Spiderplot documentation

Spiderplot

The Spiderplot utility produces plots showing haplotype structure around a variant (typically a CNV), similar to Figure 6 in DOI 10.1038/ng.3200.

This is a new version of the spiderplot code which is much faster and more scalable. This version does not currently support the -population argument to produce multiple plots (one per population). If you need to plot multiple populations, you need to plot them one at a time.

Input

The primary input is a phased VCF file that should contain the variant of interest along with phased markers flanking the variant.

The optional -plotGroupFile input allows you to specify additional decorations that are added to the plot.
The plot decoration group file should be tab-delimited file with header with the following columns:

LABEL
The the legend label for the group.
TYPE
Must be either POINT or BAR.
COLOR
Any valid R color.
SAMPLE
A sample or haplotype identifier/group, which can be the sample ID, a hap ID (e.g. SAMPLE-1) or an allele label.

Fine control over plotting can be achieved by overriding the default plotting script (spiderplot.R) using the -plottingScript option.

Example

 java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
     org.broadinstitute.sv.apps.Spiderplot \
     -R reference.fasta \
     -vcf phased_vcf_file.vcf.gz \
     -O output_plot.pdf \
     -site cnvSiteID \
     -flankWidth 50000 \
     -alleleFrequencyThreshold 0.01

Spiderplot specific arguments

Name Type Default value Summary
Required Inputs
-vcf File NA Input vcf file containing phased haplotypes
Required Parameters
-R File NA Reference file (indexed fasta file)
Optional Outputs
-log String NA Set the logging location
-O File NA Output file (pdf)
Optional Parameters
-alleleCountThreshold Integer NA Minimum minor allele count for plotted markers
-alleleFrequencyThreshold Double NA Minimum minor allele frequency for plotted markers
-alleleLabelMapFile List[File] NA Map file or files containing a mapping from VCF alleles to allele labels
-colorMapFile File NA Tab-delimited file mapping alleles to colors (any color value recognized by R).
-flankMarkerCount Integer NA Size in markers of flanks to plot (default no limit)
-flankWidth Integer 100000 Size in base pairs of flanks to plot (default 100,000)
-hapIdMapFile List[File] NA Map file or files containing alternate haplotype IDs to use
-hapLabelMapFile List[File] NA Map file or files containing the allele to assign to each haplotype
-L String NA Specific interval to plot (overrides flankWidth)
-l String INFO Set the minimum level of logging
-plotGroupFile File NA Tab-delimited file describing plot decorations on haplotypes
-plotHapIds String NA Whether to plot haplotype IDs (e.g. sample-1, sample-2) on the plot (boolean, default false)
-plotHeight String NA Plot height in inches or "auto" to auto scale for large plots (default 8)
-plotTitle String NA Plot title
-plotWidth String NA Plot width in inches (default 10.5)
-population List[String] NA Population(s) or .list file of populations
-populationMapFile List[File] NA Map file or files containing sample to population assignments
-sample List[String] NA Sample or samples to plot (or .list file)
-site String NA Site ID to plot
-siteInterval String NA Explicitly set the start/end position for the target site
-verbose String NA Enable extra progress output
Optional Flags
-h Flag NA Generate the help message
-version Flag NA Output version information
Advanced Parameters
-debug String NA Enable verbose debugging output
-hapFile File NA Location of generated text file with input for plotting script
-hapTreeFile File NA Location of generated tree file with input for plotting script
-P List[String] NA Override individual configuration parameters
-plottingScript String NA Custom plotting script to use (instead of default script)

Argument details

--alleleCountThreshold / -alleleCountThreshold ( Integer )

Minimum minor allele count for plotted markers.

--alleleFrequencyThreshold / -alleleFrequencyThreshold ( Double )

Minimum minor allele frequency for plotted markers.

--alleleLabelMapFile / -alleleLabelMapFile ( List[File] )

Map file or files containing a mapping from VCF alleles to allele labels.

--colorMapFile / -colorMapFile ( File )

Tab-delimited file mapping alleles to colors (any color value recognized by R)..

--debug / -debug ( String )

Enable verbose debugging output.

--flankMarkerCount / -flankMarkerCount ( Integer )

Size in markers of flanks to plot (default no limit).

--flankWidth / -flankWidth ( Integer with default value 100000 )

Size in base pairs of flanks to plot (default 100,000).

--hapFile / -hapFile ( File )

Location of generated text file with input for plotting script.

--hapIdMapFile / -hapIdMapFile ( List[File] )

Map file or files containing alternate haplotype IDs to use.

--hapLabelMapFile / -hapLabelMapFile ( List[File] )

Map file or files containing the allele to assign to each haplotype.

--hapTreeFile / -hapTreeFile ( File )

Location of generated tree file with input for plotting script.

--help / -h ( Flag )

Generate the help message.

--interval / -L ( String )

Specific interval to plot (overrides flankWidth).

--log_to_file / -log ( String )

Set the logging location.

--logging_level / -l ( String with default value INFO )

Set the minimum level of logging.

--outputFile / -O ( File )

Output file (pdf).

--parameter / -P ( List[String] )

Override individual configuration parameters.

--plotGroupFile / -plotGroupFile ( File )

Tab-delimited file describing plot decorations on haplotypes.

--plotHapIds / -plotHapIds ( String )

Whether to plot haplotype IDs (e.g. sample-1, sample-2) on the plot (boolean, default false).

--plotHeight / -plotHeight ( String )

Plot height in inches or "auto" to auto scale for large plots (default 8).

--plottingScript / -plottingScript ( String )

Custom plotting script to use (instead of default script).

--plotTitle / -plotTitle ( String )

Plot title.

--plotWidth / -plotWidth ( String )

Plot width in inches (default 10.5).

--population / -population ( List[String] )

Population(s) or .list file of populations.

--populationMapFile / -populationMapFile ( List[File] )

Map file or files containing sample to population assignments.

--referenceFile / -R ( required File )

Reference file (indexed fasta file).

--sample / -sample ( List[String] )

Sample or samples to plot (or .list file).

--site / -site ( String )

Site ID to plot.

--siteInterval / -siteInterval ( String )

Explicitly set the start/end position for the target site.

--vcfFile / -vcf ( required File )

Input vcf file containing phased haplotypes.

--verbose / -verbose ( String )

Enable extra progress output.

--version / -version ( Flag )

Output version information.