SVAnnotator documentation

SVAnnotator

Annotation engine and framework for SV annotation.

SVAnnotator is a general framework for generating annotations of different kinds on VCF files containing structural variant records.

SVAnnotator is modeled after the GATK VariantAnnotator walker, but works somewhat differently and is tailored for use with structural variation. Each annotator is like a plug-in and you can run one or more annotators over the same file in one invocation.

Each annotator can add annotations (typically INFO fields) to the input VCF file, or it can generate a report file (a tab delimited text file), or it can generate one or more summary reports (also generally tab-delimited text files) or any combination of the preceding. Different annotators may or may not support all of these output modes. The documentation for each individual annotator will specify what output formats it supports.

Common Framework Arguments

The SVAnnotator framework supports a number of generic command line arguments that are documented below. Each individual annotator may support additional arguments.

Where possible, annotators will try to use the same interpretation for the same arguments. As an example, the -sample argument is supported by a number of annotators to specify a (restricted) set of samples to use in the analysis. Following standard SVToolkit conventions, the -sample argument may be present multiple times and may specify either individual samples or a .list file (a text file with extension .list) that contains a list of sample identifiers, or a combination of both. The net effect will be the union of all of the specified samples.

The only required arguments are a reference sequence (-R) and an input VCF file (-vcf). The set of annotators to run is specified with -A. Individual annotators may require additional arguments specific to that annotator.

Examples

Run several annotators, generating an annotated VCF file and also report and summary files (for annotators that support them).

 java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A GCContent \
     -A VarantsPerSample \
     -A NonVariant \
     -A Redundancy \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -O output.vcf \
     -comparisonFile input.vcf \  # required for Redundancy annotator
     -writeReport true \
     -writeSummary true \
     -reportDirectory outputDir

SVAnnotator specific arguments

Name Type Default value Summary
Required Parameters
-R File NA Reference sequence (indexed fasta file)
Optional Inputs
-vcf File NA Input variant file (VCF format)
Optional Outputs
-O File NA VCF file to which annotated variants should be written
Optional Parameters
-A List[String] NA One or more specific annotations to apply to variant calls
-configFile List[File] NA Configuration parameter file(s)
-P List[String] NA Override individual configuration parameters (for advanced use only)
-reportDirectory File NA Directory for reports (default is current directory)
-reportFile File NA File name for report (default is based an annotator name)
-reportFileMap List[String] NA Map of file name for reports, overrides reportFile argument when multiple annotators are run
-scattering String NA True if the annotation is being scattered by interval (default false)
-summaryFile File NA File name for summary (default is based an annotator name)
-summaryFileMap List[String] NA Map of file name for summaries, overrides summaryFile argument when multiple annotators are run
-tempDir File NA Directory for temporary files
-useGATKTraversal String NA Use normal GATK walker traversal (default false)
-writeReport String NA Whether to write a report (default false)
-writeSummary String NA Whether to write a summary (default false)

Argument details

--annotation / -A ( List[String] )

One or more specific annotations to apply to variant calls.

--configFile / -configFile ( List[File] )

Configuration parameter file(s).

--outputFile / -O ( File )

VCF file to which annotated variants should be written.

--parameter / -P ( List[String] )

Override individual configuration parameters (for advanced use only).

--reference_sequence / -R ( required File )

Reference sequence (indexed fasta file).

--reportDirectory / -reportDirectory ( File )

Directory for reports (default is current directory).

--reportFile / -reportFile ( File )

File name for report (default is based an annotator name).

--reportFileMap / -reportFileMap ( List[String] )

Map of file name for reports, overrides reportFile argument when multiple annotators are run.

--scattering / -scattering ( String )

True if the annotation is being scattered by interval (default false).

--summaryFile / -summaryFile ( File )

File name for summary (default is based an annotator name).

--summaryFileMap / -summaryFileMap ( List[String] )

Map of file name for summaries, overrides summaryFile argument when multiple annotators are run.

--tempDir / -tempDir ( File )

Directory for temporary files.

--useGATKTraversal / -useGATKTraversal ( String )

Use normal GATK walker traversal (default false).

--vcfFile / -vcf ( File )

Input variant file (VCF format).

--writeReport / -writeReport ( String )

Whether to write a report (default false).

--writeSummary / -writeSummary ( String )

Whether to write a summary (default false).