GeneOverlapAnnotator documentation
GeneOverlapAnnotator
Annotator used to detect and report overlaps between structural variations and annotated genes.
Category: Variant Annotators
The GeneOverlap annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.
Introduction
The GeneOverlap annotator reports on the genes and transcripts that overlap a structural variation and the significance of the overlap (currently coding vs. exonic vs. intronic).
Input Formats
The -geneTrackFile must be specified in GTF format and certain restrictions on content and sort order must be followed. See the SortGTFFile utility program for details.
This annotator is new and so not all GTF files may work correctly. The GENCODE v17 GTF file works after running it through SortGTFFile.
The -geneDescriptionFile is a tab delimited file with two columns: GENENAME and DESCRIPTION. The GENENAME column must match the gene names in the geneTrack GTF file.
Output Formats
This annotator can produce the following outputs: report file.
The report file contains one or more lines per input site. If the input site overlaps one or more genes, one line is produced for each overlapping gene.
The report file contains the following columns:
- ID
- The variant ID from the input file.
- CHROM
- The chromosome of the variant.
- START
- The start coordinate of the variant on the reference.
- END
- The end coordinate of the variant on the reference.
- NOVERLAP
- The number of overlapping genes.
- GENENAME
- The name of one overlapping gene.
- TRANSCRIPT
- The transcript ID of the associated transcript or NA.
- GENEOVERLAP
- The kind of overlap, currently GENE, CDS, EXON, INTRON, OTHER (or NA for no genic overlap).
- TXOVERLAP
- The kind of transcript overlap, currently TX-ALL, TX-ALL-CDS, TX-EXON-FRAMESHIFT, TX-EXON-INFRAME, TX-TRUNCATED-3P, TX-TRUNCATED-5P, TX-CDS (or NA for no overlap).
- DESCRIPTION
- A description of the gene, taken from the gene description file if supplied.
Example
java -Xmx4g -cp SVToolkit.jar \ org.broadinstitute.sv.main.SVAnnotator \ -A GeneOverlap \ -R human_g1k_v37.fasta \ -vcf input.vcf \ -geneTrackFile gencode.v17.annotation.sorted.gtf \ -geneTypeFilter protein_coding \ -geneDescriptionFile ucsc_gene_descriptions.dat \ -writeReport true \ -reportDirectory reportdir
GeneOverlapAnnotator specific arguments
Name | Type | Default value | Summary |
---|---|---|---|
Optional Parameters | |||
-geneDescriptionFile | File | NA | Tab-delimited two column file mapping gene names to text gene descriptions |
-geneTrackFile | File | NA | File of gene models for gene overlap analysis (GTF format) |
-geneTypeFilter | List[String] | NA | Set of gene types to consider when analyzing gene overlap |
-sequenceTranslationMapFile | File | NA | Mapping file from sequence name aliases used in annotation files to reference sequence names (e.g. "chr1" to "1") |
Argument details
--geneDescriptionFile / -geneDescriptionFile ( File )
Tab-delimited two column file mapping gene names to text gene descriptions.
--geneTrackFile / -geneTrackFile ( File )
File of gene models for gene overlap analysis (GTF format).
--geneTypeFilter / -geneTypeFilter ( List[String] )
Set of gene types to consider when analyzing gene overlap.
--sequenceTranslationMapFile / -sequenceTranslationMapFile ( File )
Mapping file from sequence name aliases used in annotation files to reference sequence names (e.g. "chr1" to "1").