GeneOverlapAnnotator documentation

GeneOverlapAnnotator

Annotator used to detect and report overlaps between structural variations and annotated genes.

Category: Variant Annotators

The GeneOverlap annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.

Introduction

The GeneOverlap annotator reports on the genes and transcripts that overlap a structural variation and the significance of the overlap (currently coding vs. exonic vs. intronic).

Input Formats

The -geneTrackFile must be specified in GTF format and certain restrictions on content and sort order must be followed. See the SortGTFFile utility program for details.

This annotator is new and so not all GTF files may work correctly. The GENCODE v17 GTF file works after running it through SortGTFFile.

The -geneDescriptionFile is a tab delimited file with two columns: GENENAME and DESCRIPTION. The GENENAME column must match the gene names in the geneTrack GTF file.

Output Formats

This annotator can produce the following outputs: report file.

The report file contains one or more lines per input site. If the input site overlaps one or more genes, one line is produced for each overlapping gene.

The report file contains the following columns:

ID
The variant ID from the input file.
CHROM
The chromosome of the variant.
START
The start coordinate of the variant on the reference.
END
The end coordinate of the variant on the reference.
NOVERLAP
The number of overlapping genes.
GENENAME
The name of one overlapping gene.
TRANSCRIPT
The transcript ID of the associated transcript or NA.
GENEOVERLAP
The kind of overlap, currently GENE, CDS, EXON, INTRON, OTHER (or NA for no genic overlap).
TXOVERLAP
The kind of transcript overlap, currently TX-ALL, TX-ALL-CDS, TX-EXON-FRAMESHIFT, TX-EXON-INFRAME, TX-TRUNCATED-3P, TX-TRUNCATED-5P, TX-CDS (or NA for no overlap).
DESCRIPTION
A description of the gene, taken from the gene description file if supplied.

Example

 java -Xmx4g -cp SVToolkit.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A GeneOverlap \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -geneTrackFile gencode.v17.annotation.sorted.gtf \
     -geneTypeFilter protein_coding \
     -geneDescriptionFile ucsc_gene_descriptions.dat \
     -writeReport true \
     -reportDirectory reportdir
 


GeneOverlapAnnotator specific arguments

Name Type Default value Summary
Optional Parameters
-geneDescriptionFile File NA Tab-delimited two column file mapping gene names to text gene descriptions
-geneTrackFile File NA File of gene models for gene overlap analysis (GTF format)
-geneTypeFilter List[String] NA Set of gene types to consider when analyzing gene overlap
-sequenceTranslationMapFile File NA Mapping file from sequence name aliases used in annotation files to reference sequence names (e.g. "chr1" to "1")

Argument details

--geneDescriptionFile / -geneDescriptionFile ( File )

Tab-delimited two column file mapping gene names to text gene descriptions.

--geneTrackFile / -geneTrackFile ( File )

File of gene models for gene overlap analysis (GTF format).

--geneTypeFilter / -geneTypeFilter ( List[String] )

Set of gene types to consider when analyzing gene overlap.

--sequenceTranslationMapFile / -sequenceTranslationMapFile ( File )

Mapping file from sequence name aliases used in annotation files to reference sequence names (e.g. "chr1" to "1").