ReciprocalOverlapAnnotator documentation
ReciprocalOverlapAnnotator
Annotator for computing reciprocal overlap between sets of structural variants.
Category: Variant Annotators
The ReciprocalOverlap annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.
Introduction
The ReciprocalOverlap annotator computes the overlap between sets of structural variants, represented as intervals. Compared to other tools, some useful features of the reciprocal overlap annotator are that it processes VCF files, it understands VCF confidence intervals and it supports multiple modes for selecting the "best" overlapping intervals.
This annotator compares each variant in the input VCF to all variants in a second (comparison) VCF. The comparison VCF can be the same file as the input VCF to compute the self-overlap of a set of variants.
The intervals used for comparison are based on the -variantIntervalMode argument and are different if confidence intervals are specified in the VCF files. An interval mode of MINIMUM uses the smallest (innermost) reference interval for each variant, MAXIMUM uses the largest (outermost) reference interval and NOMINAL (the default) uses the exact POS/END values as specified in the VCF.
For each input variant, a "best hit" comparison variant is chosen among all of the overlapping variants. The best hit is selected either as the variant with the greatest overlap fraction (the default) or the greatest overlap length in bases, determined by the -reciprocalOverlapRankBy argument.
By default, variants with the same ID are not compared (to allow easy computation of the self-overlap of a call set). This can be disabled with -reciprocalOverlapCompareIds false, which will force variants with the same ID to be compared.
Output Formats
This annotator can produce the following outputs: annotated VCF, report file.
The following VCF annotations are produced by this annotator:
- RO_NHITS
- The number of comparison variants that overlap this input variant.
- RO_BESTHIT
- The identifier of the comparison variant with the greatest overlap to this variant.
- RO_LENGTH
- The length of this variant as used in the overlap calculation.
- RO_BHLENGTH
- The length of the best hit variant as used in the overlap calculation.
- RO_OVERLAP
- The overlap fraction between this variant and the best hit.
- RO_OVERLAPLENTH
- The length of the overlap between this variant and the best hit.
The report file contains one line for every input variant. The following columns are produced in the report file:
- VARIANT
- The ID of the input variant.
- CHROM
- The chromosome of the input variant.
- START
- The start coordinate of the input variant on the reference.
- END
- The end coordinate of the input variant on the reference.
- NOVERLAP
- The number of overlapping comparison variants.
- BESTHIT
- The identifier of the comparison variant with the greatest overlap to this variant.
- BHSTART
- The start coordinate of the best hit variant on the reference.
- BHEND
- The end coordinate of the best hit variant on the reference.
- LENGTH
- The length of the input variant as used in the overlap calculation.
- BHLENGTH
- The length of the best hit variant as used in the overlap calculation.
- UNIONLENGTH
- The length of the union of the input and best hit variant.
- OVERLAPLENTH
- The length of the overlap (intersection) between this variant and the best hit variant.
- OVERLAP
- The overlap fraction between this variant and the best hit.
Example
java -Xmx4g -cp SVToolkit.jar \ org.broadinstitute.sv.main.SVAnnotator \ -A ReciprocalOverlap \ -R human_g1k_v37.fasta \ -vcf input.vcf \ -comparisonFile comparison.vcf \ -O output.vcf \ -writeReport true \ -reportDirectory reportdir
ReciprocalOverlapAnnotator specific arguments
Name | Type | Default value | Summary |
---|---|---|---|
Required Parameters | |||
-comparisonFile | File | NA | VCF file of variants for comparison |
Optional Parameters | |||
-filterVariants | Boolean | true | True to ignore variants that have been filtered |
-reciprocalOverlapCompareIds | Boolean | true | Use variant IDs to avoid comparing a variant with itself to allow self-overlap comparisons |
-reciprocalOverlapRankBy | RankBy | FRACTION | How to select between multiple overlapping variants, possible values: LENGTH, FRACTION (default) |
-variantIntervalMode | IntervalMode | NOMINAL | How to measure variant intervals, possible values: NOMINAL (default), MINIMUM, MAXIMUM |
Argument details
--comparisonFile / -comparisonFile ( required File )
VCF file of variants for comparison.
--filterVariants / -filterVariants ( Boolean with default value true )
True to ignore variants that have been filtered.
--reciprocalOverlapCompareIds / -reciprocalOverlapCompareIds ( Boolean with default value true )
Use variant IDs to avoid comparing a variant with itself to allow self-overlap comparisons.
--reciprocalOverlapRankBy / -reciprocalOverlapRankBy ( RankBy with default value FRACTION )
How to select between multiple overlapping variants, possible values: LENGTH, FRACTION (default).
The --reciprocalOverlapRankBy argument is an enumerated type (RankBy), which can have one of the following values:
- FRACTION
- LENGTH
--variantIntervalMode / -variantIntervalMode ( IntervalMode with default value NOMINAL )
How to measure variant intervals, possible values: NOMINAL (default), MINIMUM, MAXIMUM.
The --variantIntervalMode argument is an enumerated type (IntervalMode), which can have one of the following values:
- NOMINAL
- MINIMUM
- MAXIMUM