ReciprocalOverlapAnnotator documentation

ReciprocalOverlapAnnotator

Annotator for computing reciprocal overlap between sets of structural variants.

Category: Variant Annotators

The ReciprocalOverlap annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.

Introduction

The ReciprocalOverlap annotator computes the overlap between sets of structural variants, represented as intervals. Compared to other tools, some useful features of the reciprocal overlap annotator are that it processes VCF files, it understands VCF confidence intervals and it supports multiple modes for selecting the "best" overlapping intervals.

This annotator compares each variant in the input VCF to all variants in a second (comparison) VCF. The comparison VCF can be the same file as the input VCF to compute the self-overlap of a set of variants.

The intervals used for comparison are based on the -variantIntervalMode argument and are different if confidence intervals are specified in the VCF files. An interval mode of MINIMUM uses the smallest (innermost) reference interval for each variant, MAXIMUM uses the largest (outermost) reference interval and NOMINAL (the default) uses the exact POS/END values as specified in the VCF.

For each input variant, a "best hit" comparison variant is chosen among all of the overlapping variants. The best hit is selected either as the variant with the greatest overlap fraction (the default) or the greatest overlap length in bases, determined by the -reciprocalOverlapRankBy argument.

By default, variants with the same ID are not compared (to allow easy computation of the self-overlap of a call set). This can be disabled with -reciprocalOverlapCompareIds false, which will force variants with the same ID to be compared.

Output Formats

This annotator can produce the following outputs: annotated VCF, report file.

The following VCF annotations are produced by this annotator:

RO_NHITS
The number of comparison variants that overlap this input variant.
RO_BESTHIT
The identifier of the comparison variant with the greatest overlap to this variant.
RO_LENGTH
The length of this variant as used in the overlap calculation.
RO_BHLENGTH
The length of the best hit variant as used in the overlap calculation.
RO_OVERLAP
The overlap fraction between this variant and the best hit.
RO_OVERLAPLENTH
The length of the overlap between this variant and the best hit.

The report file contains one line for every input variant. The following columns are produced in the report file:

VARIANT
The ID of the input variant.
CHROM
The chromosome of the input variant.
START
The start coordinate of the input variant on the reference.
END
The end coordinate of the input variant on the reference.
NOVERLAP
The number of overlapping comparison variants.
BESTHIT
The identifier of the comparison variant with the greatest overlap to this variant.
BHSTART
The start coordinate of the best hit variant on the reference.
BHEND
The end coordinate of the best hit variant on the reference.
LENGTH
The length of the input variant as used in the overlap calculation.
BHLENGTH
The length of the best hit variant as used in the overlap calculation.
UNIONLENGTH
The length of the union of the input and best hit variant.
OVERLAPLENTH
The length of the overlap (intersection) between this variant and the best hit variant.
OVERLAP
The overlap fraction between this variant and the best hit.

Example

 java -Xmx4g -cp SVToolkit.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A ReciprocalOverlap \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -comparisonFile comparison.vcf \
     -O output.vcf \
     -writeReport true \
     -reportDirectory reportdir


ReciprocalOverlapAnnotator specific arguments

Name Type Default value Summary
Required Parameters
-comparisonFile File NA VCF file of variants for comparison
Optional Parameters
-filterVariants Boolean true True to ignore variants that have been filtered
-reciprocalOverlapCompareIds Boolean true Use variant IDs to avoid comparing a variant with itself to allow self-overlap comparisons
-reciprocalOverlapRankBy RankBy FRACTION How to select between multiple overlapping variants, possible values: LENGTH, FRACTION (default)
-variantIntervalMode IntervalMode NOMINAL How to measure variant intervals, possible values: NOMINAL (default), MINIMUM, MAXIMUM

Argument details

--comparisonFile / -comparisonFile ( required File )

VCF file of variants for comparison.

--filterVariants / -filterVariants ( Boolean with default value true )

True to ignore variants that have been filtered.

--reciprocalOverlapCompareIds / -reciprocalOverlapCompareIds ( Boolean with default value true )

Use variant IDs to avoid comparing a variant with itself to allow self-overlap comparisons.

--reciprocalOverlapRankBy / -reciprocalOverlapRankBy ( RankBy with default value FRACTION )

How to select between multiple overlapping variants, possible values: LENGTH, FRACTION (default).

The --reciprocalOverlapRankBy argument is an enumerated type (RankBy), which can have one of the following values:

FRACTION
LENGTH

--variantIntervalMode / -variantIntervalMode ( IntervalMode with default value NOMINAL )

How to measure variant intervals, possible values: NOMINAL (default), MINIMUM, MAXIMUM.

The --variantIntervalMode argument is an enumerated type (IntervalMode), which can have one of the following values:

NOMINAL
MINIMUM
MAXIMUM