CopyNumberClassAnnotator documentation

CopyNumberClassAnnotator

Annotator that report statistics about the distribution of copy number states at each variant.

Category: Variant Annotators

The CopyNumberClass annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.

Introduction

The CopyNumberClass annotator reports several statistics about the distribution of copy number states across the genotyped samples.

This annotator is gender and ploidy aware. The copy number distribution (CNDIST) includes all samples (independent of gender), but the calculation of the minimum number of required alleles to explain the distribution at each site is gender-dependent, as is the determination of whether a sample must carry a non-reference allele (NNONREF) or how many samples differ from the mode copy number state (NVARIANT).

NNONREF reports the number of samples that are different than the expected copy-number for homozygous reference samples (taking into account changes in expected ploidy on sex chromosomes due to sample gender).

NVARIANT is a measure of how variable a site is, independent of the copy number of the reference allele. NVARIANT is the number of samples that are different from the most-common (mode) copy number state. For sex chromosomes, this is calcualted independently for males and females and then summed.

The -classifyMinimumObservations argument controls how many times a copy number state must be (confidently) seen in order to count that copy number state as being "observed" in the population. The default is 1, meaning any confident genotype call will cause the copy number state to be observed. All of the outputs (except call rate) only consider confident genotypes (based on the filters and quality values) in observed copy number states. If a copy number state is "unobserved" (because it wasn't seen frequently enough) all genotypes with this state are treated like no-calls.

If an output vcf is specified, then for genotypes with no CN field the annotator will attempt to generate CN from the GT field and will emit a genotype CN field into the output vcf. Note that if the CN field is already present on a given genotype, it will not be recomputed from the GT field. This computation is done on a genotype-by-genotype basis, so for a given variant it is possible that some samples will have their genotype CN fields added, while others will be left as they are.

Output Formats

This annotator can produce the following outputs: report file.

The report file contains one line per variant, with the following columns:

ID
The variant identifier.
CALLRATE
The genotyping call rate for this site.
CNMIN
The minimum observed copy number class.
CNMAX
The maximum observed copy number class.
CNALLELES
The minimum number of haploid copy number alleles that would be required to explain the observed copy number states. This calculation is gender-aware.
NNONREF
The number of samples that are called different than uniformly reference. This calculation is gender-aware.
NVARIANT
The number of samples that are called different than the mode copy number. For sex chromosomes, this is calculated independently for males and females and summed.
CNDIST
A comma-separated list of integers given the observed number of samples in each copy number state, starting from zero. Males and females are combined.
NDEL
The number of samples with observed copy number below the expected homozygous-reference copy number. This takes into account sample sex and the ploidy of each chromosome.
NDUP
The number of samples with observed copy number above the expected homozygous-reference copy number. This takes into account sample sex and the ploidy of each chromosome.

Example

 java -Xmx4g -cp SVToolkit.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A CopyNumberClass \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -writeReport true \
     -reportDirectory reportdir


CopyNumberClassAnnotator specific arguments

Name Type Default value Summary
Optional Parameters
-classifyMinimumObservations Integer 1 Minimum number of confident observations required when classifying copy number states
-filterGenotypes Boolean true True to ignore genotypes that have been filtered with the FT tag
-genotypeQualityThreshold Double NA Ignore genotypes below this genotype quality GQ/CNQ value (default no threshold)
-sample List[String] NA Sample(s) or .list file of sample names. If specified, only the listed samples will be used to evaluate the variants

Argument details

--classifyMinimumObservations / -classifyMinimumObservations ( Integer with default value 1 )

Minimum number of confident observations required when classifying copy number states.

--filterGenotypes / -filterGenotypes ( Boolean with default value true )

True to ignore genotypes that have been filtered with the FT tag.

--genotypeQualityThreshold / -genotypeQualityThreshold ( Double )

Ignore genotypes below this genotype quality GQ/CNQ value (default no threshold).

--sample / -sample ( List[String] )

Sample(s) or .list file of sample names. If specified, only the listed samples will be used to evaluate the variants.