AlleleFrequencyAnnotator documentation

AlleleFrequencyAnnotator

Annotator that calculates summary statistics on population allele frequency and Hardy-Weinberg equilibrium.

Category: Variant Annotators

The AlleleFrequency annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.

Introduction

The AlleleFrequency annotator uses the genotype calls from the VCF file to compute allele frequency statistics. Only bi-allelic variants are currently supported and the GT/FT/GQ tags are used to determine the genotype values.

Multiple populations (potentially overlapping) may be specified and then the allele frequency statistics will be calculated on a per-population basis. If no populations are defined, the samples in the VCF are treated as a single population.

The test for Hardy-Weinberg equilibrium is an exact method that calculates a two-sided Hardy-Weinberg p-value (Wigginton, Cutler and Abecasis, American Journal of Human Genetics, 2005 [PMID: 15789306], implementation copyright 2003 by Jan Wigginton and Goncalo Abecasis).

Input Formats

Population map files are tab delimited files with two columns. The first column specifies the sample identifier and the second column specifies a population identifier. A header line is optional, but if present the column names should be SAMPLE and POPULATION.

Multiple population map files may be provided and these may assign multiple populations to the same sample. This allows multiple levels of population structure or overlapping populations to be described.

Output Formats

This annotator can produce the following outputs: report file.

This report file contains one line for each population for each variant. The columns in the report file are:

ID
The ID of the variant.
PLOIDY
The expected ploidy at this site. Sites on sex chromosomes that are not uniformly diploid will be missing some statistics.
POPULATION
The population identifier.
AAF
The non-reference allele frequency.
NALLELES
The total number of alleles (chromosomes) in the population.
NNOCALLALLELES
The number of alleles (chromosomes) for which no genotype call is available.
NREFALLELES
The number of alleles (chromosomes) that match the reference.
NALTALLELES
The number of alleles (chromosomes) that are the non-reference allele.
NGENOTYPES
The total number of individuals in the population.
NNOCALL
The number of individuals for which no genotype call is available.
NHOMREF
The number of individuals that are called homozygous for the reference allele.
NHET
The number of individuals that are called heterozygous.
NHOMALT
The number of individuals that are called homozygous for the alternate allele.
HWEPVALUE
Hardy-Weinberg equilibrium p-value.

Example

 java -Xmx4g -cp SVToolkit.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A AlleleFrequency \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -populationMapFile 1000G_populations.map \
     -writeReport true \
     -reportDirectory reportdir

AlleleFrequencyAnnotator specific arguments

Name Type Default value Summary
Optional Parameters
-filterGenotypes Boolean true True to ignore genotypes that have been filtered
-filterVariants Boolean true True to ignore variants that have been filtered
-genotypeQualityThreshold Double NA Ignore genotypes below this genotype quality GQ value (default no threshold)
-population List[String] NA Population(s) or .list file of populations to process
-populationMapFile List[File] NA Map file (or files) containing sample to population assignments

Argument details

--filterGenotypes / -filterGenotypes ( Boolean with default value true )

True to ignore genotypes that have been filtered.

--filterVariants / -filterVariants ( Boolean with default value true )

True to ignore variants that have been filtered.

--genotypeQualityThreshold / -genotypeQualityThreshold ( Double )

Ignore genotypes below this genotype quality GQ value (default no threshold).

--population / -population ( List[String] )

Population(s) or .list file of populations to process.

--populationMapFile / -populationMapFile ( List[File] )

Map file (or files) containing sample to population assignments.