Annotator that calculates summary statistics on population allele frequency and Hardy-Weinberg equilibrium.
Category: Variant Annotators
The AlleleFrequency annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.
The AlleleFrequency annotator uses the genotype calls from the VCF file to compute allele frequency statistics. Only bi-allelic variants are currently supported and the GT/FT/GQ tags are used to determine the genotype values.
Multiple populations (potentially overlapping) may be specified and then the allele frequency statistics will be calculated on a per-population basis. If no populations are defined, the samples in the VCF are treated as a single population.
The test for Hardy-Weinberg equilibrium is an exact method that calculates a two-sided Hardy-Weinberg p-value (Wigginton, Cutler and Abecasis, American Journal of Human Genetics, 2005 [PMID: 15789306], implementation copyright 2003 by Jan Wigginton and Goncalo Abecasis).
Population map files are tab delimited files with two columns. The first column specifies the sample identifier and the second column specifies a population identifier. A header line is optional, but if present the column names should be SAMPLE and POPULATION.
Multiple population map files may be provided and these may assign multiple populations to the same sample. This allows multiple levels of population structure or overlapping populations to be described.
This annotator can produce the following outputs: report file.
This report file contains one line for each population for each variant. The columns in the report file are:
- The ID of the variant.
- The expected ploidy at this site. Sites on sex chromosomes that are not uniformly diploid will be missing some statistics.
- The population identifier.
- The non-reference allele frequency.
- The total number of alleles (chromosomes) in the population.
- The number of alleles (chromosomes) for which no genotype call is available.
- The number of alleles (chromosomes) that match the reference.
- The number of alleles (chromosomes) that are the non-reference allele.
- The total number of individuals in the population.
- The number of individuals for which no genotype call is available.
- The number of individuals that are called homozygous for the reference allele.
- The number of individuals that are called heterozygous.
- The number of individuals that are called homozygous for the alternate allele.
- Hardy-Weinberg equilibrium p-value.
java -Xmx4g -cp SVToolkit.jar \ org.broadinstitute.sv.main.SVAnnotator \ -A AlleleFrequency \ -R human_g1k_v37.fasta \ -vcf input.vcf \ -populationMapFile 1000G_populations.map \ -writeReport true \ -reportDirectory reportdir
AlleleFrequencyAnnotator specific arguments
|-filterGenotypes||Boolean||true||True to ignore genotypes that have been filtered|
|-filterVariants||Boolean||true||True to ignore variants that have been filtered|
|-genotypeQualityThreshold||Double||NA||Ignore genotypes below this genotype quality GQ value (default no threshold)|
|-population||List[String]||NA||Population(s) or .list file of populations to process|
|-populationMapFile||List[File]||NA||Map file (or files) containing sample to population assignments|
True to ignore genotypes that have been filtered.
True to ignore variants that have been filtered.
Ignore genotypes below this genotype quality GQ value (default no threshold).
Population(s) or .list file of populations to process.
Map file (or files) containing sample to population assignments.