Showing docs for version 3.6-0 | The latest version is


Allele balance across all samples

Category Annotation Modules

VCF Field INFO (variant-level)

Header definition line
  • INFO=<ID=ABHet,Number=1,Type=Float,Description="Allele Balance for heterozygous calls (ref/(ref+alt))">
  • INFO=<ID=ABHom,Number=1,Type=Float,Description="Allele Balance for homozygous calls (A/(A+O)) where A is the allele (ref or alt) and O is anything other">
  • INFO=<ID=OND,Number=1,Type=Float,Description="Overall non-diploid ratio (alleles/(alleles+non-alleles))">

  • Overview

    This is a set of experimental annotations that attempt to estimate whether the data supporting a variant call fits allelic ratio expectations, or whether there might be some bias in the data. ABHom is the proportion of reads from homozygous samples that support the call (REF or ALT depending on whether the call is hom-ref or hom-var). ABHet is the proportion of REF reads from heterozygous samples. OND represents the overall fraction of data that diverges from the diploid hypothesis, based on the number of reads that support something other than the genotyped alleles (called "non-alleles"). Note that each sample's contribution is weighted by its genotype quality so that individual mis-calls don't affect the overall ratio too much.


    $$ ABHom = \frac{# REF or ALT reads from homozygous samples}{# REF + ALT reads from homozygous samples} $$
    $$ ABHet = \frac{# REF reads from heterozygous samples}{# REF + ALT reads from heterozygous samples} $$
    $$ OND = \frac{# reads from non-alleles}{# all reads} $$

    For ABHom, the value should be close to 1.00 because ideally, all the reads should support a single allele. For ABHet, the value should be close to 0.5, so half of the reads support the ref allele and half of the reads support the alt allele. Divergence from these expected ratios may indicate that there is some bias in favor of one allele. Note the caveats below regarding cancer and RNAseq analysis.


    • This annotation will only work properly for biallelic SNPs in diploid organisms where all samples are either called heterozygous or homozygous.
    • This annotation cannot currently be calculated for indels.
    • The reasoning underlying this annotation only applies to germline variants in DNA sequencing data. In somatic/cancer analysis, divergent ratios are expected due to tumor heterogeneity and normal contamination. In RNAseq analysis, divergent ratios may indicate differential allele expression.
    • As stated above, this annotation is experimental and should be interpreted with caution as we cannot guarantee that it is appropriate. Basically, use it at your own risk.

    Related annotations

    Return to top

    See also GATK Documentation Index | Tool Docs Index | Support Forum

    GATK version 3.6-0-g89b7209 built at 2017/02/09 12:52:48.