# HeterozygosityDictionary | Created 2017-12-24

In GATK genotyping, we use an "expected heterozygosity" value to compute the prior probability that a locus is non-reference. Given the expected heterozygosity hets, we calculate the probability of N samples being hom-ref at a site as 1 - sum_i_2N (hets / i). The default value provided for humans is hets = 1e-3; a value of 0.001 implies that two randomly chosen chromosomes from the population of organisms would differ from each other at a rate of 1 in 1000 bp. In this context hets is analogous to the parameter theta from population genetics. The hets parameter value can be modified if desired.
Note that this quantity has nothing to do with the likelihood of any given sample having a heterozygous genotype, which in the GATK is purely determined by the probability of the observed data P(D | AB) under the model that there may be an AB heterozygous genotype. The posterior probability of this AB genotype would use the hets prior, but the GATK only uses this posterior probability in determining the probability that a site is polymorphic. So changing the hets parameters only increases the chance that a site will be called non-reference across all samples, but doesn't actually change the output genotype likelihoods at all, as these aren't posterior probabilities. The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the ploidy, which describes how many copies of each chromosome each individual in the species carries.