GATKv4.1.1.0 introduces streamlined somatic calling with fewer errors, fewer false-negatives and optimized sensitivity and precision due to several major advances in the Mutect2 pipeline. We hope the changes will help make your work more efficient, more accurate and less expensive, benefits that will be worth the slight annoyance of the occasional command line change to the workflow. Read to the bottom for what you need to know to run and take advantage of the new pipeline.

Reducing errors with key bug fixes

We fixed several bugs that were responsible for error messages about invalid log probabilities, infinities, NaNs etc. We also resolved an issue where CalculateContamination worked poorly on very small gene panels.

Maximizing sensitivity and precision with a streamlined filtering strategy

FilterMutectCalls now filters based on a single quantity, the probability that a variant is not a somatic mutation, regardless of cause. Previously, each had its own threshold. We have removed parameters such as -normal-artifact-lod, -max-germline-posterior, -max-strand-artifact-probability, -max-contamination-probability, and even -tumor-lod. FilterMutectCalls automatically determines the probability threshold that optimizes the "F score," the harmonic mean of sensitivity and precision. Users can tweak results in favor of more or less sensitivity by modifying a single parameter, the variable beta (the relative weight of sensitivity versus precision in the harmonic mean). Setting beta to a value greater than its default filters for greater sensitivity and setting it lower filters for greater precision.

Reducing false-positives with a Bayesian somatic clustering model

We had long suspected that modeling the spectrum of subclonal allele fractions would help distinguish somatic variants from errors. For example, if every somatic variant in a tumor occurred in 40% of cells, we would know to reject anything with an allele fraction significantly different from 20%. In the Bayesian framework of Mutect2 this means that we can model the read counts of somatic variants with binomial distributions. We account for an unknown number of subclones with a Dirichlet process binomial mixture model. Because CNVs, small subclones, and genetic drift of passenger mutations all contribute allele fractions that don’t match a few discrete values, this is still an oversimplification. Therefore, we include a couple of beta-binomials in the mixture to account for a background spread of allele fractions while still benefiting from clustering. Finally, we use these binomial and beta-binomial likelihoods to refine the tumor log odds calculated by Mutect2, which assume a uniform distribution of allele fractions.

For more details refer to our step-by-step tutorial for somatic variant calling with Mutect2 v4.1.1.0 and higher here. Also, refer to the latest Mutect2 tool documentation here.

Return to top

Thu 23 May 2019
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


It's hot, it's humid, it's #ASHG19 in Houston, TX. Join us at @broadgenomics booth 714 in the exhibition hall to ch…
16 Oct 19
Interested in hearing more about our DRAGEN-GATK partnership with @illumina? Fill out this survey to let us know yo…
16 Oct 19
RT @datadriveby: GATK and DRAGEN collaboration presented by @VdaGeraldine of @gatk_dev and @delagoya of @illumina at #ASHG19. Interesting t…
15 Oct 19
Questions about our new partnership with @illumina DRAGEN? Check out the blog post and handy graphic that explains…
1 Oct 19
Enter the DRAGEN-GATK: Get the lowdown on our freshly announced collaboration with the @illumina DRAGEN team at
30 Sep 19

- Our favorite tweets from others

DRAGEN-GATK roadmap looking very interesting. Several complementary options will be available for running stuff on-…
15 Oct 19
As a prior card carrying bioinformatician, it’s great to see @illumina and @broadinstitute coming together to solve…
15 Oct 19
GATK and DRAGEN collaboration presented by @VdaGeraldine of @gatk_dev and @delagoya of @illumina at #ASHG19. Intere…
15 Oct 19
In a new collaboration, the @gatk_dev team and the @illumina DRAGEN Bio-IT Platform are co-developing open-source g…
30 Sep 19
Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From re…
3 Sep 19

See more of our favorite tweets...