GATKv4.1.1.0 introduces streamlined somatic calling with fewer errors, fewer false-negatives and optimized sensitivity and precision due to several major advances in the Mutect2 pipeline. We hope the changes will help make your work more efficient, more accurate and less expensive, benefits that will be worth the slight annoyance of the occasional command line change to the workflow. Read to the bottom for what you need to know to run and take advantage of the new pipeline.

Reducing errors with key bug fixes

We fixed several bugs that were responsible for error messages about invalid log probabilities, infinities, NaNs etc. We also resolved an issue where CalculateContamination worked poorly on very small gene panels.

Maximizing sensitivity and precision with a streamlined filtering strategy

FilterMutectCalls now filters based on a single quantity, the probability that a variant is not a somatic mutation, regardless of cause. Previously, each had its own threshold. We have removed parameters such as -normal-artifact-lod, -max-germline-posterior, -max-strand-artifact-probability, -max-contamination-probability, and even -tumor-lod. FilterMutectCalls automatically determines the probability threshold that optimizes the "F score," the harmonic mean of sensitivity and precision. Users can tweak results in favor of more or less sensitivity by modifying a single parameter, the variable beta (the relative weight of sensitivity versus precision in the harmonic mean). Setting beta to a value greater than its default filters for greater sensitivity and setting it lower filters for greater precision.

Reducing false-positives with a Bayesian somatic clustering model

We had long suspected that modeling the spectrum of subclonal allele fractions would help distinguish somatic variants from errors. For example, if every somatic variant in a tumor occurred in 40% of cells, we would know to reject anything with an allele fraction significantly different from 20%. In the Bayesian framework of Mutect2 this means that we can model the read counts of somatic variants with binomial distributions. We account for an unknown number of subclones with a Dirichlet process binomial mixture model. Because CNVs, small subclones, and genetic drift of passenger mutations all contribute allele fractions that don’t match a few discrete values, this is still an oversimplification. Therefore, we include a couple of beta-binomials in the mixture to account for a background spread of allele fractions while still benefiting from clustering. Finally, we use these binomial and beta-binomial likelihoods to refine the tumor log odds calculated by Mutect2, which assume a uniform distribution of allele fractions.

For more details refer to our step-by-step tutorial for somatic variant calling with Mutect2 v4.1.1.0 and higher here. Also, refer to the latest Mutect2 tool documentation here.

Return to top

Thu 23 May 2019
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19
New crop of GATK workshop videos now available on YouTube! Updated for the GATK4/2019 version of the Best Practices…
25 Jul 19
Don't miss this #GATK workshop -- we've got a great crew lined up and the location isn't half bad either :)
23 Jul 19
@Brunods1001 It’s been updated to use GATK4, which addresses the invalid bam output issue that affected the GATK3 v…
11 Jul 19
Wrapping up the #GATK workshop in Cambridge, UK -- it's been a blast. Great group of participants and fantastic hos…
11 Jul 19

- Our favorite tweets from others

In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no…
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying…
18 Jul 19
@RareSeas first attempt at teaching the GATK course, do I look puzzled up there?
11 Jul 19
Can you spot CDGP PhD student, Dr. Alice Denyer, brushing up on the latest bioinformatics tools from @gatk_dev? The…
10 Jul 19
GATK workshop materials available online! Learn it in your own time with @ProjectJupyter notebooks. ^MT
8 Jul 19

See more of our favorite tweets...