Improving the science

As of GATK 4.0, Mutect2 did not work well with the extremely high depths seen in mitochondrial samples and cfDNA liquid biopsies. Thanks to a new active region calculation and a few new filters, Mutect2 now works off the shelf for these cases.

We have been emboldened to improve the local assembly code shared with HaplotypeCaller (HC). Adaptive assembly graph pruning advances both high- and low- depth calling. This feature (default in Mutect2 and optional in HC) uses a probabilistic model that accounts for depth, removing spurious paths in the assembly graph while retaining real mutations. Processing fewer false paths reduces runtime and improves accuracy, even in an assembly region with varying depth, such as one that spans the end of an exome capture target.

Mutect2 can now monitor a tumor after chemotherapy using the new “GGA” (genotype given alleles) mode. Inputting pre-treatment somatic variants as the given alleles forces Mutect2 to genotype those alleles in a given vcf in addition to those that it would otherwise discover (in contrast to HC’s GGA mode). You can also use GGA mode to give special attention to calls from other tools, as in a head-to-head comparison, or to known driver mutations.

Mutect2 now allows joint somatic genotyping of an arbitrary number of tumor and normal samples. With this change, you only need to specify which samples are normal. Mutect2 assumes anything else is a tumor sample. Allowing multiple samples from the same individual enables tracking a tumor and possibly its metastases over time. The feature, currently in beta, will become important as liquid biopsies increase in popularity.

Other new or overhauled tools

Several tools in the orbit of Mutect2 are also entirely new or overhauled:

  • A new Bayesian orientation bias model catches artifact modes prevalent in Illumina’s new Novaseq machines that the old model missed. It filters oxoG and deamination (FFPE) artifacts almost identically to its predecessor, and automatically discovers which artifacts to filter, if any. You can find the new model in CollectF1R2Counts and LearnReadOrientationModel.
  • While the default Mutect2 filters achieve a very low false positive rate, the new FilterAlignmentArtifacts tool is useful when even more stringent filtering is needed. The new tool catches mapping errors by re-aligning reads and checking for multi-mapping, reducing the number of false negatives by realigning reads and their mates together. Because it realigns to the best existing reference regardless of the original alignment, you can now look for mapping errors in an hg19 bam by checking it against the hg38 reference.
  • An overhauled CalculateContamination tool maximizes accuracy even in the face of large numbers of copy number variations, outperforming other known tools.

Sensitivity and precision improvements

Both sensitivity and precision are better than they were a year ago. In the coming weeks and months, watch out for additional improvements, including:

  • Improvements in Mutect2 that account for PCR error and allow you to keep both reads. Mutect2 currently discards one read of a mate pair that overlaps over a variant site on the grounds that the reads are not independent sources of evidence, since they might arise from the same PCR error.

  • A somatic likelihoods model that will force a read and its mate to support the same haplotype, since they must be from the same piece of DNA.

  • A Bayesian model that enables FilterMutectCalls to learn the spectrum of tumor allele fractions and better distinguish between false positives and real somatic variants.

  • An improved FilterMutectCalls tool able to learn whether a sample has few or many somatic mutations - and filter accordingly.

Performance Improvements

On the software engineering side of things, we have made Mutect2 faster and cheaper. Whole-genome pairs of 60x tumors and 60x normals take ~24 hours of total CPU time and cost about a dollar on Google cloud in our public validations on Firecloud. The Mutect2 WDL can parallelize this to reduce the wall clock time to an hour or so! The WDL supports direct reading from google buckets, so a Mutect2 run streams only the reads it needs.

Want to help improve Mutect2?

This is just the tip of our wish list for the next year, and we look forward to getting a lot done. While the joint assembly and genotyping work well, our initial attempt at multi-sample filtering will need improvement. We would be very grateful for suggestions, requests, suggestions for improvement, and requests for features by early adopters.



Comment on this article


- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @curroortuno: Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From reads to…
3 Sep 19
Thank you @murilocervato for hosting our GATK workshop in Sao Paolo last week! Great group of participants, we’ll s… https://t.co/QbT7P5dw5k
3 Sep 19
@RealMattJM “Convoluted”, huh? We see what you did there... https://t.co/GbnpCFWPOX
29 Aug 19
#GATK workshop caption competition: what is deep learning developer Sam Friedman trying to say here? https://t.co/qaJwg3lF1W
28 Aug 19
@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19

- Our favorite tweets from others

Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From re… https://t.co/ISBVX2Xwr5
3 Sep 19
Another successful #GATK workshop in the books! @TerraBioApp @gatk_dev https://t.co/oFzPLei0f9
3 Sep 19
Day 2 of #GATK workshop this time in São Paulo, Brazil! Hands-on tutorials using @TerraBioApp #GATK Best Practices… https://t.co/GgtENDNfhk
28 Aug 19
In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no… https://t.co/FKQTouZjbv
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying… https://t.co/er2klIcw5i
18 Jul 19

See more of our favorite tweets...