Tool names are cased differently

The first thing to note is how the tool names are different. In GATK3 it's spelled MuTect2 with an uppercase T, whereas in GATK it's spelled Mutect2 with a lowercase t. Not only is the new tool name easier to type, it helps us distinguish which version of the tool a document refers to.

The two Mutects differ in functionality

And their respective workflow tools differ too. The table shows the tools for the workflow functionalities for GATK3 versus GATK4.

GATK3 MuTect2 will remain in beta status. GATK4 Mutect2 is in beta status as of the official GATK4 release.

One major difference is GATK4 breaks off filtering into a separate tool, FilterMutectCalls. In GATK3, MuTect2 both calls and filters variants. In GATK4, Mutect2 is focused mostly on calling and does some minimal upfront filtering of obvious non-somatic sites. However, it leaves the majority of filtering to FilterMutectCalls. This separation makes it easier to test changes to filtering thresholds as the computationally expensive calling is decoupled from filtering.
Another major difference is in site versus allele filtering against the germline resource. GATK3 MuTect2 prefilters sites in the germline resource regardless of the allele in the tumor. GATK4 Mutect2 distinguishes alleles in the germline resource and only filters the site if the tumor allele matches. If the alleles are different, then the tool considers the allele a putative somatic mutation.

Filtering of sites in the panel of normals (PoN) and the matched normal remains unchanged, except that the tool will prefilter most of these such that site records are absent from the VCF.

With the 1000 Genomes Project now wrapped up, and with the availability of germline variant callsets from even larger cohorts, i.e. gnomAD, the germline component of human cancers is something that GATK4 Mutect2 can account for in a more sophisticated way. GATK4 Mutect2 factors the germline population allele frequencies towards somatic probability calculations. For a given allele in the tumor, if it is present in the germline resource, its probability of being a somatic mutation is weighted inversely to the frequency with which the allele is observed in the population.
Here are the differences between GATK3 MuTect2 and GATK4 Mutect2 as a list.

  1. The filtering functionality that annotates the FILTER column is now done by a separate tool called FilterMutectCalls. To filter further based on sequence context artifacts, additionally use FilterByOrientationBias. Note that Mutect2 still performs some upfront filtering (see next point).
  2. Mutect2 ignores sites present in the Panel of Normals (PoN) as well as sites that correspond to high fraction variants in the normal. By doing so, the tool avoids spending time in steps such as graph assembly and pairHMM alignments that cost compute. However, there is an option to force the tool to run the full process on sites that are in the PoN (--genotype-pon-sites), which can be useful in comparing results to older MuTect versions.
  3. If using a known germline variants resource, then it must contain population allele frequencies, e.g. if working on humans then from gnomAD. The VCF INFO field contains the allele frequency (AF) tag. See the GATK Resource Bundle or the Mutect2 tool documentation for an example.
  4. To create the PoN, call on each normal sample using Mutect2's tumor-only mode and then use GATK4's CreateSomaticPanelOfNormals, a tool new to GATK4. This contrasts with the GATK3 workflow, which uses an artifact calling mode in MuTect2 and CombineVariants for PoN creation. In GATK4, omitting to filter with FilterMutectCalls achieves the same result.
  5. Instead of using a maximum likelihood estimate to calculate the variant likelihoods, GATK4 Mutect2 marginalizes over allele fractions using a Bayesian likelihoods model. See the Mutect2 methods whitepaper for algorithm details. GATK3 MuTect2 uses allele depths (AD) directly to estimate allele fractions and calculate likelihoods. In contrast, GATK4 Mutect2 factors for the statistical error inherent in allele depths by marginalizing over allele fractions when calculating likelihoods.
  6. In GATK4, we recommend including cross-sample contamination estimates from CalculateContamination when filtering with FilterMutectCalls. CalculateContamination, in turn, relies on the results of GetPileupSummaries and can incorporate information from the matched normal, if available, when calculating the contamination in the tumor sample.

What remains unchanged is that neither version calls potential loss of heterozygosity (LoH) events. To detect LoH, see the Somatic Copy Number Variant (CNV) workflow.

You can find tutorials that explore consideration in the GATK3 workflow or the GATK4 workflow on our forum.

  • Tutorial#9183 outlines the GATK3 MuTect2 workflow.
  • Tutorial#11136 outlines the GATK4 Mutect2 workflow.
  • If you are wondering about the differences between Mutect2 and HaplotypeCaller, see Article#11127.
  • If you are nostalgic for the original MuTect, you can get it as a standalone jar from the MuTect1 Download page. The version is v1.1.7 and it requires Java 7 to run. MuTect1 is a somatic pileup caller that calls SNVs only. That is, it does not call indels, and therefore workflows that use it should include indel realignment. Version 1.1.7 writes results to VCF format (specify with –-vcf). For example usage commands see this thread. For prior versions that give results in MAF format, see the Broad CGA website. For workflows that use a composite of MuTect1 SNV calls and MuTect2 indel calls, see FireCloud Article#7512.

Return to top

Fri 8 Dec 2017

picard_gatk_mj on 8 Dec 2017

I am still can not understand the plot said the the gatk4 mutect2 difference PON,normal, germline resource about the site (C, C) to (-, G). you said " only filters the site if the tumor allele matches", but does G, C matches, (matches means the same base or complementary base), thanks a lot.

ying_sheng_1 on 8 Dec 2017

This is explained in the following page if you are still interested: Under "A variant allele in the case sample is not called if the site is variant in controls.".

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


RT @RealMattJM: Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking into the…
28 Oct 19
RT @MascatB: After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a standard an…
25 Oct 19
RT @FProgresoysalud: Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación clíni…
25 Oct 19
Last day of the last #GATK bootcamp of the year — going out in style with a tutorial on working with tabular 1000 G…
24 Oct 19
RT @curroortuno: Having a "workflow-ful" day in GATK workshop about #WDL #Cromwell and #Docker @gatk_dev @ClinicalBioinfo @FProgresoysalud
24 Oct 19

- Our favorite tweets from others

@CBIB_UNAB @gatk_dev @TerraBioApp This project is the product of ongoing collaborations with @SGWilliams1980 and…
28 Oct 19
Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking i…
28 Oct 19
After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a…
25 Oct 19
Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación cl…
25 Oct 19

See more of our favorite tweets...