GATK 4.1.0.0 Release

Posted by bhanuGandham on 30 Jan 2019 (7)


I'm delighted to introduce the first major version update to GATK4, version 4.1.0.0! This release includes several exciting new analysis pipelines and tons of improvements to existing tools, many of which are now officially out of beta (YAY!).

You can check out the full release notes on Github to get a sense of the scale of this release, but fair warning, it's a lot. In fact, we felt there was far too much in this release to even give a satisfying overview in a single blog post, so we decided to develop a series of nine blog posts that each cover one of the main functional areas of improvement. The table below lists the nine posts along with a short summary for each. Each blog post was written by the lead developer(s) on that project; it outlines the history of the challenge at hand, the approach that they developed to solve it, and future development prospects.

We plan to publish two posts per week starting tomorrow, so keep an eye out for them, subscribe to forum notifications or follow @gatk_dev on Twitter! We'll add links to the table as the posts become available.

And now without further ado I present to you GATK4.1!!!


Short Variant Caller Roundup Turbocharging Germline Short
Variants Calling
New Features and Improvements
in Mutect2
Two Sisters!

Mutect2 and HaplotypeCaller both aim to achieve sensitive SNP and indel discovery, though in very different contexts. Despite their different applications, they're more closely related than first meets the eye. GATK 4.1 features several performance and accuracy improvements, spurred by Mutect2 development and simultaneously benefiting both tools. We're also debuting a new beta version of GVCF mode for Mutect2, bringing the HaplotypeCaller's reference confidence model to somatic analysis.
Be big, feel small!

The Broad generates 20 terabytes of data every day, so it is no surprise that we focus much of our efforts in the germline space on processing more data more efficiently. While efficiency improvements in GATK 4.1 satisfy users with the largest cohorts (think All of Us), rest assured we aren't discounting smaller cohorts! See how GATK 4.1 facilitates generating larger, cheaper germline cohort callsets and improves accuracy and usability for single-sample clinical cases.
Expanding the use cases for a proven tool

Enhanced sensitivity and precision allows GATK4.1’s Mutect2 to encompass previously challenging domains, including mitochondria, cfDNA, and multiple tumor samples. We’ve improved performance and accuracy in single-sample calling, and have ambitious plans for more progress.

New! Mutect2 for
Mitochondrial Analysis
New! Mutect2 for Liquid Biopsy Spark Improvements
Overcoming barriers to understanding the mitochondrial genome

Calling SNPs and INDELs on the Mitochondrial genome poses unique challenges, due to its circular shape and very high copy number. We now have a tested and validated “Best Practices” pipeline using Mutect2 to call short variants at arbitrary allele fractions in the mitochondrial genome.
Adapting a proven tool to liquid
biopsy studies


Coming soon, a pipeline using MuTect2 for low allele fraction variant detection from duplex-sequenced liquid biopsies. Liquid biopsies present novel challenges — requiring high sensitivity at low allele fraction. With a few minor adjustments to parameters passed to MuTect2 and the addition of a new filter, our pipeline achieves > 90% sensitivity at ~1% allele fraction with less than 1FP / MB on three separate panels with territory as large as 2MB.
Delivering results faster

We continue to improve our support for users who want to run on Apache Spark with GATK 4.1. This release includes major improvements to MarkDuplicatesSpark, in particular, as well as the full ReadsPipelineSpark, powered by a brand new Spark I/O library, Disq!

CNV out of Beta! Funcotator out of Beta! CNN out of Beta!
A production-ready tool to call copy-number variants

In the current stage of evolution, we can still see traits inherited from venerable ancestors in the ModelSegments and GermlineCNVCaller pipelines. However, the GATK 4.1 pipelines also feature new adaptations that dramatically improve performance and enable scalability from exomes to genomes. The GATK 4.1 release brings these pipelines out of beta - adding CNV calling officially to GATK’s growing set of capabilities.
A production-ready tool to predict variant function

We created Funcotator to be a fast and accurate functional annotation tool. The latest release of GATK includes updates to Funcotator that make it even more robust and correct, as well as flexible and prod-ready. The addition of two sets of data sources to go with Funcotator (including Gencode, ClinVar, gnomAD, and more) enable it to be used out-of-the-box to add annotations to either germline or somatic variants.
A production-ready suite of tools for single-sample variant filtration

We present the CNNVariant suite of tools, a compliment to VQSR for single-sample variant filtration. This toolset includes a pre-trained model — ready to score variants — as well as the capability to train new models for new types of data. We gathered a massive amount of data together to train our model, and validated its performance against different biological samples, sequencing machines, and protocols.


Return to top

Wed 30 Jan 2019

SkyWarrior on 30 Jan 2019


Awesome release roundup. However we are still waiting for the much desired changes to HaplotypeCaller (AKA missed calls due to -L parameter.)

hugolam on 30 Jan 2019


Great and thanks. After the update to 4.1, I saw the following error with the "--resource" parameter in VariantRecalibrator: A USER ERROR has occurred: Couldn't read file file:///proj/hg19/omni,known=false,training=true,truth=false,prior=12.0:/proj/hg19/omni.vcf. Error was: It doesn't exist. The same command works in the previous version, 4.0.12.0. It seems like now its adding the current directory to the parameter --resource and making the whole thing a "file" object? or the API has changed? thanks!

cnorman on 30 Jan 2019


@hugolam The command line syntax for "tagged" arguments such as `--resource` changed for 4.1. Instead of specifying the tags as part of the argument _value_, specify them as part of the argument _name_: `--resource:known=false,training=true,truth=false,prior=12.0 /proj/hg19/omni.vcf`

yingchen69 on 30 Jan 2019


Hi, where is the doc for gatk4 mitochondria pipeline? The github page (https://github.com/gatk-workflows/gatk4-mitochondria-pipeline) is blank. Best, Ying

leshwill on 30 Jan 2019


how do I make GenomicsDB workspaces by chromosome? Does the example -L 20 in the documentation mean chromosome 20? Thank you for your support.

gauthier on 30 Jan 2019


@SkyWarrior David B. has a lead on the -L issue: https://github.com/broadinstitute/gatk/issues/3697 I think he has a prototype, but he's still working through some additional Mutect2 false positives along with everything else on his plate. Hopefully there'll be a PR in a few weeks.

SkyWarrior on 30 Jan 2019


Thanks @gauthier. We know that you guys are super busy to make things even better. :)




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @konradjk: Our slides from today's @broadinstitute MPG session are now up! Slides by @dgmacarthur myself @cureffi @nickywhiffin and spec…
12 Apr 19
RT @NICR_NCL: NICR bring the @broadinstitute to Newcastle - the workshop focuses on the core steps involved in calling variants with the Br…
11 Apr 19
Newest #GATK workshop announced: Newcastle, UK -- June 18-21 -- register now at https://t.co/VP2ngzlfuh https://t.co/miGkjjZx82
10 Apr 19
Workshop season is right around the corner -- don't miss out, sign up now https://t.co/cwfEejyCkL
10 Apr 19
Read more about the @BroadGenomics liquid biopsy analysis pipeline in this blog post by method developer Mark Fleha… https://t.co/B5YNCRXoxq
3 Apr 19

- Our favorite tweets from others

@lukwam @broadinstitute @gatk_dev Nice to see Cromwell and GATK as the tools of choice
11 Apr 19
Demo: Checking output from GATK best practices. @broadinstitute @gatk_dev #gatk #genomics #cromwell #bestpractices… https://t.co/iAwmy10zDJ
11 Apr 19
NICR bring the @broadinstitute to Newcastle - the workshop focuses on the core steps involved in calling variants w… https://t.co/FRhlj2iws1
10 Apr 19
Have questions about genomic data generation? @AJH_Genomics and @JaneW_Genomics are waiting for you at Broad Booth… https://t.co/IfKfC63Ese
1 Apr 19
The second #AACR19 poster session has begun! Find a guide to all Broad posters here: https://t.co/Lnaj54XbfT https://t.co/bnKU2yZUOj
1 Apr 19

See more of our favorite tweets...