It's a beautiful early autumn day in New England, with small patches of vibrant reds and yellows in the foliage just hinting at the fiery displays to come. Perfect weather for me to de-lurk and bring you some news! (I promise it's not GATK5)

The long and short of it (but mostly the short) is that we've started collaborating with the DRAGEN team at Illumina, led by Rami Mehio, to improve GATK tools and pipelines. There's a press release if you want the official announcement, or you can read on to get the long version from the GATK team's perspective.


If you're not familiar with DRAGEN, the name stands for Dynamic Read Analysis for GENomics and refers to a secondary analysis platform originally created by a company called Edico Genome, which was acquired by Illumina last year. The DRAGEN team became widely known for making genomic data processing insanely fast on special hardware, but they're not just a speed shop. They have top-notch computational biology expertise: when they reimplemented GATK tools like HaplotypeCaller in DRAGEN, they made some clever tweaks that improved the scientific accuracy of the results. They've done this for other tools as well, and they've also developed their own novel algorithms for other use cases.

That alone is already a big motivation for us to team up with them: they have great ideas for improving our tools and pipelines, and they're willing to share them. Works for us! Then there's the bigger picture of what this means for the kind of research we are working to enable. Both of our teams feel pretty strongly that as the amount of genomic data generation snowballs, particularly in the biomedical field, it's really important to ensure that the results of different studies can be cross-analyzed. For that to be possible, we need to standardize secondary analysis as much as possible to minimize batch effects. We believe that by working together to consolidate our methods and pipeline development efforts, we can remove a major source of heterogeneity in the ecosystem.

So what does that mean in practice?

Rest assured GATK itself is still going to be GATK, developed by our team at the Broad and released under the same BSD-3 open-source license you know and love. Any improvements that the DRAGEN team contributes to GATK tools will be integrated into the GATK codebase under the same BSD-3 license.

Beyond code improvements to GATK itself, there will also be some changes to the composition of the Best Practices pipelines. For example, we're going to replace BWA with the DRAGEN aligner, which is quite a bit faster, in our DNA pre-processing pipelines (full details and benchmarking results to follow). To reflect the collaborative nature of the work, any pipelines we co-develop with the DRAGEN team will be named DRAGEN-GATK Best Practices.

All the software involved in the DRAGEN-GATK pipelines will be fully open source and available in Github, including a new open source version of the DRAGEN aligner, and we'll continue to publish WDL workflows for every pipeline in Github and in Terra workspaces. Importantly, it will all still be runnable on normal hardware, whether you're doing your work on a local server, on-premises HPC or in the cloud. We'll also continue to provide free support for all GATK tools and pipelines, and as part of that we're going to work with the DRAGEN team to make sure we can provide the same level of high quality support for the tools that they provide.

The DRAGEN team also plans to produce a hardware-accelerated version of any DRAGEN-GATK Best Practices pipeline that we co-develop, which Illumina will offer on the commercial DRAGEN system. We won't touch that work at all (it's not our jam), but we will run comparative evaluations to validate that the hardware-accelerated version of any given pipeline produces results that are functionally equivalent to the "universal" open source software version. To be clear, it won't be just a rubber-stamp approval; we're highly motivated to make sure that the pipeline implementations are functionally equivalent because our colleagues in the Broad’s Genomics Platform are planning to switch some of the Broad's production pipelines to the DRAGEN hardware version for projects where speed is a critical factor.

On that note, what I personally find the most exciting about this partnership is that going forward, everyone in the research community will be able to take advantage of the best ideas from both our teams regardless of whether they want the "regular" software or a hardware-accelerated version. You could even switch between the two within the course of a project and still be able to cross-analyze the outputs. Over the years, I've had to tell a lot of people "sorry, you're going to have to reprocess everything with the same pipeline" so this feels like a huge step in the right direction.

Okay, this sounds great -- so when will the improved tools and pipelines be available?

We're already actively working on porting over improvements from the DRAGEN team, so if you follow the GATK repository on Github you should start seeing relevant commits and pull requests any day now. Barring any unforeseen complications, the tool improvements should roll out into regular GATK releases over the next couple of months, and we expect to release the first full DRAGEN-GATK pipeline (for germline short variants) in the first quarter of 2020. We'll post updates here on the blog about how it's going and what you can expect to see as the code rolls in and the release calendar firms up.

In the meantime, don't hesitate to reach out to us if you have any questions that aren't addressed here or in the press release. Note that if you're going to be at the ASHG meeting in Houston later this month, Angel Pizarro and I will be talking about this collaboration at the Illumina Informatics Summit that precedes the conference on Tuesday Oct 15, and I will be available at the Broad Genomics booth in the exhibit hall at ASHG itself on Wednesday Oct 16 if you'd like to discuss this in person. I hope to see a lot of you there!


Return to top

SkyWarrior on 30 Sep 2019


I cannot wait to test the DRAGEN-aligner!

matdmset on 30 Sep 2019


Can we get notified when either part is released? I'm very curious about the performance benchmarks between DRAGEN and BWA.

Geraldine_VdAuwera on 30 Sep 2019


Hi @matdmset, yes we'll put out some updates on the blog (and Twitter) as new information and data rolls out, including some benchmarking results. We're working on a plan to do that with the DRAGEN team.

raonyguimaraes on 30 Sep 2019


Hi Geraldine, great news!

nans on 30 Sep 2019


Exciting times!! @Geraldine_VdAuwera Will the variant caller also change to Dragen's caller or HC still holds good ?

Geraldine_VdAuwera on 30 Sep 2019


Hi @nans, for germline short variants it’ll still be HaplotypeCaller, with a few tweaks contributed by the DRAGEN team that improve accuracy.

Geraldine_VdAuwera on 30 Sep 2019


Hi all, we've put together a short five-question survey to assess how you would prefer to receive updates about DRAGEN-GATK, which includes an option to sign up for a mailing list and/or newsletter. Please fill it out and pass it along to any colleagues who might be interested so that we can tailor our communications plan accordingly. Thanks! https://www.surveymonkey.com/r/GK9YZ2B




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@brown_birds The GATK support forum is the best place to ask, our frontline specialist will be able to help you with this.
18 Oct 19
It's hot, it's humid, it's #ASHG19 in Houston, TX. Join us at @broadgenomics booth 714 in the exhibition hall to ch… https://t.co/An0WXnYw7z
16 Oct 19
Interested in hearing more about our DRAGEN-GATK partnership with @illumina? Fill out this survey to let us know yo… https://t.co/7Fadggm7Rp
16 Oct 19
RT @datadriveby: GATK and DRAGEN collaboration presented by @VdaGeraldine of @gatk_dev and @delagoya of @illumina at #ASHG19. Interesting t…
15 Oct 19
Questions about our new partnership with @illumina DRAGEN? Check out the blog post and handy graphic that explains… https://t.co/fBnjh45E7o
1 Oct 19

- Our favorite tweets from others

Today from 1:30 - 2:30 pm @dgmacarthur and @LFranciol will be at the Broad #ASHG19 booth talking about the #gnomAD… https://t.co/Rse2XodtrZ
17 Oct 19
DRAGEN-GATK roadmap looking very interesting. Several complementary options will be available for running stuff on-… https://t.co/jxizQkM3q6
15 Oct 19
As a prior card carrying bioinformatician, it’s great to see @illumina and @broadinstitute coming together to solve… https://t.co/xGZqi8NmT4
15 Oct 19
GATK and DRAGEN collaboration presented by @VdaGeraldine of @gatk_dev and @delagoya of @illumina at #ASHG19. Intere… https://t.co/nbE8HGoOfu
15 Oct 19
In a new collaboration, the @gatk_dev team and the @illumina DRAGEN Bio-IT Platform are co-developing open-source g… https://t.co/oPjjk1lBqY
30 Sep 19

See more of our favorite tweets...