This is one of two posts announcing the imminent beta release of GATK4; for a technical description of features, see this other post.

"Wait, what?" Yes, you read that right, we're moving GATK4 to a fully open source license -- specifically, BSD 3-clause. And to be clear, this applies to all of GATK4. Not just the core framework (which, little known fact, has always been open source), but all the tools that were previously "protected", including HaplotypeCaller, the new CNV discovery tools, everything. The whole enchilada.

Old-timers in the field (i.e. anyone with what, 3+ years experience?) will recognize this as a major shift. An important subset of the GATK -- some might say "all the really valuable bits" -- has been under a mixed licensing model since version 2.0 was released in 2012. Under this mixed model, GATK was free for academic/non-profit research purposes, while any for-profit use required a paid commercial license. The proceeds funded further GATK development and support.

Admittedly the move from the initial open-source state of GATK 1.x to the mixed licensing model caused a fair amount of debate. I'm not going to revisit in full (even my therapist is sick of hearing about it), but it's fair to say that the licensing created an obstacle for our interactions with some other groups, and that it raised some barriers to access to GATK, especially for smaller companies and startups.

Since then the context within which we operate at the Broad has evolved significantly: a little over two years ago, our small development team was assimilated into a then-newly created larger group called the Data Sciences Platform (DSP), which aims to tackle the big challenges in genomics with robust engineering solutions. This involves applying some novel approaches compared to traditional academic software development, including: 1) give engineers a good home; 2) focus on products, not projects; and 3) maximize openness. This last point in particular means that our DSP mothership-within-Broad recognizes the immense potentiating role of open-source software in driving technological and methodological innovation. In fact, all of DSP's software products have been open-source since its inception, with the notable exception of GATK, which it inherited in a mixed state.

Over the past two years, the collaborations that DSP has cultivated with external groups have immensely benefitted the development of the new framework that would eventually become GATK4. Key features that we have come to rely on were contributed as open-source code by external collaborators: the GenomicsDB datastore that allows us to scale joint genotyping to tens of thousands of whole genomes, by Karthik Gururaj and colleagues at Intel; the Genomics Kernel Library, which provides many impressive speedups for the GATK, by George Powley at Intel; the NIO functionality that allows us to access data on Google Cloud Storage directly, by JP Martin at Google; and the Apache Spark support that allows us to parallelize operations in a much more robust way than before, by Tom White at Cloudera. And it's not all about institutional collaborations; we have also received spontaneous contributions from individuals such as Daniel Gómez-Sánchez of the Institut für Populationsgenetik of Vienna, which have collectively enhanced the GATK codebase and its value to the user community.

So with GATK4 on the cusp of release, and with enthusiasm from all of us at the Broad, we're seizing this opportunity to do a reboot* and bring into alignment our mandate (to build great software), our mission (to empower great research) and our means: a more community-minded approach anchored in openness and free exchange of ideas.

* (at least we had already ditched Jar-Jar "Phone Home" Binks...)

I expect the benefits of this new direction are fairly self-evident, so I'll do us all a favor and close with just one last, somewhat personal note specifically from the development team. We want to thank all the collaborators who have worked with us so far for their support, their invaluable contributions and their faith in what we could accomplish together. And as we turn over this new leaf, we look forward to welcoming into the GATK family anyone who would like to see how much further we can push the genomics envelope.

Return to top

raonyguimaraes on 24 May 2017

Hi Geraldine, So I don't need to buy a commercial license anymore? Is that correct? Looks amazing, Congratulations!

thondeboer on 24 May 2017

Now THAT is awesome and very welcome news! Congratulations!

Geraldine_VdAuwera on 24 May 2017

That's right, no more commercial license needed to use GATK! Technically this applies starting with GATK4, but operationally we're not expecting new commercial users to buy a license now if they need to run an older version for whatever reason. The licensing manager at would be the right person to contact to get further clarifications on this topic of course.

conradL on 24 May 2017

Congratulations, and thanks! I think everyone (including Broad) will benefit from this.

fac2003 on 24 May 2017

Congratulations. We have developed an open source codebase to train deep neural networks and use them to call genotypes (see, similar idea to DeepVariant, but much more efficient). Since the project licenses are now compatible (BSD and Apache 2), this code could be integrated into GATK if there is interest at your end. Let me know who would be a good contact to discuss this. Best. FC

dsmarcoantonio on 24 May 2017

This is Wonderful. Congratulations this is a huge change.

Geraldine_VdAuwera on 24 May 2017

Thanks everyone! We were already pretty excited to be taking this step, but the outpouring of positive responses to the announcement really makes us feel like it was the right thing to do. It's hugely encouraging -- great positive reinforcement ;-) @fac2003 I've forwarded your proposal to the tech leads; we'll get back to you once they've had the chance to discuss. Thanks for reaching out!

magicDGS on 24 May 2017

That's great, for both the users and developers! And thank you very much for the recognition of my contribution in the "public" framework, I really appreciate it. I'm looking forward to continue my contribution, and after this also in the "protected" part!

Pepetideo on 24 May 2017

Being one of the vocal "debaters" arguing back then that the move to a closed licencing model was a terrible decision. I am really happy you are back on the right path (even if it did require 5 years). Sorry that you required therapy after interacting with me :)

Carl_Li on 24 May 2017

just awesome!

yzharold on 24 May 2017

Wow, major shifting, as an frequent user, it is a great cause, perhaps the DL is the next step for taking advantage of AI for genetic application in general.

nitinCelmatix on 24 May 2017

This is a wonderful news! Eagerly waiting for GATK4 general release! Can you give us an idea when can we expect that? Thanks so much!

Geraldine_VdAuwera on 24 May 2017

We plan to announce the definitive 4.0 release date early next week.

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


@dbernick @mattmight Not really — interpretation is downstream of our space; we defer to the subject matter experts on this one.
11 Dec 18
@Greg_Owens No need to trim your reads for GATK -- in fact it's better not to. The tools take base quality into account appropriately.
7 Dec 18
RT @yguo2k: Check out the MIA talks at @broadinstitute. Very nice and cutting-edge research talks bridging computation/ML and biology/genom…
30 Nov 18
@samuel_barreto8 Hah no worries, it’s good for us to know where are the pain points. We definitely need to do a bet…
30 Nov 18
@samuel_barreto8 Can you tell us what kind of issues you've encountered?
27 Nov 18

- Our favorite tweets from others

Have Cromwell running on AWS Batch, very easy to work with WDL and get things working. Cool stuff!
6 Nov 18
Amazing talk by @dgmacarthur about the expansion of gnomAD and how size and diversity increase filtering power
30 Oct 18
@geoffjentry Who doesn't love a Warp Pig? @WDL_dev and @gatk_dev are on the ball getting stickers out. Was happy to…
22 Oct 18
#ASHG18 VA: call with GATK @gatk_dev. Look for pathogenic / likely pathogenic. leverage ClinVar.
17 Oct 18
If you think your fascination with #GATK hit the roof wait until you meet @gatk_dev team! Has been a wonderful week…
21 Sep 18

See more of our favorite tweets...