This is one of two posts announcing the imminent beta release of GATK4; for a technical description of features, see this other post.

"Wait, what?" Yes, you read that right, we're moving GATK4 to a fully open source license -- specifically, BSD 3-clause. And to be clear, this applies to all of GATK4. Not just the core framework (which, little known fact, has always been open source), but all the tools that were previously "protected", including HaplotypeCaller, the new CNV discovery tools, everything. The whole enchilada.

Old-timers in the field (i.e. anyone with what, 3+ years experience?) will recognize this as a major shift. An important subset of the GATK -- some might say "all the really valuable bits" -- has been under a mixed licensing model since version 2.0 was released in 2012. Under this mixed model, GATK was free for academic/non-profit research purposes, while any for-profit use required a paid commercial license. The proceeds funded further GATK development and support.

Admittedly the move from the initial open-source state of GATK 1.x to the mixed licensing model caused a fair amount of debate. I'm not going to revisit in full (even my therapist is sick of hearing about it), but it's fair to say that the licensing created an obstacle for our interactions with some other groups, and that it raised some barriers to access to GATK, especially for smaller companies and startups.

Since then the context within which we operate at the Broad has evolved significantly: a little over two years ago, our small development team was assimilated into a then-newly created larger group called the Data Sciences Platform (DSP), which aims to tackle the big challenges in genomics with robust engineering solutions. This involves applying some novel approaches compared to traditional academic software development, including: 1) give engineers a good home; 2) focus on products, not projects; and 3) maximize openness. This last point in particular means that our DSP mothership-within-Broad recognizes the immense potentiating role of open-source software in driving technological and methodological innovation. In fact, all of DSP's software products have been open-source since its inception, with the notable exception of GATK, which it inherited in a mixed state.

Over the past two years, the collaborations that DSP has cultivated with external groups have immensely benefitted the development of the new framework that would eventually become GATK4. Key features that we have come to rely on were contributed as open-source code by external collaborators: the GenomicsDB datastore that allows us to scale joint genotyping to tens of thousands of whole genomes, by Karthik Gururaj and colleagues at Intel; the Genomics Kernel Library, which provides many impressive speedups for the GATK, by George Powley at Intel; the NIO functionality that allows us to access data on Google Cloud Storage directly, by JP Martin at Google; and the Apache Spark support that allows us to parallelize operations in a much more robust way than before, by Tom White at Cloudera. And it's not all about institutional collaborations; we have also received spontaneous contributions from individuals such as Daniel Gómez-Sánchez of the Institut für Populationsgenetik of Vienna, which have collectively enhanced the GATK codebase and its value to the user community.

So with GATK4 on the cusp of release, and with enthusiasm from all of us at the Broad, we're seizing this opportunity to do a reboot* and bring into alignment our mandate (to build great software), our mission (to empower great research) and our means: a more community-minded approach anchored in openness and free exchange of ideas.

* (at least we had already ditched Jar-Jar "Phone Home" Binks...)

I expect the benefits of this new direction are fairly self-evident, so I'll do us all a favor and close with just one last, somewhat personal note specifically from the development team. We want to thank all the collaborators who have worked with us so far for their support, their invaluable contributions and their faith in what we could accomplish together. And as we turn over this new leaf, we look forward to welcoming into the GATK family anyone who would like to see how much further we can push the genomics envelope.

Return to top

raonyguimaraes on 24 May 2017

Hi Geraldine, So I don't need to buy a commercial license anymore? Is that correct? Looks amazing, Congratulations!

thondeboer on 24 May 2017

Now THAT is awesome and very welcome news! Congratulations!

Geraldine_VdAuwera on 24 May 2017

That's right, no more commercial license needed to use GATK! Technically this applies starting with GATK4, but operationally we're not expecting new commercial users to buy a license now if they need to run an older version for whatever reason. The licensing manager at would be the right person to contact to get further clarifications on this topic of course.

conradL on 24 May 2017

Congratulations, and thanks! I think everyone (including Broad) will benefit from this.

fac2003 on 24 May 2017

Congratulations. We have developed an open source codebase to train deep neural networks and use them to call genotypes (see, similar idea to DeepVariant, but much more efficient). Since the project licenses are now compatible (BSD and Apache 2), this code could be integrated into GATK if there is interest at your end. Let me know who would be a good contact to discuss this. Best. FC

dsmarcoantonio on 24 May 2017

This is Wonderful. Congratulations this is a huge change.

Geraldine_VdAuwera on 24 May 2017

Thanks everyone! We were already pretty excited to be taking this step, but the outpouring of positive responses to the announcement really makes us feel like it was the right thing to do. It's hugely encouraging -- great positive reinforcement ;-) @fac2003 I've forwarded your proposal to the tech leads; we'll get back to you once they've had the chance to discuss. Thanks for reaching out!

magicDGS on 24 May 2017

That's great, for both the users and developers! And thank you very much for the recognition of my contribution in the "public" framework, I really appreciate it. I'm looking forward to continue my contribution, and after this also in the "protected" part!

Pepetideo on 24 May 2017

Being one of the vocal "debaters" arguing back then that the move to a closed licencing model was a terrible decision. I am really happy you are back on the right path (even if it did require 5 years). Sorry that you required therapy after interacting with me :)

Carl_Li on 24 May 2017

just awesome!

yzharold on 24 May 2017

Wow, major shifting, as an frequent user, it is a great cause, perhaps the DL is the next step for taking advantage of AI for genetic application in general.

nitinCelmatix on 24 May 2017

This is a wonderful news! Eagerly waiting for GATK4 general release! Can you give us an idea when can we expect that? Thanks so much!

Geraldine_VdAuwera on 24 May 2017

We plan to announce the definitive 4.0 release date early next week.

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


RT @seandavis12: Calling Somatic SNVs and Indels with Mutect2
3 Dec 19
Couldn’t have said it better
3 Dec 19
RT @broadinstitute: Genome sequencing technology allows for massive amounts of high-quality data to be produced. Researchers at Broad have…
27 Nov 19
Heads up: we’re moving the GATK website, docs and forum to a new platform. Full story and breakdown of key changes…
21 Nov 19
RT @RealMattJM: Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking into the…
28 Oct 19

- Our favorite tweets from others

@CBIB_UNAB @gatk_dev @TerraBioApp This project is the product of ongoing collaborations with @SGWilliams1980 and…
28 Oct 19
Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking i…
28 Oct 19
After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a…
25 Oct 19
Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación cl…
25 Oct 19

See more of our favorite tweets...