This is one of two posts announcing the imminent beta release of GATK4; for a technical description of features, see this other post.

"Wait, what?" Yes, you read that right, we're moving GATK4 to a fully open source license -- specifically, BSD 3-clause. And to be clear, this applies to all of GATK4. Not just the core framework (which, little known fact, has always been open source), but all the tools that were previously "protected", including HaplotypeCaller, the new CNV discovery tools, everything. The whole enchilada.


Old-timers in the field (i.e. anyone with what, 3+ years experience?) will recognize this as a major shift. An important subset of the GATK -- some might say "all the really valuable bits" -- has been under a mixed licensing model since version 2.0 was released in 2012. Under this mixed model, GATK was free for academic/non-profit research purposes, while any for-profit use required a paid commercial license. The proceeds funded further GATK development and support.

Admittedly the move from the initial open-source state of GATK 1.x to the mixed licensing model caused a fair amount of debate. I'm not going to revisit in full (even my therapist is sick of hearing about it), but it's fair to say that the licensing created an obstacle for our interactions with some other groups, and that it raised some barriers to access to GATK, especially for smaller companies and startups.

Since then the context within which we operate at the Broad has evolved significantly: a little over two years ago, our small development team was assimilated into a then-newly created larger group called the Data Sciences Platform (DSP), which aims to tackle the big challenges in genomics with robust engineering solutions. This involves applying some novel approaches compared to traditional academic software development, including: 1) give engineers a good home; 2) focus on products, not projects; and 3) maximize openness. This last point in particular means that our DSP mothership-within-Broad recognizes the immense potentiating role of open-source software in driving technological and methodological innovation. In fact, all of DSP's software products have been open-source since its inception, with the notable exception of GATK, which it inherited in a mixed state.

Over the past two years, the collaborations that DSP has cultivated with external groups have immensely benefitted the development of the new framework that would eventually become GATK4. Key features that we have come to rely on were contributed as open-source code by external collaborators: the GenomicsDB datastore that allows us to scale joint genotyping to tens of thousands of whole genomes, by Karthik Gururaj and colleagues at Intel; the Genomics Kernel Library, which provides many impressive speedups for the GATK, by George Powley at Intel; the NIO functionality that allows us to access data on Google Cloud Storage directly, by JP Martin at Google; and the Apache Spark support that allows us to parallelize operations in a much more robust way than before, by Tom White at Cloudera. And it's not all about institutional collaborations; we have also received spontaneous contributions from individuals such as Daniel Gómez-Sánchez of the Institut für Populationsgenetik of Vienna, which have collectively enhanced the GATK codebase and its value to the user community.

So with GATK4 on the cusp of release, and with enthusiasm from all of us at the Broad, we're seizing this opportunity to do a reboot* and bring into alignment our mandate (to build great software), our mission (to empower great research) and our means: a more community-minded approach anchored in openness and free exchange of ideas.

* (at least we had already ditched Jar-Jar "Phone Home" Binks...)

I expect the benefits of this new direction are fairly self-evident, so I'll do us all a favor and close with just one last, somewhat personal note specifically from the development team. We want to thank all the collaborators who have worked with us so far for their support, their invaluable contributions and their faith in what we could accomplish together. And as we turn over this new leaf, we look forward to welcoming into the GATK family anyone who would like to see how much further we can push the genomics envelope.


Return to top

raonyguimaraes on 24 May 2017


Hi Geraldine, So I don't need to buy a commercial license anymore? Is that correct? Looks amazing, Congratulations!

thondeboer on 24 May 2017


Now THAT is awesome and very welcome news! Congratulations!

Geraldine_VdAuwera on 24 May 2017


That's right, no more commercial license needed to use GATK! Technically this applies starting with GATK4, but operationally we're not expecting new commercial users to buy a license now if they need to run an older version for whatever reason. The licensing manager at softwarelicensing@broadinstitute.org would be the right person to contact to get further clarifications on this topic of course.

conradL on 24 May 2017


Congratulations, and thanks! I think everyone (including Broad) will benefit from this.

fac2003 on 24 May 2017


Congratulations. We have developed an open source codebase to train deep neural networks and use them to call genotypes (see https://github.com/campagnelaboratory/variationanalysis, similar idea to DeepVariant, but much more efficient). Since the project licenses are now compatible (BSD and Apache 2), this code could be integrated into GATK if there is interest at your end. Let me know who would be a good contact to discuss this. Best. FC

dsmarcoantonio on 24 May 2017


This is Wonderful. Congratulations this is a huge change.

Geraldine_VdAuwera on 24 May 2017


Thanks everyone! We were already pretty excited to be taking this step, but the outpouring of positive responses to the announcement really makes us feel like it was the right thing to do. It's hugely encouraging -- great positive reinforcement ;-) @fac2003 I've forwarded your proposal to the tech leads; we'll get back to you once they've had the chance to discuss. Thanks for reaching out!

magicDGS on 24 May 2017


That's great, for both the users and developers! And thank you very much for the recognition of my contribution in the "public" framework, I really appreciate it. I'm looking forward to continue my contribution, and after this also in the "protected" part!

Pepetideo on 24 May 2017


Being one of the vocal "debaters" arguing back then that the move to a closed licencing model was a terrible decision. I am really happy you are back on the right path (even if it did require 5 years). Sorry that you required therapy after interacting with me :)

Carl_Li on 24 May 2017


just awesome!

yzharold on 24 May 2017


Wow, major shifting, as an frequent user, it is a great cause, perhaps the DL is the next step for taking advantage of AI for genetic application in general.

nitinCelmatix on 24 May 2017


This is a wonderful news! Eagerly waiting for GATK4 general release! Can you give us an idea when can we expect that? Thanks so much!

Geraldine_VdAuwera on 24 May 2017


We plan to announce the definitive 4.0 release date early next week.




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@BioDataScience Frankly the right way to do it would have been to assign meaningful read group identifiers (includi… https://t.co/lPwXwITnMu
16 Aug 18
The #GATK is one of the software packages that benefits from the work of Jose and his team — We’re super grateful f… https://t.co/nw9ZbjoEia
13 Aug 18
@LoreAment There's an external developer (and hero of the people) who is working on it as a side project. You can s… https://t.co/U0NzCbI3A8
8 Aug 18
@geoffjentry @BroadGenomics @broadinstitute @LeeLichtenstein The rules do say no humans, what are you implying @geoffjentry
2 Aug 18
@BroadGenomics @broadinstitute @LeeLichtenstein Day 2 of #BlackAndWhiteChallenge: 7 B&W photos; no humans, no expla… https://t.co/27bvuub5Vf
2 Aug 18

- Our favorite tweets from others

In a #WhyIScience Q&A, software engineer Jose Soto talks about his role at the intersection of computer science and… https://t.co/LBDn64Ox6k
11 Aug 18
@Hideoimamura I fully agree. I used #gatk #haplotypecaller to retrieve complex indels in some braziliensis genes and it did a fantastic job!
20 Jul 18
Very productive week in Cambridge thanks to @gatk_dev , your trainers Eric, Soo Hee, Kate and Takuto were highly or… https://t.co/piPuCNG2pq
19 Jul 18
Cambridge GATK4 variant discovery workshop day 2 is underway! The Broad team are doing an excellent job making thi… https://t.co/QF6WlUokwp
17 Jul 18

See more of our favorite tweets...