## GATK4 is completely open source

### Posted by Geraldine_VdAuwera on 24 May 2017 (13)

This is one of two posts announcing the imminent beta release of GATK4; for a technical description of features, see this other post.

"Wait, what?" Yes, you read that right, we're moving GATK4 to a fully open source license -- specifically, BSD 3-clause. And to be clear, this applies to all of GATK4. Not just the core framework (which, little known fact, has always been open source), but all the tools that were previously "protected", including HaplotypeCaller, the new CNV discovery tools, everything. The whole enchilada.

Old-timers in the field (i.e. anyone with what, 3+ years experience?) will recognize this as a major shift. An important subset of the GATK -- some might say "all the really valuable bits" -- has been under a mixed licensing model since version 2.0 was released in 2012. Under this mixed model, GATK was free for academic/non-profit research purposes, while any for-profit use required a paid commercial license. The proceeds funded further GATK development and support.

Admittedly the move from the initial open-source state of GATK 1.x to the mixed licensing model caused a fair amount of debate. I'm not going to revisit in full (even my therapist is sick of hearing about it), but it's fair to say that the licensing created an obstacle for our interactions with some other groups, and that it raised some barriers to access to GATK, especially for smaller companies and startups.

Since then the context within which we operate at the Broad has evolved significantly: a little over two years ago, our small development team was assimilated into a then-newly created larger group called the Data Sciences Platform (DSP), which aims to tackle the big challenges in genomics with robust engineering solutions. This involves applying some novel approaches compared to traditional academic software development, including: 1) give engineers a good home; 2) focus on products, not projects; and 3) maximize openness. This last point in particular means that our DSP mothership-within-Broad recognizes the immense potentiating role of open-source software in driving technological and methodological innovation. In fact, all of DSP's software products have been open-source since its inception, with the notable exception of GATK, which it inherited in a mixed state.

Over the past two years, the collaborations that DSP has cultivated with external groups have immensely benefitted the development of the new framework that would eventually become GATK4. Key features that we have come to rely on were contributed as open-source code by external collaborators: the GenomicsDB datastore that allows us to scale joint genotyping to tens of thousands of whole genomes, by Karthik Gururaj and colleagues at Intel; the Genomics Kernel Library, which provides many impressive speedups for the GATK, by George Powley at Intel; the NIO functionality that allows us to access data on Google Cloud Storage directly, by JP Martin at Google; and the Apache Spark support that allows us to parallelize operations in a much more robust way than before, by Tom White at Cloudera. And it's not all about institutional collaborations; we have also received spontaneous contributions from individuals such as Daniel Gómez-Sánchez of the Institut für Populationsgenetik of Vienna, which have collectively enhanced the GATK codebase and its value to the user community.

So with GATK4 on the cusp of release, and with enthusiasm from all of us at the Broad, we're seizing this opportunity to do a reboot* and bring into alignment our mandate (to build great software), our mission (to empower great research) and our means: a more community-minded approach anchored in openness and free exchange of ideas.

* (at least we had already ditched Jar-Jar "Phone Home" Binks...)

I expect the benefits of this new direction are fairly self-evident, so I'll do us all a favor and close with just one last, somewhat personal note specifically from the development team. We want to thank all the collaborators who have worked with us so far for their support, their invaluable contributions and their faith in what we could accomplish together. And as we turn over this new leaf, we look forward to welcoming into the GATK family anyone who would like to see how much further we can push the genomics envelope.

#### raonyguimaraes

Hi Geraldine, So I don't need to buy a commercial license anymore? Is that correct? Looks amazing, Congratulations!

#### thondeboer

Now THAT is awesome and very welcome news! Congratulations!

#### Geraldine_VdAuwera

That's right, no more commercial license needed to use GATK! Technically this applies starting with GATK4, but operationally we're not expecting new commercial users to buy a license now if they need to run an older version for whatever reason. The licensing manager at softwarelicensing@broadinstitute.org would be the right person to contact to get further clarifications on this topic of course.

###### Wed 24 May 2017

Congratulations, and thanks! I think everyone (including Broad) will benefit from this.

#### fac2003

Congratulations. We have developed an open source codebase to train deep neural networks and use them to call genotypes (see https://github.com/campagnelaboratory/variationanalysis, similar idea to DeepVariant, but much more efficient). Since the project licenses are now compatible (BSD and Apache 2), this code could be integrated into GATK if there is interest at your end. Let me know who would be a good contact to discuss this. Best. FC

#### dsmarcoantonio

This is Wonderful. Congratulations this is a huge change.

#### Geraldine_VdAuwera

Thanks everyone! We were already pretty excited to be taking this step, but the outpouring of positive responses to the announcement really makes us feel like it was the right thing to do. It's hugely encouraging -- great positive reinforcement ;-) @fac2003 I've forwarded your proposal to the tech leads; we'll get back to you once they've had the chance to discuss. Thanks for reaching out!

#### magicDGS

That's great, for both the users and developers! And thank you very much for the recognition of my contribution in the "public" framework, I really appreciate it. I'm looking forward to continue my contribution, and after this also in the "protected" part!

#### Pepetideo

Being one of the vocal "debaters" arguing back then that the move to a closed licencing model was a terrible decision. I am really happy you are back on the right path (even if it did require 5 years). Sorry that you required therapy after interacting with me :)

just awesome!

#### yzharold

Wow, major shifting, as an frequent user, it is a great cause, perhaps the DL is the next step for taking advantage of AI for genetic application in general.

#### nitinCelmatix

This is a wonderful news! Eagerly waiting for GATK4 general release! Can you give us an idea when can we expect that? Thanks so much!

#### Geraldine_VdAuwera

We plan to announce the definitive 4.0 release date early next week.

Wed 24 May 2017

##### Unboxing GATK4...

###### - Upcoming events

See Events calendar for full list and dates

###### - Recent events

See Events calendar for full list and dates

#### GATK Dev Team

###### @gatk_dev

Last chance to win one of 100 prizes including $50 Amazon gift cards and up to$500 in FireCloud compute credits! S… https://t.co/KKnqcrsWot
###### 22 Nov 17
Still a 1-in-3 chance of winning one of the 100 prizes we're giving to survey respondents! Tell your friends and la… https://t.co/afHJFgMuV5
###### 17 Nov 17
@ctsa11 @strnr Fair enough; now that we have 280 characters to play with we can say != "GVCF as we define it" (with… https://t.co/eD686VenjI
###### 17 Nov 17
#GATK HaplotypeCaller paper on biorxiv https://t.co/fISg0KM12f #BetterLateThanNever
###### 17 Nov 17
Thanks again for inviting us to @marshallu, we had a great time and enjoyed the very active group of participants! https://t.co/8ymsXvXDhQ

###### - Our favorite tweets from others

Wanna be a baller, HaplotypeCaller 20K genotypes in the VCF file Caller, gettin' phased tonight https://t.co/bOGSI4UL23
###### 17 Nov 17
This amazing genomics toolkit helps researchers find insights that save lives - I know! GATK users - please provide… https://t.co/gdY4FDPX8K
###### 2 Nov 17
using GATK to identify SNPs while handing out candy... Happy Halloween! @broadinstitute @gatk_dev #bioinformatics #researchisfun #Halloween
###### 31 Oct 17
Although it made me cry sometimes, I owe them a lot and love them much more. https://t.co/vUj0cBllgn
###### 16 Oct 17
Round of applause at #BOSC2017 for GATK4 being open sourced. https://t.co/WRhTeKtKTX
###### 23 Jul 17
See more of our favorite tweets...