Latest posts

Find out and learn some practical steps to cloud debugging.

Specifically, I tested the alpha release of Google Genomics Pipelines API that uses the command-line. Down the road, we will post similarly for the UI-driven systems FireCloud and Workbench. In this particular challenge, my aim is to first genotype a trio and then a cohort of 17 whole genome BAMs that are available in the cloud. I need the resulting VCF callsets within a week.

Read the whole post
See comments (0)

GATK workshops bring you the latest in our methods development. The materials we prepare for workshops often serve as a base for our documentation on new or improved tools and workflows. So not only do GATK workshops cover our established Best Practices, they also give you a taste of what is to come. And let me just say a lot of changes are pouring out of the jar, especially with GATK4.

Let’s get into the logistics of workshops.

Interested in attending a GATK workshop?

Please join the new gatk-workshop group at!forum/gatk-workshop to receive emails about upcoming workshops. These emails are different from the group's email updates, so group membership settings should be as shown below with the group Email delivery preference set to Don’t send email updates. You may also browse the posts in the GATK Blog for mention of our workshop schedule.

For information and links for an approaching workshop, we post information on our forum. Look for the announcement box at the top of the GATK Forum homepage. Depending on the hosting institution, a workshop may be open to non-affiliates, and may or may not charge a fee to offset hosting costs.

Read the whole post
See comments (0)

You may have noticed we’ve been talking about this new thing called WDL--the Workflow Definition Language. We've published a tutorial using WDL to run some GATK tasks, as well as a pipeline implementation of the Best Practices for germline short variant discovery written in WDL. These fully-baked WDL scripts assume you already know what to do with them, but you may be wondering where to start. Whether you need a few pointers to get you started, or you’re completely new to this, we’ve got you covered. (And if you’re just looking for how to run pre-written WDLs, head on over to the executions section. You can still learn a lot from reading the rest of this article too though!)

WDL is designed to be easy to use--"human readable and writable" is our promise. You should think of building a pipeline with WDL like building with legos. The final product (like that full pipeline script I linked before) can look quite complex, but it is a simple matter of going step by step with your WDL building blocks.

I would recommend that you get started by reading our user guide. By reading through and clicking to the next article at the bottom of each page, the user guide will introduce you to all the pieces you can use in your lego-pipeline--from what pieces you'll need all the way through how to test & run your pipeline once you've finished it.

Once you've got a handle on what WDL can do, head over to the tutorials section. In these sequential tutorials, I walk you through how to use those building blocks to implement a small part of the GATK pipeline. Each tutorial builds on the previous one to help you learn to use WDL in new ways without repeating all of your earlier work.

You've read the user guide and you've run through the tutorials; you now have all you need to get started writing your very own WDLs. If you get stuck on something, you can always see how we do things in these real WDL scripts. If you have a more specific question, don't hesitate to post it on our WDL forum. Happy building!

See comments (0)

Here's the scoop. We've been working with Intel engineers for some time now, and we've all been enjoying it so much, we decided to commit to the relationship big time.

As announced in this Broad press release, we are taking our collaboration with Intel to the next level. Specifically, we have joined forces to create the "Intel-Broad Center for Genomic Data Engineering", with an initial five-year mission to build out life sciences tools and infrastructure, and boldly grow the genomics community's ability to collaborate across diverse datasets and analysis platforms in ways that no one has done before.

Ahem. In practice this is going to enable us to bring you some key improvements on three fronts: hardware recommendations, genomics software tools, and cross-infrastructure collaboration.

Read the whole post
See comments (1)

These are the materials that were presented at the November 2015 GATK workshop at the Broad Institute in Cambridge, MA.

Materials Link
Slide decks presented on Day 1 Google Drive Folder
Workshop handout document (agenda and resources) PDF on Google Drive
Variant Discovery Tutorial (Day 2 AM) PDF on Google Drive
Variant Filtering Tutorial (Day 2 PM) PDF on Google Drive
Tutorial data bundle (Day 2 PM) ZIP on Google Drive
See comments (0)

The weather in Vancouver is awful right now, and that's probably a good thing -- it should keep the outdoorsy types like myself from succumbing to the natural beauty of British Columbia and skipping out on any of the great science lined up for us this week. And rumor is the WIFI is pretty decent!

I sure hope it is, because this afternoon in the GATK workshop we're going to be running some live demos of how to run GATK analyses on the Cloud. We have screencap videos as backup in case technology abandons us, but it's just not the same to play a recording... (for one thing, the recording is probably more reliable than my brain, but shush).

We'll also have a hands-on tutorial on somatic exome CNV analysis with GATK4, and the overall workshop will be peppered with live polls, in an effort to make the experience as interactive and engaging as possible. This is something the ASHG workshop organizers have been pushing for over the past few meetings, and rightly so.

It's a tall order with a crowd of 225 registered users (we get a ballroom!) but we've got a solid 90 minutes lined up to talk about all brand new GATK content. This is going to be fun!

Oh, and the slides and tutorial worksheet are here, to complement the tutorial bundle which is available from the Download page.

See comments (0)

Tomorrow, a bunch of us are packing our bags and heading to Vancouver for the American Society of Human Genetics' Annual Meeting.

We have a busy week ahead of us, between the GA4GH Plenary Meeting, the various workshops that are organized around the ASHG meeting, and the meeting itself, which draws thousands of researchers from across the globe. Our Broad Genomics team this year is going to be pretty active in a variety of events, which you can find detailed here on the website of the Broad Genomics Services.

Soo Hee and I from our little support team will be rather busy as well. We're finalizing preparations for the workshop we're teaching on Tuesday, which will focus on what's hot in GATK4. As a reminder, GATK4 is currently still in "alpha preview" phase, but we expect it to move to beta status over the course of the next quarter, and I personally hold high hopes of releasing it as the officially supported version in early 2017!

In any case, we have some cool live demos and a full CNV pipeline hands-on tutorial to show off at the workshop to a maxed-out audience of 225 people (no pressure...). Speaking of which, the materials for the workshop are now available for download over here. The bundle file contains both a special GATK4 jar and a test dataset. If you'll be joining us in the workshop, please make sure you have downloaded the bundle BEFORE the workshop, as its size is large (~400Mb) and you can't count on the conference center wifi to be good enough to download onsite.

If you're coming to ASHG but are not coming to the workshop (did you wait too long to register? ;) ), you can still come chat with us at the Broad Genomics booth in the exhibition hall. I'll post a detailed schedule of when we'll be hanging out there -- there are some sessions I don't want to miss, but I have yet to compile the final list -- and you can for sure find me at the Meet the Expert event that will take place at the booth. I'll be the so-called expert in the Thursday, October 20th 10:00am - 11:00am slot. You can also follow @gatk_dev on Twitter for the latest schedule developments and/or social event opportunities.

And if you're not coming to Vancouver, either because you blame Canada or you study a different organism and you don't see what all the fuss is about these humans we keep going on about -- well, we'll still see you on the forum, and you can always invite us to teach a workshop at your local institution. We've had a really fantastic series this year and are now taking invitations for 2017. More on that later!

See comments (0)

The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.

See comments (1)

Cross-posted from

For many years now we’ve been hearing from users of both GATK and Picard about how they’d love to see the two projects unite into a single "toolkit-to-rule-them-all", for the sake of user convenience, to promote consistency across tools, and to minimize duplication of effort.

With the advent of GATK 4 this suddenly became a real possibility, as the decision was made to start the new GATK codebase from the Picard base classes rather than the old GATK 3.x base classes. This allows for free-form Picard-style tools and GATK “walkers” built upon an engine traversal to peacefully co-exist within the same framework. Last year, a Picard engineer successfully ported all Picard tools to the GATK 4 codebase with only minor changes to the tools themselves. More recently, efforts have been made to harmonize the build systems of the two projects, resulting in Picard’s recent move to gradle.

Importantly, the core GATK 4 codebase at is released entirely under the BSD 3-clause license, a big improvement over the confusing licensing situation in GATK 3.x, with its mix of open-source and proprietary licenses within the same repository -- and that is where any Picard tools moved to the GATK 4 codebase would live, remaining fully open-sourced and free for all.

As all of the technical pieces are now in place to allow for a merger of the two projects (with the guarantee that the open-source nature of Picard code will be preserved) we are soliciting feedback from the Picard developer community about the prospect of a union with GATK. Would people here be generally in favor of such a move? Are there any strong objections to this idea? Any concerns that should be addressed before we head any further down this path?

Read the whole post
See comments (1)

We are starting official support of GRCh38, a reference genome with alternate contigs.

In fact, going forward all of our new projects will use GRCh38. During this transition over the coming year, we will keep supporting GRCh37/hg19. Here are nine takeaways to help you get started in using the latest reference.

Read the whole post
See comments (1)

Latest posts

At a glance

Follow us on Twitter

GATK Dev Team


@geoffjentry @PatriciaMBrent Indel realn gone now; much faster BQSR + Sparkified tools in GATK4a. And wider scope of application/ use cases.
9 Dec 16
@geoffjentry @PatriciaMBrent B fair, other slide says results similar. Not uncommon for benchmark tests. But speed comparison v outdated 1/2
9 Dec 16
RT @geoffjentry: Talk by Stavros Papadopoulos about TileDB from @intel - used by @BroadGenomics to power our joint genotyping
8 Dec 16
@iskander @NJL_NGS Hm. Can you please post this in the forum?
8 Dec 16
@iskander @NJL_NGS Treat them like any other read, no special handling. Problems arise in CIGAR functions that don’t know how to handle Ns.
7 Dec 16

Our favorite tweets from others

Currently in a time-out for saying that duck fat had a certain "je ne sais quack" at the thanksgiving dinner table.
25 Nov 16
@dgmacarthur @BioMickWatson @StevenNHart @splon There's even a shop near Broad that apparently fixes Hail code erro…
19 Nov 16
I'm very happy to be at GATK Workshop hands on with @gatk_dev ! There's always something to learn. Tip: there is free coffee. Lol
8 Nov 16
Have recently begun to think of slide editing as ‘downsampling my slides’. I suspect this indicates something is wrong with me.
1 Nov 16
See more of our favorite tweets...
Search blog by tag

appistry ashg ashg16 benchmarks best-practices bug bug-fixed cancer cloud cluster cnv collaboration commandline commandlinegatk community compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 gatk4 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm intel java8 job job-offer jobs license meetings mendelianviolations mutect mutect2 ngs outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script search sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory troll tutorial unifiedgenotyper vcf-gz version-highlights versions vqsr wdl webinar workflow workshop