Latest posts

The weather in Vancouver is awful right now, and that's probably a good thing -- it should keep the outdoorsy types like myself from succumbing to the natural beauty of British Columbia and skipping out on any of the great science lined up for us this week. And rumor is the WIFI is pretty decent!

I sure hope it is, because this afternoon in the GATK workshop we're going to be running some live demos of how to run GATK analyses on the Cloud. We have screencap videos as backup in case technology abandons us, but it's just not the same to play a recording... (for one thing, the recording is probably more reliable than my brain, but shush).

We'll also have a hands-on tutorial on somatic exome CNV analysis with GATK4, and the overall workshop will be peppered with live polls, in an effort to make the experience as interactive and engaging as possible. This is something the ASHG workshop organizers have been pushing for over the past few meetings, and rightly so.

It's a tall order with a crowd of 225 registered users (we get a ballroom!) but we've got a solid 90 minutes lined up to talk about all brand new GATK content. This is going to be fun!

Oh, and the slides and tutorial worksheet are here, to complement the tutorial bundle which is available from the Download page.

See comments (0)

Tomorrow, a bunch of us are packing our bags and heading to Vancouver for the American Society of Human Genetics' Annual Meeting.

We have a busy week ahead of us, between the GA4GH Plenary Meeting, the various workshops that are organized around the ASHG meeting, and the meeting itself, which draws thousands of researchers from across the globe. Our Broad Genomics team this year is going to be pretty active in a variety of events, which you can find detailed here on the website of the Broad Genomics Services.

Soo Hee and I from our little support team will be rather busy as well. We're finalizing preparations for the workshop we're teaching on Tuesday, which will focus on what's hot in GATK4. As a reminder, GATK4 is currently still in "alpha preview" phase, but we expect it to move to beta status over the course of the next quarter, and I personally hold high hopes of releasing it as the officially supported version in early 2017!

In any case, we have some cool live demos and a full CNV pipeline hands-on tutorial to show off at the workshop to a maxed-out audience of 225 people (no pressure...). Speaking of which, the materials for the workshop are now available for download over here. The bundle file contains both a special GATK4 jar and a test dataset. If you'll be joining us in the workshop, please make sure you have downloaded the bundle BEFORE the workshop, as its size is large (~400Mb) and you can't count on the conference center wifi to be good enough to download onsite.

If you're coming to ASHG but are not coming to the workshop (did you wait too long to register? ;) ), you can still come chat with us at the Broad Genomics booth in the exhibition hall. I'll post a detailed schedule of when we'll be hanging out there -- there are some sessions I don't want to miss, but I have yet to compile the final list -- and you can for sure find me at the Meet the Expert event that will take place at the booth. I'll be the so-called expert in the Thursday, October 20th 10:00am - 11:00am slot. You can also follow @gatk_dev on Twitter for the latest schedule developments and/or social event opportunities.

And if you're not coming to Vancouver, either because you blame Canada or you study a different organism and you don't see what all the fuss is about these humans we keep going on about -- well, we'll still see you on the forum, and you can always invite us to teach a workshop at your local institution. We've had a really fantastic series this year and are now taking invitations for 2017. More on that later!

See comments (0)

The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.

See comments (1)

Cross-posted from

For many years now we’ve been hearing from users of both GATK and Picard about how they’d love to see the two projects unite into a single "toolkit-to-rule-them-all", for the sake of user convenience, to promote consistency across tools, and to minimize duplication of effort.

With the advent of GATK 4 this suddenly became a real possibility, as the decision was made to start the new GATK codebase from the Picard base classes rather than the old GATK 3.x base classes. This allows for free-form Picard-style tools and GATK “walkers” built upon an engine traversal to peacefully co-exist within the same framework. Last year, a Picard engineer successfully ported all Picard tools to the GATK 4 codebase with only minor changes to the tools themselves. More recently, efforts have been made to harmonize the build systems of the two projects, resulting in Picard’s recent move to gradle.

Importantly, the core GATK 4 codebase at is released entirely under the BSD 3-clause license, a big improvement over the confusing licensing situation in GATK 3.x, with its mix of open-source and proprietary licenses within the same repository -- and that is where any Picard tools moved to the GATK 4 codebase would live, remaining fully open-sourced and free for all.

As all of the technical pieces are now in place to allow for a merger of the two projects (with the guarantee that the open-source nature of Picard code will be preserved) we are soliciting feedback from the Picard developer community about the prospect of a union with GATK. Would people here be generally in favor of such a move? Are there any strong objections to this idea? Any concerns that should be addressed before we head any further down this path?

Read the whole post
See comments (1)

We are starting official support of GRCh38, a reference genome with alternate contigs.

In fact, going forward all of our new projects will use GRCh38. During this transition over the coming year, we will keep supporting GRCh37/hg19. Here are nine takeaways to help you get started in using the latest reference.

Read the whole post
See comments (1)

Believe it or not we've done seven workshops so far this year, spread across five countries, spanning three continents -- the furthest ones in Australia (Sydney and Melbourne) and the most recent one in Helsinki, Finland. That's a lot of flying but on the bright side, now I have Gold status on American Airlines (hello fast track lane).

So after a restful summer hiatus we're gearing up to revisit continental Europe -- specifically, we're heading to Basel, Switzerland, at the invitation of the Swiss Institute of Bioinformatics.

We'll be following our standard formula of one day of lectures focused on the Best Practices for variant discovery, and one day of optional hands-on practical sessions demonstrating key steps of analysis and interpretation. The registration page is now live at this link:

One important note: we'll be offering the day of hands-on practicals twice (in order to serve more people), which is why the workshop dates span three days -- but to be clear each person will only attend two days out of the three (the lectures are on Day 1 for everyone). The practical sessions have limited space, and tend to fill up fast, so don't wait too long to register -- especially if you have a strong preference about which of the two optional days would work better for you.

If you can't make it to Basel, the next workshops will be at Broad in Boston/Cambridge (USA) on November 7-8, then at VIB in Leuven (Belgium), dates TBD (probably February). Details will follow in due time.

We look forward to seeing many of you in Basel!

See comments (0)

Folks, it really makes my day when I get to announce some good news that has been cooking for a long time. So this is going to be a very happy Humpday indeed.

The good news (which I may have hinted at previously) is that we are making our production pipeline scripts public, starting with the one that implements our Best Practices for data pre-processing and initial variant calling (aka GVCF generation) in whole genomes. Not only that, all Grch38/Hg38 resource files needed to run it, plus test data, are in a Google Cloud bucket. In time the bucket will replace our not-so-reliable FTP server as bundle sharing mechanism.

Details below the fold, in FAQ format (sort of).

TL;DR: Take this script and run it, for it is our WGS processing production workflow (uBAMs -> GVCF per-sample).

Read the whole post
See comments (0)

This morning, we unveiled an interactive GoogleMap, based on anonymized IP addresses collected from the forum database, that shows how the GATK user community is distributed across the globe. Check out Boston/Cambridge!

For the record, this was originally inspired by the World Map of High-throughput Sequencers by James Hadfield (Cancer Research UK, Cambridge) and Nick Loman (University of Birmingham).

As several people have already expressed interest in how this map was put together, I thought I'd give a brief overview of the technical side below the fold. I'm happy to provide more details and/or code if anyone wants to do something similar.

Read the whole post
See comments (1)

For largely practical reasons, the GATK website home URL has become Don't worry, your bookmarked www links will still work foreveeeer -- at least that's what I'm told by our valiant IT folks. As always, let us know if you run into any trouble, not that we're expecting any.

See comments (0)

First, I hope those of you in the USA had a relaxing and/or exciting holiday weekend (happy birthday, 'Murica!). For the rest, we thank you for your patience as we recover from the festivities and work our way through the backlog of forum questions.

Now, I wanted to let you know that over the next few weeks, we're going to push out a variety of improvements to the GATK website and documentation contents. We start today with a main push that involves some structural changes that we think will improve the user experience overall and make it easier for new users in particular. Much of this is based on feedback we've received over the years, so hopefully we're following the will of the people!

We've done our best to avoid causing any disruptions for those of you who have been using our website for a long time, but we did have to move a few things around. Here are the highlights; if you have strong feelings about any of this (good or bad) let us know in the comments. Also let us know if you stumble across anything that looks broken and we'll fix it double quick.

Read the whole post
See comments (0)

Latest posts

At a glance

Follow us on Twitter

GATK Dev Team


@hoffsbeefs Yes, multithreading can explain some marginal differences. For purely deterministic behavior, it should be disabled.
25 Oct 16
@hoffsbeefs Yes it should be deterministic, given same data, same parameters and no multithreading.
25 Oct 16
RT @BroadGenomics: Missed #GATK WKSP at #ASHG16? 10am meet Geraldine at booth 329 - Broad’s GATK Guru @gatk_dev #BroadGenomicsExperts https…
20 Oct 16
RT @NJL_NGS: Broad Institute Workbench workshop now at #ashg16.
19 Oct 16
RT @konradjk: We're rebranding a bit! gnomAD now adds WGS regions, as well as doubling the data depth and increasing diversity (5K ASJ!) #A…
19 Oct 16

Our favorite tweets from others

Had a really nice time in Vancouver at #ASHG16, also known as "The @dgmacarthur Lab Meeting"
24 Oct 16
Real motivation behind gnomAD: all of you have now learnt the proper capitalization of ExAC, and we felt you needed a new challenge. #ASHG16
20 Oct 16
Asked a question, speaker jokes not to edit your BAM header and expect GATK to let you get away with it.
19 Oct 16
My new hobby: finding incomprehensible diagrams on office whiteboards and adding alarming conclusions to them
30 Sep 16
I've easily written my first custom ReadFilter for GATK. The @gatk_dev 's toolkit is a great example of programming.
21 Sep 16
See more of our favorite tweets...
Search blog by tag

ad appistry ashg ashg16 benchmarks best-practices bug bug-fixed cancer cloud cluster cnv commandline commandlinegatk community compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 gatk4 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm java8 job job-offer jobs license meetings mendelianviolations multithreading mutect mutect2 ngs nt outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script search selectvariants sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory trivia troll tutorial unifiedgenotyper vcf-gz version-highlights versions vqsr wdl webinar workflow workshop