Latest posts

We are starting official support of GRCh38, a reference genome with alternate contigs.

In fact, going forward all of our new projects will use GRCh38. During this transition over the coming year, we will keep supporting GRCh37/hg19. Here are nine takeaways to help you get started in using the latest reference.

Read the whole post
See comments (0)

Believe it or not we've done seven workshops so far this year, spread across five countries, spanning three continents -- the furthest ones in Australia (Sydney and Melbourne) and the most recent one in Helsinki, Finland. That's a lot of flying but on the bright side, now I have Gold status on American Airlines (hello fast track lane).

So after a restful summer hiatus we're gearing up to revisit continental Europe -- specifically, we're heading to Basel, Switzerland, at the invitation of the Swiss Institute of Bioinformatics.

We'll be following our standard formula of one day of lectures focused on the Best Practices for variant discovery, and one day of optional hands-on practical sessions demonstrating key steps of analysis and interpretation. The registration page is now live at this link:

One important note: we'll be offering the day of hands-on practicals twice (in order to serve more people), which is why the workshop dates span three days -- but to be clear each person will only attend two days out of the three (the lectures are on Day 1 for everyone). The practical sessions have limited space, and tend to fill up fast, so don't wait too long to register -- especially if you have a strong preference about which of the two optional days would work better for you.

If you can't make it to Basel, the next workshops will be at Broad in Boston/Cambridge (USA) on November 7-8, then at VIB in Leuven (Belgium), dates TBD (probably February). Details will follow in due time.

We look forward to seeing many of you in Basel!

See comments (0)

Folks, it really makes my day when I get to announce some good news that has been cooking for a long time. So this is going to be a very happy Humpday indeed.

The good news (which I may have hinted at previously) is that we are making our production pipeline scripts public, starting with the one that implements our Best Practices for data pre-processing and initial variant calling (aka GVCF generation) in whole genomes. Not only that, all Grch38/Hg38 resource files needed to run it, plus test data, are in a Google Cloud bucket. In time the bucket will replace our not-so-reliable FTP server as bundle sharing mechanism.

Details below the fold, in FAQ format (sort of).

TL;DR: Take this script and run it, for it is our WGS processing production workflow (uBAMs -> GVCF per-sample).

Read the whole post
See comments (0)

This morning, we unveiled an interactive GoogleMap, based on anonymized IP addresses collected from the forum database, that shows how the GATK user community is distributed across the globe. Check out Boston/Cambridge!

For the record, this was originally inspired by the World Map of High-throughput Sequencers by James Hadfield (Cancer Research UK, Cambridge) and Nick Loman (University of Birmingham).

As several people have already expressed interest in how this map was put together, I thought I'd give a brief overview of the technical side below the fold. I'm happy to provide more details and/or code if anyone wants to do something similar.

Read the whole post
See comments (1)

For largely practical reasons, the GATK website home URL has become Don't worry, your bookmarked www links will still work foreveeeer -- at least that's what I'm told by our valiant IT folks. As always, let us know if you run into any trouble, not that we're expecting any.

See comments (0)

First, I hope those of you in the USA had a relaxing and/or exciting holiday weekend (happy birthday, 'Murica!). For the rest, we thank you for your patience as we recover from the festivities and work our way through the backlog of forum questions.

Now, I wanted to let you know that over the next few weeks, we're going to push out a variety of improvements to the GATK website and documentation contents. We start today with a main push that involves some structural changes that we think will improve the user experience overall and make it easier for new users in particular. Much of this is based on feedback we've received over the years, so hopefully we're following the will of the people!

We've done our best to avoid causing any disruptions for those of you who have been using our website for a long time, but we did have to move a few things around. Here are the highlights; if you have strong feelings about any of this (good or bad) let us know in the comments. Also let us know if you stumble across anything that looks broken and we'll fix it double quick.

Read the whole post
See comments (0)

We are streamlining our recommended workflows by removing a preprocessing step.

As announced in the GATK v3.6 highlights, variant calling workflows that use HaplotypeCaller or MuTect2 now omit indel realignment. This change does not apply to workflows that call variants with UnifiedGenotyper or the original MuTect. We still recommend indel realignment for these legacy workflows. Recall that indel realignment uses RealignerTargetCreator and IndelRealigner and comes after duplicate marking and before base quality score recalibration (BQSR).

In light of these changes, let’s take a brisk stroll through the implications for variant detection. In particular, let’s focus on insertion and deletion events (indels).

Read the whole post
See comments (0)

The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.

See comments (0)

For my 10,000th posting on this forum, I thought I'd pull together a few numbers.

First, yes I did just say this is my 10,000th post. That breaks downs to 464 new discussions (documentation articles and blog posts, including this one) and 9536 comments in various threads. I've also posted over 1,000 tweets as @gatk_dev on behalf of the development team. When people ask what I do I can say with a straight face that I'm a scientist and I tweet for a living, it's awesome.

But hey, here are some more important numbers.

  • Forum and website (since 2012): 35,000 registered users; 3,000 active participants; 6,000 discussions; 20,000 comments; 50,000 page views weekly; 8,000,000 page views total.
  • Codebase: 23 version; 59 contributors; 14,000 commits ; 500,000 lines.
  • Usage: 5,000,000 CPU days; 800,000,000 jobs; 30,000 distinct users.

Details below.

Read the whole post
See comments (2)

What better way to start the summer than with a new GATK release?

Umm no don't answer that, there's loads of good options. You could have a barbecue, eat some ice cream, go on a hike if that's the sort of thing that floats your kayak... Or you might live somewhere where winter is just starting and everything I just said there was terribly insensitive. Sorry.

Ahem. As I mentioned in my recent sneak preview blog post, the bulk of our development effort (speed! copy number! unicorns!) is now going into the GATK4 project. Accordingly, development in the GATK3 framework is winding down, so this release consists mainly of bug fixes, added convenience functions, and relatively minor behavior tweaks.

That being said we do have a few new experimental features in the VQSR tools (which haven't yet been fully ported to GATK4, hence the ongoing development in GATK3) that are pushing the envelope of allele-specific filtering. So that's interesting, if not yet fully documented (someone should really get on that). And you'll probably care about some of those tweaks I casually mentioned above -- in fact I guarantee that at least one of these things will matter to you in some way. If you read through the whole thing and don't find anything relevant to you, tell me in the comments that I was wrong. That's what the internet is for.

As usual, here I go over the changes that matter the most / to the most; consider going through the release notes as well for a full list of changes.

Read the whole post
See comments (13)

Latest posts

At a glance

Follow us on Twitter

GATK Dev Team


@BrandonDWomack Sure, just ask in the GATK support forum
29 Aug 16
@thatdnaguy Note: future releases may switch to mem at some point. For our prod pipelines this is tied to shift to cloud.
26 Aug 16
@thatdnaguy @exac_exomes Same as first iirc, bwa aln as noted in supplemental doc
26 Aug 16
@MattBashton Fixed, thanks for reporting!
22 Aug 16

Our favorite tweets from others

it's the nightly build owl for GATK :D
11 Aug 16
We're going to make an hg38 version of ExAC. And we'll make @dgmacarthur pay for it. #BioinformaticsCampaignPromises
2 Aug 16
You’re right @gatk_dev honesty is key! About variants manual filtering: “In any case you're probably in for a world of pain.” Ha now I know!
11 Jul 16
.@gatk_dev I like the new documentation index page, the subheading has made my day! :D #doge #GeekHumourFTW
8 Jul 16
There is no NGS, NG is today so should only be called high-throughput sequencing #CSC #GATKworkshop
16 Jun 16
See more of our favorite tweets...
Search blog by tag

ad appistry ashg benchmarks best-practices bug bug-fixed cancer catvariants challenge cloud cluster commandline commandlinegatk community competition compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm java8 job job-offer jobs joint-discovery license meetings mendelianviolations multisample multithreading mutect mutect2 ngs nt outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote reference-model release release-notes rnaseq runtime saas script search selectvariants sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory trivia troll tutorial unifiedgenotyper variantannotator variantrecalibrator vcf-gz version-highlights versions vqsr wdl webinar workflow workshop