For largely practical reasons, the GATK website home URL has become http://software.broadinstitute.org/gatk. Don't worry, your bookmarked www. links will still work foreveeeer* -- at least that's what I'm told by our valiant IT folks. As always, let us know if you run into any trouble, not that we're expecting any.
First, I hope those of you in the USA had a relaxing and/or exciting holiday weekend (happy birthday, 'Murica!). For the rest, we thank you for your patience as we recover from the festivities and work our way through the backlog of forum questions.
Now, I wanted to let you know that over the next few weeks, we're going to push out a variety of improvements to the GATK website and documentation contents. We start today with a main push that involves some structural changes that we think will improve the user experience overall and make it easier for new users in particular. Much of this is based on feedback we've received over the years, so hopefully we're following the will of the people!
We've done our best to avoid causing any disruptions for those of you who have been using our website for a long time, but we did have to move a few things around. Here are the highlights; if you have strong feelings about any of this (good or bad) let us know in the comments. Also let us know if you stumble across anything that looks broken and we'll fix it double quick.
As announced in the GATK v3.6 highlights, variant calling workflows that use HaplotypeCaller or MuTect2 now omit indel realignment. This change does not apply to workflows that call variants with UnifiedGenotyper or the original MuTect. We still recommend indel realignment for these legacy workflows. Recall that indel realignment uses RealignerTargetCreator and IndelRealigner and comes after duplicate marking and before base quality score recalibration (BQSR).
In light of these changes, let’s take a brisk stroll through the implications for variant detection. In particular, let’s focus on insertion and deletion events (indels).
The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.
For my 10,000th posting on this forum, I thought I'd pull together a few numbers.
First, yes I did just say this is my 10,000th post. That breaks downs to 464 new discussions (documentation articles and blog posts, including this one) and 9536 comments in various threads. I've also posted over 1,000 tweets as @gatk_dev on behalf of the development team. When people ask what I do I can say with a straight face that I'm a scientist and I tweet for a living, it's awesome.
But hey, here are some more important numbers.
What better way to start the summer than with a new GATK release?
Umm no don't answer that, there's loads of good options. You could have a barbecue, eat some ice cream, go on a hike if that's the sort of thing that floats your kayak... Or you might live somewhere where winter is just starting and everything I just said there was terribly insensitive. Sorry.
Ahem. As I mentioned in my recent sneak preview blog post, the bulk of our development effort (speed! copy number! unicorns!) is now going into the GATK4 project. Accordingly, development in the GATK3 framework is winding down, so this release consists mainly of bug fixes, added convenience functions, and relatively minor behavior tweaks.
That being said we do have a few new experimental features in the VQSR tools (which haven't yet been fully ported to GATK4, hence the ongoing development in GATK3) that are pushing the envelope of allele-specific filtering. So that's interesting, if not yet fully documented (someone should really get on that). And you'll probably care about some of those tweaks I casually mentioned above -- in fact I guarantee that at least one of these things will matter to you in some way. If you read through the whole thing and don't find anything relevant to you, tell me in the comments that I was wrong. That's what the internet is for.
As usual, here I go over the changes that matter the most / to the most; consider going through the release notes as well for a full list of changes.
GATK 3.6 was released on June 1, 2016. Itemized changes are listed below. For more details, see the user-friendly version highlights.
We've got so many exciting projects going on right now, between the GATK4 alpha, expanding our scope to include copy number and structural variation, and ramping up to offer GATK-as-a-service -- we're going to need more talent! And possibly a bigger boat.
So if you or someone you know (and like) are looking to join a team working on cutting-edge analysis methods and software, real big data and a mission that matters; if you like the idea of a stimulating professional environment with competitive compensation where equal opportunity and work-life balance are not empty phrases; and if your Memorial Day plans get canceled on account of rain -- why not take that time to polish up your resume and apply for one of the jobs below?
We look forward to hearing from you!
In my last blog post, I mentioned that GATK 3.6 would support Java 8. I also mentioned that we had some evidence from our Java migration testing that GATK 3.5 (and presumably older versions as well) may produce correctness errors if run on Java 8. Since then quite a few people have expressed concern because they have been running GATK 3.5 or older versions on Java 8 already. Most wanted to know what would be the nature and amplitude of these problems, and whether they should re-run the affected data on Java 7.
We don't have definitive answers for these questions because we haven't performed end-to-end testing of GATK 3.5 on Java 8. Once we noticed that some of the automated tests were failing when we switched Java versions, we hunted down the source of the test failures (some Java list structures for which iteration order is not the same between Java versions) and fixed them.
So for us, the story stops there. But we understand that those of you who have been misguidedly running GATK on Java 8 need more information to decide what to do. I thought that perhaps sharing the relevant details from our migration test results might help, so I compiled a summary of per-tool tests that were affected, with some developer notes and values that were discussed in the issue ticket.
Since the GATK first started gaining traction in the research community ca. 2010, its development has sustained a fairly rapid pace, with a new major version (1, 2 and 3 so far) coming out about every 2 years. Each major version was a straight continuation of the same codebase, distinguished by significantly new tools and capabilities (e.g. HaplotypeCaller in 2.0, the GVCF workflow in 3.0).
This year, we're on track for a new major version, but we're breaking the mold of the classic GATK codebase. Over the past 18 months, in parallel to the ongoing 3.x development effort, we built a brand new GATK engine that is faster, more scalable and can support new types of analysis that weren't possible in the original GATK framework. Now we're hard at work porting the classic GATK tools over to the new framework, as well as developing some new ones (copy number!). The resulting toolkit will be formally released as GATK 4 later this year. If you're keen to try it out, it's already available as an alpha preview; I'll follow up on this with more details soon.
But that's not all. At the same time, we kept working on the GATK 3.x package in order to continue delivering improvements to the research community. Now we're just about ready to release version 3.6 -- nothing yuuugely different, but quite a few bug fixes and feature enhancements (especially in the GVCF workflow tools) that have been widely requested. Again, details to follow. Oh, and it supports Java 8! Which previous versions do not -- it may look like they do because they run on it without crashing, but there could be silent correctness errors. Which are the worst; I prefer a good honest in-your-face run-busting error any day.
So there you have it; version 3.6 coming out sometime next week (-ish), and GATK 4 coming out later this year, probably Fall timeframe. In between, we'll have one last 3.x release, either a patch release (3.6-x) or a proper minor release (3.7) depending on how substantial are the changes involved, to immortalize the last state of the classic GATK before it gets encased in amber.