In fact, going forward all of our new projects will use GRCh38. During this transition over the coming year, we will keep supporting GRCh37/hg19. Here are nine takeaways to help you get started in using the latest reference.
Believe it or not we've done seven workshops so far this year, spread across five countries, spanning three continents -- the furthest ones in Australia (Sydney and Melbourne) and the most recent one in Helsinki, Finland. That's a lot of flying but on the bright side, now I have Gold status on American Airlines (hello fast track lane).
So after a restful summer hiatus we're gearing up to revisit continental Europe -- specifically, we're heading to Basel, Switzerland, at the invitation of the Swiss Institute of Bioinformatics.
We'll be following our standard formula of one day of lectures focused on the Best Practices for variant discovery, and one day of optional hands-on practical sessions demonstrating key steps of analysis and interpretation. The registration page is now live at this link: http://www.sib.swiss/training/upcoming-training-events/training/gatk-workshop-lecture
One important note: we'll be offering the day of hands-on practicals twice (in order to serve more people), which is why the workshop dates span three days -- but to be clear each person will only attend two days out of the three (the lectures are on Day 1 for everyone). The practical sessions have limited space, and tend to fill up fast, so don't wait too long to register -- especially if you have a strong preference about which of the two optional days would work better for you.
If you can't make it to Basel, the next workshops will be at Broad in Boston/Cambridge (USA) on November 7-8, then at VIB in Leuven (Belgium), dates TBD (probably February). Details will follow in due time.
We look forward to seeing many of you in Basel!
Folks, it really makes my day when I get to announce some good news that has been cooking for a long time. So this is going to be a very happy Humpday indeed.
The good news (which I may have hinted at previously) is that we are making our production pipeline scripts public, starting with the one that implements our Best Practices for data pre-processing and initial variant calling (aka GVCF generation) in whole genomes. Not only that, all Grch38/Hg38 resource files needed to run it, plus test data, are in a Google Cloud bucket. In time the bucket will replace our not-so-reliable FTP server as bundle sharing mechanism.
Details below the fold, in FAQ format (sort of).
TL;DR: Take this script and run it, for it is our WGS processing production workflow (uBAMs -> GVCF per-sample).
This morning, we unveiled an interactive GoogleMap, based on anonymized IP addresses collected from the forum database, that shows how the GATK user community is distributed across the globe. Check out Boston/Cambridge!
For the record, this was originally inspired by the World Map of High-throughput Sequencers by James Hadfield (Cancer Research UK, Cambridge) and Nick Loman (University of Birmingham).
As several people have already expressed interest in how this map was put together, I thought I'd give a brief overview of the technical side below the fold. I'm happy to provide more details and/or code if anyone wants to do something similar.
For largely practical reasons, the GATK website home URL has become http://software.broadinstitute.org/gatk. Don't worry, your bookmarked www links will still work foreveeeer -- at least that's what I'm told by our valiant IT folks. As always, let us know if you run into any trouble, not that we're expecting any.
First, I hope those of you in the USA had a relaxing and/or exciting holiday weekend (happy birthday, 'Murica!). For the rest, we thank you for your patience as we recover from the festivities and work our way through the backlog of forum questions.
Now, I wanted to let you know that over the next few weeks, we're going to push out a variety of improvements to the GATK website and documentation contents. We start today with a main push that involves some structural changes that we think will improve the user experience overall and make it easier for new users in particular. Much of this is based on feedback we've received over the years, so hopefully we're following the will of the people!
We've done our best to avoid causing any disruptions for those of you who have been using our website for a long time, but we did have to move a few things around. Here are the highlights; if you have strong feelings about any of this (good or bad) let us know in the comments. Also let us know if you stumble across anything that looks broken and we'll fix it double quick.
As announced in the GATK v3.6 highlights, variant calling workflows that use HaplotypeCaller or MuTect2 now omit indel realignment. This change does not apply to workflows that call variants with UnifiedGenotyper or the original MuTect. We still recommend indel realignment for these legacy workflows. Recall that indel realignment uses RealignerTargetCreator and IndelRealigner and comes after duplicate marking and before base quality score recalibration (BQSR).
In light of these changes, let’s take a brisk stroll through the implications for variant detection. In particular, let’s focus on insertion and deletion events (indels).
The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.
For my 10,000th posting on this forum, I thought I'd pull together a few numbers.
First, yes I did just say this is my 10,000th post. That breaks downs to 464 new discussions (documentation articles and blog posts, including this one) and 9536 comments in various threads. I've also posted over 1,000 tweets as @gatk_dev on behalf of the development team. When people ask what I do I can say with a straight face that I'm a scientist and I tweet for a living, it's awesome.
But hey, here are some more important numbers.
What better way to start the summer than with a new GATK release?
Umm no don't answer that, there's loads of good options. You could have a barbecue, eat some ice cream, go on a hike if that's the sort of thing that floats your kayak... Or you might live somewhere where winter is just starting and everything I just said there was terribly insensitive. Sorry.
Ahem. As I mentioned in my recent sneak preview blog post, the bulk of our development effort (speed! copy number! unicorns!) is now going into the GATK4 project. Accordingly, development in the GATK3 framework is winding down, so this release consists mainly of bug fixes, added convenience functions, and relatively minor behavior tweaks.
That being said we do have a few new experimental features in the VQSR tools (which haven't yet been fully ported to GATK4, hence the ongoing development in GATK3) that are pushing the envelope of allele-specific filtering. So that's interesting, if not yet fully documented (someone should really get on that). And you'll probably care about some of those tweaks I casually mentioned above -- in fact I guarantee that at least one of these things will matter to you in some way. If you read through the whole thing and don't find anything relevant to you, tell me in the comments that I was wrong. That's what the internet is for.
As usual, here I go over the changes that matter the most / to the most; consider going through the release notes as well for a full list of changes.