Specifically, I tested the alpha release of Google Genomics Pipelines API that uses the command-line. Down the road, we will post similarly for the UI-driven systems FireCloud and Workbench. In this particular challenge, my aim is to first genotype a trio and then a cohort of 17 whole genome BAMs that are available in the cloud. I need the resulting VCF callsets within a week.
GATK workshops bring you the latest in our methods development. The materials we prepare for workshops often serve as a base for our documentation on new or improved tools and workflows. So not only do GATK workshops cover our established Best Practices, they also give you a taste of what is to come. And let me just say a lot of changes are pouring out of the jar, especially with GATK4.
Let’s get into the logistics of workshops.
Please join the new gatk-workshop group at https://groups.google.com/a/broadinstitute.org/forum/?hl=en#!forum/gatk-workshop to receive emails about upcoming workshops. These emails are different from the group's email updates, so group membership settings should be as shown below with the group Email delivery preference set to Don’t send email updates. You may also browse the posts in the GATK Blog for mention of our workshop schedule.
For information and links for an approaching workshop, we post information on our forum. Look for the announcement box at the top of the GATK Forum homepage. Depending on the hosting institution, a workshop may be open to non-affiliates, and may or may not charge a fee to offset hosting costs.
You may have noticed we’ve been talking about this new thing called WDL--the Workflow Definition Language. We've published a tutorial using WDL to run some GATK tasks, as well as a pipeline implementation of the Best Practices for germline short variant discovery written in WDL. These fully-baked WDL scripts assume you already know what to do with them, but you may be wondering where to start. Whether you need a few pointers to get you started, or you’re completely new to this, we’ve got you covered. (And if you’re just looking for how to run pre-written WDLs, head on over to the executions section. You can still learn a lot from reading the rest of this article too though!)
WDL is designed to be easy to use--"human readable and writable" is our promise. You should think of building a pipeline with WDL like building with legos. The final product (like that full pipeline script I linked before) can look quite complex, but it is a simple matter of going step by step with your WDL building blocks.
I would recommend that you get started by reading our user guide. By reading through and clicking to the next article at the bottom of each page, the user guide will introduce you to all the pieces you can use in your lego-pipeline--from what pieces you'll need all the way through how to test & run your pipeline once you've finished it.
Once you've got a handle on what WDL can do, head over to the tutorials section. In these sequential tutorials, I walk you through how to use those building blocks to implement a small part of the GATK pipeline. Each tutorial builds on the previous one to help you learn to use WDL in new ways without repeating all of your earlier work.
You've read the user guide and you've run through the tutorials; you now have all you need to get started writing your very own WDLs. If you get stuck on something, you can always see how we do things in these real WDL scripts. If you have a more specific question, don't hesitate to post it on our WDL forum. Happy building!
Here's the scoop. We've been working with Intel engineers for some time now, and we've all been enjoying it so much, we decided to commit to the relationship big time.
As announced in this Broad press release, we are taking our collaboration with Intel to the next level. Specifically, we have joined forces to create the "Intel-Broad Center for Genomic Data Engineering", with an initial five-year mission to build out life sciences tools and infrastructure, and boldly grow the genomics community's ability to collaborate across diverse datasets and analysis platforms in ways that no one has done before.
Ahem. In practice this is going to enable us to bring you some key improvements on three fronts: hardware recommendations, genomics software tools, and cross-infrastructure collaboration.
These are the materials that were presented at the November 2015 GATK workshop at the Broad Institute in Cambridge, MA.
|Slide decks presented on Day 1||Google Drive Folder|
|Workshop handout document (agenda and resources)||PDF on Google Drive|
|Variant Discovery Tutorial (Day 2 AM)||PDF on Google Drive|
|Variant Filtering Tutorial (Day 2 PM)||PDF on Google Drive|
|Tutorial data bundle (Day 2 PM)||ZIP on Google Drive|
The weather in Vancouver is awful right now, and that's probably a good thing -- it should keep the outdoorsy types like myself from succumbing to the natural beauty of British Columbia and skipping out on any of the great science lined up for us this week. And rumor is the WIFI is pretty decent!
I sure hope it is, because this afternoon in the GATK workshop we're going to be running some live demos of how to run GATK analyses on the Cloud. We have screencap videos as backup in case technology abandons us, but it's just not the same to play a recording... (for one thing, the recording is probably more reliable than my brain, but shush).
We'll also have a hands-on tutorial on somatic exome CNV analysis with GATK4, and the overall workshop will be peppered with live polls, in an effort to make the experience as interactive and engaging as possible. This is something the ASHG workshop organizers have been pushing for over the past few meetings, and rightly so.
It's a tall order with a crowd of 225 registered users (we get a ballroom!) but we've got a solid 90 minutes lined up to talk about all brand new GATK content. This is going to be fun!
Tomorrow, a bunch of us are packing our bags and heading to Vancouver for the American Society of Human Genetics' Annual Meeting.
We have a busy week ahead of us, between the GA4GH Plenary Meeting, the various workshops that are organized around the ASHG meeting, and the meeting itself, which draws thousands of researchers from across the globe. Our Broad Genomics team this year is going to be pretty active in a variety of events, which you can find detailed here on the website of the Broad Genomics Services.
Soo Hee and I from our little support team will be rather busy as well. We're finalizing preparations for the workshop we're teaching on Tuesday, which will focus on what's hot in GATK4. As a reminder, GATK4 is currently still in "alpha preview" phase, but we expect it to move to beta status over the course of the next quarter, and I personally hold high hopes of releasing it as the officially supported version in early 2017!
In any case, we have some cool live demos and a full CNV pipeline hands-on tutorial to show off at the workshop to a maxed-out audience of 225 people (no pressure...). Speaking of which, the materials for the workshop are now available for download over here. The bundle file contains both a special GATK4 jar and a test dataset. If you'll be joining us in the workshop, please make sure you have downloaded the bundle BEFORE the workshop, as its size is large (~400Mb) and you can't count on the conference center wifi to be good enough to download onsite.
If you're coming to ASHG but are not coming to the workshop (did you wait too long to register? ;) ), you can still come chat with us at the Broad Genomics booth in the exhibition hall. I'll post a detailed schedule of when we'll be hanging out there -- there are some sessions I don't want to miss, but I have yet to compile the final list -- and you can for sure find me at the Meet the Expert event that will take place at the booth. I'll be the so-called expert in the Thursday, October 20th 10:00am - 11:00am slot. You can also follow @gatk_dev on Twitter for the latest schedule developments and/or social event opportunities.
And if you're not coming to Vancouver, either because you blame Canada or you study a different organism and you don't see what all the fuss is about these humans we keep going on about -- well, we'll still see you on the forum, and you can always invite us to teach a workshop at your local institution. We've had a really fantastic series this year and are now taking invitations for 2017. More on that later!
The presentation slide decks and hands-on tutorial materials can be downloaded at this Google Drive link.
Cross-posted from https://github.com/broadinstitute/picard/issues/647
For many years now we’ve been hearing from users of both GATK and Picard about how they’d love to see the two projects unite into a single "toolkit-to-rule-them-all", for the sake of user convenience, to promote consistency across tools, and to minimize duplication of effort.
With the advent of GATK 4 this suddenly became a real possibility, as the decision was made to start the new GATK codebase from the Picard base classes rather than the old GATK 3.x base classes. This allows for free-form Picard-style tools and GATK “walkers” built upon an engine traversal to peacefully co-exist within the same framework. Last year, a Picard engineer successfully ported all Picard tools to the GATK 4 codebase with only minor changes to the tools themselves. More recently, efforts have been made to harmonize the build systems of the two projects, resulting in Picard’s recent move to gradle.
Importantly, the core GATK 4 codebase at https://github.com/broadinstitute/gatk is released entirely under the BSD 3-clause license, a big improvement over the confusing licensing situation in GATK 3.x, with its mix of open-source and proprietary licenses within the same repository -- and that is where any Picard tools moved to the GATK 4 codebase would live, remaining fully open-sourced and free for all.
As all of the technical pieces are now in place to allow for a merger of the two projects (with the guarantee that the open-source nature of Picard code will be preserved) we are soliciting feedback from the Picard developer community about the prospect of a union with GATK. Would people here be generally in favor of such a move? Are there any strong objections to this idea? Any concerns that should be addressed before we head any further down this path?
In fact, going forward all of our new projects will use GRCh38. During this transition over the coming year, we will keep supporting GRCh37/hg19. Here are nine takeaways to help you get started in using the latest reference.