Latest posts
 


科研圈的亲们,我们来啦!携手国内重量级公司和机构,我们这次给大家带来了高效、规模化使用GATK的技巧!

Today we are reaching out to the Chinese research community with great news: we are partnering with key companies and institutions in China to empower Chinese researchers to use GATK effectively and at scale.


Read the whole post
See comments (6)



We've been getting so caught up in the excitement of the imminent beta release of GATK4 (possibly later this week!), we forgot to announce upcoming workshops! And two of them are coming up fast, in just a month from now. Specifically, we'll be in Cambridge, UK, July 12-14 and then in Edinburgh, UK, July 17-19. There's still time to register for both but the hands-on sessions have limited space, so don't wait around!

GATK is also going to be making a brief appearance at BOSC '17 in Prague, CZ, July 21-22. Our team member Kate Voss will give a lightning talk and present a poster about our genomics pipelining stack that is composed of GATK4+WDL+Cromwell. I'm frankly delighted that our abstract was accepted as a late-breaking submission; it's a pleasure to kick off the new open-source era of GATK at the most open-sourcey meeting of the year!

Going forward, be sure to check out the new Events calendar feature we just now added to the website to help you keep track of events more systematically.

See comments (3)



This is one of two posts announcing the imminent beta release of GATK4; for a technical description of features, see this other post.

"Wait, what?" Yes, you read that right, we're moving GATK4 to a fully open source license -- specifically, BSD 3-clause. And to be clear, this applies to all of GATK4. Not just the core framework (which, little known fact, has always been open source), but all the tools that were previously "protected", including HaplotypeCaller, the new CNV discovery tools, everything. The whole enchilada.


Read the whole post
See comments (11)



Unboxing GATK4

Posted by Geraldine_VdAuwera on 24 May 2017 (1)


This is one of two posts announcing the imminent beta release of GATK4; for details about the open-source licensing, see this other post.

You've probably heard it by now: we are on the cusp of releasing GATK4 into beta status (targeting mid-June), and we plan to push out a general release shortly thereafter (targeting midsummer). That's great. So what's in the box?

Over two years of active development have gone into producing GATK4, and I'm happy to say we have plenty to show for it. Specifically, we've pushed the evolution of GATK on three fronts: (1) technical performance, i.e. speed and scalability; (2) new functionality and expanded scope of analysis, e.g. we can do CNVs now; and (3) openness to collaboration, through [open-sourcing](link to BSD blog post) as well as general developer-friendliness (documented code! consistent APIs! clear contribution guidelines!).

Want more detail? Let me give you a tour of the highlights, using slides from the presentation I gave at Bio-IT earlier today (code reuse: it's not just for code anymore).


Read the whole post
See comments (1)



This is becoming a bit of a yearly tradition; next week we're heading over to Bio-IT World Expo in Boston (so a short hop across the Charles River) to announce the majorly rebooted version of GATK which we've affectionately dubbed GATK4. Because it will be version 4.

Look, if you've ever seen the names we give our tools, you know that naming things isn't exactly where we put our creativity to work. It's a precious resource, and anyway we rather like things to be self-explanatory.

Yes, technically we already announced GATK4 at Bio-IT last year, but no, this is not a re-run. Last year was a heads-up that we were working on this significant new reimplementation of the toolkit. We were mostly there to talk about the core features of the new framework, which famously excited the Spark-savvy in the crowd (because it supports Apache Spark). But it was definitely still under heavy development; while we had the CNV tools just about ready for testing, as I recall there wasn't even a glimmer of the HaplotypeCaller in there yet.

This year is very different. We have a toolkit that is in the final stages of polishing up for public consumption. We have multiple Best Practices workflows, because we're not just about the germline SNPs and indels anymore. And we also have numbers. Dates for the beta and full releases, performance estimates...

All of which we'll present during a luncheon event we're holding with our wonderful partners at Intel Life Sciences, who have contributed some of GATK4's key new features. The luncheon will take place Wednesday the 24th at 12:40 PM, at a location TBD (because I can't figure it out from the Bio-IT program, which is not self-explanatory). We'll be in Track 1: Data and Storage Management, which may sound super boring (no offense to other speakers in this track) but come on and join us if you can; I predict you'll be pleasantly surprised.

As a coda, we'll be holding Q&A sessions in the Intel Hospitality Suite, aka Dartmouth room in the WTC, at the following times: Wednesday the 24th from 1:30 PM to 3:15 PM, and Thursday the 25th from 10:30 to 11:30 AM. Swing on by if you have any burning questions about GATK4.

We look forward to seeing you there! And if you can't make it because of trivial considerations like geographical incompatibility (oceans, shmoceans), check out this blog or follow @gatk_dev on Twitter. We'll post a summary of the announcements shortly after the luncheon presentation.


Read the whole post
See comments (1)



As part of our job providing support to the GATK user community, our team takes turns traveling to conferences, both to learn what's going on in the field at large and to advertise the latest features of the GATK. I recently attended the Advances in Genome Biology and Technology (AGBT) general meeting in Hollywood, Florida in February. Nice time of year to go there!

When we go to conferences we often do workshops or present posters, but this time was a first: I was there to do a software demo. Well, in fact I had two demos prepared: one about using GATK4 to run commands directly on a Spark cluster, and the other about running GATK workflows on the Cloud using Google's Pipelines API.


Read the whole post
See comments (0)



What is beagle.

Beagle is a type of dog known for its even temper and intelligence. It is also the name given to the ship Darwin sailed to the Galapagos (the H.M.S. Beagle), where he developed his theory of natural selection from observing finches. It is also the name of a genomics software package known for phasing and imputing genotypes. Beagle also calls genotypes and detects identity-by-descent (IBD), i.e. it can find segments of identical DNA that indicate two individuals are related.

I will be writing a series of posts where I share with you how I take 23andMe raw data to locate IBD segments using Beagle v4.1 (website; doi:10.1534/genetics.113.150029). For a review of the statistical methods and other theory underlying IBD, see doi:10.1534/genetics.112.148825. To see a skipper’s dog on a ship at sea, watch Irving Johnson’s footage of the Peking barque.


Read the whole post
See comments (0)



A few of us GATKers (among a flood of other Broadies) traveled to Washington, DC this week for the General Meeting of the American Association for Cancer Research (AACR). Here are PDF copies of the posters we presented on Tuesday morning.

Abbreviated title Presenter Link
Somatic mutation discovery with GATK4 Geraldine Van der Auwera PDF
Allelic Copy Number Variation Discovery Aaron Chevalier PDF
Copy Number Variation Discovery in WGS and Exomes Mehrtash Babadi PDF

Incidentally, it's the end of the conference so now 10,000 people are trying to get home, and apparently half of them are going to Boston. I was hoping to catch an earlier flight on standby; the gate attendant laughed so hard. Most of the flights are overbooked to start with. So I have some time to kill until 9 PM. Well, I guess there's plenty of documentation in need of writing!

See comments (5)



You may have heard that we've been working on a major new release of GATK that we call GATK4. As we are getting closer to the scheduled transition of GATK4 into beta status (from its current lowly alpha state), we are putting a lot work into fine-tuning the user-facing aspects of the program. We realize that many of our users struggle to make sense of the variety of tools and their numerous options and parameters, and that when something goes wrong, the error messages can seem cryptic and/or overwhelming.

So one of the things we're experimenting with is an interactive support feature that you can invoke directly from the command line, and that should help you figure out solutions to most problems that you might encounter while using GATK. It's not quite fully-featured yet but we'd like to get some feedback to evaluate whether it is helpful to real users, and determine how we can further improve it.

You can download a precompiled jar (fully open source under a BSD license) where this feature is enabled by default, from this page: https://software.broadinstitute.org/gatk/download/gatk4_1. The command syntax is essentially the same as for the current version of GATK, except you no longer provide -T to specify the tool, and -o is all grown up and is now -O. You can get usage information for any tool by doing e.g. java -jar GenomeAnalysisTk-4_1.jar PrintReads -h the same way as you would with the current GATK.

Please try it out and let us know what you think!

See comments (4)



Here are some rules-of-thumb for posting questions

  1. Post a new question instead of continuing an ongoing discussion thread. The exception to this is if your question relates directly to the discussion thread, i.e. comments on the original post or answers a question asked in the thread. To refer to a particular thread, you can include its URL.

  2. Post the question once. This is the case even if you post to the wrong subforum. We can easily move your post to the appropriate one.

  3. Questions relate to running a GATK tool, Picard tool, GATK Best Practice Workflow, WDL script, Cromwell or FireCloud. All other questions, e.g. those about non-GATK tools, you should ask the Biostars or SeqAnswers forums.

Next, I point out specific guidelines for GATK questions, give a formatting tip and explain the motivation behind this note using pie.


Read the whole post
See comments (0)



Latest posts
 

- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@Juanmicroguy @googlecloud Let us know if you have questions that aren't addressed in either -- we are currently wo… https://t.co/mE73Ndqp0N
24 Jul 17
@Juanmicroguy @googlecloud To start with pipelines API: https://t.co/pjVvjL87TN tutorial of basic example running a… https://t.co/r4rHekXCAM
24 Jul 17
@Juanmicroguy @googlecloud Are you looking for docs about how to use the tools in practice, or info/press about collaboration at org level?
24 Jul 17
Ah, well, sadly (?) there's no direct equivalent for GATK... though there is A/V evidence of correct pronounciation https://t.co/QCbos5KBWw
24 Jul 17

- Our favorite tweets from others

Round of applause at #BOSC2017 for GATK4 being open sourced. https://t.co/WRhTeKtKTX
23 Jul 17
Round of applause for @Katewanders - Broad Institute will open source data science tools from now on https://t.co/CvLhwgBQUK #BOSC2017
23 Jul 17
The @gatk_dev team, that delivered an excellent "GATK Best Practices for Variant Discovery" workshop this week, on… https://t.co/Z8GNduuDeJ
20 Jul 17
Amazing session @edgenome with @gatk_dev comes to an end. Enriched learning! Thx #gatk #gatk2017 #Genomics #Edinburgh #Bioinformatics
19 Jul 17
Great day of lectures on GATK Best Practices yesterday. Very clear and useful. Thanks! @gatk_dev @edgenome
18 Jul 17
See more of our favorite tweets...