This is an important heads-up regarding the GATK 3.0 release.

The purpose of the ReduceReads algorithm was to enable joint analysis of large cohorts by the UnifiedGenotyper. The new workflow for joint discovery, which involves doing a single-sample pass with the HaplotypeCaller in gVCF mode followed by a joint analysis on multiple sample gVCFs, renders the compression step obsolete.

In addition, based on our most recent analyses, we have come to the conclusion that the quality of variant calls made on BAMs compressed with ReduceReads is inferior to the standard targeted by GATK tools. In comparison, the results obtained with the new workflow are far superior.

For these reasons, we have made the difficult decision to remove the ReduceReads tool from version 3.0 of the toolkit. To be clear, reduced BAMs will NOT be supported in GATK 3.0.

We realize that this may cause some disruption to your existing workflows, and for that we apologize. Please understand that we are driven to provide tools that produce the best possible results. Now that all the data is in, we have found that the best results cannot be achieved with reduced BAMs, so we feel that the best thing to do is to remove this inferior tool from the toolkit, and promote the new tools.

As always we welcome your comments, and we look forward to showing you how the new calling workflow will yield superior results.


jpitt


Hi @Geraldine_VdAuwera, Thanks for the heads up on this! Since I saw this post I've looked around without much success. Is HaplotypeCaller's gVCF and subsequent multi-sample gVCF calling implemented in the current 2.8.1 build of GATK, or do we need to wait until 3.0 is released? Also, for multi-sample calling you say that ReducedReads + UnifiedGenotyper perform subotimally compared to HaplotypeCaller + gVCF. Does this mean that the calls obtained from ReducedReads bams are inherently poor? More specifically, are ReducedReads + UnifiedGenotyper calls of much lower quality than regular bams + UnifiedGenotyper calls? Assuming that HaplotypeCaller gVCF is coming in version 3.0, would your team advise against using ReducedReads bams in the meantime? Thanks!

Tue 4 Mar 2014

Geraldine_VdAuwera


Hi @jpitt, The gVCF (or reference model) is already implemented in 2.8, though it is further improved in the 3.0 code. However the tools for making joint analyses based on gVCFs of multiple samples (which is what replaces RR+UG) are not yet available; they will be in the new release. Which, if all goes well, will be released tomorrow! So I would advise waiting a day or two for the new version to be available. > are ReducedReads + UnifiedGenotyper calls of much lower quality than regular bams + UnifiedGenotyper calls? It's all relative, but let's say that they are inferior enough (particularly to the results we get with the new method) that we absolutely encourage you to switch over to the new method.

Tue 4 Mar 2014

BretH


How will you be filtering out low-information reads (like pcr duplicates)? Or are these included in the joint varient analysis step?

Tue 4 Mar 2014

Geraldine_VdAuwera


Hi @BretH, We mark duplicates using Picard tools in pre-processing, same as previously. This causes GATK tools to ignore dupes in subsequent analyses.

Tue 4 Mar 2014

BretH


Thank you for your answer.

Tue 4 Mar 2014




At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @dgmacarthur: Get in on the ground floor with an amazing team building software that's already transforming genomic analysis. https://t.…
26 May 17
I added a video to a @YouTube playlist https://t.co/fpNmKf6jlP GATK4: speed optimizations, new tools, and open source licensing
26 May 17
I added a video to a @YouTube playlist https://t.co/Bur7IbDefW GATK4: speed optimizations, new tools, and open source licensing - Open
26 May 17
I added a video to a @YouTube playlist https://t.co/y2zRjExH9v GATK4: speed optimizations, new tools, and open source licensing -
26 May 17
RT @BroadGenomics: @gatk_dev experts are at the @IntelHealth Hospitality Suite (Dartmouth Room) until 11:30am today! Stop by to ask about G…
25 May 17

Our favorite tweets from others

Huge thanks to the @gatk_dev team: they return to BSD license (https://t.co/xW80GJctrT)! Watch out for the #GATK package in #Bioconda!
26 May 17
This is great GATK @gatk_dev 4 open source (again), BSD3! 💯 https://t.co/jmsStAVE6S
25 May 17
Wooow, really exiting and cheerful news! Will load it up on our server for sure! Congrats @gatk_dev https://t.co/9ppcH4I4Mh
25 May 17
Kudos and congratulations to the @broadinstitute and @gatk_dev for open source release of GATK4 and other tools.
25 May 17
best error output: Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
4 Apr 17
See more of our favorite tweets...
Search blog by tag

appistry ashg ashg16 benchmarks best-practices bug bug-fixed cloud cluster cnv collaboration community compute conference conferences cram cromwell depthofcoverage diagnosetargets error forum gatk3 gatk4 genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller help hg38 holiday hts htsjdk ibm intel java8 job job-offer jobs license meetings mutect mutect2 ngs outreach pairhmm parallelism patch pdf performance picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script sequencing service slides snow speed support talks team terminology topstory troll tutorial unifiedgenotyper vcf-gz version-highlights wdl workflow workshop xhmm