Better late than never, here are the highlights of the most recent version release, GATK 2.8. This should be short and sweet because as releases go, 2.8 is light on new features, and is best described as a collection of bug fixes, which are all* dutifully listed in the corresponding release notes document. That said, two of the changes we've made deserve some additional explanation.

* Up to now (this release included) we have not listed updates/patches to Queue in the release notes, but will start doing so from the next version onward.


VQSR & bad variants: no more guessing games

In the last release (2.7, for those of you keeping score at home) we trumpeted that the old -percentBad argument of VariantRecalibrator had been replaced by the shiny new -numBad argument, and that this was going to be awesome for all sorts of good reasons, improve stability and whatnot. Weeeeeeell it turned out that wasn't quite the case. It worked really well on the subset of analyses that we tested it on initially, but once we expanded to different datasets (and the complaints started rolling in on the forum) we realized that it actually made things worse in some cases because the default value was less appropriate than what -percentBad would have produced. This left people guessing as to what value would work for their particular dataset, with a great big range to choose from and very little useful information to assist in the choice.

So, long story short, we (and by "we" I mean Ryan) built in a new function that allows the VariantRecalibrator to determine for itself the amount of variants that is appropriate to use for the "bad" model depending on the data. So the short-lived -numBad argument is gone too, replaced by... nothing. No new argument to specify; just let the VariantRecalibrator do its thing.

Of course if you really want to, you can override the default behavior and tweak the internal thresholds. See the tool doc here; and remember that a good rule of thumb is that if you can't figure out which arguments are involved based on that doc, you probably shouldn't be messing with this advanced functionality.


Reference calculation model

This is still a rather experimental feature, so we're still making changes as we go. The two big changes worth mentioning here are that you can now run this on reduced reads, and that we've changed the indexing routine to optimize the compression level. The latter shouldn't have any immediate impact on normal users, but it was necessary for a new feature project we've been working on behind the scenes (the single-sample-to-joint-discovery pipeline we have been alluding to in recent forum discussions). The reason we're mentioning it now is that if you use -ERC GVCF output, you'll need to specify a couple of new arguments as well (-variant_index_type LINEAR and -variant_index_parameter 128000, with those exact values). This useful little fact didn't quite make it into the documentation before we released, and not specifying them leads to an error message, so... there you go. No error message for you!


What's up, doc?

That's all for tool changes. In addition to those, we have made a number of corrections in the tool documentation pages, updated the Best Practices (mostly layout, tiny bit of content update related to the VQSR -numBad deprecation) and made some minor changes to the website, e.g. updated the list of publications that cite the GATK and improved the Guide index somewhat (but that's still a work in progress).


Return to topComment on this article in the forum




At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @dgmacarthur: Get in on the ground floor with an amazing team building software that's already transforming genomic analysis. https://t.…
26 May 17
I added a video to a @YouTube playlist https://t.co/fpNmKf6jlP GATK4: speed optimizations, new tools, and open source licensing
26 May 17
I added a video to a @YouTube playlist https://t.co/Bur7IbDefW GATK4: speed optimizations, new tools, and open source licensing - Open
26 May 17
I added a video to a @YouTube playlist https://t.co/y2zRjExH9v GATK4: speed optimizations, new tools, and open source licensing -
26 May 17
RT @BroadGenomics: @gatk_dev experts are at the @IntelHealth Hospitality Suite (Dartmouth Room) until 11:30am today! Stop by to ask about G…
25 May 17

Our favorite tweets from others

Huge thanks to the @gatk_dev team: they return to BSD license (https://t.co/xW80GJctrT)! Watch out for the #GATK package in #Bioconda!
26 May 17
This is great GATK @gatk_dev 4 open source (again), BSD3! 💯 https://t.co/jmsStAVE6S
25 May 17
Wooow, really exiting and cheerful news! Will load it up on our server for sure! Congrats @gatk_dev https://t.co/9ppcH4I4Mh
25 May 17
Kudos and congratulations to the @broadinstitute and @gatk_dev for open source release of GATK4 and other tools.
25 May 17
best error output: Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
4 Apr 17
See more of our favorite tweets...
Search blog by tag

appistry ashg ashg16 benchmarks best-practices bug bug-fixed cloud cluster cnv collaboration community compute conference conferences cram cromwell depthofcoverage diagnosetargets error forum gatk3 gatk4 genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller help hg38 holiday hts htsjdk ibm intel java8 job job-offer jobs license meetings mutect mutect2 ngs outreach pairhmm parallelism patch pdf performance picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script sequencing service slides snow speed support talks team terminology topstory troll tutorial unifiedgenotyper vcf-gz version-highlights wdl workflow workshop xhmm