GATK 2.4 was released on February 26, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

Important note 1 for this release: with this release comes an updated licensing structure for the GATK. Different files in our public repository are protected with different licenses, so please see the text at the top of any given file for details as to its particular license.

Important note 2 for this release: the GATK team spent a tremendous amount of time and engineering effort to add extensive tests for many of our core tools (a process that will continue into future releases). Unsurprisingly, as part of this process many small (and some not so small) bugs were uncovered during testing that we subsequently fixed. While we usually attempt to enumerate in our release notes all of the bugs fixed during a given release, that would entail quite a Herculean effort for release 2.4; so please just be aware that there were many smaller fixes that may be omitted from these notes.

Base Quality Score Recalibration

  • The underlying calculation of the recalibration has been improved and generalized so that the empirical quality is now calculated through a Bayesian estimate. This radically improves the accuracy in particular for bins with small numbers of observations.
  • Added many run time improvements so that this tool now runs much faster.
  • Print Reads writes a header when used with the -BQSR argument.
  • Added a check to make sure that BQSR is not being run on a reduced bam (which would be bad).
  • The --maximum_cycle_value argument can now be specified during the Print Reads step to prevent problems when running on bams with extremely long reads.
  • Fixed bug where reads with an existing BQ tag and soft-clipped bases could cause the tool to error out.

Unified Genotyper

  • Fixed the QUAL calculation for monomorphic (homozygous reference) sites (the math for previous versions was not correct).
  • Biased downsampling (i.e. contamination removal) values can now be specified as per-sample fractions.
  • Fixed bug where biased downsampling (i.e. contamination removal) was not being performed correctly in the presence of reduced reads.
  • The indel likelihoods calculation had several bugs (e.g. sometimes the log likelihoods were positive!) that manifested themselves in certain situations and these have all been fixed.
  • Small run time improvements were added.

Haplotype Caller

  • Extensive performance improvements were added to the Haplotype Caller. This includes run time enhancements (it is now much faster than previous versions) plus improvements in accuracy for both SNPs and indels. Internal assessment now shows the Haplotype Caller calling variants more accurately than the Unified Genotyper. The changes for this tool are so extensive that they cannot easily be enumerated in these notes.

Variant Annotator

  • The QD annotation is now divided by the average length of the alternate allele (weighted by the allele count); this does not affect SNPs but makes the calculation for indels much more accurate.
  • Fixed Fisher Strand annotation where p-values sometimes summed to slightly greater than 1.0.
  • Fixed Fisher Strand annotation for indels where reduced reads were not being handled correctly.
  • The Haplotype Score annotation no longer applies to indels.
  • Added the Variant Type annotation (not enabled by default) to annotate the VCF record with the variant type.
  • The DepthOfCoverage annotation has been renamed to Coverage.

Reduce Reads

  • Several small run time improvements were added to make this tool slightly faster.
  • By default this tool now uses a downsampling value of 40x per start position.

Indel Realigner

  • Fixed bug where some reads with soft clipped bases were not be realigned.

Combine Variants

  • Run time performance improvements added where one uses the PRIORITIZE or REQUIRE_UNIQUE options.

Select Variants

  • The --regenotype functionality has been removed from SelectVariants and transferred into its own tool: RegenotypeVariants.

Variant Eval

  • Removed the GenotypeConcordance evaluation module (which had many bugs) and converted it into its own tested, standalone tool (called GenotypeConcordance).

Miscellaneous

  • The VariantContext and related classes have been moved out of the GATK codebase and into Picard's public repository. The GATK now uses the variant.jar as an external library.
  • Added a new Read Filter to reassign just a particular mapping quality to another one (see the ReassignOneMappingQualityFilter).
  • Added the Regenotype Variants tool that allows one to regenotype a VCF file (which must contain likelihoods in the PL field) after samples have been added/removed.
  • Added the Genotype Concordance tool that calculates the concordance of one VCF file against another.
  • Bug fix for VariantsToVCF for records where old dbSNP files had '-' as the reference base.
  • The GATK now automatically converts IUPAC bases in the reference to Ns and errors out on other non-standard characters.
  • Fixed bug for the DepthOfCoverage tool which was not counting deletions correctly.
  • Added Cat Variants, a standalone tool to quickly combine multiple VCF files whose records are non-overlapping (e.g. as produced during scatter-gather).
  • The Somatic Indel Detector has been removed from our codebase and moved to the Broad Cancer group's private repository.
  • Fixed Validate Variants rsID checking which wasn't working if there were multiple IDs.
  • Picard jar updated to version 1.84.1337.
  • Tribble jar updated to version 1.84.1337.
  • Variant jar updated to version 1.85.1357.

aeonsim


Sweet looks like a nice improvement. Great to see Unit tests being implemented extensively. Will there be a 2.4 Version highlights post?

Mon 25 Feb 2013

Geraldine_VdAuwera


Yep, the highlights post is in the works -- just a few more revisions to do and it should be out later this afternoon.

Mon 25 Feb 2013

TechnicalVault


Hi Geraldine, I notice the downloads page still asking us to agree to the old license? Could this be corrected please?

Mon 25 Feb 2013

Geraldine_VdAuwera


Oops, yes I will correct this. Thanks for pointing it out!

Mon 25 Feb 2013

Geraldine_VdAuwera


@TechnicalVault, I've updated the license. Thanks again for pointing it out, this is important.

Mon 25 Feb 2013

kailee


Is there any way I could still gain access to the Somatic Indel Detector tool?

Mon 25 Feb 2013

Geraldine_VdAuwera


Hi @kailee, you should contact the Cancer Group here at the Broad. The SomaticIndelDetector is now in their hands. They are the group who also develop MuTect.

Mon 25 Feb 2013




At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

@DataKimist Enjoy! And let us know if we can help.
19 Apr 17
@mjpchaisson Not meant that way - just depending on what you're doing you may want to cite earlier framework or lat… https://t.co/QpIbwRf0bC
18 Apr 17
@cabioinformatic For more recent versions see https://t.co/QCbos5KBWw
15 Apr 17
@thatdnaguy @notigor @David_McGaughey @brent_p Indel Realign is redundant with assembly-based realign done by HC, w… https://t.co/77Lyil7BJY
13 Apr 17
RT @edgenome: Learn GATK Best Practices from @gatk_dev @broadinstitute experts, Edinburgh, 17-19 July: https://t.co/XqJP6e3XbK
13 Apr 17

Our favorite tweets from others

best error output: Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
4 Apr 17
From the @gatk_dev page describing .vcf files: "Don't write home-brewed VCF parsing scripts. It never ends well” https://t.co/28KcRoV14j
28 Feb 17
Our 3-day course on GATK https://t.co/mtN60KRTyS finished - 38 participants very happy! Big thanks to @gatk_dev team for excellent lessons.
24 Feb 17
@froggleston @dgmacarthur Sounds like ExAC is reaching Uber stage. ‘Uber but for pizza’. ‘ExAC but for wheat’.
14 Jan 17
#ESRenpeinture grad school - postdoc - after postdoc https://t.co/o3vQMgBDgk
6 Jan 17
See more of our favorite tweets...
Search blog by tag

appistry ashg ashg16 benchmarks best-practices bug bug-fixed cloud cluster cnv collaboration community compute conference conferences cram cromwell depthofcoverage diagnosetargets error forum gatk3 gatk4 genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller help hg38 holiday hts htsjdk ibm intel java8 job job-offer jobs license meetings mutect mutect2 ngs outreach pairhmm parallelism patch pdf performance picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script sequencing service slides snow speed status support talks team terminology topstory troll tutorial unifiedgenotyper vcf-gz version-highlights wdl workflow workshop xhmm