GATK 2.7 was released on August 21, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history


Reduce Reads

  • Changed the underlying convention of having unstranded reduced reads; instead there are now at least 2 compressed reads at every position, one for each strand (forward and reverse). This allows us to maintain strand information that is useful for downstream filtering.
  • Fixed bug where representative depths were arbitrarily being capped at 127 (instead of the expected 255).
  • Fixed bug where insertions downstream of a variant region weren't triggering a stop to the compression.
  • Fixed bug when using --cancer_mode where alignments were being emitted out of order (and causing the tool to fail).

Unified Genotyper

  • Added --onlyEmitSamples argument that, when provided, instructs that caller to emit only the selected samples into the VCF (even though the calling is performed over all samples present in the provided bam files).
  • FPGA support was added to the underlying HMM that is automatically used when the appropriate hardware is available on the machine.
  • Added a (very) experimental argument (allSitePLs) that will have the caller emit PLs for all sites (including reference sites). Note that this does not give a fully accurate reference model because it models only SNPs. Full a proper handling of the reference model, please use the Haplotype Caller.

Haplotype Caller

  • Added a still somewhat experimental PCR indel error model to the Haplotype Caller. By default this modeling is turned on and is very useful for removing false positive indel calls associated with PCR slippage around short tandem repeats (esp. homopolymers). Users have the option (with the --pcr_indel_model argument) of turning it off or making it even more aggressive (at the expense of losing some true positives too).
  • Added the ability to emit accurate likelihoods for non-variant positions (i.e. what we call a "reference model" that incorporates indels as well as SNP confidences at every position). The output format can be either a record for every position or use the gVCF style recording of blocks. See the --emitRefConfidence argument for more details; note that this replaces the use of "--output_mode EMIT_ALL_SITES" in the HaplotypeCaller.
  • Improvements to the internal likelihoods that are generated by the Haplotype Caller. Specifically, this tool now uses a tri-state correction like the Unified Genotyper, corrects for overlapping read pairs (from the same underlying fragment), and does not run contamination removal (allele-biased downsampling) by default.
  • Several small runtime performance improvements were added (although we are still hard at work on larger improvements that will allow calling to scale to many samples; we're just not there yet).
  • Fixed bug in how adapter clipping was performed (we now clip only after reverting soft-clipped bases).
  • FPGA support was added to the underlying HMM that is automatically used when the appropriate hardware is available on the machine.
  • Improved the "dangling tail" recovery in the assembly algorithm, which allows for higher sensitivity in calling variants at the edges of coverage (e.g. near the ends of targets in an exome).
  • Added the ability to run allele-biased downsampling with different per-sample values like the Unified Genotyper (contributed by Yossi Farjoun).

Variant Annotator

  • Fixed bug where only the last -comp was being annotated at a site.

Indel Realigner

  • Fixed bug that arises because of secondary alignments and that was causing the tool not to update the alignment start of the mate when a read was realigned.

Phase By Transmission

  • Fixed bug where multi-allelic records were being completely dropped by this tool. Now they are emitted unphased.

Variant Recalibrator

  • General improvements to the Gaussian modeling, mostly centered around separating the parameters for the positive and negative training models.
  • The percentBadVariants argument has been replaced with the numBad argument.
  • Added mode to not emit (at all) variant records that are filtered out.
  • This tool now automatically orders the annotation dimensions by their standard deviation instead of the order they were specified on the command-line in order to stabilize the training and have it produce optimal results.
  • Fixed bug where the tool occasionally produced bad log10 values internally.

Miscellaneous

  • General performance improvements to the VCF reading code contributed by Michael McCowan.
  • Error messages are much less verbose and "scary."
  • Added a LibraryReadFilter contributed by Louis Bergelson.
  • Fixed the ReadBackedPileup class to represent mapping qualities as ints, not (signed) bytes.
  • Added the engine-wide ability to do on-the-fly BAM file sample renaming at runtime (see the documentation for the --sample_rename_mapping_file argument for more details).
  • Fixed bug in how the GATK counts filtered reads in the traversal output.
  • Added a new tool called Qualify Intervals.
  • Fixed major bug in the BCF encoding (the previous version was producing problematic files that were failing when trying to be read back into the GATK).
  • Picard/sam/tribble/variant jars updated to version 1.96.1534.

Return to top

aeonsim


Sweet thanks for this. Is there likely to be a Version Highlights post?

Wed 21 Aug 2013

Geraldine_VdAuwera


Yes there will be, ETA Friday if all goes well.

Wed 21 Aug 2013



Wed 21 Aug 2013


At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @BroadGenomics: After the Samples Lab, we move into Extractions or Storage. What would you like to see next? #BroadGenomicsTour https://…
29 Aug 16
@BrandonDWomack Sure, just ask in the GATK support forum https://t.co/6z2bZ2AeN0
29 Aug 16
@thatdnaguy Note: future releases may switch to mem at some point. For our prod pipelines this is tied to shift to cloud.
26 Aug 16
@thatdnaguy @exac_exomes Same as first iirc, bwa aln as noted in supplemental doc https://t.co/ziwLjp7iIw
26 Aug 16
@MattBashton Fixed, thanks for reporting!
22 Aug 16

Our favorite tweets from others

it's the nightly build owl for GATK :D https://t.co/OwTRrk6KHA https://t.co/rfmAbdIIQp
11 Aug 16
We're going to make an hg38 version of ExAC. And we'll make @dgmacarthur pay for it. #BioinformaticsCampaignPromises
2 Aug 16
You’re right @gatk_dev honesty is key! About variants manual filtering: “In any case you're probably in for a world of pain.” Ha now I know!
11 Jul 16
.@gatk_dev I like the new documentation index page, the subheading has made my day! :D #doge #GeekHumourFTW https://t.co/9RXnDTMoBm
8 Jul 16
There is no NGS, NG is today so should only be called high-throughput sequencing #CSC #GATKworkshop https://t.co/paHcNimD7o
16 Jun 16
See more of our favorite tweets...
Search blog by tag

ad appistry ashg benchmarks best-practices bug bug-fixed cancer catvariants challenge cloud cluster commandline commandlinegatk community competition compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm java8 job job-offer jobs joint-discovery license meetings mendelianviolations multisample multithreading mutect mutect2 ngs nt outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote reference-model release release-notes rnaseq runtime saas script search selectvariants sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory trivia troll tutorial unifiedgenotyper variantannotator variantrecalibrator vcf-gz version-highlights versions vqsr wdl webinar workflow workshop