The GATK 2.0 release includes both the addition of brand-new (and often still experimental) tools and updates to the existing stable tools.

New Tools

  • Base Recalibrator (BQSR v2), an upgrade to CountCovariates/TableRecalibration that generates base substitution, insertion, and deletion error models.
  • Reduce Reads, a BAM compression algorithm that reduces file sizes by 20x-100x while preserving all information necessary for accurate SNP and indel calling. ReduceReads enables the GATK to call tens of thousands of deeply sequenced NGS samples simultaneously.
  • HaplotypeCaller, a multi-sample local de novo assembly and integrated SNP, indel, and short SV caller.
  • Plus powerful extensions to the Unified Genotyper to support variant calling of pooled samples, mitochondrial DNA, and non-diploid organisms. Additionally, the extended Unified Genotyper introduces a novel error modeling approach that uses a reference sample to build a site-specific error model for SNPs and indels that vastly improves calling accuracy.

Base Quality Score Recalibration

  • IMPORTANT: the Count Covariates and Table Recalibration tools (which comprise BQSRv1) have been retired! Please see the BaseRecalibrator tool (BQSRv2) for running recalibration with GATK 2.0.

Unified Genotyper

  • Handle exception generated when non-standard reference bases are present in the fasta.
  • Bug fix for indels: when checking the limits of a read to clip, it wasn't considering reads that may already have been clipped before.
  • Now emits the MLE AC and AF in the INFO field.
  • Don't allow N's in insertions when discovering indels.

Phase By Transmission

  • Multi-allelic sites are now correctly ignored.
  • Reporting of mendelian violations is enhanced.
  • Corrected TP overflow.
  • Fixed bug that arose when no PLs were present.
  • Added option to output the father's allele first in phased child haplotypes.
  • Fixed a bug that caused the wrong phasing of child/father pairs.

Variant Eval

  • Improvements to the validation report module: if eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status.
  • If present, the AlleleCount stratification uses the MLE AC by default (and otherwise drops down to use the greedy AC).
  • Fixed bugs in the VariantType and IndelSize stratifications.

Variant Annotator

  • FisherStrand annotation no longer hard-codes in filters for bases/reads (previously used MAPQ > 20 && QUAL > 20).
  • Miscellaneous bug fixes to experimental annotations.
  • Added a Clipping Rank Sum Test to detect when variants are present on reads with differential clipping.
  • Fixed the ReadPos Rank Sum Test annotation so that it no longer uses the un-hardclipped start as the alignment start.
  • Fixed bug in the NBaseCount annotation module.
  • The new TandemRepeatAnnotator is now a standard annotation while HRun has been retired.
  • Added PED support for the Inbreeding Coefficient annotation.
  • Don't compute QD if there is no QUAL.

Variant Quality Score Recalibration

  • The VCF index is now created automatically for the recalFile.

Variant Filtration

  • Now allows you to run with type unsafe JEXL selects, which all default to false when matching.

Select Variants

  • Added an option which allows the user to re-genotype through the exact AF calculation model (if PLs are present) in order to recalculate the QUAL and genotypes.

Combine Variants

  • Added --mergeInfoWithMaxAC argument to keep info fields from the input with the highest AC value.

Somatic Indel Detector

  • GT header line is now output.

Indel Realigner

  • Automatically skips Ion reads just like it does with 454 reads.

Variants To Table

  • Genotype-level fields can now be specified.
  • Added the --moltenize argument to produce molten output of the data.

Depth Of Coverage

  • Fixed a NullPointerException that could occur if the user requested an interval summary but never provided a -L argument.

Miscellaneous

  • BCF2 support in tools that output VCFs (use the .bcf extension).
  • The GATK Engine no longer automatically strips the suffix "Walker" after the end of tool names; as such, all tools whose name ended with "Walker" have been renamed without that suffix.
  • Fixed bug when specifying a JEXL expression for a field that doesn't exist: we now treat the whole expression as false (whereas we were rethrowing the JEXL exception previously).
  • There is now a global --interval_padding argument that specifies how many basepairs to add to each of the intervals provided with -L (on both ends).
  • Removed all code associated with extended events.
  • Algorithmically faster version of DiffEngine.
  • Better down-sampling fixes edge case conditions that used to be handled poorly. Read Walkers can now use down-sampling.
  • GQ is now emitted as an int, not a float.
  • Fixed bug in the Beagle codec that was skipping the first line of the file when decoding.
  • Fixed bug in the VCF writer in the case where there are no genotypes for a record but there are genotypes in the header.
  • Miscellaneous fixes to the VCF headers being produced.
  • Fixed up the BadCigar read filter.
  • Removed the old deprecated genotyping framework revolving around the misordering of alleles.
  • Extensive refactoring of the GATKReports.
  • Picard jar updated to version 1.67.1197.
  • Tribble jar updated to version 110.

Comment on this article in the forum



At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

@BrianPardy Great, thanks for the feedback!
28 Sep 16
@BrianPardy Thank you! Does anything in particular stand out? We'd love to know what people find most useful so we can do more of the same.
28 Sep 16
#GATK workshop crew is in Basel, ready to roll! @ISBSIB https://t.co/m56JzpC1bN
25 Sep 16
@thatdnaguy That's right, we've retired it, see https://t.co/epbvwOQVTt
23 Sep 16
@geoffjentry @BroadGenomics Ah, you should ask @WDL_dev on the WDL forum then :)
21 Sep 16

Our favorite tweets from others

I've easily written my first custom ReadFilter for GATK. The @gatk_dev 's toolkit is a great example of programming.
21 Sep 16
@gatk_dev "make it so"
8 Sep 16
it's the nightly build owl for GATK :D https://t.co/OwTRrk6KHA https://t.co/rfmAbdIIQp
11 Aug 16
We're going to make an hg38 version of ExAC. And we'll make @dgmacarthur pay for it. #BioinformaticsCampaignPromises
2 Aug 16
You’re right @gatk_dev honesty is key! About variants manual filtering: “In any case you're probably in for a world of pain.” Ha now I know!
11 Jul 16
See more of our favorite tweets...
Search blog by tag

ad appistry ashg benchmarks best-practices bug bug-fixed cancer catvariants cloud cluster commandline commandlinegatk community compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 gatk4 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm java8 job job-offer jobs license meetings mendelianviolations multithreading mutect mutect2 ngs nt outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote reference-model release release-notes rnaseq runtime saas script search selectvariants sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory trivia troll tutorial unifiedgenotyper variantannotator variantrecalibrator vcf-gz version-highlights versions vqsr wdl webinar workflow workshop