GATK 3.6 was released on June 1, 2016. Itemized changes are listed below. For more details, see the user-friendly version highlights.


Variant calling features

  • HaplotypeCaller will now emit a no-call (./.) for any sample where GQ is zero, in both normal and GVCF modes, instead of emitting a specific genotype in which we have zero confidence.

  • GenotypeGVCFs will now emit a QUAL value for hom-ref sites when run in -allSites mode.

  • Implemented tracking of dropped reads by HaplotypeCaller and MuTect2 (see highlights for details).

  • Assorted optimizations to the joint calling code, expected to speed up genotyping (not the overall tool run) by about 10 percent.

  • Enabled MuTect2 to annotate all the same regular (non-AS) annotations as HaplotypeCaller on request.

Assorted new functionality

  • New ranksum annotations (allele-specific insert size and MQ of mate).
  • New -AS mode to run VQSR in an allele-specific manner (both VariantRecalibrator and ApplyRecalibration) (still experimental).
  • VariantRecalibrator can now output the recalibration model to a file (in GATKReport format — use the R library gsalib for reading).
  • Added ability to have VariantRecalibrator retry building the recalibration model if it fails initially. Meant as a workaround for runs on small datasets that fail randomly because the model isn't robust enough. Default behavior remains a single try. Contributed by @depristo / Mark DePristo.
  • ValidateVariants can now perform validation checks specific to GVCFs with the option --gvcf.
  • VariantsToTable now determines each allele's type when -F TYPE and -SMA are specified together.
  • LeftAlignAndTrimVariants now retains genotypes that remain valid after splitting with —splitMultiallelics (previously all were discarded).
  • SelectVariants can now select sites based on the number or fraction of samples that have no-call genotypes (./.) using —maxNOCALLnumber and —maxNOCALLfraction, respectively.
  • DepthOfCoverage now supports collecting coverage statistics for overlapping exons/genes. Contributed by @seru71 / Pawel Sztromwasser.

Assorted bug fixes

  • Handling of allele depths when the NON_REF allele is non-zero (see highlights for details)
  • A sample ploidy check that may have minor performance implications
  • Threshold evaluation in the max alt alleles filter of MuTect2
  • MQ annotation calculation when processing BP resolution GVCFs
  • RankSum calculations on small sample sizes
  • PrintReads’ ability to emit a @PG header record
  • Writing GVCFs to stdout instead of to file
  • Order of column headers in sample_gene_summary reports output by DepthOfCoverage
  • MNP-merging behavior of ReadBackedPhasing: treatment of spanning deletions and consecutive SNPs
  • SelectVariants and VariantFiltration’s ability to update genotype summary annotations (AC, AN and AF)
  • Subsetting alleles from StrandAlleleCountsBySample annotation

Workarounds for weird sites

  • Added an argument to HaplotypeCaller and GenotypeGVCFs, -maxNumPLValues, that controls the maximum number of PL values that can be emitted for a given site. If the number of PLs resulting from the combination of observed alleles and ploidy exceeds this value, no PLs will be emitted. This will cause subsetting errors in SelectVariants but empowers the user to identify and work around difficult sites where this happens.

  • Extended the functionality of the engine-level argument —reference_window_stop to set the reference window size used by VariantAnnotator when annotating hompolymers through the HomopolymerRun annotation. This makes it possible to deal with the problem of homopolymer stretches that are longer than the default window size.

Deleted functionality

  • Removed Phone Home usage tracking system (see highlights for details)
  • Deprecated GenotypeAndValidate tool which was massively outdated and had no unit or integration tests

Tools moved to the open-source core of GATK

  • IndelRealigner and RealignerTargetCreator
  • Post-IR MQ reverter filter to public
  • Moved BQSRGatherer and dependencies to the public module

Core / engine functionality

  • Enabled Java 8 support (see highlights for details)
  • Updated htsjdk & picard to version 2.4.1
  • Tweaks to the genome coordinates parsing system and contig names to support the Hg38 reference
  • Assorted improvements in the handling of errors, warnings and log output. The engine will now output a summary of WARN messages encountered during a run so you don’t have to parse the full log to see if anything worrying-but-not-fatal happened.

Queue

  • Expose time between checks for whether new jobs can be submitted as a user-settable parameter on CLi. Useful when testing pipelines to make idle time shorter. Contributed by @dakl / Daniel Klevebring.

  • Remove mem_free from resident memory request params for Queue because it doesn't work and wouldn't actually reserve memory.

Tool documentation

  • Improvements and clarifications to many tool docs
  • Refreshed organization and naming of tool categories
  • Fixed display of default values for arguments
  • Switched default doc output to html to make the tool docs provided for nightly builds more readable

Return to top

ekanterakis on 1 Jun 2016


Hello, is there an option associated with having "VariantRecalibrator retry building the recalibration model if it fails initially"? We've been experiencing this a lot running VQSR on a single WGS sample. Thank you

ekanterakis on 1 Jun 2016


> @ekanterakis said: > Hello, is there an option associated with having "VariantRecalibrator retry building the recalibration model if it fails initially"? We've been experiencing this a lot running VQSR on a single WGS sample. Thank you I'm guessing it's "max_attempts" and the default is 1

Geraldine_VdAuwera on 1 Jun 2016


Correct, @ekanterakis




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @curroortuno: Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From reads to…
3 Sep 19
Thank you @murilocervato for hosting our GATK workshop in Sao Paolo last week! Great group of participants, we’ll s… https://t.co/QbT7P5dw5k
3 Sep 19
@RealMattJM “Convoluted”, huh? We see what you did there... https://t.co/GbnpCFWPOX
29 Aug 19
#GATK workshop caption competition: what is deep learning developer Sam Friedman trying to say here? https://t.co/qaJwg3lF1W
28 Aug 19
@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19

- Our favorite tweets from others

Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From re… https://t.co/ISBVX2Xwr5
3 Sep 19
Another successful #GATK workshop in the books! @TerraBioApp @gatk_dev https://t.co/oFzPLei0f9
3 Sep 19
Day 2 of #GATK workshop this time in São Paulo, Brazil! Hands-on tutorials using @TerraBioApp #GATK Best Practices… https://t.co/GgtENDNfhk
28 Aug 19
In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no… https://t.co/FKQTouZjbv
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying… https://t.co/er2klIcw5i
18 Jul 19

See more of our favorite tweets...