Base Quality Score Recalibration

  • Multi-threaded support in the BaseRecalibrator tool has been temporarily suspended for performance reasons; we hope to have this fixed for the next release.
  • Implemented support for SOLiD no call strategies other than throwing an exception.
  • Fixed smoothing in the BQSR bins.
  • Fixed plotting R script to be compatible with newer versions of R and ggplot2 library.

Unified Genotyper

  • Renamed the per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarified the description in the VCF header.
  • UG now makes use of base insertion and base deletion quality scores if they exist in the reads (output from BaseRecalibrator).
  • Changed the -maxAlleles argument to -maxAltAlleles to make it more accurate.
  • In pooled mode, if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) then do not try to genotype.
  • Added improvements to indel calling in pooled mode: we compute per-read likelihoods in reference sample to determine whether a read is informative or not.

Haplotype Caller

  • Added LowQual filter to the output when appropriate.
  • Added some support for calling on Reduced Reads. Note that this is still experimental and may not always work well.
  • Now does a better job of capturing low frequency branches that are inside high frequency haplotypes.
  • Updated VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller.
  • Made fixes to the likelihood based LD calculation for deciding when to combine consecutive events.
  • Fixed bug where non-standard bases from the reference would cause errors.
  • Better separation of arguments that are relevant to the Unified Genotyper but not the Haplotype Caller.

Reduce Reads

  • Fixed bug where reads were soft-clipped beyond the limits of the contig and the tool was failing with a NoSuchElement exception.
  • Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out.
  • Fixed a bug where downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception.

Variant Eval

  • Fixed support in the AlleleCount stratification when using the MLEAC (it is now capped by the AN).
  • Fixed incorrect allele counting in IndelSummary evaluation.

Combine Variants

  • Now outputs the first non-MISSING QUAL, instead of the maximum.
  • Now supports multi-threaded running (with the -nt argument).

Select Variants

  • Fixed behavior of the --regenotype argument to do proper selecting (without losing any of the alternate alleles).
  • No longer adds the DP INFO annotation if DP wasn't used in the input VCF.
  • If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the output VC (since they are no longer accurate).

Miscellaneous

  • Updated and improved the BadCigar read filter.
  • GATK now generates a proper error when a gzipped FASTA is passed in.
  • Various improvements throughout the BCF2-related code.
  • Removed various parallelism bottlenecks in the GATK.
  • Added support of X and = CIGAR operators to the GATK.
  • Catch NumberFormatExceptions when parsing the VCF POS field.
  • Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions.
  • Fixed AlignmentUtils bug for handling Ns in the CIGAR string.
  • We now allow lower-case bases in the REF/ALT alleles of a VCF and upper-case them.
  • Added support for handling complex events in ValidateVariants.
  • Picard jar remains at version 1.67.1197.
  • Tribble jar remains at version 110.


Comment on this article


- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@hdeus To be clear we're not yet using convolutional neural nets (CNNs) in copy number (CNV) analysis. We're using… https://t.co/MYPQVjvzyC
18 May 18
Small correction on this #BioIT18 agenda item: Mark will actually be introducing Lee Lichtenstein from our team to… https://t.co/0UxLmX1uN0
16 May 18
RT @broadinstitute: Geraldine Van der Auwera of @gatk_dev will be presenting on the team’s Best Practices Pipeline tomorrow (5/16) at 12:40…
16 May 18
At Bio-IT World? Check out our abundant lineup of GATK4 talks and demos at #BioIT18 https://t.co/ahkUjny6Cw
15 May 18
Hey that pipeline looks familiar :) https://t.co/68lupDUkxN
15 May 18

- Our favorite tweets from others

Bioinformatics in a nutshell. 😑 #genetics #research #phd #gatk https://t.co/EjqaeFf4YZ
18 May 18
Want to hear the latest on WDL, Cromwell, FireCloud, and GATK #BioIT18 ? See this blog for tomorrow's schedule of t… https://t.co/S7kf58tECu
16 May 18
We caught up with @broadinstitute's Anthony Philippakis and Illumina's Susan Tousi today for perspective on today's… https://t.co/PNaVXbNB0r
15 May 18
#BioIT18 folks - come to booth #410 on 5/15 at 5:00 to learn about our $5 genome analysis pipeline (5 is clearly th… https://t.co/GH0zwrLlea
14 May 18
Excited to be registered for #GCCBOSC! Looking forward to the GATK4.0 and snakemake workshops. https://t.co/fMfcH38OhO
11 May 18

See more of our favorite tweets...