By Laura Gauthier, lead GATK developer for germline short variant discovery

Q: What, there's a HaplotypeCaller paper?

A: Yes! We are super excited to announce the long-awaited release of The HaplotypeCaller Paper -- or rather, the preprint in bioRxiv. (Actually we announced it on Twitter a while back but we understand not everyone enjoys such an old-school way of keeping up with the news). Hopefully you’re as excited as we are, if not more so, but we understand that this probably raises a few questions for some of you, so we tried to address some of those below.

Q: Why did it take so long?!

A: Our mission is to develop the tools that get used by others to do groundbreaking scientific research. Benchmarking and validation are important parts of our prototyping and development cycle, but given that we’re not subject to the “publish or perish” culture of a research lab, submitting manuscripts presenting those results wasn’t a high priority for us.

Q: Are you going to submit it to a peer-reviewed journal?

A: Probably not.

Q: Why not?

A: Our main motivation for posting the HaplotypeCaller manuscript to bioRxiv was to provide something recent/reasonable to cite and to make more details of the methods public. Submitting to a peer-reviewed journal usually involves a lot of time working on revisions that we’d rather put towards working on further improvements to the tools.

Q: Is it still a preprint if it's never intended to go to print?

A: You tell us.

Q: What version of HaplotypeCaller does the paper describe?

A: The paper describes the GATK 3.4 version of the HaplotypeCaller (yes we started this a while back) but the HaplotypeCaller has not changed significantly in later 3.x versions so it's fair to say the paper covers up to version 3.8 completely.

Q: How do these results compare to GATK4?

A: At time of writing, the GATK4 version of HaplotypeCaller is still considered a beta version. The team is actively working on validating the GATK4 version to make sure that it’s guaranteed to be as good as or better than the GATK3 version described in the paper.

Q: How does the methodology compare to GATK4?

A: The GATK engine that parses the BAM and “shards” the data to pass to the tools has been rewritten for improved efficiency over GATK3, and the HaplotypeCaller code has been refactored for better organization and readability. So there's a lot that is different in terms of software implementation. However the algorithms and equations presented in the manuscript remain the same, so overall the paper's description of how the HaplotypeCaller operates also applies to the GATK4 beta version, and it is appropriate to use it as a citation for results derived from versions up to the current beta (4.beta.6).

Q: Does the release of this paper hint at a change in how the team prioritizes publication?

A: To some extent. The developers of the somatic variant caller Mutect2 and related tools have put in a lot of effort to prepare white papers on the methods involved (Mutect2 itself, the assembly process and the pairHMM algorithm), some of which are shared with the HaplotypeCaller. They hope to release a manuscript featuring Mutect2 somatic SNV and INDEL variant calling results in the near future. Additionally, the GATK development team as a whole aims to make more of our internal benchmarking and validation efforts more transparent and available to other tool developers; an effort that our colleague Yossi Farjoun kicked off in style in his blog post about the new "SynDip" benchmark last week.

Return to top

Wed 13 Dec 2017
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


RT @BroadFireCloud: We've updated the preprocessing #GATK4, somatic CNV & SNV featured workspaces w/ time & cost benchmarks! Grab free cred…
18 Jan 18
This shows our mothership, the Data Sciences Platform at Broad. It’s amazing... and it’s expanding! Check out the v…
17 Jan 18
@ksuhre @desertGenomics Btw, for those who didn't get it the reference is; not the *most* b…
17 Jan 18
@BioinfoMcDermot Can’t claim credit for this happy coincidence but delighted it worked out that way!
16 Jan 18
@FabienCampagne If you post details in the forum we’d be happy to look into it. Was this with the 4.0 release?
16 Jan 18

- Our favorite tweets from others

@gatk_dev Thanks for giving GATK a BSD license. Great scientific software available to all
13 Jan 18
Thanks @broadinstitute @gatk_dev for the awesome Amazon gift card!!! I am happy to answer all of your surveys!
11 Jan 18
The @broadinstitute this week released #GATK4, the much-anticipated version 4 update to the Genome Analysis Toolkit…
10 Jan 18
Ditto here, thanks @gatk_dev @BroadFireCloud! New opportunities for advancement in #genomics #cancer #ngs research…
10 Jan 18
Thanks for free credits @gatk_dev @BroadFireCloud & @googlecloud!
10 Jan 18

See more of our favorite tweets...