By Laura Gauthier, lead GATK developer for germline short variant discovery

Q: What, there's a HaplotypeCaller paper?

A: Yes! We are super excited to announce the long-awaited release of The HaplotypeCaller Paper -- or rather, the preprint in bioRxiv. (Actually we announced it on Twitter a while back but we understand not everyone enjoys such an old-school way of keeping up with the news). Hopefully you’re as excited as we are, if not more so, but we understand that this probably raises a few questions for some of you, so we tried to address some of those below.

Q: Why did it take so long?!

A: Our mission is to develop the tools that get used by others to do groundbreaking scientific research. Benchmarking and validation are important parts of our prototyping and development cycle, but given that we’re not subject to the “publish or perish” culture of a research lab, submitting manuscripts presenting those results wasn’t a high priority for us.

Q: Are you going to submit it to a peer-reviewed journal?

A: Probably not.

Q: Why not?

A: Our main motivation for posting the HaplotypeCaller manuscript to bioRxiv was to provide something recent/reasonable to cite and to make more details of the methods public. Submitting to a peer-reviewed journal usually involves a lot of time working on revisions that we’d rather put towards working on further improvements to the tools.

Q: Is it still a preprint if it's never intended to go to print?

A: You tell us.

Q: What version of HaplotypeCaller does the paper describe?

A: The paper describes the GATK 3.4 version of the HaplotypeCaller (yes we started this a while back) but the HaplotypeCaller has not changed significantly in later 3.x versions so it's fair to say the paper covers up to version 3.8 completely.

Q: How do these results compare to GATK4?

A: At time of writing, the GATK4 version of HaplotypeCaller is still considered a beta version. The team is actively working on validating the GATK4 version to make sure that it’s guaranteed to be as good as or better than the GATK3 version described in the paper.

Q: How does the methodology compare to GATK4?

A: The GATK engine that parses the BAM and “shards” the data to pass to the tools has been rewritten for improved efficiency over GATK3, and the HaplotypeCaller code has been refactored for better organization and readability. So there's a lot that is different in terms of software implementation. However the algorithms and equations presented in the manuscript remain the same, so overall the paper's description of how the HaplotypeCaller operates also applies to the GATK4 beta version, and it is appropriate to use it as a citation for results derived from versions up to the current beta (4.beta.6).

Q: Does the release of this paper hint at a change in how the team prioritizes publication?

A: To some extent. The developers of the somatic variant caller Mutect2 and related tools have put in a lot of effort to prepare white papers on the methods involved (Mutect2 itself, the assembly process and the pairHMM algorithm), some of which are shared with the HaplotypeCaller. They hope to release a manuscript featuring Mutect2 somatic SNV and INDEL variant calling results in the near future. Additionally, the GATK development team as a whole aims to make more of our internal benchmarking and validation efforts more transparent and available to other tool developers; an effort that our colleague Yossi Farjoun kicked off in style in his blog post about the new "SynDip" benchmark last week.

Return to top

Wed 13 Dec 2017
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


RT @dgmacarthur: We’re looking for a talented computational biologist to help drive the development of new methods for the diagnosis of rar…
21 Mar 18
@SeqComplete Please use the gatk wrapper script rather than calling the jars directly. Follow the quick start guide…
19 Mar 18
Join us on Tuesday March 20, we'll be talking about running #GATK4 pipelines on @BroadFireCloud, and how we optimiz…
16 Mar 18
#GATK Forums will be largely unattended today while we huddle for warmth and battle polar bears for food
13 Mar 18
@mikegloud @zaczap You could also come to one of our in-person workshops, we have a bunch coming up
9 Mar 18

- Our favorite tweets from others

@gatk_dev We'll be happy to host a survival boot camp for you guys in Finland! Complementary polar bear provided.
13 Mar 18
@gatk_dev @ericschmidt Standardised, accessible, $5. Wow.
6 Mar 18
I am very grateful to the software from academia that "just works", like GATK, BWA, Picard, Bedtools, SAMtools, Kal…
3 Mar 18
@gatk_dev @BroadFireCloud @github We've been hacking GATK for years to play well with fish and butterflies! My enth…
23 Feb 18
@BroadFireCloud also does call caching!!!!!! Perfect for tinkering with multi-step pipelines.
23 Feb 18

See more of our favorite tweets...