By Laura Gauthier, lead GATK developer for germline short variant discovery

Q: What, there's a HaplotypeCaller paper?

A: Yes! We are super excited to announce the long-awaited release of The HaplotypeCaller Paper -- or rather, the preprint in bioRxiv. (Actually we announced it on Twitter a while back but we understand not everyone enjoys such an old-school way of keeping up with the news). Hopefully you’re as excited as we are, if not more so, but we understand that this probably raises a few questions for some of you, so we tried to address some of those below.


Q: Why did it take so long?!

A: Our mission is to develop the tools that get used by others to do groundbreaking scientific research. Benchmarking and validation are important parts of our prototyping and development cycle, but given that we’re not subject to the “publish or perish” culture of a research lab, submitting manuscripts presenting those results wasn’t a high priority for us.

Q: Are you going to submit it to a peer-reviewed journal?

A: Probably not.

Q: Why not?

A: Our main motivation for posting the HaplotypeCaller manuscript to bioRxiv was to provide something recent/reasonable to cite and to make more details of the methods public. Submitting to a peer-reviewed journal usually involves a lot of time working on revisions that we’d rather put towards working on further improvements to the tools.

Q: Is it still a preprint if it's never intended to go to print?

A: You tell us.

Q: What version of HaplotypeCaller does the paper describe?

A: The paper describes the GATK 3.4 version of the HaplotypeCaller (yes we started this a while back) but the HaplotypeCaller has not changed significantly in later 3.x versions so it's fair to say the paper covers up to version 3.8 completely.

Q: How do these results compare to GATK4?

A: At time of writing, the GATK4 version of HaplotypeCaller is still considered a beta version. The team is actively working on validating the GATK4 version to make sure that it’s guaranteed to be as good as or better than the GATK3 version described in the paper.

Q: How does the methodology compare to GATK4?

A: The GATK engine that parses the BAM and “shards” the data to pass to the tools has been rewritten for improved efficiency over GATK3, and the HaplotypeCaller code has been refactored for better organization and readability. So there's a lot that is different in terms of software implementation. However the algorithms and equations presented in the manuscript remain the same, so overall the paper's description of how the HaplotypeCaller operates also applies to the GATK4 beta version, and it is appropriate to use it as a citation for results derived from versions up to the current beta (4.beta.6).

Q: Does the release of this paper hint at a change in how the team prioritizes publication?

A: To some extent. The developers of the somatic variant caller Mutect2 and related tools have put in a lot of effort to prepare white papers on the methods involved (Mutect2 itself, the assembly process and the pairHMM algorithm), some of which are shared with the HaplotypeCaller. They hope to release a manuscript featuring Mutect2 somatic SNV and INDEL variant calling results in the near future. Additionally, the GATK development team as a whole aims to make more of our internal benchmarking and validation efforts more transparent and available to other tool developers; an effort that our colleague Yossi Farjoun kicked off in style in his blog post about the new "SynDip" benchmark last week.


Return to top

Wed 13 Dec 2017
Comment on this article


- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @BioCodePapers: GATK PathSeq: A customizable computational tool for the discovery and identification of microbial sequences in libraries…
10 Jul 18
RT @xdopazo: Still some vacancies in the GATK workshop in Seville https://t.co/Wmh8HeqmbY do not miss it! @gatk_dev @ClinicalBioinfo @FProg…
9 Jul 18
Holiday notice: The #GATK forum is on break today as we celebrate US Independence Day. Barring any alien invasion o… https://t.co/IyPKilBhru
4 Jul 18
@StevenNHart @delagoya Thanks for the suggestion, will look into this.
27 Jun 18
@delagoya We could definitely consider that assuming there’s a good way to manage this cleanly. Would love to discu… https://t.co/gNsUNYEXNl
26 Jun 18

- Our favorite tweets from others

Davide Sampietro presenting our work on an #FPGA implementation of the #pairhmm step of the @gatk_dev pipeline by… https://t.co/LU2m4QOtUy
11 Jul 18
@delagoya @gatk_dev Might want to try the builder design pattern for docker. https://t.co/v43xc3Ut0j
26 Jun 18
.@chapmanb shows #bcbio validation graphs: include sensitivity and precision. Compare different tool versions again… https://t.co/OvDXmw6p8x
26 Jun 18
@gatk_dev That’s tough... Thank you so much for maintaining GATK services!
24 Jun 18

See more of our favorite tweets...