Adapting a proven tool to liquid biopsy studies

By Mark Fleharty & Madeleine Duran

Accurate detection of somatic events from liquid biopsies has the potential to revolutionize precision medicine in cancer. But identifying variants at the low allele fractions in blood requires higher base qualities than are typically reported by the Illumina platform. We are happy to unveil a pipeline that overcomes this challenge with an improved Mutect2 and a custom lab process that uses duplex consensus reads to reduce false-negatives. Our pipeline is able to call SNPs from liquid biopsies with 90% sensitivity for somatic SNPs present at a 1% allele fraction - while calling no more than one false positive per megabase, welcome news for patients who today must endure invasive biopsies to detect and track cancer.

Three considerations for the viable liquid biopsy

  • High quality bases are absolutely necessary

    Many artifacts are amplified when sequencing at low allele fractions, meaning it can be difficult to distinguish between biological variants and artifact.

  • Correcting for PCR errors using duplex reads helps reduce false positive rates

    By requiring observations from both strands of the original double stranded molecule, we significantly reduce the effects of sequencing PCR error.

  • High depths are critical

    Duplex depths around 800x are necessary to make reliable calls at 1% allele fraction. Generating such high depths increases the occurrence of errors that lead to false positives. Depending on the efficiency of duplex recovery, the actual amount of sequencing could be 10-20 times more sequencing.

Meeting the challenges with double strand sequencing

Duplex sequencing produces reads with considerably fewer PCR and sequencer errors, enabling calling of low allele fraction variants with low false positive rates.Our liquid biopsies take advantage of this with a custom lab process that incorporates 6bp duplex unique molecular indices (UMI). We benchmarked this pipeline by calling SNPs in a 402 gene panel with 2Mb of target territory. The UMIs increase the available depth of reads and, more importantly, reduce sequencer and PCR error by utilizing consensus-called reads. We use fgbio for calling duplex consensus reads and GATK4.1 Mutect2 for variant calling. Our pipeline requires that we make an observation of both strands to form a duplex consensus read.

Figure 1. (a) Sensitivities for allele fractions 1% and above are > 90%. (b) Two false positives are detected using normal-normal analysis.

New Data Type, New Error Modes

When we apply BQSR we frequently find quality scores on the order of q55, an estimate of one error in 300,000 base calls. Qualities like this mean we can make high-confidence variant calls with as few as two reads! Unfortunately, however, with new data types also come new error modes. Read on for more details on these errors, as well as how Mutect2 tackles each one.

Overcoming PCR error

Although PCR error is present in all small-panel sequencing, it’s a more serious source of errors at the allele fractions in liquid biopsy studies. Mutect2 now filters such artifacts with the addition of a modified strand bias filter that requires observing at least one alt supporting read in the positive and negative directions.

Figure caption: This IGV screenshot shows a likely PCR artifact: five reads with the alt, but at duplex read depths of about 700x. A modified strand bias filter in Mutect2 removes such alts if they are not observed in both directions.

Reducing false positives by harnessing N base calls

N base calls, where some of the duplicate raw reads disagree with each other at a particular loci, are common in duplex consensus reads.

From time to time these calls look reasonable, and can skew results (see Figure 2). Observing known false positives revealed a large number of Ns compared to the number of alt-reads in such cases. We found the presence of large numbers of Ns relative to the number of alt-reads is a good indicator of a false positive.

Figure 2: IGV screenshot of an apparent variant with duplex evidence in both strand orientations. We know this particular variant is not real because it was not in our truth dataset.

Benchmarking the difference

We benchmarked this technology using pooled sample analysis to simulate somatic variants from a tumor-normal analysis. These pooled samples were spiked in at 5%, 2.5% and 1%, as an independent measure of false positive rate (FPR). The normal-normal replicates were taken from the biological source material. Sensitivity for events at ~1% observed allele fraction (2.5% spike-in) exceeds 90% given samples with mean target coverage of 800x duplex coverage with FPR < 1/Mb, our measured false positive rate for the data shown below is 2 FP over 8 Mb of territory.

In Conclusion

We are excited to unveil this pipeline for delivering SNPs from a 402 gene liquid biopsy assay. Newly added filters in Mutect2 enable us to take advantage of consensus calling to increase the sensitivity of the assay and enable a path towards a future where patients will be able to replace painful and dangerous biopsies with simple blood draws to track and treat their disease. In the coming weeks, look for the new pipeline as an official GATK workflow published by the Broad Institute's Data Sciences Platform in this github repo.

Return to top

Fri 29 Mar 2019

sabaferdous on 29 Mar 2019

Brilliant... That's what needed .... :) Very excited to try this on our liquid biopsies (cfDNA) project data....

kokyriakidis on 29 Mar 2019

Hi! Could I have access to the custom lab process you work on to produce the right data? I want to work on liquid biopsy but I am struggling to find a good protocol

Patrick_Turko on 29 Mar 2019

Hello, I'd also like access to the lab protocols. I'm working wth a team developing a liquid biopsy screen at a major hospital. Can you please link a paper / preprint, or perhaps contact me privately if the methods are not yet public? Thanks.

Alexandra_Zhayvoron on 29 Mar 2019

Hi! Looking forward to try out this pipeline on our data. I havn't found it on GATK Repo, are you still planning to publish it?

yingchen69 on 29 Mar 2019

Hi! When is this going to be available? Best, Sean

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19
New crop of GATK workshop videos now available on YouTube! Updated for the GATK4/2019 version of the Best Practices…
25 Jul 19
Don't miss this #GATK workshop -- we've got a great crew lined up and the location isn't half bad either :)
23 Jul 19
@Brunods1001 It’s been updated to use GATK4, which addresses the invalid bam output issue that affected the GATK3 v…
11 Jul 19
Wrapping up the #GATK workshop in Cambridge, UK -- it's been a blast. Great group of participants and fantastic hos…
11 Jul 19

- Our favorite tweets from others

In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no…
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying…
18 Jul 19
@RareSeas first attempt at teaching the GATK course, do I look puzzled up there?
11 Jul 19
Can you spot CDGP PhD student, Dr. Alice Denyer, brushing up on the latest bioinformatics tools from @gatk_dev? The…
10 Jul 19
GATK workshop materials available online! Learn it in your own time with @ProjectJupyter notebooks. ^MT
8 Jul 19

See more of our favorite tweets...