Variants with spanning deletions lack phase set information
open | Created 2019-02-07 | Last updated 2019-03-22| Posted by tfenne | See in Github


HaplotypeCaller bug


Bug Report

Affected tool(s) or class(es)

HaplotypeCaller 4.1 with -ERC GVCF

Affected version(s)

  • Latest public release version [version?]
  • Latest master branch as of [date of test?]

Description

It would appear that variants covered by a spanning deletion are not output with phasing information even when surrounded by phased variants on either side. Since one of the alleles is covered by an upstream deletion phase is known, but the genotype itself is not phased and no phase set is attached. The following is a cut-down example from a gVCF:

chr6  51618169  .  GT  G,<NON_REF>    948.60   .  DP=94         GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB  0|1:32,39,0:71:3,4,0:29,35,0:99:0|1:51618169_GT_G:956,0,808,1054,926,1980:51618169:3,29,4,35
chr6  51618170  .  T   *,G,<NON_REF>  776.01   .  DP=92         GT:AD:DP:F1R2:F2R1:GQ:PL:SB             1/2:2,39,30,0:71:1,4,2,0:1,35,28,0:99:3533,786,723,1141,0,956,2837,916,1206,2757:1,1,6,63
chr6  51618171  .  G   <NON_REF>      .        .  END=51618173  GT:DP:GQ:MIN_DP:PL                      0/0:90:99:90:0,120,1800
chr6  51618174  .  A   G,<NON_REF>    1001.60  .  DP=89         GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB  0|1:33,41,0:74:3,4,0:30,37,0:99:0|1:51618169_GT_G:1009,0,803,1108,926,2034:51618169:3,30,4,37

You can see that the SNP at 51618170 is flanked by phased variants at 51618169 and 51618174, but is output with unphased genotype and no PS (or PID/PGT).

I'm not entirely sure if this is on purpose for some reason I don't understand, or simply an edge case in the phasing code that's handled incorrectly.

Steps to reproduce

Run HC on reads with three variants, starting with a deletion, a variant spanned by the deletion and a variant just beyond the deletion. FWIW I've requested permission to share an example case from real data and am awaiting an answer.

Expected behavior

I think the spanned variant should be output with phasing information, e.g. in the above case I would expect (abbreviated):

chr6  51618169  .  GT  G,<NON_REF>    ...  GT:DP:PS  0|1:71:51618169
chr6  51618170  .  T   *,G,<NON_REF>  ...  GT:DP:PS  2|1:71:51618169
chr6  51618171  .  G   <NON_REF>      ...  GT:DP     0/0:90
chr6  51618174  .  A   G,<NON_REF>    ...  GT:DP:PS  0|1:74:51618169

Actual behavior

The actual output (abbreviated):

chr6  51618169  .  GT  G,<NON_REF>    ...  GT:DP:PS  0|1:71:51618169
chr6  51618170  .  T   *,G,<NON_REF>  ...  GT:DP     1/2:71:51618169
chr6  51618171  .  G   <NON_REF>      ...  GT:DP     0/0:90
chr6  51618174  .  A   G,<NON_REF>    ...  GT:DP:PS  0|1:74:51618169

Return to top