Ambiguous results with HaplotypeCaller output modes EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES
open | Created 2019-07-29 | Last updated 2019-08-21| Posted by bhanugandham | See in Github


HaplotypeCaller Vanilla bug


Please find below the user report:

Dear GATK-Team,

First of all, thank you for your great support and constant development of GATK! I was very pleased to see that the output mode options EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES were reintroduced recently in HaplotypeCaller (GitHub Issue 2865, I am not allowed to post links yet). However, I think these options or not yet working.

When I am using the --output-mode options EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES, only some seemingly random monomorphic sites are reported with site quality QUAL="Infinity", instead of all sites (monomorphic sites and variants). This, despite many, many high depth covered monomorphic sites in my dataset. I get the same buggy result for single-sample calling and for multi-sample calling with HaplotypeCaller. I am using GATK version v4.1.2.0, java 1.8.0_101-b13.

Here is the commands I used to get all confident sites for a two samples:
gatk --java-options "-Xmx10G -Xms1G" HaplotypeCaller --tmp-dir ./temp -R ref.fasta -I 22363.subset.bam -I 22365.subset.bam -L scaffold_7232:1-1000 -O test.allconf.vcf.gz --pcr-indel-model NONE --output-mode EMIT_ALL_CONFIDENT_SITES

The resulting VCF file look like this (barring all except the last header lines):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  22363   22365
scaffold_7232   49      .       C       T       539.88  .       AC=1;AF=0.250;AN=4;BaseQRankSum=-0.334;DP=132;ExcessHet=3.0103;FS=0.857;MLEAC=1;MLEAF=0.250;MQ=42.22;MQRankSum=-0.781;QD=5.57;ReadPosRankSum=-1.285;SOR=0.578   GT:AD:DP:GQ:PL  0/1:74,23:97:99:548,0,2548      0/0:33,0:33:99:0,99,1296
scaffold_7232   241     .       A       .       Infinity        .       AN=4;DP=332;MQ=43.51    GT:AD:DP        0/0:232:232     0/0:100:100
scaffold_7232   276     .       A       C       1514.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=2.611;DP=318;ExcessHet=4.7712;FS=2.430;MLEAC=2;MLEAF=0.500;MQ=42.98;MQRankSum=-0.010;QD=4.76;ReadPosRankSum=1.388;SOR=0.544     GT:AD:DP:GQ:PL  0/1:172,41:213:99:1039,0,5920   0/1:85,20:105:99:485,0,2913
scaffold_7232   333     .       T       .       Infinity        .       AN=4;DP=311;MQ=41.99    GT:AD:DP        0/0:202:202     0/0:109:109
scaffold_7232   343     .       C       T       4427.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=0.018;DP=319;ExcessHet=4.7712;FS=5.050;MLEAC=2;MLEAF=0.500;MQ=41.79;MQRankSum=-1.223;QD=13.97;ReadPosRankSum=-0.937;SOR=0.901   GT:AD:DP:GQ:PL  0/1:84,120:204:99:3812,0,2721   0/1:88,25:113:99:625,0,2792
scaffold_7232   453     .       G       A       5311.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=2.043;DP=303;ExcessHet=4.7712;FS=7.804;MLEAC=2;MLEAF=0.500;MQ=40.00;MQRankSum=-0.946;QD=17.65;ReadPosRankSum=0.174;SOR=0.761    GT:AD:DP:GQ:PL  0/1:91,98:189:99:3866,0,5942    0/1:72,40:112:99:1455,0,3886
scaffold_7232   460     .       T       C       5174.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=1.691;DP=300;ExcessHet=4.7712;FS=7.886;MLEAC=2;MLEAF=0.500;MQ=39.96;MQRankSum=-1.060;QD=17.36;ReadPosRankSum=-0.211;SOR=0.674   GT:AD:DP:GQ:PL  0/1:90,95:185:99:3727,0,5928    0/1:73,40:113:99:1457,0,3901
scaffold_7232   475     .       G       .       Infinity        .       AN=4;DP=306;MQ=39.88    GT:AD:DP        0/0:194:194     0/0:112:112
scaffold_7232   476     .       A       .       Infinity        .       AN=4;DP=305;MQ=39.90    GT:AD:DP        0/0:193:193     0/0:112:112
scaffold_7232   485     .       G       A       4907.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=2.955;DP=313;ExcessHet=4.7712;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=39.98;MQRankSum=-0.180;QD=15.73;ReadPosRankSum=-1.691;SOR=0.694   GT:AD:DP:GQ:PL  0/1:107,95:202:99:2997,0,3343   0/1:51,59:110:99:1920,0,1412
scaffold_7232   589     .       T       TAACTTAGGTTGTCTTAAAGAG  1148.91 .       AC=1;AF=0.250;AN=4;BaseQRankSum=2.466;DP=286;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.250;MQ=40.83;MQRankSum=-3.627;QD=6.64;ReadPosRankSum=-2.513;SOR=0.661    GT:AD:DP:GQ:PL  0/1:139,34:173:99:1157,0,6676   0/0:105,0:105:99:0,317,5193
scaffold_7232   592     .       A       C       2229.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=2.278;DP=278;ExcessHet=4.7712;FS=6.428;MLEAC=2;MLEAF=0.500;MQ=40.93;MQRankSum=0.022;QD=8.02;ReadPosRankSum=-0.480;SOR=0.964     GT:AD:DP:GQ:PL  0/1:130,43:173:99:1242,0,4637   0/1:71,34:105:99:997,0,2331
scaffold_7232   682     .       C       CT      1404.91 .       AC=1;AF=0.250;AN=4;BaseQRankSum=-3.692;DP=291;ExcessHet=3.0103;FS=7.626;MLEAC=1;MLEAF=0.250;MQ=43.48;MQRankSum=1.029;QD=13.01;ReadPosRankSum=-0.245;SOR=0.905   GT:AD:DP:GQ:PL  0/0:161,0:161:99:0,488,7902     0/1:68,40:108:99:1413,0,3038
scaffold_7232   696     .       G       A       1337.88 .       AC=1;AF=0.250;AN=4;BaseQRankSum=-2.273;DP=287;ExcessHet=3.0103;FS=4.200;MLEAC=1;MLEAF=0.250;MQ=43.62;MQRankSum=0.599;QD=12.62;ReadPosRankSum=1.324;SOR=0.730    GT:AD:DP:GQ:PL  0/0:178,0:178:99:0,539,8590     0/1:69,37:106:99:1346,0,3079
scaffold_7232   874     .       A       .       Infinity        .       AN=4;DP=284;MQ=42.99    GT:AD:DP        0/0:191:191     0/0:93:93
scaffold_7232   884     .       C       A       1741.46 .       AC=2;AF=0.500;AN=4;BaseQRankSum=-1.238;DP=272;ExcessHet=4.7712;FS=3.370;MLEAC=2;MLEAF=0.500;MQ=42.96;MQRankSum=-0.331;QD=7.35;ReadPosRankSum=-1.538;SOR=0.486   GT:AD:DP:GQ:PL  0/1:113,43:156:99:1215,0,3899   0/1:59,22:81:99:536,0,2058
scaffold_7232   911     .       C       .       Infinity        .       AN=4;DP=287;MQ=42.32    GT:AD:DP        0/0:197:197     0/0:90:90

Note that the depth across the whole 1 kb scaffold is very high in each sample and that both mapping and base qualities are nearly all very high, so I don't understand why other monomorphic sites are not called as confident. When I repeat the same with the --output-mode option EMIT_ALL_SITES, I get the exact same sites called in the VCF file, with QUAL=Infinity. Also, when I call single individuals instead, I get the same output with Infinity QUAL fields and only a handfull of the almost 1000 covered sites.

I am happy to provide the files above for replicating the issues in a different context, please let me know how I can provide them.

Thank you and best wishes,
David

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/24270/haplotypecaller-output-modes-emit-all-confident-sites-and-emit-all-sites-not-working/p1


Return to top