GATK 4.1.2.0 HaplotypeCaller reports nonsense GT/AD when --min-base-quality-score parameter changes
open | Created 2019-07-16 | Last updated 2019-09-06| Posted by freeseek | See in Github


HaplotypeCaller assembly bug


I have managed to generate a minimal bam file that reproduces the issue.

First of all, you have to download the mini input.bam file from this dropbox link: https://www.dropbox.com/sh/xae79hanumpireu/AABKo1l4Y-z5G5YLBqSpylRva?dl=0

Then the following code will reproduce the issue:

wget https://github.com/broadinstitute/picard/releases/download/2.19.0/picard.jar

wget https://github.com/broadinstitute/gatk/releases/download/4.1.2.0/gatk-4.1.2.0.zip
unzip gatk-4.1.2.0.zip

wget -O- ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | \
  gzip -d > GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

samtools faidx GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

java -jar picard.jar \
  CreateSequenceDictionary \
  R=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
  O=GCA_000001405.15_GRCh38_no_alt_analysis_set.dict

(echo "##fileformat=VCFv4.2"; \
echo "##contig=<ID=chrX,length=156040895>"; \
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO"; \
echo -e "chr1\t97329945\t.\tT\tA\t.\t.\t."; \
echo -e "chr1\t97329967\t.\tC\tT\t.\t.\t.") | bgzip > input.vcf.gz && \
tabix -f input.vcf.gz

for score in 11 12; do
  gatk-4.1.2.0/gatk HaplotypeCaller \
    -R GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
    -I input.bam \
    -O output.$score.vcf.gz \
    --genotyping-mode GENOTYPE_GIVEN_ALLELES \
    --alleles input.vcf.gz \
    -L chr1:97329945-97329967 \
    --min-base-quality-score $score && \
  bcftools query \
    -f "[%CHROM\t%POS\t%REF\t%ALT\t%GT\t%AD\n]" \
    output.$score.vcf.gz \
    -r chr1:97329945-97329967
done

When the parameter --min-base-quality-score 11 is used, the GT/AD output is this:

chr1	97329945	T	A	1/1	0,35
chr1	97329967	C	T	1/1	0,33

When the parameter --min-base-quality-score 12 is used, the GT/AD output is this:

chr1	97329945	T	A	0/1	9,10
chr1	97329967	C	T	0/1	6,11

The first output is the output that makes sense. When --min-base-quality-score 12 is used, it is as if HaplotypeCaller invents some reference allele reads and then uses them to genotype the variant as heterozygous. I have seen issues like this all over the genome.

Interestingly, if I run the same code on my own Ubuntu laptop, I get an error instead:

Exception in thread "main" java.lang.IncompatibleClassChangeError: Inconsistent constant pool data in classfile for class org/broadinstitute/hellbender/transformers/ReadTransformer. Method lambda$identity$d67512bf$1(Lorg/broadinstitute/hellbender/utils/read/GATKRead;)Lorg/broadinstitute/hellbender/utils/read/GATKRead; at index 65 is CONSTANT_MethodRef and should be CONSTANT_InterfaceMethodRef
	at org.broadinstitute.hellbender.transformers.ReadTransformer.identity(ReadTransformer.java:30)
	at org.broadinstitute.hellbender.engine.GATKTool.makePreReadFilterTransformer(GATKTool.java:345)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:276)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
	at org.broadinstitute.hellbender.Main.main(Main.java:291)

Return to top