Bugs & Feature Requests
List of known bugs, limitations and requested enhancements


These are all the issues tagged "bug" in the GATK4 development repository. If you have reported a bug and cannot find it in this list, let us know in the thread where you reported it. Note that we are no longer listing or processing issues in older versions of GATK (3.x and prior) unless there is substantial negative impact.





Created 2017-09-20 | Updated 2017-09-20 | bug question SV


Go the following:

Current git hash does not match GATK git hash. Run anyway?yes
error: malformed object name 1
usage: git branch [<options>] [-r | -a] [--merged | --no-merged]
   or: git branch [<options>] [-l] [-f] <branch-name> [<start-point>]
   or: git branch [<options>] [-r] (-d | -D) <branch-name>...
   or: git branch [<options>] (-m | -M) [<old-branch>] <new-branch>
   or: git branch [<options>] [-r | -a] [--points-at]
...

Not sure the first error message about the "hashes" means perhaps that is the cause though.

Assigned to @TedBrookings because I think this is you beast.



Created 2017-08-29 | Updated 2017-08-29 | bug


GATK4 Hapltotype Caller does not generate a vcf.idx, even though the default for createOutputVariantIndex is true. Explicitly setting it to true still does not generate a VCF index.



Created 2017-08-29 | Updated 2017-09-20 | bug


System

  • GATK4 a1eee32
  • Mac OS X 10.11.6 x86_64
  • machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT
  • machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI
  • java version "1.8.0_60"
  • Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
  • Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Error

When updating a downstream project with the latest master and running the integration/unit tests with gradle, it generates the following error.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00000001236427f4, pid=4010, tid=20739
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libgkl_compression7227189416687158431.dylib+0x17f4]  Java_com_intel_gkl_compression_IntelDeflater_resetNative+0x164
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/daniel/workspaces/ReadTools/hs_err_pid4010.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Error report file: hs_err_pid4010.log.txt

Forcing other GKL versions

  • 0.5.2 (working)
  • 0.5.3 (failing)
  • 0.5.5 (failing)
  • 0.5.6 (failing)
  • 0.5.7 (failing)
  • 0.5.8 (failing)


Created 2017-08-25 | Updated 2017-08-28 | bug SV


Fails with an ArrayIndexOutOfBoundException.

The fix is trivial.

This might affect other similar methods in that class



Created 2017-08-23 | Updated 2017-09-19 | bug Engine performance PRIORITY_HIGH forum


User has reported longer runtimes in GATK4 beta3 release compared to GATK4 beta 2 release. It sounds like this is not expected. Her runtimes are below. The first post in the forum thread has her original report.

Tool 4.beta.2 4.beta.3
BaseRecalibrator 1m 3s 3m 3s
ApplyBQSR (scattered) 4m 48s 11m 51s
HaplotypeCaller (scattered) 23m 42s 29m 7s
GenotypeGVCFs (scattered) 4m 6s 9m 28s
VariantRecalibrator (for SNPs) 4m 7s 6m 38s
VariantRecalibrator (for INDELs) 2m 7s 4m 8s
ApplyVQSR (for SNPs) 37s 2m 36s
ApplyVQSR (for INDELs) 39s 2m 35s

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/comment/41669#Comment_41669



Created 2017-08-22 | Updated 2017-08-24 | bug NIO


I ran 408 invocations of an nio using command for BQSR using gatk4 and got 2 failures that looked pretty similar. Is there something I might be doing wrong? The two failures were also on different shards.

I cant remember exactly when I built this jar but it was after this commit - 4df1d16 less than a week ago. If you need any more info let me know, thanks.

Using GATK jar /usr/gitc/gatk4/gatk-package-4.beta.3-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -XX:+PrintFlagsFinal -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc_log.log -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xms3000m -jar /usr/gitc/gatk4/gatk-package-4.beta.3-local.jar ApplyBQSR --createOutputBamMD5 --addOutputSAMProgramRecord -R /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://broad-gotc-dev-cromwell-execution/PairedEndSingleSampleWorkflow/4a87f12f-014e-438a-9a10-260c70bf3584/call-SortSampleBam/attempt-4/NA12878.aligned.duplicate_marked.sorted.bam --useOriginalQualities -O NA12878.aligned.duplicates_marked.recalibrated.bam -bqsr /cromwell_root/broad-gotc-dev-cromwell-execution/PairedEndSingleSampleWorkflow/4a87f12f-014e-438a-9a10-260c70bf3584/call-GatherBqsrReports/NA12878.recal_data.csv -SQQ 10 -SQQ 20 -SQQ 30 -L chr5:1+
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.Ni4zSL
[August 22, 2017 2:52:59 PM UTC] ApplyBQSR  --output NA12878.aligned.duplicates_marked.recalibrated.bam --bqsr_recal_file /cromwell_root/broad-gotc-dev-cromwell-execution/PairedEndSingleSampleWorkflow/4a87f12f-014e-438a-9a10-260c70bf3584/call-GatherBqsrReports/NA12878.recal_data.csv --useOriginalQualities true --static_quantized_quals 10 --static_quantized_quals 20 --static_quantized_quals 30 --intervals chr5:1+ --input gs://broad-gotc-dev-cromwell-execution/PairedEndSingleSampleWorkflow/4a87f12f-014e-438a-9a10-260c70bf3584/call-SortSampleBam/attempt-4/NA12878.aligned.duplicate_marked.sorted.bam --reference /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --createOutputBamMD5 true --addOutputSAMProgramRecord true  --preserve_qscores_less_than 6 --quantize_quals 0 --round_down_quantized false --emit_original_quals false --globalQScorePrior -1.0 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[August 22, 2017 2:52:59 PM UTC] Executing as root@fb0704c97258 on Linux 4.9.0-0.bpo.3-amd64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Version: 4.beta.3-41-g9d05dd8-SNAPSHOT
[August 22, 2017 3:06:11 PM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 13.19 minutes.
Runtime.totalMemory()=3040870400
htsjdk.samtools.FileTruncatedException: Premature end of file: /PairedEndSingleSampleWorkflow/4a87f12f-014e-438a-9a10-260c70bf3584/call-SortSampleBam/attempt-4/NA12878.aligned.duplicate_marked.sorted.bam
	at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
	at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
	at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
	at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
	at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
	at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
	at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366)
	at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:209)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829)
	at htsjdk.samtools.BAMFileReader$BAMFileIndexIterator.getNextRecord(BAMFileReader.java:981)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
	at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
	at htsjdk.samtools.BAMFileReader$BAMQueryFilteringIterator.advance(BAMFileReader.java:1034)
	at htsjdk.samtools.BAMFileReader$BAMQueryFilteringIterator.next(BAMFileReader.java:1024)
	at htsjdk.samtools.BAMFileReader$BAMQueryFilteringIterator.next(BAMFileReader.java:988)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
	at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
	at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextRecord(SamReaderQueryingIterator.java:114)
	at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.next(SamReaderQueryingIterator.java:151)
	at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.next(SamReaderQueryingIterator.java:29)
	at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:29)
	at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:15)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)


Created 2017-08-22 | Updated 2017-08-22 | bug SV


Seems there will be nil change we ever need that any more.



Created 2017-08-17 | Updated 2017-09-05 | bug SV


It seems that it fails to hard-clip the bases array if the cigar has hard-clips.

java.lang.IllegalStateException: CIGAR covers 4 bases but the sequence is 8 read bases 

	at org.broadinstitute.hellbender.utils.bwa.BwaMemAlignmentUtils.applyAlignment(BwaMemAlignmentUtils.java:92)
	at org.broadinstitute.hellbender.tools.spark.sv.discovery.AlignmentIntervalUnitTest.testConstructionFromGATKRead(AlignmentIntervalUnitTest.java:133)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:108)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:661)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:869)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1193)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:126)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
	at org.testng.TestRunner.privateRun(TestRunner.java:744)
	at org.testng.TestRunner.run(TestRunner.java:602)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:380)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:375)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
	at org.testng.SuiteRunner.run(SuiteRunner.java:289)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1301)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1226)
	at org.testng.TestNG.runSuites(TestNG.java:1144)
	at org.testng.TestNG.run(TestNG.java:1115)



Created 2017-08-14 | Updated 2017-08-18 | bug


The TargetCoverageSexGenotyper can sometimes fail to calculate genotype LLs for low-quality (low-coverage) samples. The exception below can be reproduced trivially: just pass it a raw coverage matrix containing a single sample that has all only zeroes in its counts column. In practice, I've recently encountered several samples that somehow passed upstream QC but failed to genotype due to extensive dropout. This causes program termination and no genotypes are written for any samples in the input. Could we instead warn and print an NA genotype for such samples?

Runtime.totalMemory()=869269504
org.apache.commons.math3.exception.NotStrictlyPositiveException: mean (0)
	at org.apache.commons.math3.distribution.PoissonDistribution.<init>(PoissonDistribution.java:126)
	at org.apache.commons.math3.distribution.PoissonDistribution.<init>(PoissonDistribution.java:103)
	at org.apache.commons.math3.distribution.PoissonDistribution.<init>(PoissonDistribution.java:80)
	at org.broadinstitute.hellbender.tools.exome.sexgenotyper.TargetCoverageSexGenotypeCalculator.lambda$calculateSexGenotypeData$20(TargetCoverageSexGenotypeCalculator.java:414)```


Created 2017-08-10 | Updated 2017-08-25 | bug GenomicsDB PRIORITY_HIGH


We have received two independent reports of this problem. The latest by Brad Chapman, posted here includes a reproducible test case. Files are here: https://s3.amazonaws.com/chapmanb/testcases/gatk4_genotypegvcfs.tar.gz

Steps to reproduce:

Running an import, and then a GenotypeGVCFs and SelectVariants to test export:

gatk-launch GenomicsDBImport --genomicsDBWorkspace test_genomicsdb -L chrM:1-1000 --variant Test1.vcf.gz --variant Test2.vcf.gz
gatk-launch GenotypeGVCFs --variant gendb://test_genomicsdb -R hg19.fa --output joint.vcf.gz -L chrM:1-1000
gatk-launch SelectVariants -R hg19.fa -V gendb://test_genomicsdb -O extract.vcf.gz

The input GVCFs have calls within the chrM region being tested:

Test1.vcf.gz:chrM       150     .       T       C,<NON_REF>     49406.77        .       DP=1084;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;MQ0=0;RAW_MQ=3902400.00    GT:AD:DP:GQ:MCL:MMQ:PGT:PID:PL:SB       1/1:0,1081,0:1081:99:0,0,0:0,60,0:0|1:150_T_C:49435,3362,0,49435,3362,49435:0,0,483,598

But the GenotypeGVCFs output VCF is empty (file only contains the hdear; no records). Running SelectVariants shows the sample but with empty genotype calls:

extract.vcf.gz:chrM     150     .       T       C,<NON_REF>     .       .       DP=2168;ExcessHet=3.01;MQ0=0;RAW_MQ=7804800.00        GT:AD:DP:GQ:MCL ./.:0,1081,0:1081:99:0.00,0.00  ./.:0,1081,0:1081:99:0.00,0.00


Created 2017-08-04 | Updated 2017-08-07 | bug GenomicsDB PRIORITY_HIGH


We've seen 2 instances of this error now:

java.lang.ArrayIndexOutOfBoundsException: 15
	at htsjdk.variant.bcf2.BCF2Utils.decodeType(BCF2Utils.java:122)
	at htsjdk.variant.bcf2.BCF2Decoder.decodeInt(BCF2Decoder.java:220)
	at htsjdk.variant.bcf2.BCF2Decoder.decodeNumberOfElements(BCF2Decoder.java:205)
	at htsjdk.variant.bcf2.BCF2Decoder.decodeTypedValue(BCF2Decoder.java:129)
	at htsjdk.variant.bcf2.BCF2Decoder.decodeTypedValue(BCF2Decoder.java:125)
	at htsjdk.variant.bcf2.BCF2LazyGenotypesDecoder.parse(BCF2LazyGenotypesDecoder.java:75)
	at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
	at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
	at htsjdk.variant.variantcontext.GenotypesContext.getMaxPloidy(GenotypesContext.java:431)
	at htsjdk.variant.variantcontext.VariantContext.getMaxPloidy(VariantContext.java:785)
	at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.mergeRefConfidenceGenotypes(ReferenceConfidenceVariantContextMerger.java:405)
	at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:92)
	at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:212)
	at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:116)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:173)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:192)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)

It seems to happen in rare cases when reading from a genomicsDB instance with GenotypeGVCFs. The array that is going out of bounds is one that maps ID values to BCF2Types. The valid values for BCF2Types are 0,1,2,3,5,7 as defined in the bcf spec section 6.3.3 It seems likely that this is either a bug in htsjdk, in htsjlib, or in GenomicsDB.



Created 2017-07-26 | Updated 2017-07-27 | bug Picard


I think CollectSequencingArtifactMetrics requires a reference, but the documentation lists it as optional. Without reference, I get an instant crash. I also get NullPointer exceptions when proving --DB_SNP.

Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -jar /home/riestma1/gatk-4.beta.3/gatk-package-4.beta.3-local.jar CollectSequencingArtifactMetrics --input sample1.bam --output sample1_pre_adapter_detail_metrics
 ...
18:06:26.220 INFO  CollectSequencingArtifactMetrics - Shutting down engine
[July 26, 2017 6:06:26 PM EDT] org.broadinstitute.hellbender.tools.picard.analysis.artifacts.CollectSequencingArtifactMetrics done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1517289472
java.lang.NullPointerException
	at org.broadinstitute.hellbender.tools.picard.analysis.artifacts.CollectSequencingArtifactMetrics.acceptRead(CollectSequencingArtifactMetrics.java:214)
	at org.broadinstitute.hellbender.tools.picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:114)
	at org.broadinstitute.hellbender.tools.picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:53)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:116)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:173)
	at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgram.instanceMain(PicardCommandLineProgram.java:62)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)


Created 2017-07-25 | Updated 2017-08-24 | bug hadoop-bam Spark


spark things are falling over when they encounter the new hg38 contig names that include : and -

We need to update hadoop bam to understand these since they are an unfortunate fact of life.

[ameyner2@node2c15 read_counts]$ /exports/igmm/eddie/bioinfsvice/ameynert/software/gatk-4.beta.2/gatk-launch SparkGenomeReadCounts -I ../../bcbio/final/WW00247b/WW00247b-ready.bam -o WW00247b.prop_cov --reference /exports/igmm/eddie/bioinfsvice/ameynert/bcbio/data/genomes/Hsapiens/hg38/seq/hg38.fa
Using GATK jar /gpfs/igmmfs01/eddie/bioinfsvice/ameynert/software/gatk-4.beta.2/gatk-package-4.beta.2-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -jar /gpfs/igmmfs01/eddie/bioinfsvice/ameynert/software/gatk-4.beta.2/gatk-package-4.beta.2-local.jar SparkGenomeReadCounts -I ../../bcbio/final/WW00247b/WW00247b-ready.bam -o WW00247b.prop_cov --reference /exports/igmm/eddie/bioinfsvice/ameynert/bcbio/data/genomes/Hsapiens/hg38/seq/hg38.fa
16:51:57.743 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/igmmfs01/eddie/bioinfsvice/ameynert/software/gatk-4.beta.2/gatk-package-4.beta.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
[21 July 2017 16:51:57 BST] SparkGenomeReadCounts  --outputFile WW00247b.prop_cov --reference /exports/igmm/eddie/bioinfsvice/ameynert/bcbio/data/genomes/Hsapiens/hg38/seq/hg38.fa --input ../../bcbio/final/WW00247b/WW00247b-ready.bam  --keepXYMT false --binsize 10000 --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --bamPartitionSize 0 --disableSequenceDictionaryValidation false --shardedOutput false --numReducers 0 --sparkMaster local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false
[21 July 2017 16:51:57 BST] Executing as ameyner2@node2c15.ecdf.ed.ac.uk on Linux 3.10.0-327.36.3.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b13; Version: 4.beta.2
16:51:57.770 INFO  SparkGenomeReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 1
16:51:57.770 INFO  SparkGenomeReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:51:57.770 INFO  SparkGenomeReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:51:57.770 INFO  SparkGenomeReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:51:57.770 INFO  SparkGenomeReadCounts - Deflater: IntelDeflater
16:51:57.770 INFO  SparkGenomeReadCounts - Inflater: IntelInflater
16:51:57.770 INFO  SparkGenomeReadCounts - Initializing engine
16:51:57.770 INFO  SparkGenomeReadCounts - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/07/21 16:51:58 INFO SparkContext: Running Spark version 2.0.2
17/07/21 16:51:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/21 16:51:58 INFO SecurityManager: Changing view acls to: ameyner2
17/07/21 16:51:58 INFO SecurityManager: Changing modify acls to: ameyner2
17/07/21 16:51:58 INFO SecurityManager: Changing view acls groups to: 
17/07/21 16:51:58 INFO SecurityManager: Changing modify acls groups to: 
17/07/21 16:51:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ameyner2); groups with view permissions: Set(); users  with modify permissions: Set(ameyner2); groups with modify permissions: Set()
17/07/21 16:51:58 INFO Utils: Successfully started service 'sparkDriver' on port 43815.
17/07/21 16:51:58 INFO SparkEnv: Registering MapOutputTracker
17/07/21 16:51:58 INFO SparkEnv: Registering BlockManagerMaster
17/07/21 16:51:58 INFO DiskBlockManager: Created local directory at /tmp/ameyner2/blockmgr-d8bbd2bc-8366-4b98-a238-2d51da1689d1
17/07/21 16:51:58 INFO MemoryStore: MemoryStore started with capacity 15.8 GB
17/07/21 16:51:58 INFO SparkEnv: Registering OutputCommitCoordinator
17/07/21 16:51:58 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/07/21 16:51:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.41.105.80:4040
17/07/21 16:51:58 INFO Executor: Starting executor ID driver on host localhost
17/07/21 16:51:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34818.
17/07/21 16:51:58 INFO NettyBlockTransferService: Server created on 192.41.105.80:34818
17/07/21 16:51:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.41.105.80, 34818)
17/07/21 16:51:58 INFO BlockManagerMasterEndpoint: Registering block manager 192.41.105.80:34818 with 15.8 GB RAM, BlockManagerId(driver, 192.41.105.80, 34818)
17/07/21 16:51:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.41.105.80, 34818)
16:51:59.636 INFO  SparkGenomeReadCounts - Dropping non-autosomes, as requested...
16:51:59.672 INFO  SparkGenomeReadCounts - Starting Spark coverage collection...
17/07/21 16:52:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 463.4 KB, free 15.8 GB)
17/07/21 16:52:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 52.9 KB, free 15.8 GB)
17/07/21 16:52:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.41.105.80:34818 (size: 52.9 KB, free: 15.8 GB)
17/07/21 16:52:00 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at ReadsSparkSource.java:109
17/07/21 16:52:00 INFO FileInputFormat: Total input paths to process : 1
17/07/21 16:52:08 INFO SparkUI: Stopped Spark web UI at http://192.41.105.80:4040
17/07/21 16:52:08 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/21 16:52:08 INFO MemoryStore: MemoryStore cleared
17/07/21 16:52:08 INFO BlockManager: BlockManager stopped
17/07/21 16:52:08 INFO BlockManagerMaster: BlockManagerMaster stopped
17/07/21 16:52:08 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/21 16:52:08 INFO SparkContext: Successfully stopped SparkContext
16:52:08.357 INFO  SparkGenomeReadCounts - Shutting down engine
[21 July 2017 16:52:08 BST] org.broadinstitute.hellbender.tools.genome.SparkGenomeReadCounts done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=4171235328
java.lang.NumberFormatException: For input string: "A*01"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at org.seqdoop.hadoop_bam.BAMInputFormat.getIntervals(BAMInputFormat.java:104)
    at org.seqdoop.hadoop_bam.BAMInputFormat.filterByInterval(BAMInputFormat.java:284)
    at org.seqdoop.hadoop_bam.BAMInputFormat.getSplits(BAMInputFormat.java:158)
    at org.seqdoop.hadoop_bam.AnySAMInputFormat.getSplits(AnySAMInputFormat.java:252)
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:121)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
    at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:327)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:372)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:372)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
    at org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:371)
    at org.apache.spark.rdd.RDD$$anonfun$countByValue$1.apply(RDD.scala:1175)
    at org.apache.spark.rdd.RDD$$anonfun$countByValue$1.apply(RDD.scala:1175)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
    at org.apache.spark.rdd.RDD.countByValue(RDD.scala:1174)
    at org.apache.spark.api.java.JavaRDDLike$class.countByValue(JavaRDDLike.scala:487)
    at org.apache.spark.api.java.AbstractJavaRDDLike.countByValue(JavaRDDLike.scala:45)
    at org.broadinstitute.hellbender.tools.genome.SparkGenomeReadCounts.collectReads(SparkGenomeReadCounts.java:162)
    at org.broadinstitute.hellbender.tools.genome.SparkGenomeReadCounts.runTool(SparkGenomeReadCounts.java:241)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:353)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
    at org.broadinstitute.hellbender.Main.main(Main.java:230)
17/07/21 16:52:08 INFO ShutdownHookManager: Shutdown hook called


Created 2017-07-18 | Updated 2017-07-18 | bug Mutect


Using VCFTools to validate:

vcf-validator SAMPLE7T-vs-SAMPLE7N-filtered.vcf

I get a massive amount of messages:

.....snip.......
column SAMPLE7N at 19:49136721 .. Could not validate the float [NaN],FORMAT tag [MPOS] expected different number of values (expected 1, found 2),FORMAT tag [MFRL] expected different number of values (expected 1, found 2),FORMAT tag [MMQ] expected different number of values (expected 1, found 2),FORMAT tag [MCL] expected different number of values (expected 1, found 2),FORMAT tag [MBQ] expected different number of values (expected 1, found 2)
.....snip.......
column SAMPLE7T at 19:45901415 .. FORMAT tag [MBQ] expected different number of values (expected 1, found 2),FORMAT tag [MMQ] expected different number of values (expected 1, found 2),FORMAT tag [MCL] expected different number of values (expected 1, found 2),FORMAT tag [MFRL] expected different number of values (expected 1, found 2),FORMAT tag [MPOS] expected different number of values (expected 1, found 2)
.....snip.......

Sure enough, the header does not match the values for those fields (in the header number="A"), so the validation errors are correct. Not sure what is the deal with FOXOG, but that may not be a big deal.



Created 2017-07-11 | Updated 2017-07-12 | bug


I tried to run VariantRecalibrator using the args echoed from an integration test and found that the resource files weren't listed properly. The command in the test was " --resource known,known=true,prior=10.0:" + getLargeVQSRTestDataDir() + "dbsnp_132_b37.leftAligned.20.1M-10M.vcf" and what came out of the engine was --resource known:/Users/gauthier/workspaces/gatk/src/test/resources/large/VQSR/dbsnp_132_b37.leftAligned.20.1M-10M.vcf, so it lost the known=true and the prior which makes the command line unrunnable. Probably affects #2269 too.

This behavior can be replicated by running any of the VariantRecalibration integration tests and checking the console output.



Created 2017-07-02 | Updated 2017-07-07 | bug NativeLibraries


We are currently plagued with cryptic intermittent failures coming from the BWA and FML bindings in Travis CI. These usually manifest as a simple "exited with code 137" (ie., killed by signal 9) error, but sometimes we get an explicit segfault or out-of-memory error. Examples:

�[31mFAILURE: �[39m�[31mBuild failed with an exception.�[39m
* What went wrong:
Execution failed for task ':test'.
�[33m> �[39mProcess 'Gradle Test Executor 1' finished with non-zero exit value 137
:test[M::bwa_idx_load_from_disk] read 0 ALT contigs
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000715180000, 719847424, 0) failed; error='Cannot allocate memory' (errno=12)

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 719847424 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/travis/build/broadinstitute/gatk/hs_err_pid11513.log
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f27ebfe7d9a, pid=11455, tid=0x00007f27e87e5700
#
# JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libfml.6198146539708364717.jnilib+0xed9a]  rld_itr_init+0x4a
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd2680a350c, pid=11685, tid=0x00007fd2b02bf700
#
# JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libbwa.5694772191018335324.jnilib+0x850c]  bwa_mem2idx+0xcc

The underlying issue in these cases is likely either "out of memory" or, perhaps in the case of the seg faults, "file not found" or "malformed file", but we could greatly improve our ability to interpret Travis failures if we were more careful about checking return values from system calls. Eg., in the function below from the BWA bindings we could check the return values of the mmap() and calloc() calls, and die with an appropriate error message if they fail:

bwaidx_t* jnibwa_openIndex( int fd ) {
    struct stat statBuf;
    if ( fstat(fd, &statBuf) == -1 ) return 0;
    uint8_t* mem = mmap(0, statBuf.st_size, PROT_READ, MAP_SHARED, fd, 0);
    close(fd);
    bwaidx_t* pIdx = calloc(1, sizeof(bwaidx_t));
    bwa_mem2idx(statBuf.st_size, mem, pIdx);
    pIdx->is_shm = 1;
    mem_fmt_fnc = &fmt_BAMish;
    bwa_verbose = 0;
    return pIdx;
}


Created 2017-06-27 | Updated 2017-06-28 | bug Engine HaplotypeCaller Mutect


@sooheelee has some serious concerns about ReadClipper.hardClipAdaptorSequence(), which is called in Mutect2 and HaplotypeCaller via AssemblyBasedCallerUtils.finalizeRegion(). She thinks that the method being used to find the adaptor boundary for clipping purposes is completely bogus!

This is some pretty old code that was also in the GATK3 versions of these tools, so if it's crazy, then we can at least take "comfort" in the fact that it's been like this for a very long time...



Created 2017-06-15 | Updated 2017-06-15 | bug


Somehow Kris managed to generate a VCF with an index that doesn't have a properly sorted sequence dictionary: gs://broad-dsde-methods/kcibul/bug_for_louis I think it was with GATK4 SelectVariants (with a version prior to the commandline being put in the header), but I'm not 100% sure.

Generating the index on the fly with GATK3 works fine. I'm not sure if the original tabix index from the GotC pipeline is okay: gs://broad-jg-dev-storage/temp/09C99383.91c5a812-70c5-4526-a3a2-3e99b9cf08fb.g.vcf.gz.tbi

I found this because GATK3 complained about the contig order.



Created 2017-06-08 | Updated 2017-09-19 | bug Spark


I just try to run PrintReadsSpark on cloudera cluster and meet this error.

Command:

$ ./gatk-launch PrintReadsSpark -I NA12878.chr17_69k_70k.dictFix.bam -O /user/yaron/output.bam -- --sparkRunner SPARK --sparkMaster yarn  --num-executors 5 --executor-cores 2 --executor-memory 1g

Results:

Using GATK jar /home/yaron/gatk/build/libs/gatk-spark.jar
Running:
    spark-submit --master yarn --conf spark.kryoserializer.buffer.max=512m --conf spark.driver.maxResultSize=0 --conf spark.driver.userClassPathFirst=true --conf spark.io.compression.codec=lzf --conf spark.yarn.executor.memoryOverhead=600 --conf spark.driver.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true  --conf spark.executor.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true  --num-executors 5 --executor-cores 2 --executor-memory 1g /home/yaron/gatk/build/libs/gatk-spark.jar PrintReadsSpark -I NA12878.chr17_69k_70k.dictFix.bam -O /user/yaron/output.bam --sparkMaster yarn
09:14:13.551 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/yaron/gatk/build/libs/gatk-spark.jar!/com/intel/gkl/native/libgkl_compression.so
[June 8, 2017 9:14:13 AM CST] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark  --output /user/yaron/output.bam --input NA12878.chr17_69k_70k.dictFix.bam --sparkMaster yarn  --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --bamPartitionSize 0 --disableSequenceDictionaryValidation false --shardedOutput false --numReducers 0 --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false
[June 8, 2017 9:14:13 AM CST] Executing as yaron@dn1 on Linux 4.4.0-31-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Version: 4.alpha.2-281-g752d020-SNAPSHOT
09:14:13.567 INFO  PrintReadsSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 1
09:14:13.567 INFO  PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:14:13.567 INFO  PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
09:14:13.567 INFO  PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:14:13.567 INFO  PrintReadsSpark - Deflater: IntelDeflater
09:14:13.567 INFO  PrintReadsSpark - Inflater: IntelInflater
09:14:13.567 INFO  PrintReadsSpark - Initializing engine
09:14:13.567 INFO  PrintReadsSpark - Done initializing engine
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@6d21714c] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLoader@6ee12bac].
log4j:ERROR Could not instantiate appender named "console".
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@6d21714c] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLoader@6ee12bac].
log4j:ERROR Could not instantiate appender named "console".
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
log4j:WARN No appenders could be found for logger (org.apache.spark.SparkContext).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
09:14:26.202 INFO  PrintReadsSpark - Shutting down engine
[June 8, 2017 9:14:26 AM CST] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark done. Elapsed time: 0.21 minutes.
Runtime.totalMemory()=494927872
***********************************************************************

A USER ERROR has occurred: Couldn't write file /user/yaron/output.bam because writing failed with exception /user/yaron/output.bam.parts/_SUCCESS: Unable to find _SUCCESS file

***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$CouldNotCreateOutputFile: Couldn't write file /user/yaron/output.bam because writing failed with exception /user/yaron/output.bam.parts/_SUCCESS: Unable to find _SUCCESS file
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:255)
        at org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark.runTool(PrintReadsSpark.java:37)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:353)
        at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:116)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:171)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:190)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:121)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:142)
        at org.broadinstitute.hellbender.Main.main(Main.java:220)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.nio.file.NoSuchFileException: /user/yaron/output.bam.parts/_SUCCESS: Unable to find _SUCCESS file
        at org.seqdoop.hadoop_bam.util.SAMFileMerger.mergeParts(SAMFileMerger.java:53)
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReadsSingle(ReadsSparkSink.java:230)
        at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:152)
        at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:250)
        ... 18 more

However, I can find that _SUCCESS file exists in output.bam.parts. Could someone tell me what may be the cause? Thanks!

$ hdfs dfs -ls output.bam.parts
Found 3 items
-rw-r--r--   3 yaron yaron          0 2017-06-08 09:14 output.bam.parts/_SUCCESS
-rw-r--r--   3 yaron yaron      62019 2017-06-08 09:14 output.bam.parts/part-r-00000.bam
-rw-r--r--   3 yaron yaron         16 2017-06-08 09:14 output.bam.parts/part-r-00000.bam.splitting-bai


Created 2017-06-06 | Updated 2017-06-07 | bug Mutect


Currently, FilterByOrientation requires replacing a required input's contents with a sed command:

sed -r "s/picard\.analysis\.artifacts\.SequencingArtifactMetrics\\\$PreAdapterDetailMetrics/org\.broadinstitute\.
 hellbender\.tools\.picard\.analysis\.artifacts\.SequencingArtifactMetrics\$PreAdapterDetailMetrics/g" \
 "metrics.pre_adapter_detail_metrics" > "gatk.pre_adapter_detail_metrics"

The required input is the detailed pre-adapter summary metrics from Picard's CollectSequencingArtifactMetrics. The sed command replaces

## METRICS CLASS	picard.analysis.artifacts.SequencingArtifactMetrics$PreAdapterDetailMetrics

with

## METRICS CLASS	org.broadinstitute.hellbender.tools.picard.analysis.artifacts.SequencingArtifactMetrics$PreAdapterDetailMetrics

If this class is not replaced, then the tool errors with:

htsjdk.samtools.SAMException: Could not locate class with name gatk.analysis.artifacts.SequencingArtifactMetrics$PreAdapterDetailMetrics
	at htsjdk.samtools.metrics.MetricsFile.read(MetricsFile.java:356)
	at org.broadinstitute.hellbender.tools.exome.FilterByOrientationBias.onTraversalStart(FilterByOrientationBias.java:102)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:777)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143)
	at org.broadinstitute.hellbender.Main.main(Main.java:221)
Caused by: java.lang.ClassNotFoundException: gatk.analysis.artifacts.SequencingArtifactMetrics$PreAdapterDetailMetrics
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at htsjdk.samtools.metrics.MetricsFile.loadClass(MetricsFile.java:471)
	at htsjdk.samtools.metrics.MetricsFile.read(MetricsFile.java:353)
	... 8 more

If it is replaced, the tool still errors but with a different error:

java.lang.IllegalArgumentException: Features added out of order: previous (TabixFeature{referenceIndex=0, start=118314029, end=118314036, featureStartFilePosition=1403632633, featureEndFilePosition=-1}) > next (TabixFeature{referenceIndex=0, start=33414233, end=33414234, featureStartFilePosition=1403632876, featureEndFilePosition=-1})
	at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:89)
	at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:170)
	at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:219)
	at java.util.ArrayList.forEach(ArrayList.java:1249)
	at org.broadinstitute.hellbender.tools.exome.FilterByOrientationBias.onTraversalSuccess(FilterByOrientationBias.java:171)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:781)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143)
	at org.broadinstitute.hellbender.Main.main(Main.java:221)

It does not matter if I produce the pre-adapter metrics with the latest Picard jar v2.9.2. I get the same error.

I'm using a M2 callset from GATK3. Even so, I don't think I should get the above error?



Created 2017-06-05 | Updated 2017-08-01 | bug Spark


@TianJin297 commented on Fri May 26 2017

The command is gatk-protected HaplotypeCallerSpark -I XX_BQSRappliedspark.bam -O XX_spark.vcf -R /curr/data/humann_g1k_v37.2bit --emitRefConfidence GVCF --TMP_DIR tmp

And it is run on an Amazon m4.2xlarge instance. The error messages are like below.
04:39:06.415 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:09:00.269 ERROR Executor:91 - Exception in task 8.0 in stage 1.0 (TID 345)
java.lang.ArrayIndexOutOfBoundsException: 16777215
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.clear(IdentityObjectIntMap.java:382)
at com.esotericsoftware.kryo.util.MapReferenceResolver.reset(MapReferenceResolver.java:65)
at com.esotericsoftware.kryo.Kryo.reset(Kryo.java:865)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:630)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:297)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
05:09:00.455 WARN TaskSetManager:66 - Lost task 8.0 in stage 1.0 (TID 345, localhost): java.lang.ArrayIndexOutOfBoundsException: 16777215
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.clear(IdentityObjectIntMap.java:382)
at com.esotericsoftware.kryo.util.MapReferenceResolver.reset(MapReferenceResolver.java:65)
at com.esotericsoftware.kryo.Kryo.reset(Kryo.java:865)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:630)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:297)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

05:09:00.456 ERROR TaskSetManager:70 - Task 8 in stage 1.0 failed 1 times; aborting job
05:09:10.808 ERROR MapOutputTrackerMaster:91 - Error communicating with MapOutputTracker
java.lang.NullPointerException
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:100)
at org.apache.spark.MapOutputTracker.getStatuses(MapOutputTracker.scala:202)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:142)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
05:09:10.813 ERROR Executor:91 - Exception in task 16.0 in stage 1.0 (TID 353)
org.apache.spark.SparkException: Error communicating with MapOutputTracker
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:104)
at org.apache.spark.MapOutputTracker.getStatuses(MapOutputTracker.scala:202)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:142)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:100)
... 24 more
05:12:04.045 INFO HaplotypeCallerSpark - Shutting down engine
[May 18, 2017 5:12:04 AM UTC] org.broadinstitute.hellbender.tools.HaplotypeCallerSpark done. Elapsed time: 131.63 minutes.
Runtime.totalMemory()=16201547776
org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 1.0 failed 1 times, most recent failure: Lost task 8.0 in stage 1.0 (TID 345, localhost): java.lang.ArrayI
ndexOutOfBoundsException: 16777215
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.clear(IdentityObjectIntMap.java:382)
at com.esotericsoftware.kryo.util.MapReferenceResolver.reset(MapReferenceResolver.java:65)
at com.esotericsoftware.kryo.Kryo.reset(Kryo.java:865)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:630)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:297)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1873)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1886)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1899)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1913)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:912)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:911)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.writeVariants(HaplotypeCallerSpark.java:205)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.runTool(HaplotypeCallerSpark.java:115)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:353)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:116)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:121)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:142)
at org.broadinstitute.hellbender.Main.main(Main.java:220)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 16777215
at com.esotericsoftware.kryo.util.IdentityObjectIntMap.clear(IdentityObjectIntMap.java:382)
at com.esotericsoftware.kryo.util.MapReferenceResolver.reset(MapReferenceResolver.java:65)
at com.esotericsoftware.kryo.Kryo.reset(Kryo.java:865)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:630)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:297)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)



Created 2017-06-05 | Updated 2017-08-01 | bug HaplotypeCaller Spark


@TianJin297 commented on Fri May 26 2017

When I was running HaplotypeCallorSpark with one of my samples, I got an error as "Duplicate key".

The command I used is "/gatk-protected HaplotypeCallerSpark -I XX_BQSRappliedspark.bam -O XX_525.gvcf -R /curr/data/humann_g1k_v37.2bit --emitRefConfidence BP_RESOLUTION --TMP_DIR tmp"

And it runs on Amazon instance m4.2xlarge.

00:10:41.089 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
00:10:41.089 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
00:10:41.089 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
00:10:45.460 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
00:10:45.460 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
00:11:09.609 WARN TaskMemoryManager:381 - leak 166.6 MB memory from org.apache.spark.util.collection.ExternalAppendOnlyMap@60a3c432
00:11:09.611 ERROR Executor:91 - Exception in task 15.0 in stage 1.0 (TID 519)
java.lang.IllegalStateException: Duplicate key [B@4e233a3c
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1253)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.buildGapContinuationPenalties(PairHMMLikelihoodCalculat
ionEngine.java:304)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:253)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:187)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:518)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$2(HaplotypeCallerSpark.java:192)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
00:11:09.632 WARN TaskSetManager:66 - Lost task 15.0 in stage 1.0 (TID 519, localhost): java.lang.IllegalStateException: Duplicate key [B@4e233a3c
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1253)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.buildGapContinuationPenalties(PairHMMLikelihoodCalculat
ionEngine.java:304)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:253)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:187)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:518)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$2(HaplotypeCallerSpark.java:192)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

00:11:09.634 ERROR TaskSetManager:70 - Task 15 in stage 1.0 failed 1 times; aborting job
00:11:09.810 WARN TaskSetManager:66 - Lost task 33.0 in stage 1.0 (TID 528, localhost): TaskKilled (killed intentionally)
00:11:24.786 INFO HaplotypeCallerSpark - Shutting down engine
[May 26, 2017 12:11:24 AM UTC] org.broadinstitute.hellbender.tools.HaplotypeCallerSpark done. Elapsed time: 10.58 minutes.
Runtime.totalMemory()=16622026752
org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1.0 failed 1 times, most recent failure: Lost task 15.0 in stage 1.0 (TID 519
, localhost): java.lang.IllegalStateException: Duplicate key [B@4e233a3c
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1253)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.buildGapContinuationPenalties(PairHMMLikelihoodCalculat
ionEngine.java:304)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:253)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:187)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:518)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$2(HaplotypeCallerSpark.java:192)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1873)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1886)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1899)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1913)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:912)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:911)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.writeVariants(HaplotypeCallerSpark.java:205)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.runTool(HaplotypeCallerSpark.java:115)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:353)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:116)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:121)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:142)
at org.broadinstitute.hellbender.Main.main(Main.java:220)
Caused by: java.lang.IllegalStateException: Duplicate key [B@4e233a3c
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1253)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.buildGapContinuationPenalties(PairHMMLikelihoodCalculat
ionEngine.java:304)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:253)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(PairHMMLikelihoodCalculationEngi
ne.java:187)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:518)
at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$2(HaplotypeCallerSpark.java:192)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)



Created 2017-06-05 | Updated 2017-08-01 | bug HaplotypeCaller Spark


@tomwhite commented on Tue May 23 2017

When running on a 160GB BAM I get:

java.lang.IllegalArgumentException: contig must be non-null and not equal to *, and start must be >= 1
     at org.broadinstitute.hellbender.utils.read.SAMRecordToGATKReadAdapter.setPosition(SAMRecordToGATKReadAdapter.java:89)
     at org.broadinstitute.hellbender.utils.clipping.ClippingOp.applyHARDCLIP_BASES(ClippingOp.java:381)
     at org.broadinstitute.hellbender.utils.clipping.ClippingOp.apply(ClippingOp.java:73)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.clipRead(ReadClipper.java:147)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.clipRead(ReadClipper.java:128)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.hardClipSoftClippedBases(ReadClipper.java:332)
     at org.broadinstitute.hellbender.utils.clipping.ReadClipper.hardClipSoftClippedBases(ReadClipper.java:335)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.finalizeRegion(AssemblyBasedCallerUtils.java:84)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:238)
     at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:478)
     at org.broadinstitute.hellbender.tools.HaplotypeCallerSpark.lambda$regionToVariants$580(HaplotypeCallerSpark.java:203) 

At first glance this looks like a problem with unmapped reads, but these are filtered out by the tool. So it's more likely to be in the clipping logic. It's hard to diagnose since it doesn't say which read caused it, and it's slow to reproduce as it is running on a large input.

Any thoughts @lbergelson, @droazen?


@lbergelson commented on Sat May 27 2017

@tomwhite Is it possible you could upload the bam file somewhere on google cloud along with the command line you used? It's not obvious to me where the error is being caused. It's painful to debug anything on a 160GB file, but I think we can probably do a binary search on the file and find the bad location pretty quickly. I.e. throw compute at the problem instead of human time...



Created 2017-06-05 | Updated 2017-06-05 | bug Copy Number tools


@asmirnov239 commented on Fri Apr 21 2017

Right now if a user disables MAPPED filter, which is a default filter for CalculateTargetCoverage tool, it will fail with the following uninformative exception (unless somehow all reads are mapped):

java.lang.IllegalArgumentException: the input location cannot be null
	at org.broadinstitute.hellbender.utils.Utils.nonNull(Utils.java:549)
	at org.broadinstitute.hellbender.tools.exome.HashedListTargetCollection.indexRange(HashedListTargetCollection.java:152)
	at org.broadinstitute.hellbender.tools.exome.CalculateTargetCoverage.apply(CalculateTargetCoverage.java:298)

We should guard against it and throw an exception before the traversal starts if MAPPED filter is disabled


@asmirnov239 commented on Fri Apr 21 2017

Since there is no direct API call to access the list of resolved filters(after command line parsing) this bug fix will have to wait until broadinstitute/barclay#38 is merged



Created 2017-06-05 | Updated 2017-06-05 | bug


@sooheelee commented on Fri Feb 17 2017

A fix was implemented for HaplotypeCaller but not ported to GenotypeGVCFs nor CombineGVCFs nor CombineVariants. Although user is using v3.7-0-gcfedb67, my understanding is that these types of fixes will only be worked on in GATK4.


Test data submitted by user can be found at

/humgen/gsa-scr1/pub/incoming/bugrep_jgeibel_1.tar.gz

This includes chicken reference files.


The command the user uses to generate the error is very long because we have many vcfs:

Program Args: -T GenotypeGVCFs -R /usr/users/geibel/chicken/chickenrefgen/galGal5_Dec2015/galGal5.fa --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72631_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72632_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72633_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72634_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72635_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72636_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72637_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72638_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72639_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72640_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72641_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72642_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72643_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72644_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72645_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72646_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72647_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72648_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72649_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72650_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72651_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72652_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72653_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72654_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72655_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72656_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72657_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72658_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72659_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72660_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72661_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72662_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72663_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72664_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72665_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72666_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72667_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72668_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72669_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72670_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72671_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72672_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72673_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72674_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72675_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72676_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72678_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72680_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72682_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72683_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AB_0001_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AR_0002_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AS_0003_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BA_0004_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BK_0005_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BH_0006_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CG_0007_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CS_0008_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_DG_0009_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_FG_0010_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_HO_0011_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GG_0012_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GS_0013_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KW_0014_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_IT_0015_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KA_0016_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KG_0017_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_PA_0018_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_LE_0019_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_MA_0020_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_MR_0021_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OH_0022_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OR_0023_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OM_0024_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_NH_0025_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SB_0026_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SE_0027_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SH_0028_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SA_0029_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SN_0030_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_TO_0031_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WT_0032_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WY_0033_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_YO_0034_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_ZC_0035_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GV_0036_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CW_0037_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_DL_0038_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KS_0039_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OF_0040_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WR_0041_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_RI_0042_chr26.raw.g.vcf -nt 10 --max_genotype_count 1024 -L chr26 --dbsnp /usr/users/geibel/chicken/chickenrefgen/ENSEMBL_20170106/Gallus_gallus.updated.vcf -o /usr/users/geibel/chicken/pool_sequence_nov2016/data/rawVCF/IndandPool_chr26.raw.vcf 

The user actually includes a shell script in the test data bundle called JointGenotyping_chr26.sh.


The error shows:

##### ERROR --
##### ERROR stack trace 
java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 20 and allele 16: approx. 3247943160
	at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypeLikelihoodCalculators.getInstance(GenotypeLikelihoodCalculators.java:319)
	at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.mergeRefConfidenceGenotypes(ReferenceConfidenceVariantContextMerger.java:461)
	at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:164)
	at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:302)
	at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:135)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
	at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: the number of genotypes is too large for ploidy 20 and allele 16: approx. 3247943160
##### ERROR ------------------------------------------------------------------------------------------



Priority: High   | Normal