BaseRecalibratorSpark fails with "java.lang.IllegalArgumentException: Table1 1,3 not equal to 88,3"
open | Created 2019-04-01 | Last updated 2019-04-05| Posted by akkellogg | See in Github


Spark bug


Bug Report

Affected tool(s) or class(es)

BaseRecalibratorSpark
BQSRPipelineSpark

Affected version(s)

4.1.0.0

Description

We are running into a problem using BaseRecalibratorSpark. The tool fails soon after starting. The same error appears with the same bam file on different machines. Additionally, vanilla BaseRecalibrator works just fine on these bams (so I don't think the issue is with the bam). They are all suffering from the same/similar stacktrace. We've had BaseRecalibratorSpark work fine on other bam files. Additionally, changing the number of threads still results in the same stacktrace.

I've also tried running the BQSRPipelineSpark to see if that would suffer the same issue and it fails in the same manner.

Additionally, I've run ValidateSamFile. There are some reads missing their mates, but this hasn't presented an issue in other tools (including vanilla BaseRecalibrator).

Searching thru the forum, I found an old issue with a similar stacktrace, but that issue appears to occur in GATK 2.4: https://gatkforums.broadinstitute.org/gatk/discussion/3265/bqsrgatherer-exception

In the below stacktrace, I've bolded the error message that seems to occur in each of these samples

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1128) at org.apache.spark.api.java.JavaRDDLike$class.treeAggregate(JavaRDDLike.scala:439) at org.apache.spark.api.java.AbstractJavaRDDLike.treeAggregate(JavaRDDLike.scala:45) at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.apply(BaseRecalibratorSparkFn.java:38) at org.broadinstitute.hellbender.tools.spark.BaseRecalibratorSpark.runTool(BaseRecalibratorSpark.java:132) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:528) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291)
Caused by: java.lang.IllegalArgumentException: Table1 1,3 not equal to 88,3
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724) at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.combineTables(RecalUtils.java:560) at org.broadinstitute.hellbender.utils.recalibration.RecalibrationTables.combine(RecalibrationTables.java:144) at org.broadinstitute.hellbender.utils.recalibration.RecalibrationTables.inPlaceCombine(RecalibrationTables.java:178) at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:214) at scala.collection.AbstractIterator.aggregate(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$25.apply(RDD.scala:1137) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$25.apply(RDD.scala:1137) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19/03/26 20:02:39 INFO ShutdownHookManager: Shutdown hook called 19/03/26 20:02:39 INFO ShutdownHookManager: Deleting directory /docker/working/7dd5e9aa-fa24-45ca-9979-13623c0ff8d5/a0d4bfdf-66b4-47af-b002-3c3935a7b633/spark-44911f4d-4d54-42b0-b6d1-35614170c1fc Using GATK jar /docker/reference/Apps/GATK/4.1.0.0/gatk-package-4.1.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx10876M -Djava.io.tmpdir=./ -jar /docker/reference/Apps/GATK/4.1.0.0/gatk-package-4.1.0.0-local.jar BaseRecalibratorSpark --spark-master local[8] -R /docker/reference/Data/B37/GATKBundle/2.8_subset_arup_v0.1/human_g1k_v37_decoy_phiXAdaptr.fasta -I bam/rmdup_gatkrealign.bam -O gatk_base_recal_table.txt --known-sites /docker/reference/Data/B37/DbSNP/dbSNP_147_20160601/All_20160601.vcf.gz --read-filter GoodCigarReadFilter -jdk-deflater

Steps to reproduce

gatk BaseRecalibratorSpark --spark-master local[8] -R human_g1k_v37_decoy_phiXAdaptr.fasta -I rmdup_gatkrealign.bam -O gatk_base_recal_table.txt --known-sites All_20160601.vcf.gz --read-filter GoodCigarReadFilter

Expected behavior

The tools should run successfully

Actual behavior

The tools reproducibly fail


Return to top