## Bug in various tools, 2.4-7 : "ArrayIndexOutOfBoundsException"

### Posted by Geraldine_VdAuwera on 14 Mar 2013 (36)

As reported here:

If you encounter this bug too, please don't post a new question about it. Feel free to comment in this thread to let us know you have also had the same problem. Tell us what version of the GATK you were using and post your command line.

Thank you for your patience while we work to fix this issue.

### Latest update: we found that the three tools (PrintRead, HaplotypeCaller and UnifiedGenotyper) had different issues that only produced the same symptom (the ArrayIndexOutOfBoundsException error). The underlying issues have all been fixed in version 2.5. If you encounter an ArrayIndexOutOfBoundsException in later versions, it is certainly caused by a different problem, so please open a new thread to report the issue.

#### yhoang

Hi, I am getting this problem for every sample I have ##### ERROR stack trace java.lang.ArrayIndexOutOfBoundsException: 100 ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.4-7-g5e89f01): ##### ERROR MESSAGE: 100 java -Xmx85g -jar /tools/gatk/2.4-7/gatk/GenomeAnalysisTK.jar \ -T BaseRecalibrator\ -R /refs/b37/human_g1k_v37.fasta \ -I bwamem.rmd.realigned.bam \ --knownSites /refs/b37/dbsnp_135.b37.vcf \ --knownSites /refs/b37/1000G_omni2.5.b37.sites.vcf\ -o bwamem_recal_data.grp And also, I got another problem: even after using --fix_misencoded_quality_scores in RealignerTargetCreator, there are some samples where I need to do the following step IndelRealigner with -allowPotentiallyMisencodedQuals, but with an older version (2.1.) I do not need to fix anything at all. What is the problem right here? Thanks Yen

#### Geraldine_VdAuwera

Hi @nicolas, I will tell Mauricio that I'm going to post his phone number on the forum if he doesn't fix the bug quickly :) More seriously, can you post the command line that led to this error?

#### Geraldine_VdAuwera

Hi Yen, Could you please try again with the latest version (2.4-9) and let me know if the first error still occurs? Regarding the second problem, older versions of gatk did not check base quality encoding. So you could have already had that problem but without knowing it.

#### Geraldine_VdAuwera

Hi @aeonsim, i believe that's the second error case that we found using your data -- the fix for that problem is in our development version, but it was too complex to patch onto 2.4, so it will only be released in 2.5. That said the fix is in the nightly build of our dev tree, so you can try the latest nightly to confirm whether this is indeed fixed for you.

#### aeonsim

@Geraldine_VdAuwera Thanks I've had a go with the nightly version and I am no longer getting the error.

#### pascalg

Hi, I have the same error as Yen as shown below. I tried the current version (v2.4-9-g532efad) and the nightly (nightly-2013-04-03-g231698f). Thanks, Pascal java -Xmx14g -jar GenomeAnalysisTK.jar \ -T BaseRecalibrator \ -I realigned.bam \ -R rn5.fa \ -S LENIENT \ --log_to_file BR.log \ -o realigned.recal.bam \ --plot_pdf_file recal.pdf \ --intermediate_csv_file recal.csv \ --knownSites dbSNP136.vcf INFO 19:19:14,020 ProgressMeter - chr14:32233412 9.07e+07 5.3 h 3.5 m 24.1% 22.1 h 16.8 h INFO 19:20:06,316 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.ArrayIndexOutOfBoundsException: 100 at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateIsIndel(BaseRecalibrator.java:387) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:253) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:131) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:109) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.4-9-g532efad): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk ##### ERROR ##### ERROR MESSAGE: 100 ##### ERROR ------------------------------------------------------------------------------------------

#### Geraldine_VdAuwera

Hi Pascal, Could you please validate your input files? I see you're running with lenient validation, which may be letting through malformed data.

#### pascalg

Thanks Geraldine, you are right, I should not run it in lenient mode. I checked the file by ValidateSamFile (and the vcf file by vcf-validator), but the same error occurs.

#### Geraldine_VdAuwera

I see. Can you please try again with our very latest nightly build (see Downloads page)? I think we have a fix for this now. If that still doesn't work I'll need you to upload a snippet of your bam file to our server. Detailed instructions here: http://www.broadinstitute.org/gatk/guide/article?id=1894

#### Martin1

Hello, I have the same error for version 2.5.2. I my case it happens with the BaseRecalibrator: INFO 15:58:09,019 HelpFormatter - -------------------------------------------------------------------------------- INFO 15:58:09,023 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 INFO 15:58:09,023 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 15:58:09,024 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 15:58:09,031 HelpFormatter - Program Args: -T BaseRecalibrator -nct 4 -R /data/common/Genomes/H.Sapiens/hg19_sorted/gatk/ucsc.hg19.fasta -knownSites /data/common/gatk_bundle_2.3/d bsnp_137.hg19.vcf -knownSites /data/common/gatk_bundle_2.3/1000G_omni2.5.hg19.vcf -knownSites /data/common/gatk_bundle_2.3/hapmap_3.3.hg19.vcf -I 5073.realign.bam -cov ReadGroupCovariat e -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o 5073.recal_data.grp INFO 15:58:09,031 HelpFormatter - Date/Time: 2013/05/02 15:58:09 and the stack trace: ##### ERROR stack trace java.lang.ArrayIndexOutOfBoundsException: 100 at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateIsIndel(BaseRecalibrator.java:387) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:253) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:131) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Is this the same error?

#### Geraldine_VdAuwera

Hi Martin, I think this is different from the others in this thread, but it might be the same as a similar case I'm currently looking into for someone else here: http://gatkforums.broadinstitute.org/discussion/2551/error-stack-trace Can you tell me if your file passes validation?

#### Geraldine_VdAuwera

Can you please try running this again with the latest nightly build (see Downloads page)? The nightlies are built from the latest internal development version; they are not supported for general use but that will tell us if the bug is still present or not.

#### gathrey

Sure. I'll try that and get back to you :)

#### gathrey

I have not been able to run the nightly build because I am getting 'unsupported major.minor version 51.0' errors for java. I am running the latest java version and I am getting the same error on both linux and mac. I went all the way back to the oldest nightly build available, but I have the same error. So unable to figure this out but will keep trying.

#### Geraldine_VdAuwera

I see -- @gathrey, when you say latest java version, which one do you mean exactly? We have been working on migrating from Java 6 to 7 so we're interested in the details if there are any version incompatibilities.

#### Geraldine_VdAuwera

Hi @Fer, This must be a different problem, albeit with the same symptom (see my update to the original post above, which I added to clarify). Please check if this issue reproduces without the -nt argument and if it does, post it as a new thread (using the "Ask a question" button) in the "Ask the GATK team" category.

#### dtaliun

Hi, I wonder what is the status of the bug reported by @Fer? I run into the exactly the same error when using v3.3.0 but without -nt. Best, Daniel Error trace: java.lang.ArrayIndexOutOfBoundsException: 10000 at org.broadinstitute.gatk.utils.variant.ReferenceConfidenceVariantContextMerger.generatePL(ReferenceConfidenceVariantContextMerger.java:357) at org.broadinstitute.gatk.utils.variant.ReferenceConfidenceVariantContextMerger.mergeRefConfidenceGenotypes(ReferenceConfidenceVariantContextMerger.java:331) at org.broadinstitute.gatk.utils.variant.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:134) at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:200) at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:119) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

#### Sheila

@dtaliun Hi Daniel, Can you post the exact command you used? Thanks, Sheila

#### dtaliun

Hi Sheila, The command line: java -jar -Xmx8g GenomeAnalysisTK.jar -T GenotypeGVCFs -R hs37d5.fa --variant 16_7000001_8000000.vcf.gz -o 16_7000001_8000000.genotypes.vcf.gz Best, Daniel

#### GATK Dev Team

###### 27 Jul 16
Interactive map of the global #GATK user community (thanks Google APIs) https://t.co/IrAhKg13vu
###### 27 Jul 16
New #GATK web address: https://t.co/SmXppw36ir (but www links or bookmarks will still work) https://t.co/PqbYbGhSWH
###### 20 Jul 16
@yokofakun GenomeSTRiP is developed by a different group (Bob Handsaker at Harvard); expect Bob to answer on the forum.
###### 18 Jul 16
@TechnicalVault and we're currently working on a public roadmap document for GATK4 (2/2)

###### Our favorite tweets from others

You’re right @gatk_dev honesty is key! About variants manual filtering: “In any case you're probably in for a world of pain.” Ha now I know!
###### 11 Jul 16
.@gatk_dev I like the new documentation index page, the subheading has made my day! :D #doge #GeekHumourFTW https://t.co/9RXnDTMoBm
###### 8 Jul 16
There is no NGS, NG is today so should only be called high-throughput sequencing #CSC #GATKworkshop https://t.co/paHcNimD7o
###### 16 Jun 16
The @dgmacarthur lab leaving as they came, rock stars of science in their stretch limo https://t.co/IQ0eCOT5H6
###### 14 May 16
Hey @BroadGenomics we just flew past 250,000 exomes + genomes. Good job everyone.
###### 13 May 16
See more of our favorite tweets...