GATK errors normally come with a message that tries to explain what went wrong. If possible, the message suggests a solution or at least links to a helpful documentation article. But sometimes you can get an error called "ArrayIndexOutOfBoundsException", and it doesn't come with a helpful message. The error message is just a cryptic number. What's up with that?

The reason we can't easily put in a message that suggests a fix is because this error can happen with different tools and for different reasons. So there's no one-size-fits-all solution! In this post I'm going to try to explain what the error actually means and what you can do about it.


Internally, GATK tools store most of the data they handle in arrays, which are basically a sort of list in computer jargon. The index is a number that specifies the position in the array. Arrays are created with a fixed size based on the number of elements we want (or expect) to store in them. For example, when we create an array to hold the sequence of a read, we create it to match the exact number of bases in the read. The ArrayIndexOutOfBoundsException error happens when a tool tries to access an index position that does not exist because it is larger than the size of the array.

For example, let’s say you have 15 eggs, but you only have an egg carton that can fit 12 eggs. You will not be able to store the 13th, 14th and 15th eggs. As a thinking human, you know you can just get another carton, or store the extra eggs in those little egg holders in the fridge door. But if you're a GATK tool (ahem), when you try to store the 13th egg, there's no room in the carton so the egg falls and you throw an ArrayIndexOutOfBoundsException. Or if you tried to pick a 13th egg out of a 12-egg carton, as a human you would just look silly. But as a GATK tool you would freak out and throw an ArrayIndexOutOfBoundsException again.

Now, let's tie the egg examples into a more realistic sequencing data example, to illustrate why the tool would try to access a nonexistent position in the first place. Let's say we have a read record containing its sequence and corresponding base qualities, which are both stored as arrays of characters. The two arrays are expected to be exactly the same length, so if I'm interested in the 16th base, I can look up its quality score by taking the 16th element in the array of base qualities. But what if the read record is malformed, and only has 12 base qualities despite having a 16-base sequence? You guessed it -- I will get an ArrayIndexOutOfBounds error when I try to look for the 16th quality score.

That was an example of bad data formatting. These errors can also be caused by miscalculations due to bugs, of course. But if the tool is working for everyone else except you, maybe the problem is with your data! So your first step in troubleshooting this sort of error should be to validate all your data files (Picard ValidateSamFile for BAMs, GATK ValidateVariants for VCFs). If the validations are all okay, then try with the latest version of GATK. If the error still occurs, let us know in the forum and we'll help you figure it out!


Return to top

Fri 19 Feb 2016
Comment on this article


- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

RT @RealMattJM: Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking into the…
28 Oct 19
RT @MascatB: After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a standard an…
25 Oct 19
RT @FProgresoysalud: Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación clíni…
25 Oct 19
Last day of the last #GATK bootcamp of the year — going out in style with a tutorial on working with tabular 1000 G… https://t.co/qSIWbRmyog
24 Oct 19
RT @curroortuno: Having a "workflow-ful" day in GATK workshop about #WDL #Cromwell and #Docker @gatk_dev @ClinicalBioinfo @FProgresoysalud
24 Oct 19

- Our favorite tweets from others

@CBIB_UNAB @gatk_dev @TerraBioApp This project is the product of ongoing collaborations with @SGWilliams1980 and… https://t.co/y2mCQlnXdO
28 Oct 19
Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking i… https://t.co/KFjVEAL5F4
28 Oct 19
After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a… https://t.co/SzHRDknSrZ
25 Oct 19
Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación cl… https://t.co/BY9AcfWaki
25 Oct 19

See more of our favorite tweets...