GATK errors normally come with a message that tries to explain what went wrong. If possible, the message suggests a solution or at least links to a helpful documentation article. But sometimes you can get an error called "ArrayIndexOutOfBoundsException", and it doesn't come with a helpful message. The error message is just a cryptic number. What's up with that?

The reason we can't easily put in a message that suggests a fix is because this error can happen with different tools and for different reasons. So there's no one-size-fits-all solution! In this post I'm going to try to explain what the error actually means and what you can do about it.

Internally, GATK tools store most of the data they handle in arrays, which are basically a sort of list in computer jargon. The index is a number that specifies the position in the array. Arrays are created with a fixed size based on the number of elements we want (or expect) to store in them. For example, when we create an array to hold the sequence of a read, we create it to match the exact number of bases in the read. The ArrayIndexOutOfBoundsException error happens when a tool tries to access an index position that does not exist because it is larger than the size of the array.

For example, let’s say you have 15 eggs, but you only have an egg carton that can fit 12 eggs. You will not be able to store the 13th, 14th and 15th eggs. As a thinking human, you know you can just get another carton, or store the extra eggs in those little egg holders in the fridge door. But if you're a GATK tool (ahem), when you try to store the 13th egg, there's no room in the carton so the egg falls and you throw an ArrayIndexOutOfBoundsException. Or if you tried to pick a 13th egg out of a 12-egg carton, as a human you would just look silly. But as a GATK tool you would freak out and throw an ArrayIndexOutOfBoundsException again.

Now, let's tie the egg examples into a more realistic sequencing data example, to illustrate why the tool would try to access a nonexistent position in the first place. Let's say we have a read record containing its sequence and corresponding base qualities, which are both stored as arrays of characters. The two arrays are expected to be exactly the same length, so if I'm interested in the 16th base, I can look up its quality score by taking the 16th element in the array of base qualities. But what if the read record is malformed, and only has 12 base qualities despite having a 16-base sequence? You guessed it -- I will get an ArrayIndexOutOfBounds error when I try to look for the 16th quality score.

That was an example of bad data formatting. These errors can also be caused by miscalculations due to bugs, of course. But if the tool is working for everyone else except you, maybe the problem is with your data! So your first step in troubleshooting this sort of error should be to validate all your data files (Picard ValidateSamFile for BAMs, GATK ValidateVariants for VCFs). If the validations are all okay, then try with the latest version of GATK. If the error still occurs, let us know in the forum and we'll help you figure it out!

Return to top

Fri 19 Feb 2016
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


RT @curroortuno: Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From reads to…
3 Sep 19
Thank you @murilocervato for hosting our GATK workshop in Sao Paolo last week! Great group of participants, we’ll s…
3 Sep 19
@RealMattJM “Convoluted”, huh? We see what you did there...
29 Aug 19
#GATK workshop caption competition: what is deep learning developer Sam Friedman trying to say here?
28 Aug 19
@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19

- Our favorite tweets from others

Do you want to learn about sequencing data analysis in an amazing city? Register now at @gatk_dev workshop "From re…
3 Sep 19
Another successful #GATK workshop in the books! @TerraBioApp @gatk_dev
3 Sep 19
Day 2 of #GATK workshop this time in São Paulo, Brazil! Hands-on tutorials using @TerraBioApp #GATK Best Practices…
28 Aug 19
In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no…
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying…
18 Jul 19

See more of our favorite tweets...