This article has been retired, as the resources it cites are somewhat out of date. For an introduction to GATK and sequence analysis, see the Best Practices section of the website, which contains a lot of intro-level information and references useful resources.
We know this field can be confusing or even overwhelming to newcomers, and getting to grips with a large and varied toolkit like the GATK can be a big challenge. We have produced a presentation that we hope will help you review all the background information that you need to know in order to use the GATK:
In addition, the following links feature a lot of useful educational material about concepts and terminology related to next-generation sequencing:
A basic review of the sequencing process.
An excellent, detailed overview of the myriad next-gen sequencing methdologies.
A nice piece explaining the problems inherent in trying to analyze terabytes of data. The GATK addresses this issue by requiring all datasets be in reference order, so only small chunks of the genome need to be in memory at once, as explained here.