This article covers concepts and terminology frequently used when discussing how to accelerate the execution of GATK tools and pipelines (and genomics tools in general).
When we're talking about achieving faster speeds through hardware upgrades, we're talking about three types of things: "generically" better hardware, "normal" hardware for which there are software optimizations available, and specialized "alphabet soup" processors like GPUs, FPGAs, and TPUs. Read on if you'd like to understand better (with minimal jargon) what are the key concepts involved and why some types of hardware yield better performance.
Most tools will benefit from being run on hardware that's generically better, e.g. faster processors that can perform more calculations per unit of time, more cores (for multithreaded parallelism), more memory or faster hard drives.
The latter two save on time spent copying data to and from disk, which is called IO (for In-Out); one by avoiding IO altogether, the other by speeding up the process. The fastest type of disk you can get is called SSD (Solid State Drive); this used to be considered a specialized hardware component but at this point they've become quite common -- in fact there might be one in the laptop you're using right now. SSDs are an attractive alternative to traditional hard drives (a.k.a. HDDs) in large part because they are much faster, so any tool that spends a lot of time reading and writing to and from disk will benefit enormously from running with an SSD rather than an HDD drive.
How much of a speedup you can get for any given analysis tool depends a lot on what the tool does and how it works, e.g. how many calculations does it need to perform, how complex are they, and how much data does it need to load into memory for a given operation.
Let me unpack that a bit. Software optimization in general is basically the art of writing a piece of software code in a way that is going to run the most efficiently (i.e. will run the fastest, in this context). Hardware-specific optimization is when you do the software optimization with a specific make and model of hardware in mind.
Without going into the weeds, this is a thing because processors (CPUs, a.k.a. Central Processing Units) have "instructions sets" that translate the software code we write into "machine code" that the processor actually executes. There are apparently some subtle differences (which I honestly don't understand myself) between the instruction sets of different processor architectures (= model series). The bottom line is that it's possible to make the computation more efficient by tailoring the software code to a processor's particular instruction set.
Now, we write almost all of the GATK code in Java, which is great for us because it's a programming language that makes it (relatively) easy to write a robust, portable program that will run pretty much anywhere. However, it's notoriously not-so-great for performance because it doesn't let the programmer access the low-level controls that are involved in hardware-specific optimizations. For the record, as far as Java is concerned that's a feature, not a bug, because abstracting away the hardware layer is one of the key ways you make software portable. Also, again from our point of view, optimizing for hardware is hard and is not something we're interested in spending time on -- we'd rather leave that to the experts, and keep our focus on delivering the best possible scientific results.
Which is why we enjoy our very fruitful collaboration with Intel engineers, who have written optimized versions of key pieces of GATK software (including the PairHMM algorithm, which is involved in calculating genotype likelihoods in HaplotypeCaller and Mutect2) for all recent Intel processor architectures. These optimizations are packaged within GATK itself, along with a system that enables the GATK engine to detect when it's running on hardware for which optimized kernels are available, and to switch automatically to using the most appropriate kernel for that hardware. This means that whenever you run an optimized tool on a machine with an Intel processor, the GATK will automatically get a speed boost just from that.
Others have written optimizations for GATK tools in that same vein, including IBM (for their POWER series of processors) but those are currently not packaged within the official distribution of GATK so there's a bit of additional assembly required.
GPUs, FPGAs and TPUs are types of processors that have become very popular for accelerating computation in some subsets of genomic analyses that lend themselves especially well to massive parallelization, like genome mapping. All three types require specific software optimizations to take full advantage of these processors' capabilities. There are currently no such optimizations available within the official GATK distribution, but our collaborators at Intel are working on developing some. We also know of several other groups working in this space; some with the intention of eventually contributing their improvements to the GATK codebase so everyone can use them freely (yay!), while others are commercial competitors of GATK and therefore less inclined toward public benefaction (ahem).
GPU stands for Graphics Processor Unit; also known as a video card, there's definitely one in your laptop, handling everything that gets sent to your monitor(s). GPUs are designed and built in a way that's very different from CPUs, and are especially well suited to doing a ton of calculations in parallel really fast. Why? Because that's what's involved in generating graphical representations on a computer monitor, especially if you're going to be watching movies or playing video games.
FPGA stands for Field-Programmable Gate Array; the "Field-Programmable" part refers to the fact that FPGAs are designed to be reprogrammed at will after manufacturing, in contrast to CPUs and GPUs that are programmed with their instruction sets as part of the manufacturing process. Remember the point I made earlier about hardware-specific optimizations, where we have to rewrite the GATK code to suit the processor's instruction set? Well here it's sort of the opposite: you can actually reprogram the processor to suit the computation you want to run.
TPU stands for Tensor Processing Unit; it's a (relatively) newfangled device designed by Google specifically for running machine learning algorithms that involve handling something called, you guessed it, tensors. According to the GCP blog, that's what Google Search, Street View, Google Photos and Google Translate all run on under the hood. Since everything related to machine learning and neural networks is really hot, you can expect TPUs to be a hot topic as well.