## Version highlights for GATK version 3.1

### Posted by Geraldine_VdAuwera on 18 Mar 2014 (26)

This may seem crazy considering we released the big 3.0 version not two weeks ago, but yes, we have a new version for you already! It's a bit of a special case because this release is all about the hardware-based optimizations we had previously announced. What we hadn't announced yet was that this is the fruit of a new collaboration with a team at Intel (which you can read more about here), so we were waiting for everyone to be ready for the big reveal.

### Intel inside GATK

So basically, the story is that we've started collaborating with the Intel Bio Team to enable key parts of the GATK to run more efficiently on certain hardware configurations. For our first project together, we tackled the PairHMM algorithm, which is responsible for a large proportion of the runtime of HaplotypeCaller analyses. The resulting optimizations, which are the main feature in version 3.1, produce significant speedups for HaplotypeCaller runs on a wide range of hardware.

We will continue working with Intel to further improve the performance of GATK tools that have historically been afflicted with performance issues and long runtimes (hello BQSR). As always, we hope these new features will make your life easier, and we welcome your feedback in the forum!

### In practice

Note that these optimizations currently work on Linux systems only, and will not work on Mac or Windows operating systems. In the near future we will add support for Mac OS. We have no plans to add support for Windows since the GATK itself does not run on Windows.

Please note also that to take advantage of these optimizations, you need to opt-in by adding the following flag to your GATK command: -pairHMM VECTOR_LOGLESS_CACHING.

Here is a handy little table of the speedups you can expect depending on the hardware and operating system you are using. The configurations given here are the minimum requirements for benefiting from the expected speedup ranges shown in the third column. Keep in mind that these numbers are based on tests in controlled conditions; in the wild, your mileage may vary.

Linux kernel version Architecture / Processor Expected speedup Instruction set
Any 64-bit Linux Any x86 64-bit 1-1.5x Non-vector
Linux 2.6 or newer Penryn (Core 2 or newer) 1.3-1.8x SSE 4.1
Linux 2.6.30 or newer SandyBridge (i3, i5, i7, Xeon E3, E5, E7 or newer) 2-2.5x AVX

To find out exactly which processor is in your machine, you can run this command in the terminal:

#### blueskypy

which GATK commands should -pairHMM VECTOR_LOGLESS_CACHING be added to?

#### Geraldine_VdAuwera

HaplotypeCaller commands. In future we plan to enable other tools to take advantage of hardware optimizations. This is the objective of our budding collaboration with Intel.

#### TechnicalVault

Could we get a "Instruction Set" and corresponding cpuinfo flags added to that table? It's easier than trying to remember than which Intel Processor came in what order.

#### Geraldine_VdAuwera

Hah, sure -- that was in the original draft but we removed it because we didn't think people would want to know. But happy to add it back.

#### whiteering

Is the speedup of 2~2.5 on AVX-enabled machine for Haplotypecaller only or for the whole GATK pipeline? According to the poster presented at AGBT, 35X and 720X speedups are expected for haplotypecaller on AVX-enabled Intel Xeon machines with 1-core and 24-cores , respectively. Would you please clarify the situation in a bit detail?

#### Geraldine_VdAuwera

@whiteering‌, the speedups available in 3.1 only affect the HaplotypeCaller. In future we will have speedups for other parts of the pipeline, but it will be a while yet before we can deliver those.

#### blueskypy

I see the following "note" from HC with -pairHMM VECTOR_LOGLESS_CACHING > FTZ enabled - may decrease accuracy if denormal numbers encountered > Using SSE4.1 accelerated implementation of PairHMM Should users be worried about the "may decrease accuracy" part?

#### Geraldine_VdAuwera

@blueskypy No, you don't need to worry about this at all. It's a leftover development note, will be removed in the next version.

###### Tue 18 Mar 2014

We don't seem to see significant speed up when running with -pairHMM VECTOR_LOGLESS_CACHING on HaplotypeCaller. We seem to meet the requirements (Xeon CPU E5-2670, AVX, Linux 2.6.32), but the performance actually decreased (89 minutes without the pairHMM flag, 90 minutes with). Is there something else that could keep us from seeing a 2x speed-up? Below is the edited output of the run with pairHMM just in case you spot something that I should have noticed. INFO 16:58:49,340 HelpFormatter - Program Args: -T HaplotypeCaller -R /ifs/data/bio/assemblies/H.sapiens/hg19/hg19.fasta -L chr2 --dbsnp data/dbsnp_135.hg19__ReTag.vcf --downsampling_type NONE --annotation AlleleBalanceBySample --annotation ClippingRankSumTest --read_filter BadCigar --num_cpu_threads_per_data_thread 12 --out TEST_CHR2_HaplotypeCaller.vcf -I TEST_group_1_CHR2_indelRealigned_recal.bam -I TEST_group_2_CHR2_indelRealigned_recal.bam -I TEST_group_3_CHR2_indelRealigned_recal.bam -I TEST_group_4_CHR2_indelRealigned_recal.bam -pairHMM VECTOR_LOGLESS_CACHING INFO 16:58:49,342 HelpFormatter - Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. INFO 16:58:49,342 HelpFormatter - Date/Time: 2014/04/04 16:58:49 INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,783 GenomeAnalysisEngine - Strictness is SILENT INFO 16:58:49,876 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 16:58:49,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:58:49,966 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 16:58:50,027 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 16:58:50,327 IntervalUtils - Processing 243199373 bp from intervals INFO 16:58:50,341 MicroScheduler - Running the GATK in parallel mode with 12 total threads, 12 CPU thread(s) for each of 1 data thread(s), of 32 processors available on this machine INFO 16:58:50,478 GenomeAnalysisEngine - Preparing for traversal over 4 BAM files INFO 16:58:51,173 GenomeAnalysisEngine - Done preparing for traversal INFO 16:58:51,173 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:58:51,173 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining INFO 16:58:51,305 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units WARN 16:58:54,452 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! FTZ enabled - may decrease accuracy if denormal numbers encountered Using AVX accelerated implementation of PairHMM WARN 16:58:54,510 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! . . . INFO 18:29:01,584 ProgressMeter - chr2:243199373 0.00e+00 90.2 m 8945.8 w 100.0% 90.2 m 0.0 s WARN 18:29:05,339 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! Time spent in setup for JNI call : 0.0 Total compute time in PairHMM computeLikelihoods() : 0.0 INFO 18:29:05,340 HaplotypeCaller - Ran local assembly on 68951 active regions INFO 18:29:05,884 ProgressMeter - done 2.43e+08 90.2 m 22.0 s 100.0% 90.2 m 0.0 s INFO 18:29:05,884 ProgressMeter - Total runtime 5414.71 secs, 90.25 min, 1.50 hours

#### Kurt

I have this same model as well and also didn't see the speedup, but did see a speed up on older nodes/models that were SSE enabled.

#### Kurt

I'll have to clear it with powers that be tomorrow, but I should be able to provide you with a bam file or more since some of them are hapmap samples. We have an aspera license so it might be faster to download them through that mechanism once we put them up them up on that server.

#### Carneiro

can any of you share the dataset you are working on so we can try to reproduce it here? In your logs, it seems like it didn't use the AVX version at all since the "Total compute time in PairHMM computeLikelihoods() : 0.0". Something must be wrong. I'm guessing it may have to do with the fact that this is an AMD machine and we haven't tested the platform identification on AMD (although it's supposed to be standardized...) : "Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64"

#### Kurt

@Carneiro I'm waiting for my IT group to put up a few bam files up on our aspera server. In the meantime, I went back through the logs again and found that anytime -nct is used then the log always reflect "Total compute time in PairHMM computeLikelihoods() : 0.0". It appears this is also why that is in @adouble2 log. Anytime -nct is not specified (either when opting in -pairHMM VECTOR_LOGLESS_CACHING or not) that entry now is calculated and in placed into the log. However, I still do not see a speed up on the avx enable cpu models while on the SSE enabled models I did see a 20% increase. Currently in my pipeline I set both -nct and -pairHMM b/c I did not see a decrease in wall clock time and I figured that I might as well set them both if this (-nct + -pairHMM) becomes enabled in 3.2. Both the SSE and AVX enabled nodes come out as "Executing as ...kernel amd64" However, for the Xeon CPU E5-2670, I do not get "Using AVX accelerated implementation of PairHMM" like @adouble2 did in the log, but for my older models I do get "Using SSE4.1 accelerated implementation of PairHMM"

#### Kurt

@Carneiro‌, @Geraldine_VdAuwera, Just sent you an email regarding the files. Well, apparently I had the wrong email address for Mauricio, but I cc'd Geraldine on it. Best Regards, Kurt

#### Kurt

Sure thing Karthik, @kgururaj‌ , i will let u know sometime this weekend/early next week. Best, Kurt

#### Kurt

@kgururaj‌, this is the only thing that I can see so far in regards to the library not being loaded. DEBUG 07:29:45,885 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from StingUtils.jar DEBUG 07:29:45,890 PairHMMLikelihoodCalculationEngine\$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING

#### kgururaj

@Kurt‌ Thanks for the log - the native library failed to load which is why you do not see any speedup. A couple of checks: 1. Insane check, but just to be on the safe side, see whether the GATK jar file contains the native library: jar tf target/GenomeAnalysisTK.jar | grep libVector You should see: org/broadinstitute/sting/utils/pairhmm/libVectorLoglessPairHMM.so 2. At runtime, the AVX library file is unbundled from the GATK jar and loaded. While the HaplotypeCaller is running on the server, do you see a file "/tmp/libVectorLoglessPairHMM*.so" ? This file should be created by Java when it tries to load the library (assuming you have write permissions for the /tmp directory). The library file should be available after the log file shows the following warning message: "VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation" Note that Java deletes the library file when it terminates. So, you should check *while* the HaplotypeCaller is running and *after* the warning message listed above is seen in the log.

#### croshong

While I'm using VectorLoglessPairHMM in HaplotypeCaller, I could see more than 2X speedup. but In the log file it does always emits warning like this WARN 06:18:21,666 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! Does it mean that VectorLoglessPariHMM is still under development and there is some possible danger in using this option?

#### Sheila

@croshong‌ Hi, There is nothing to worry about now. The warning will be removed in the next version. -Sheila

#### Kurt

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

#### Kurt

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

#### Kurt

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

#### Kurt

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

Tue 18 Mar 2014

#### GATK Dev Team

###### @gatk_dev

@geoffjentry @PatriciaMBrent Indel realn gone now; much faster BQSR + Sparkified tools in GATK4a. And wider scope of application/ use cases.
###### 9 Dec 16
@geoffjentry @PatriciaMBrent B fair, other slide says results similar. Not uncommon for benchmark tests. But speed comparison v outdated 1/2
###### 9 Dec 16
RT @geoffjentry: Talk by Stavros Papadopoulos about TileDB from @intel - used by @BroadGenomics to power our joint genotyping https://t.co/
###### 8 Dec 16
@iskander @NJL_NGS Hm. Can you please post this in the forum?
###### 8 Dec 16
@iskander @NJL_NGS Treat them like any other read, no special handling. Problems arise in CIGAR functions that don’t know how to handle Ns.

###### Our favorite tweets from others

Currently in a time-out for saying that duck fat had a certain "je ne sais quack" at the thanksgiving dinner table.
###### 25 Nov 16
@dgmacarthur @BioMickWatson @StevenNHart @splon There's even a shop near Broad that apparently fixes Hail code erro… https://t.co/IZ4BcgRZYE
###### 19 Nov 16
@dgmacarthur reminds me of https://t.co/dO8NgxbtEQ
###### 17 Nov 16
I'm very happy to be at GATK Workshop hands on with @gatk_dev ! There's always something to learn. Tip: there is free coffee. Lol
###### 8 Nov 16
Have recently begun to think of slide editing as ‘downsampling my slides’. I suspect this indicates something is wrong with me.
###### 1 Nov 16
See more of our favorite tweets...