This may seem crazy considering we released the big 3.0 version not two weeks ago, but yes, we have a new version for you already! It's a bit of a special case because this release is all about the hardware-based optimizations we had previously announced. What we hadn't announced yet was that this is the fruit of a new collaboration with a team at Intel (which you can read more about here), so we were waiting for everyone to be ready for the big reveal.


Intel inside GATK

So basically, the story is that we've started collaborating with the Intel Bio Team to enable key parts of the GATK to run more efficiently on certain hardware configurations. For our first project together, we tackled the PairHMM algorithm, which is responsible for a large proportion of the runtime of HaplotypeCaller analyses. The resulting optimizations, which are the main feature in version 3.1, produce significant speedups for HaplotypeCaller runs on a wide range of hardware.

We will continue working with Intel to further improve the performance of GATK tools that have historically been afflicted with performance issues and long runtimes (hello BQSR). As always, we hope these new features will make your life easier, and we welcome your feedback in the forum!

In practice

Note that these optimizations currently work on Linux systems only, and will not work on Mac or Windows operating systems. In the near future we will add support for Mac OS. We have no plans to add support for Windows since the GATK itself does not run on Windows.

Please note also that to take advantage of these optimizations, you need to opt-in by adding the following flag to your GATK command: -pairHMM VECTOR_LOGLESS_CACHING.

Here is a handy little table of the speedups you can expect depending on the hardware and operating system you are using. The configurations given here are the minimum requirements for benefiting from the expected speedup ranges shown in the third column. Keep in mind that these numbers are based on tests in controlled conditions; in the wild, your mileage may vary.

Linux kernel version Architecture / Processor Expected speedup Instruction set
Any 64-bit Linux Any x86 64-bit 1-1.5x Non-vector
Linux 2.6 or newer Penryn (Core 2 or newer) 1.3-1.8x SSE 4.1
Linux 2.6.30 or newer SandyBridge (i3, i5, i7, Xeon E3, E5, E7 or newer) 2-2.5x AVX

To find out exactly which processor is in your machine, you can run this command in the terminal:

$ cat /proc/cpuinfo | grep "model name"                                                                                    
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

In this example, the machine has 4 cores (8-threads), so you see the answer 8 times. With the model name (here i7-2600) you can look up your hardware's relevant capabilities in the Wikipedia page on vector extensions.

Alternatively, Intel has provided us with some links to lists of processors categorized by architecture, in which you can look up your hardware:

Penryn processors

Sandy Bridge processors

Finally, a few notes to clarify some concepts regarding Linux kernels vs. distributions and processors vs. architectures:

  • SandyBridge and Penryn are microarchitectures; essentially, these are sets of instructions built into the CPU. Core 2, core i3, i4, i7, Xeon e3, e5, e7 are the processors that will implement a specific architecture to make use of the relevant improvements (see table above).

  • The Linux kernel has no connection with Linux distribution (e.g. Ubuntu, RedHat etc). Any distribution can use any kernel they want. There are "default kernels" shipped with each distribution, but that's beyond the scope of this article to cover (there are at least 300 Linux distributions out there). But you can always install whatever kernel version you want.

  • The kernel version 2.6.30 was released in 2009, so we expect every sane person or IT out there to be using something better than this.

Return to top

blueskypy


Sweet!!! This is what I have :), expecting a 2X speedup! -bash-4.1$ cat /proc/cpuinfo | grep "model name" model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz

Tue 18 Mar 2014

blueskypy


which GATK commands should `-pairHMM VECTOR_LOGLESS_CACHING` be added to?

Tue 18 Mar 2014

Geraldine_VdAuwera


HaplotypeCaller commands. In future we plan to enable other tools to take advantage of hardware optimizations. This is the objective of our budding collaboration with Intel.

Tue 18 Mar 2014

TechnicalVault


Could we get a "Instruction Set" and corresponding cpuinfo flags added to that table? It's easier than trying to remember than which Intel Processor came in what order.

Tue 18 Mar 2014

Geraldine_VdAuwera


Hah, sure -- that was in the original draft but we removed it because we didn't think people would want to know. But happy to add it back.

Tue 18 Mar 2014

whiteering


Is the speedup of 2~2.5 on AVX-enabled machine for Haplotypecaller only or for the whole GATK pipeline? According to the poster presented at AGBT, 35X and 720X speedups are expected for haplotypecaller on AVX-enabled Intel Xeon machines with 1-core and 24-cores , respectively. Would you please clarify the situation in a bit detail?

Tue 18 Mar 2014

Geraldine_VdAuwera


@whiteering‌, the speedups available in 3.1 only affect the HaplotypeCaller. In future we will have speedups for other parts of the pipeline, but it will be a while yet before we can deliver those.

Tue 18 Mar 2014

blueskypy


I see the following "note" from HC with `-pairHMM VECTOR_LOGLESS_CACHING` > FTZ enabled - may decrease accuracy if denormal numbers encountered > Using SSE4.1 accelerated implementation of PairHMM Should users be worried about the "may decrease accuracy" part?

Tue 18 Mar 2014

Geraldine_VdAuwera


@blueskypy No, you don't need to worry about this at all. It's a leftover development note, will be removed in the next version.

Tue 18 Mar 2014

adouble2


We don't seem to see significant speed up when running with -pairHMM VECTOR_LOGLESS_CACHING on HaplotypeCaller. We seem to meet the requirements (Xeon CPU E5-2670, AVX, Linux 2.6.32), but the performance actually decreased (89 minutes without the pairHMM flag, 90 minutes with). Is there something else that could keep us from seeing a 2x speed-up? Below is the edited output of the run with pairHMM just in case you spot something that I should have noticed. `INFO 16:58:49,340 HelpFormatter - Program Args: -T HaplotypeCaller -R /ifs/data/bio/assemblies/H.sapiens/hg19/hg19.fasta -L chr2 --dbsnp data/dbsnp_135.hg19__ReTag.vcf --downsampling_type NONE --annotation AlleleBalanceBySample --annotation ClippingRankSumTest --read_filter BadCigar --num_cpu_threads_per_data_thread 12 --out TEST_CHR2_HaplotypeCaller.vcf -I TEST_group_1_CHR2_indelRealigned_recal.bam -I TEST_group_2_CHR2_indelRealigned_recal.bam -I TEST_group_3_CHR2_indelRealigned_recal.bam -I TEST_group_4_CHR2_indelRealigned_recal.bam -pairHMM VECTOR_LOGLESS_CACHING INFO 16:58:49,342 HelpFormatter - Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. INFO 16:58:49,342 HelpFormatter - Date/Time: 2014/04/04 16:58:49 INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,783 GenomeAnalysisEngine - Strictness is SILENT INFO 16:58:49,876 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 16:58:49,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:58:49,966 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 16:58:50,027 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 16:58:50,327 IntervalUtils - Processing 243199373 bp from intervals INFO 16:58:50,341 MicroScheduler - Running the GATK in parallel mode with 12 total threads, 12 CPU thread(s) for each of 1 data thread(s), of 32 processors available on this machine INFO 16:58:50,478 GenomeAnalysisEngine - Preparing for traversal over 4 BAM files INFO 16:58:51,173 GenomeAnalysisEngine - Done preparing for traversal INFO 16:58:51,173 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:58:51,173 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining INFO 16:58:51,305 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units WARN 16:58:54,452 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! FTZ enabled - may decrease accuracy if denormal numbers encountered Using AVX accelerated implementation of PairHMM WARN 16:58:54,510 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! . . . INFO 18:29:01,584 ProgressMeter - chr2:243199373 0.00e+00 90.2 m 8945.8 w 100.0% 90.2 m 0.0 s WARN 18:29:05,339 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! Time spent in setup for JNI call : 0.0 Total compute time in PairHMM computeLikelihoods() : 0.0 INFO 18:29:05,340 HaplotypeCaller - Ran local assembly on 68951 active regions INFO 18:29:05,884 ProgressMeter - done 2.43e+08 90.2 m 22.0 s 100.0% 90.2 m 0.0 s INFO 18:29:05,884 ProgressMeter - Total runtime 5414.71 secs, 90.25 min, 1.50 hours `

Tue 18 Mar 2014

Kurt


I have this same model as well and also didn't see the speedup, but did see a speed up on older nodes/models that were SSE enabled.

Tue 18 Mar 2014

Kurt


I'll have to clear it with powers that be tomorrow, but I should be able to provide you with a bam file or more since some of them are hapmap samples. We have an aspera license so it might be faster to download them through that mechanism once we put them up them up on that server.

Tue 18 Mar 2014

Carneiro


can any of you share the dataset you are working on so we can try to reproduce it here? In your logs, it seems like it didn't use the AVX version at all since the "Total compute time in PairHMM computeLikelihoods() : 0.0". Something must be wrong. I'm guessing it may have to do with the fact that this is an AMD machine and we haven't tested the platform identification on AMD (although it's supposed to be standardized...) : "Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64"

Tue 18 Mar 2014

Kurt


@Carneiro I'm waiting for my IT group to put up a few bam files up on our aspera server. In the meantime, I went back through the logs again and found that anytime -nct is used then the log always reflect "Total compute time in PairHMM computeLikelihoods() : 0.0". It appears this is also why that is in @adouble2 log. Anytime -nct is not specified (either when opting in -pairHMM VECTOR_LOGLESS_CACHING or not) that entry now is calculated and in placed into the log. However, I still do not see a speed up on the avx enable cpu models while on the SSE enabled models I did see a 20% increase. Currently in my pipeline I set both -nct and -pairHMM b/c I did not see a decrease in wall clock time and I figured that I might as well set them both if this (-nct + -pairHMM) becomes enabled in 3.2. Both the SSE and AVX enabled nodes come out as "Executing as ...kernel amd64" However, for the Xeon CPU E5-2670, I do not get "Using AVX accelerated implementation of PairHMM" like @adouble2 did in the log, but for my older models I do get "Using SSE4.1 accelerated implementation of PairHMM"

Tue 18 Mar 2014

Kurt


@Carneiro‌, @Geraldine_VdAuwera, Just sent you an email regarding the files. Well, apparently I had the wrong email address for Mauricio, but I cc'd Geraldine on it. Best Regards, Kurt

Tue 18 Mar 2014

kgururaj


@Kurt‌ Hi Kurt, Mauricio forwarded me the log messages from your runs. For the SSE run, it seems the library loaded and the vector code executed correctly (SSE_NODE_LOGS/SSE.log). However, for the AVX logs (AVX_NODE_LOGS/200459444@0123857183.HAPLOTYPE.CALLER.AVX.log), the library appears not to have loaded at all. The HaplotypeCaller falls back to Java mode, which is why you do not see any performance improvement (slower compared to SSE). I would suggest running the HaplotypeCaller with some debug messages printed to see why the library was not loaded. Can you try running it on the AVX system with the additional arguments: "-l DEBUG"? You do not need to run the full HaplotypeCaller, the library is loaded within the first two minutes and will print out information about whether the library was loaded or not. A quick check to see whether the library was loaded is to run: grep accelerated log_file Also, the time printed in the log files: Total compute time in PairHMM computeLikelihoods() : time_in_seconds is valid only when using a single thread (NO -nct option). From your logs (NOTHING.log), when only Java is used for PairHMM (no vectorization): Total compute time in PairHMM computeLikelihoods() : 5752.94 out of a total time of 18222.16. Thus, PairHMM consumed less than one-third of the total time. For SSE.log Total compute time in PairHMM computeLikelihoods() : 1722.0774865170001 out of a total time of 14287.14. Although the vectorized PairHMM kernel ran more than 3 times as fast as the Java kernel, the overall speedup is relatively small since the other parts of HaplotypeCaller ran at the same speed as before.

Tue 18 Mar 2014

Kurt


Sure thing Karthik, @kgururaj‌ , i will let u know sometime this weekend/early next week. Best, Kurt

Tue 18 Mar 2014

Kurt


@kgururaj‌, this is the only thing that I can see so far in regards to the library not being loaded. `DEBUG 07:29:45,885 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from StingUtils.jar DEBUG 07:29:45,890 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING `

Tue 18 Mar 2014

kgururaj


@Kurt‌ Thanks for the log - the native library failed to load which is why you do not see any speedup. A couple of checks: 1. Insane check, but just to be on the safe side, see whether the GATK jar file contains the native library: jar tf target/GenomeAnalysisTK.jar | grep libVector You should see: org/broadinstitute/sting/utils/pairhmm/libVectorLoglessPairHMM.so 2. At runtime, the AVX library file is unbundled from the GATK jar and loaded. While the HaplotypeCaller is running on the server, do you see a file "/tmp/libVectorLoglessPairHMM*.so" ? This file should be created by Java when it tries to load the library (assuming you have write permissions for the /tmp directory). The library file should be available after the log file shows the following warning message: "VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation" Note that Java deletes the library file when it terminates. So, you should check *while* the HaplotypeCaller is running and *after* the warning message listed above is seen in the log.

Tue 18 Mar 2014

croshong


While I'm using VectorLoglessPairHMM in HaplotypeCaller, I could see more than 2X speedup. but In the log file it does always emits warning like this WARN 06:18:21,666 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! Does it mean that VectorLoglessPariHMM is still under development and there is some possible danger in using this option?

Tue 18 Mar 2014

Sheila


@croshong‌ Hi, There is nothing to worry about now. The warning will be removed in the next version. -Sheila

Tue 18 Mar 2014

Kurt


I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

Tue 18 Mar 2014

Kurt


I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

Tue 18 Mar 2014

Kurt


I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

Tue 18 Mar 2014

Kurt


I have this same model and also didn't see any speed-up, but did on older nodes that had SSE. http://gatkforums.broadinstitute.org/discussion/3965/nct-settings-and-vector-logless-caching#latest

Tue 18 Mar 2014





At a glance



Follow us on Twitter

GATK Dev Team

@gatk_dev

@geoffjentry @PatriciaMBrent Indel realn gone now; much faster BQSR + Sparkified tools in GATK4a. And wider scope of application/ use cases.
9 Dec 16
@geoffjentry @PatriciaMBrent B fair, other slide says results similar. Not uncommon for benchmark tests. But speed comparison v outdated 1/2
9 Dec 16
RT @geoffjentry: Talk by Stavros Papadopoulos about TileDB from @intel - used by @BroadGenomics to power our joint genotyping https://t.co/
8 Dec 16
@iskander @NJL_NGS Hm. Can you please post this in the forum?
8 Dec 16
@iskander @NJL_NGS Treat them like any other read, no special handling. Problems arise in CIGAR functions that don’t know how to handle Ns.
7 Dec 16

Our favorite tweets from others

Currently in a time-out for saying that duck fat had a certain "je ne sais quack" at the thanksgiving dinner table.
25 Nov 16
@dgmacarthur @BioMickWatson @StevenNHart @splon There's even a shop near Broad that apparently fixes Hail code erro… https://t.co/IZ4BcgRZYE
19 Nov 16
I'm very happy to be at GATK Workshop hands on with @gatk_dev ! There's always something to learn. Tip: there is free coffee. Lol
8 Nov 16
Have recently begun to think of slide editing as ‘downsampling my slides’. I suspect this indicates something is wrong with me.
1 Nov 16
See more of our favorite tweets...
Search blog by tag

appistry ashg ashg16 benchmarks best-practices bug bug-fixed cancer cloud cluster cnv collaboration commandline commandlinegatk community compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 gatk4 genotype genotype-refinement genotypegvcfs google grch38 gvcf haploid haplotypecaller hg38 holiday hts htsjdk ibm intel java8 job job-offer jobs license meetings mendelianviolations mutect mutect2 ngs outreach pairhmm parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote release release-notes rnaseq runtime saas script search sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory troll tutorial unifiedgenotyper vcf-gz version-highlights versions vqsr wdl webinar workflow workshop