Resource Bundle


The GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK. We provide several versions of the bundle corresponding to the various reference builds, but be aware that we no longer actively support very old versions (b36/hg18). In addition, we are currently transitioning to support the Grch38/hg38 reference build, but we have not yet generated all of the files necessary for all use cases (in particular we are still missing the Hg38 version of the Broad's exome intervals).

As of August 2016, the human genome reference builds we support actively are the following:

  • For Best Practices short variant discovery in WGS (uBam to GVCF): Grch38/hg38 and b37/hg19
  • For Best Practices short variant discovery in exome and other targeted sequencing: b37/hg19

Please see this article for further details on the content of this resource bundle.


Where the Bundle lives

The resource bundle is hosted on two different platforms: an FTP server and a Google Cloud bucket.

  • The FTP server is intended for people who wish to download the files to run on them locally. It can be accessed easily as indicated below. Its downsides are that it is local to Broad (no mirrors), has tight limits on concurrent downloads, and users in some countries have reported difficulties accessing it due to e.g. firewalls.
  • The Google Cloud bucket is intended primarily for people who plan to run analyses on the Google Cloud, and can therefore call to the resource files directly using the bucket paths, without needing to copy or download the files first. The files can also be downloaded directly from the cloud, providing a helpful alternative to those who experience difficulties with the FTP. Note that the bucket currently only contains Grch38/Hg38 resources for WGS analysis as explained in this article.

Google Cloud bucket

The bucket can be accessed using a regular web browser at the location shown below. It does require a valid Google account, which can be obtained for free from Google.

https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/


FTP Server Access

To access the bundle on the FTP server, use the following login credentials in your favorite FTP client:

	location: ftp.broadinstitute.org/bundle
	username: gsapubftp-anonymous
	password:

If you use your browser as FTP client, make sure to include the login information in the address, otherwise you will access the general Broad Institute FTP instead of our team FTP. This should work as a direct link:

ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/

The bundle/ directory contains five subdirectories, one for each build of the human genome that we have resources for: b36, b37, hg18, hg19 and hg38 (aka GRCh38). Be aware that the hg38 resource set is provided as-is, and its contents may still be incomplete.