Collection Details

C2 collection details

Gene sets in this collection come from such sources as:

  • Online pathway databases: Gene sets representing metabolic and signaling pathways are imported from the online pathway databases listed here.
  • Biomedical literature: Over the past few years, microarray studies have identified signatures of several important biological and clinical states (e.g. cancer metastasis, stem cell characteristics, drug resistance). This collection makes many of these signatures, originally published as tables in a paper, available as gene sets. To do this, we compiled a list of microarray articles with published gene expression signatures and, from each article, extracted one or more gene sets from tables in the main text or supplementary information. Currently, this collection includes gene sets from more than 340 PubMed articles. We are working to create a more automated method of curating gene sets from the literature.
  • L2L: Gene sets compiled from published mammalian microarray studies (Newman and Weiner, Genome Biology 2005, 6(9):R81).
  • MYC Target Gene Database: gene sets curated by Dr. Chi Dang from the MYC Target Gene Database at Johns Hopkins University School of Medicine.

C2: CP collection details

The pathway gene sets are curated from the following online databases:

Name URL/Reference
Pathway Interaction Database
Signaling Gateway
Signal Transduction KE

C4: CGN collection details

This collection is identical to that previously reported in (Subramanian, Tamayo et al. 2005).

Starting with a curated list of 380 cancer-associated genes (Brentani, Caballero et al. 2003, Proc. Natl. Acad. Sci. USA 100, 13418-13423), the authors (Subramanian, Tamayo et al. 2005) mined 4 expression compendia datasets for correlated gene sets. Gene neighborhoods with <25 genes at a Pearson correlation threshold of 0.8 were omitted yielding 427 sets.

  • Human tissue compendium (Novartis): Gene expression profiles from the Novartis normal tissue compendium, as published in Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004) Proc. Natl. Acad. Sci. USA 101, 6062-6067.
  • Global Cancer Map (Broad Institute): Gene expression profiles from the global cancer map, as published in Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., et al. (2001) Proc. Natl. Acad. Sci. USA 98, 15149-15154.
  • NCI-60 cell lines (National Cancer Institute): Gene expression profiles from the NCI 60 data set downloaded from the Developmental Therapeutics Program web site ( No preprocessing was done other than collapsing probe IDs to gene symbols.
  • Novartis carcinoma compendium (Novartis): Gene expression profiles from the Novartis normal tissue compendium, as published in Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., Schultz, P. G., Powell, S. M., Moskaluk, C. A., Frierson, H. F., Jr., et al. (2001) Cancer Res. 61, 7388-7393.
From the Investigate Gene Sets page, one may use the first three compendia to display gene expression profiles for selected gene sets.

C5 collection details

Gene sets in this collection are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genet. (2000) 25: 25-29 ( The gene sets are based on GO terms (gene_ontology_edit.obo, downloaded 1/25/2008) and their associations to human genes (gene2go, downloaded 1/22/2008).

Each GO term belongs to one of the three ontologies: molecular function (MF), cellular component (CC) or biological process (BP). A gene product might be associated with or located in one or more cellular components. It is active in one or more biological processes, during which it performs one or more molecular functions. Each ontology captures a unique aspect of the gene product.

A GO annotation consists of a GO term associated with a specific reference that describes the work or analysis upon which the association between a specific GO term and gene product is based. Each annotation must also include an evidence code to indicate how the annotation to a particular term is supported ( Only associations with the following evidence codes are included in MSigDB gene sets: IDA IPI, IMP IGI, IEP ISS, TAS.

GO gene sets for very broad categories, such as Biological Process, have been omitted from MSigDB. GO gene sets with fewer than 10 genes have also been omitted. Gene sets with the same members have been resolved based on the GO tree structure: if a parent term has only one child term and their gene sets have the same members, the child gene set is omitted; if the gene sets of sibling terms have the same members, the sibling gene sets are omitted.

C7 collection details

Immunologic signatures collection (also called ImmuneSigDB) is composed of gene sets that represent cell types, states, and perturbations within the immune system. The signatures were generated by manual curation of published studies in human and mouse immunology.

We first capture relevant microarray datasets published in the immunology literature that have raw data deposited to Gene Expression Omnibus (GEO). For each published study, the relevant comparisons are identified (e.g. WT vs. KO; pre- vs. post-treatment etc.) and brief, biologically meaningful descriptions are created. All data is processed and normalized the same way to identify the gene sets, which correspond to the top or bottom genes (FDR < 0.25 or maximum of 200 genes) ranked by mutual information for each assigned comparison.

The immunologic signatures collection was generated as part of our collaboration with the Haining Lab at Dana-Farber Cancer Institute and the Human Immunology Project Consortium (HIPC). To cite your use of the collection, and for further information, please refer to (Godec J, Tan Y, Liberzon A, Tamayo P, Bhattacharya S, Butte A, Mesirov JP, Haining WN, Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation, Immunity (2016): published online 12 Jan 2016.)