This collection is an initial release of 50 hallmarks which condense information from over 4,000 original overlapping gene sets from v4.0 MSigDB collections C1 through C6. We refer to the original gene sets as "founder" sets.
Hallmark gene set pages provide links to the corresponding founder sets for more in-depth exploration. In addition, hallmark gene set pages include links to microarray data that served for refining and validation of the hallmark signatures.
To cite your use of the collection, and for further information, please refer to Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015 Dec 23;1(6):417-425.
Each GO term belongs to one of the three ontologies: molecular function (MF), cellular component (CC) or biological process (BP). A gene product might be associated with or located in one or more molecular functions. Each ontology captures a unique aspect of the gene product.
A GO annotation consists of a GO term associated with a specific reference that describes the work or analysis upon which the association between a specific GO term and gene product is based. Each annotation must also include an evidence code to indicate how the annotation to a particular term is supported (http://geneontology.org/page/guide-go-evidence-codes).
GO gene sets for very broad categories, such as Biological Process, have been omitted. GO sets with fewer than 10 genes (NCBI Entrez Gene IDs) have also been omitted. We defined sets as "highly similar" if their Jaccard's coefficient was > 0.85. For each pair of the highly similar sets, we kept the largest set and repeated the procedure until all such pairs were resolved.
We first capture relevant microarray datasets published in the immunology literature that have raw data deposited to Gene Expression Omnibus (GEO). For each published study, the relevant comparisons are identified (e.g. WT vs. KO; pre- vs. post-treatment etc.) and brief, biologically meaningful descriptions are created. All data is processed and normalized the same way to identify the gene sets, which correspond to the top or bottom genes (FDR < 0.25 or maximum of 200 genes) ranked by mutual information for each assigned comparison.
The immunologic signatures collection was generated as part of our collaboration with the Haining Lab at Dana-Farber Cancer Institute and the Human Immunology Project Consortium (HIPC). To cite your use of the collection, and for further information, please refer to Godec J, Tan Y, Liberzon A, Tamayo P, Bhattacharya S, Butte A, Mesirov JP, Haining WN, Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation, 2016, Immunity 44(1), 194-206.