MSigDB Acknowledgements

From GeneSetEnrichmentAnalysisWiki
Revision as of 11:56, 25 April 2006 by Hkuehn (talk | contribs)
Jump to navigation Jump to search

The Molecular Signature Database (MSigDB) is a collection of gene sets maintained by the GSEA team. The team welcomes and appreciates contributions to this shared resource and encourages users to submit their gene sets to mailto:gsea@broad.mit.edu. The MSigDB contains four categories of genes sets. Our thanks to the contributors.

Category C1, Positional (chromosomal location): Contains 24 gene sets corresponding to the genes on each of the 24 human chromosomes, as well as 301 sets corresponding to cytogenetic bands. These gene sets can be helpful in identifying effects related to epigenetic silencing, dosage compensation, copy number polymorphisms, and aneuploidy or other chromosomal deletions/amplifications.

Category C2, Curated (functional): Contains gene sets of metabolic and signaling pathways gleaned from the following publicly available manually curated databases:

  1. BioCarta: http://www.biocarta.com/
  2. Signaling pathway database: http://www.grt.kyushu-u.ac.jp/spad/menu.html
  3. Signaling gateway: http://www.signaling-gateway.org/
  4. Signal transduction knowledge environment: http://stke.sciencemag.org/
  5. Human protein reference database: http://www.hprd.org/
  6. GenMAPP: http://www.genmapp.org/
  7. Gene ontology: http://www.geneontology.org/
  8. Sigmal Aldrich pathways: http://www.sigmaaldrich.com/Area_of_Interest/Biochemicals/Enzyme_Explorer/Key_Resources.html
  9. Gene arrays, BioScience corporation: http://www.superarray.com/
  10. Human cancer genome anatomy consortium: http://cgap.nci.nih.gov/
  11. L2L, John Newman and Alan Weiner, Department of Biochemistry, University of Washington: http://depts.washington.edu/l2l/

In addition, contains gene sets representing gene expression signatures of genetic and chemical perturbations that have been culled from experimental results in the literature.

Several colleagues have contributed to this effort, including:

Jean-Pierre Bourquin (Orkin lab)
Ben Ebert (Golub lab)
Yujin Hoshida (Golub lab)
Jean Junior (Student)
John Newman (L2L, Washington University)
Kate Stafford (MIT, UROP)
Lisa Sturla (Pomeroy lab)

Please see the annotation of a gene set in its GeneSetCard or the MSigDB Browser included in the GSEA java desktop application for the source/contributor of a gene set.

Database C3, Motif (motif-based): Each gene set in this category contains genes that lie downstream of a motif that is conserved across the human, mouse, rat, and dog genomes. The motifs are catalogued in Xie, et al, 2005 and Supplemental Information and represent known or likely regulatory elements in promoters and 3'-untranslated regions. Please cite Xie et al if using this database.

Database C4: Computed (correlational): Correlation gene sets are groups of genes defined by computationally mining large-scale experimental datasets for co-expressed genes built from the neighborhoods of ~400 cancer related genes (seeds).


Each gene set in the MSigDB is fully described by a Gene Set Card. For the notion of Gene Set Cards, our thanks to Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D.: GeneCards: encyclopedia for genes, proteins and diseases. Weizmann Institute of Science, Bioinformatics Unit and Genome Center (Rehovot, Israel), 1997. World Wide Web URL: http://www.genecards.org/.