MSigDB v5.0 Release Notes

From GeneSetEnrichmentAnalysisWiki
Revision as of 22:37, 16 March 2015 by Liberzon (talk | contribs)
Jump to navigation Jump to search

<a href="http://www.broadinstitute.org/gsea/">GSEA Home</a> | <a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a> | <a href="http://www.broadinstitute.org/gsea/msigdb/">Molecular Signatures Database</a> | <a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page">Documentation</a> | <a href="http://www.broadinstitute.org/gsea/contact.jsp">Contact</a>

New collection H: Hallmark signatures

H: Hallmarks is a new collection of 50 sets. These gene sets represent specific well defined biological states or processes and display coherent expression. The hallmark gene sets were generated by a computational methodology based on identifying gene set overlaps and extracting coherent representatives of them. Details of the procedure will become available after the manuscript describing it is accepted for publication. The hallmark gene sets reduce noise and redundancy and provide a better biological space for GSEA and other gene set-based analyses of genomic data.

We envision this collection as the starting point for exploring MSigDB resource and GSEA. This collection is an initial release of 50 hallmarks which condense information from over 4,000 original overlapping gene sets from v4.0 MSigDB collections C1 through C6. We refer to the original gene sets as “founder” sets.

Hallmark gene set pages provide links to the corresponding founder sets for more in-depth exploration. In addition, hallmark gene set pages include links to microarray data that served for refining and validation of the hallmark signatures.

Updates to C2 collection

C2:CP Matrisome gene sets

The CP (Canonical Pathways) sub-collection has 10 new gene sets from the Matrisome Project. The "matrisome" refers to the ensemble of genes encoding extracellular matrix (ECM) and ECM-associated proteins (as defined by Naba and collaborators). The Matrisome Project is a collaborative effort between the laboratory of Richard Hynes at MIT, researchers at the Barbara K. Ostrom (1978) Bioinformatics & Computing Facility at the Koch Institute at MIT and theBroad Institute, pursuing extensive in silica and experimental characterization of ECM components.

Updates to C2:CGP collection

In response to requests from multiple users of our resource, we removed all 7 gene sets based on the publication in Nat Med 2006 by Patti et. al, which has been retracted.

Alerted by sharp-eyed users of MSigDB, we redefined four gene sets based on the publication in Cancer Cell 2010 by Verhaak et al.

At request of Dr. Durand, with have updated records of two gene sets he contributed earlier.

Fixed errors in a number of other gene sets.

All these changes are documented here.

Changes in the XML file format

To accommodate new features in the Hallmarks collection, we have introduced additional attributes for gene set description in the database XML format. The new attributes are:

  • FOUNDER_NAMES = pipe ('|') separated list of v4.0 MSigDB ‘founder’ gene sets
  • REFINEMENT_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures
  • VALIDATION_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signature

For more information please refer to detailed description of the MSigDB XML file format here.

Viewing previous versions of MSigDB

Files from previous versions of MSigDB (v4.0, v3.1, v3.0, v2.5, v2.1 and v1.0) are archived and available at Downloads page. You can view them through the MSigDB Browser tool in the GSEA desktop application.