Difference between revisions of "MSigDB v3.0 Release Notes"
m |
m |
||
Line 18: | Line 18: | ||
The C2 collection consists of gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts. Gene sets in this collection have been extensively revised and expanded by making an aggressive and comprehensive search through all articles published in selected, high-profile journals since 2006.<br /> | The C2 collection consists of gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts. Gene sets in this collection have been extensively revised and expanded by making an aggressive and comprehensive search through all articles published in selected, high-profile journals since 2006.<br /> | ||
<ul> | <ul> | ||
− | <li><strong>CGP</strong>: chemical and genetic perturbations (3,127 gene sets). See <a href=" | + | <li><strong>CGP</strong>: chemical and genetic perturbations (3,127 gene sets). See <a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Msigdb_mapping_v2.5_to_v3">this page</a> for information about MSigDB 2.5 gene sets that have been renamed, retired, recombined, or replaced in the MSigDB 3.0 release. These gene sets have been reviewed extensively, and during the reviewing process, we have applied changes to many existing gene sets and their contents, as follows: |
<ul> | <ul> | ||
<li>added exact source of the gene set (e.g., Table 1)</li> | <li>added exact source of the gene set (e.g., Table 1)</li> |
Revision as of 15:23, 30 July 2010
<a href="http://www.broadinstitute.org/gsea/">GSEA Home</a> | <a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a> | <a href="http://www.broadinstitute.org/gsea/msigdb/">Molecular Signatures Database</a> | Documentation | <a href="http://www.broadinstitute.org/gsea/contact.jsp">Contact</a>
Major changes in Release 3.0 of the Molecular Signatures Database (MSigDB) include the following:
- The gene sets have been updated; in particular, the C2 gene set collection was extensively reviewed, revised, and expanded
- The XML format of the database has changed
- Entrez IDs are now supported
- A bug in the Compute Overlays algorithm has been corrected
- MSigDB v2.5 files archived
Contents
Gene Sets Update
The following describes the changes made to the gene set collections for MSigDB v3.0.
C1: Positional gene sets
No changes were made in the C1 gene sets. For a description of this collection, see the <a href="http://www.broad.mit.edu/gsea/msigdb/collections.jsp">Browse Collections</a> page.
C2: Curated gene sets (+2,075)
The C2 collection consists of gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts. Gene sets in this collection have been extensively revised and expanded by making an aggressive and comprehensive search through all articles published in selected, high-profile journals since 2006.
- CGP: chemical and genetic perturbations (3,127 gene sets). See <a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Msigdb_mapping_v2.5_to_v3">this page</a> for information about MSigDB 2.5 gene sets that have been renamed, retired, recombined, or replaced in the MSigDB 3.0 release. These gene sets have been reviewed extensively, and during the reviewing process, we have applied changes to many existing gene sets and their contents, as follows:
- added exact source of the gene set (e.g., Table 1)
- added GEO or ArrayExpress ID when available
- changed the brief description of the gene set; added links to human Entrez Gene entries and PubChem Compound entries as appropriate
- used the original gene identifiers as reported in the source paper (not all gene sets did this originally)
- verified the gene set contents and made corrections when necessary resolved cases of redundant gene sets
- CP: canonical pathways (840 gene sets). We have replaced all gene sets in this collection with the most up-to-date sets from BioCarta, KEGG, and Reactome. We retrieved human pathways from the KEGG and BioCarta websites, and Reactome contributed their pathways in collaboration with MSigDB. We applied the following filters to this data:
- Source priority: KEGG > Reactome > BioCarta
- Size priority: keep the set with the smaller size
- Name length priority: keep the set with the shorter name
- External ID priority: keep the set with the smaller ID
Note that all the gene set names for C2 have changed. Many of the names used in v2.5 were confusing or wrong, so these have been clarified or corrected. For CGP, the new naming convention is that all gene set names begin with the name of the first author of the source paper.
C3: Motif gene sets
No changes were made in the C3 gene sets. For a description of this collection, see the <a href="http://www.broad.mit.edu/gsea/msigdb/collections.jsp">Browse Collections</a> page.
C4: Computational gene sets
No changes were made in the C4 gene sets. For a description of this collection, see the <a href="http://www.broad.mit.edu/gsea/msigdb/collections.jsp">Browse Collections</a> page.
C5: Gene Ontology gene sets
No changes were made in the C5 gene sets. For a description of this collection, see the <a href="http://www.broad.mit.edu/gsea/msigdb/collections.jsp">Browse Collections</a> page.
For more information
For complete descriptions of all collections or to download the updated gene sets, go to the <a href="http://www.broad.mit.edu/gsea/msigdb/collections.jsp">Browse Collections</a> page.
Other Updates
XML Format Changes
The XML format has changed. See this page for more information.
Entrez IDs Now Supported
All gene sets now have Entrez IDs as well as human gene symbols, and alternate GMT files are included on the <a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a> page. In addition, we have added a new CHIP file that maps Entrez IDs to human gene symbols. This means that data files analyzed in GSEA can now use Entrez IDs.
Compute Overlays Error Corrected
A user-reported bug in the Compute Overlays algorithm has been corrected, improving the quality of the P values.
MSigDB v2.5 Files
The MSigDB v2.5 files are archived and are still available for download on the <a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a> page