Difference between revisions of "MSigDB v5.0 Release Notes"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
m
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
<a href="http://www.broadinstitute.org/gsea/">GSEA Home</a> | <a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a>  | <a href="http://www.broadinstitute.org/gsea/msigdb/">Molecular Signatures Database</a> | <a href="http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page">Documentation</a> | <a href="http://www.broadinstitute.org/gsea/contact.jsp">Contact</a><br>
+
[http://www.broadinstitute.org/gsea/ GSEA Home] |
 +
[http://www.broadinstitute.org/gsea/downloads.jsp Downloads] |  
 +
[http://www.broadinstitute.org/gsea/msigdb/ Molecular Signatures Database] |  
 +
[http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page Documentation] |
 +
[http://www.broadinstitute.org/gsea/contact.jsp Contact]<br>
 
<br />
 
<br />
 
<h2>New collection H: Hallmark signatures</h2>
 
<h2>New collection H: Hallmark signatures</h2>
Line 10: Line 14:
 
<h2>Updates to C2 collection</h2>
 
<h2>Updates to C2 collection</h2>
 
<h3>C2:CP Matrisome gene sets</h3>
 
<h3>C2:CP Matrisome gene sets</h3>
 +
<p>The CP (Canonical Pathways) sub-collection has <strong>10 new gene sets</strong> from the [http://matrisomeproject.mit.edu Matrisome Project]. The "matrisome" refers to the ensemble of genes encoding extracellular matrix (ECM) and ECM-associated proteins (as defined by [http://www.ncbi.nlm.nih.gov/pubmed/22159717 Naba and collaborators]). The Matrisome Project is a collaborative effort between the [http://hynes-lab.mit.edu laboratory of Richard Hynes] at MIT, researchers at the [http://ki.mit.edu/sbc/bioinformatics Barbara K. Ostrom (1978) Bioinformatics & Computing Facility] at the Koch Institute at MIT and the[http://www.broadinstitute.org  Broad Institute], pursuing extensive <i>in silica</i> and experimental characterization of ECM components.</p>
 +
 
<h3>Updates to C2:CGP collection</h3>
 
<h3>Updates to C2:CGP collection</h3>
<p>In response to requests from multiple users of our resource, we removed all 7 gene sets based on the publication in [http://www.ncbi.nlm.nih.gov/pubmed/17057710 Nat Med 2006] by Patti et. al, which has been retracted.</p>
+
<p>In response to requests from multiple users of our resource, we removed all 7 gene sets based on the publication in [http://www.ncbi.nlm.nih.gov/pubmed/17057710 Nat Med 2006] by Potti et. al, which has been retracted.</p>
 +
<p>Alerted by sharp-eyed users of MSigDB, we redefined four gene sets based on the publication in [http://www.ncbi.nlm.nih.gov/pubmed/20129251 Cancer Cell 2010] by Verhaak et al.</p>
 +
<p>At request of Dr. Durand, with have updated records of two gene sets he contributed earlier.</p>
 +
<p>Fixed errors in a number of other gene sets.</p>
  
 
<h2>Changes in the XML file format</h2>
 
<h2>Changes in the XML file format</h2>
 +
<p>To accommodate new features in the Hallmarks collection, we have introduced additional attributes for gene set description in the database XML format. The new attributes are:</p>
 +
<ul>
 +
<li>FOUNDER_NAMES = pipe ('|') separated list of v4.0 MSigDB ‘founder’ gene sets</li>
 +
<li>REFINEMENT_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures</li>
 +
<li>VALIDATION_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signature</li>
 +
</ul>
 +
 +
<p>For more information please refer to detailed description of the MSigDB XML file format [[MSigDB XML description|here]].</p>
 +
 
<h2> Viewing previous versions of MSigDB</h2>
 
<h2> Viewing previous versions of MSigDB</h2>
  
 
<p>Files from previous versions of MSigDB (v4.0, v3.1, v3.0, v2.5, v2.1 and v1.0) are archived and available at [http://www.broadinstitute.org/gsea/downloads.jsp Downloads]  page. You can view them through the MSigDB Browser tool in the GSEA desktop application.</p>
 
<p>Files from previous versions of MSigDB (v4.0, v3.1, v3.0, v2.5, v2.1 and v1.0) are archived and available at [http://www.broadinstitute.org/gsea/downloads.jsp Downloads]  page. You can view them through the MSigDB Browser tool in the GSEA desktop application.</p>

Latest revision as of 03:12, 25 September 2016

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

New collection H: Hallmark signatures

H: Hallmarks is a new collection of 50 sets. These gene sets represent specific well defined biological states or processes and display coherent expression. The hallmark gene sets were generated by a computational methodology based on identifying gene set overlaps and extracting coherent representatives of them. Details of the procedure will become available after the manuscript describing it is accepted for publication. The hallmark gene sets reduce noise and redundancy and provide a better biological space for GSEA and other gene set-based analyses of genomic data.

We envision this collection as the starting point for exploring MSigDB resource and GSEA. This collection is an initial release of 50 hallmarks which condense information from over 4,000 original overlapping gene sets from v4.0 MSigDB collections C1 through C6. We refer to the original gene sets as “founder” sets.

Hallmark gene set pages provide links to the corresponding founder sets for more in-depth exploration. In addition, hallmark gene set pages include links to microarray data that served for refining and validation of the hallmark signatures.

Updates to C2 collection

C2:CP Matrisome gene sets

The CP (Canonical Pathways) sub-collection has 10 new gene sets from the Matrisome Project. The "matrisome" refers to the ensemble of genes encoding extracellular matrix (ECM) and ECM-associated proteins (as defined by Naba and collaborators). The Matrisome Project is a collaborative effort between the laboratory of Richard Hynes at MIT, researchers at the Barbara K. Ostrom (1978) Bioinformatics & Computing Facility at the Koch Institute at MIT and theBroad Institute, pursuing extensive in silica and experimental characterization of ECM components.

Updates to C2:CGP collection

In response to requests from multiple users of our resource, we removed all 7 gene sets based on the publication in Nat Med 2006 by Potti et. al, which has been retracted.

Alerted by sharp-eyed users of MSigDB, we redefined four gene sets based on the publication in Cancer Cell 2010 by Verhaak et al.

At request of Dr. Durand, with have updated records of two gene sets he contributed earlier.

Fixed errors in a number of other gene sets.

Changes in the XML file format

To accommodate new features in the Hallmarks collection, we have introduced additional attributes for gene set description in the database XML format. The new attributes are:

  • FOUNDER_NAMES = pipe ('|') separated list of v4.0 MSigDB ‘founder’ gene sets
  • REFINEMENT_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures
  • VALIDATION_DATASETS = pipe ('|') separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signature

For more information please refer to detailed description of the MSigDB XML file format here.

Viewing previous versions of MSigDB

Files from previous versions of MSigDB (v4.0, v3.1, v3.0, v2.5, v2.1 and v1.0) are archived and available at Downloads page. You can view them through the MSigDB Browser tool in the GSEA desktop application.