Difference between revisions of "MSigDB v7.2 Release Notes"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
m (Fixed minor version for GTRD update)
 
(15 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
</span>
 
</span>
  
This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes STUFF.
+
This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes a substantial reorganization of C5 to accommodate the addition of the Human Phenotype Ontology, the addition of gene sets from WikiPathways to C2:CP, and the promotion of SCSig to C8, among other minor updates and additions.
  
 
<b>Note:</b> Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.<br>
 
<b>Note:</b> Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.<br>
 
<b>Advisory</b>: It is strongly recommended that users of MSigDB 7.2 '''always''' use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than '''Ensembl v101/GENCODE v35'''.
 
<b>Advisory</b>: It is strongly recommended that users of MSigDB 7.2 '''always''' use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than '''Ensembl v101/GENCODE v35'''.
  
<h2>Changes to Collection Organization</h2>
+
<h2>New Additions and Changes to Collection Organization</h2>
  
<h2>Updates to Gene Sets by Collection</h2>
+
<h3>C2:CP:WikiPathways</h3>
 +
Begining in MSigDB 7.2, the WikiPathways analysis subset gene sets are now included as a canonical pathway subset in C2. This initial release reflects the WikiPathways September 2020 release.
 +
 
 +
<h3>C2:CGP</h3>
 +
60 gene sets have been curated from literature or contributed by users and are now available in C2:CGP.
 +
 
 +
36 of these gene sets derived from two publications (prefixed with "MANNE" and "BLANCO_MELO") are derived from research related to the ongoing COVID-19 global pandemic.
 +
 
 +
The remaining sets consist data contributed by the following individuals:
 +
<ul>
 +
<li>Francesca Buffa, University of Oxford - BUFFA_HYPOXIA_METAGENE Signature
 +
<li>Orlando Musso, INSERM (Institut National de la Santé et de la Recherche Médicale), France - 4 "DESERT" and 12 "MEBARKI" (12) Hepatocellular Carcinoma gene sets
 +
<li>Goodwin Jinesh, UT MD Anderson Cancer Center - 6"JINESH_BLEBBISHIELD"gene sets
 +
<li>Russell Ryan, University Of Michigan - "RYAN_MANTLE_CELL_LYMPHOMA_NOTCH_DIRECT_UP" gene set
 +
</ul>
 +
 
 +
<h3>C5 ontology</h3>
 +
C5 has been renamed from "C5 GO gene sets" to "C5: ontology gene sets". This change reflects the addition of a new sub-collection of gene sets from the [https://hpo.jax.org/ Human Phenotype Ontology project]. This initial release is categorized under C5:HPO and reflects the August 2020 release of the Human Phenotype Ontology. This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.
 +
 
 +
<h3>C8: cell type signature gene sets</h3>
 +
The previously supplemental release of [https://www.gsea-msigdb.org/gsea/msigdb/supplementary_genesets.jsp#SCSig gene sets for single cell identities] has been updated and promoted to a full MSigDB collection.
 +
 
 +
The new C8 differs from the previously released supplemental in the following ways:
 +
<ul>
 +
<li>Added 26 new gene sets from Durante et al. ''Single-cell analysis of olfactory neurogenesis and differentiation in adult humans''.
 +
<li>Added 25 new gene sets from Cui et al. ''Single-Cell Transcriptome Analysis Maps the Developmental Track of the Human Heart''.
 +
<li>Performed additional significance filtering for 35 gene sets from Hay et al. ''The Human Cell Atlas bone marrow single-cell interactive web portal''. This additional filtering resulted in the outright deletion of two sets, and the reduction of several more to below the MSigDB inclusion threshold.
 +
</ul>
 +
 
 +
<h2>Updates to Existing Gene Sets by Collection</h2>
  
 
<h3>C1 (positional gene sets)</h3>
 
<h3>C1 (positional gene sets)</h3>
C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38) (+5 gene sets). Gene annotations for this collection are derived from the ''Chromosome'' and ''Karyotype band'' tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.
+
C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38) (+0 gene set). Gene annotations for this collection are derived from the ''Chromosome'' and ''Karyotype band'' tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.
  
 
<h3>C2:CP:Reactome</h3>
 
<h3>C2:CP:Reactome</h3>
 
<ul>
 
<ul>
     <li>Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of '''Reactome v73''' (+209 gene sets).
+
     <li>Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of '''Reactome v73''' (+22 gene sets).
 
<li>As previously described in the [[MSigDB_v7.0_Release_Notes#C2:CP:Reactome_-_Major_overhaul | Reactome release notes for MSigDB 7.0]], in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
 
<li>As previously described in the [[MSigDB_v7.0_Release_Notes#C2:CP:Reactome_-_Major_overhaul | Reactome release notes for MSigDB 7.0]], in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
 
</ul>
 
</ul>
 +
 +
<h3>C3 regulatory target gene sets</h3>
 +
C3:GTRD has been updated to GTRD v20.04. A substantial addition of new content to the source database resulted in a substantial number of gene sets increasing in size over the MSigDB maximum size of inclusion threshold. This resulted in a net decrease in the size of the collection (-176 gene sets).
  
 
<h3>C5:GO (Gene Ontology)</h3>
 
<h3>C5:GO (Gene Ontology)</h3>
<p> Gene sets in this collection are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (<span class="plainlinks">[http://www.geneontology.org Nature Genet 2000]</span>). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-07-16 and NCBI GO annotations downloaded on 2020-07-30.</p>
+
<p> Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (<span class="plainlinks">[http://www.geneontology.org Nature Genet 2000]</span>). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-08-11 and NCBI gene2go annotations downloaded on 2020-09-03.</p>
 
<p>This collection is divided into three sub-collections:</p>
 
<p>This collection is divided into three sub-collections:</p>
 
<ul>
 
<ul>
     <li><strong>BP</strong>: GO Biological process (+564 gene sets). Gene sets derived from the Biological Process Ontology.</li>
+
     <li><strong>BP</strong>: GO Biological process (+43 gene sets). Gene sets derived from the Biological Process Ontology.</li>
     <li><strong>CC</strong>: GO Cellular component (+164 gene sets). Gene sets derived from the Cellular Component Ontology.</li>
+
     <li><strong>CC</strong>: GO Cellular component (+2 gene sets). Gene sets derived from the Cellular Component Ontology.</li>
     <li><strong>MF</strong>: GO Molecular function (+213 gene sets). Gene sets derived from the Molecular Function Ontology.</li>
+
     <li><strong>MF</strong>: GO Molecular function (+34 gene sets). Gene sets derived from the Molecular Function Ontology.</li>
 
</ul>
 
</ul>
 
<p>These updates were generated in accordance with the procedure described in the [[MSigDB_v7.0_Release_Notes#C5_.28Gene_Ontology_collection.29_-_Major_overhaul | GO release notes for MSigDB 7.0.]]
 
<p>These updates were generated in accordance with the procedure described in the [[MSigDB_v7.0_Release_Notes#C5_.28Gene_Ontology_collection.29_-_Major_overhaul | GO release notes for MSigDB 7.0.]]
Line 41: Line 73:
  
 
Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to <span class="plainlinks">[https://www.alliancegenome.org/ Alliance of Genome Resources] orthology database release 3.1.1.
 
Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to <span class="plainlinks">[https://www.alliancegenome.org/ Alliance of Genome Resources] orthology database release 3.1.1.
 +
 +
<h2>Addendum</h2>
 +
Hallmark founder gene sets in the MSigDB XML file have had their identifiers adjusted to reflect their internal "systematic name". This change enables more precise tracking of Hallmark founder gene sets across releases. Previously these gene sets were identified by their standard name as represented in the initial release of the MSigDB Hallmarks collection.

Latest revision as of 14:19, 16 February 2021

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes a substantial reorganization of C5 to accommodate the addition of the Human Phenotype Ontology, the addition of gene sets from WikiPathways to C2:CP, and the promotion of SCSig to C8, among other minor updates and additions.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.2 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v101/GENCODE v35.

New Additions and Changes to Collection Organization

C2:CP:WikiPathways

Begining in MSigDB 7.2, the WikiPathways analysis subset gene sets are now included as a canonical pathway subset in C2. This initial release reflects the WikiPathways September 2020 release.

C2:CGP

60 gene sets have been curated from literature or contributed by users and are now available in C2:CGP.

36 of these gene sets derived from two publications (prefixed with "MANNE" and "BLANCO_MELO") are derived from research related to the ongoing COVID-19 global pandemic.

The remaining sets consist data contributed by the following individuals:

  • Francesca Buffa, University of Oxford - BUFFA_HYPOXIA_METAGENE Signature
  • Orlando Musso, INSERM (Institut National de la Santé et de la Recherche Médicale), France - 4 "DESERT" and 12 "MEBARKI" (12) Hepatocellular Carcinoma gene sets
  • Goodwin Jinesh, UT MD Anderson Cancer Center - 6"JINESH_BLEBBISHIELD"gene sets
  • Russell Ryan, University Of Michigan - "RYAN_MANTLE_CELL_LYMPHOMA_NOTCH_DIRECT_UP" gene set

C5 ontology

C5 has been renamed from "C5 GO gene sets" to "C5: ontology gene sets". This change reflects the addition of a new sub-collection of gene sets from the Human Phenotype Ontology project. This initial release is categorized under C5:HPO and reflects the August 2020 release of the Human Phenotype Ontology. This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

C8: cell type signature gene sets

The previously supplemental release of gene sets for single cell identities has been updated and promoted to a full MSigDB collection.

The new C8 differs from the previously released supplemental in the following ways:

  • Added 26 new gene sets from Durante et al. Single-cell analysis of olfactory neurogenesis and differentiation in adult humans.
  • Added 25 new gene sets from Cui et al. Single-Cell Transcriptome Analysis Maps the Developmental Track of the Human Heart.
  • Performed additional significance filtering for 35 gene sets from Hay et al. The Human Cell Atlas bone marrow single-cell interactive web portal. This additional filtering resulted in the outright deletion of two sets, and the reduction of several more to below the MSigDB inclusion threshold.

Updates to Existing Gene Sets by Collection

C1 (positional gene sets)

C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38) (+0 gene set). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.

C2:CP:Reactome

  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v73 (+22 gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C3 regulatory target gene sets

C3:GTRD has been updated to GTRD v20.04. A substantial addition of new content to the source database resulted in a substantial number of gene sets increasing in size over the MSigDB maximum size of inclusion threshold. This resulted in a net decrease in the size of the collection (-176 gene sets).

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-08-11 and NCBI gene2go annotations downloaded on 2020-09-03.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+43 gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (+2 gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+34 gene sets). Gene sets derived from the Molecular Function Ontology.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

CHIP file updates

All CHIP files previously provided in the standard MSigDB 7.1 release have been updated for MSigDB 7.2 in accordance with previously described procedures.

Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.1.1.

Addendum

Hallmark founder gene sets in the MSigDB XML file have had their identifiers adjusted to reflect their internal "systematic name". This change enables more precise tracking of Hallmark founder gene sets across releases. Previously these gene sets were identified by their standard name as represented in the initial release of the MSigDB Hallmarks collection.