Difference between revisions of "MSigDB v7.5.1 Release Notes"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
(Copy content from 7.4 Release notes to seed page)
 
(10 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
</span>
 
</span>
  
This page describes the changes made to the gene set collections for Release 7.4 of the Molecular Signatures Database (MSigDB). This release contains updates to GO and Reactome, as well as a bugfix for certain sets in C8 introduced with MSigDB 7.3 that contained errors to their gene members.
+
This page describes the changes made to the gene set collections for Release 7.5.x of the Molecular Signatures Database (MSigDB). This release contains updates to: C1, GO, HPO, and Reactome, as well as the addition of curated sets to C8 and user submitted sets to C2:CGP. This update incorporates the removal of clone based gene IDs introduced in Ensembl 104.  
 +
 
  
 
<b>Note:</b> Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.<br>
 
<b>Note:</b> Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.<br>
<b>Advisory</b>: It is strongly recommended that users of MSigDB 7.4 '''always''' use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than '''Ensembl v103/GENCODE v37'''.
+
<b>Advisory</b>: It is strongly recommended that users of MSigDB 7.5 '''always''' use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than '''Ensembl v105/GENCODE v39'''.
 +
 
 +
<h1>Specific Updates in the MSigDB 7.5.1 Patch Release</h1>
 +
<h3>C2:CGP and C2:CP:WikiPathways</h3>
 +
<ul>
 +
    <li>SATOH_COLORECTAL_CANCER_MYC_DN contributed by Rintaro Saito, Institute for Advanced Biosciences, Keio University (+1 gene set) from [https://pubmed.ncbi.nlm.nih.gov/28847964 (PMID:28847964)] was inadvertently omitted from MSigDB 7.5, this has been corrected</li>
 +
    <li>In C2:CP:WikiPathways duplicate WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY gene sets have been disambiguated by using the specific WikiPathways ID number as a suffix; the two sets are now indicated as <span class="plainlinks">[https://www.gsea-msigdb.org/gsea/msigdb/cards/WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2586.html WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2586]</span> and <span class="plainlinks">[https://www.gsea-msigdb.org/gsea/msigdb/cards/WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2873 WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2873]</span></li>
 +
    <li>In C2:CP:WikiPathways duplicate WP_HEDGEHOG_SIGNALING_PATHWAYgene sets have been disambiguated by using the specific WikiPathways ID number as a suffix; the two sets are now indicated as <span class="plainlinks">[https://www.gsea-msigdb.org/gsea/msigdb/cards/WP_HEDGEHOG_SIGNALING_PATHWAY_WP4249.html WP_HEDGEHOG_SIGNALING_PATHWAY_WP4249]</span> and <span class="plainlinks">[https://www.gsea-msigdb.org/gsea/msigdb/cards/WP_HEDGEHOG_SIGNALING_PATHWAY_WP47 WP_HEDGEHOG_SIGNALING_PATHWAY_WP47]</span></li>
 +
    <li>MSigDB version numbers in CHIP files and other collection GMTs have also been incremented to MSigDB 7.5.1.
 +
</ul>
 +
 
 +
<h3>Known Issues</h3>
 +
<ul><li>The gene set descriptions in C1 incorrectly state that the gene positional information was taken from Ensembl 103. This is stated correctly in the "Version history" field to be from Ensembl 105</li></ul>
 +
<br>
 +
<h1>MSigDB 7.5 Initial Release</h1>
 +
 
 +
<h2>Updates to Collections</h2>
  
<h2>Updates to Existing Gene Sets by Collection</h2>
+
<h3>C1</h3>
 +
Updated human gene annotations to Ensembl 105 (+21 gene sets).
 +
<h3>C2:CGP</h3>
 +
Gene sets contributed by the following individuals have been added to C2:CGP
 +
<ul>
 +
    <li>NRF response gene sets contributed by Lara Ibrahim, The Scripps Research Institute (+6 gene sets) from [https://pubmed.ncbi.nlm.nih.gov/33096892/ (PMID:33096892)]</li>
 +
    <li>Gene sets describing the epithelial-mesenchymal transition (EMT) upon transforming growth factor beta (TGFb) stimulation contributed by Dharmesh D. Bhuva, Walter and Eliza Hall Institute of Medical Research (+6 gene sets) from [https://pubmed.ncbi.nlm.nih.gov/28119430 (PMID:28119430)]</li>
 +
    <li>SEAVEY_EPITHELIOID_HEMANGIOENDOTHELIOMA contributed by Caleb Seavey, Cleveland Clinic Foundation (+1 gene set) from [https://pubmed.ncbi.nlm.nih.gov/33766982 (PMID:33766982)]</li>
 +
    <li>SATOH_COLORECTAL_CANCER_MYC_UP contributed by Rintaro Saito, Institute for Advanced Biosciences, Keio University (+1 gene set) from [https://pubmed.ncbi.nlm.nih.gov/28847964 (PMID:28847964)]</li>
 +
    <li>GLASS_IGF2BP1_CLIP_TARGETS_KNOCKDOWN_DN contributed by Markus Gla&szlig;, Martin Luther University Halle-Wittenberg (+1 gene set) from [https://pubmed.ncbi.nlm.nih.gov/33829040 (PMID:33829040)]</li>
 +
 
 +
</ul>
  
 
<h3>C2:CP:Reactome</h3>
 
<h3>C2:CP:Reactome</h3>
 +
 
<ul>
 
<ul>
     <li>Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of '''Reactome v76''' (+35 gene sets).
+
     <li>Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of '''Reactome v78''' (+11 gene sets).</li>
<li>As previously described in the [[MSigDB_v7.0_Release_Notes#C2:CP:Reactome_-_Major_overhaul | Reactome release notes for MSigDB 7.0]], in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
+
    <li>As previously described in the [[MSigDB_v7.0_Release_Notes#C2:CP:Reactome_-_Major_overhaul | Reactome release notes for MSigDB 7.0]], in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.</li>
 
</ul>
 
</ul>
 +
 +
<h3>C2:CP:WikiPathways</h3>
 +
WikiPathways gene sets have been updated to the January 10, 2022 release (+47 gene sets).
 +
 +
<h3>C3:TFT:GTRD</h3>
 +
As a result of Ensembl gene annotations, 5 gene sets were removed from GTRD as they no fall below the the maximum number genes threshold (<2000 genes). (-5)
  
 
<h3>C5:GO (Gene Ontology)</h3>
 
<h3>C5:GO (Gene Ontology)</h3>
<p> Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (<span class="plainlinks">[http://www.geneontology.org Nature Genet 2000]</span>). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-02-01 and NCBI gene2go annotations downloaded on 2021-03-30.</p>
+
<p> Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (<span class="plainlinks">[http://www.geneontology.org Nature Genet 2000]</span>). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-12-15 and NCBI gene2go annotations downloaded on 2022-01-03.</p>
 
<p>This collection is divided into three sub-collections:</p>
 
<p>This collection is divided into three sub-collections:</p>
 
<ul>
 
<ul>
     <li><strong>BP</strong>: GO Biological process (+2 gene sets). Gene sets derived from the Biological Process Ontology.</li>
+
     <li><strong>BP</strong>: GO Biological process (+177 gene sets). Gene sets derived from the Biological Process Ontology.</li>
     <li><strong>CC</strong>: GO Cellular component (+0 gene sets). Gene sets derived from the Cellular Component Ontology.</li>
+
     <li><strong>CC</strong>: GO Cellular component (+10 gene sets). Gene sets derived from the Cellular Component Ontology.</li>
     <li><strong>MF</strong>: GO Molecular function (+0 gene sets). Gene sets derived from the Molecular Function Ontology.</li>
+
     <li><strong>MF</strong>: GO Molecular function (+30 gene sets). Gene sets derived from the Molecular Function Ontology.</li>
 
</ul>
 
</ul>
  
Gene sets in GO sub-collection prior to MSigDB 7.3 had the universal prefix "GO_", this prefix has been updated to be sub-collection specific. As of MSigDB 7.3 gene sets in GO:BP now begin with "GOBP_", GO:CC now begin with "GOCC_", and GO:MF now begin with "GOMF_". This change should enable better "at a glance" determinations of which GO sub-collection was the origin of a specific gene set hit in analysis pipelines.
+
<p>These updates were generated in accordance with the procedure described in the [[MSigDB_v7.0_Release_Notes#C5_.28Gene_Ontology_collection.29_-_Major_overhaul | GO release notes for MSigDB 7.0.]]
  
<p>These updates were generated in accordance with the procedure described in the [[MSigDB_v7.0_Release_Notes#C5_.28Gene_Ontology_collection.29_-_Major_overhaul | GO release notes for MSigDB 7.0.]]
+
<h3>C5:HPO (Human Phenotype Ontology)</h3>
 +
 
 +
Gene sets in this sub-collection have been updated to reflect the 2021-10-10 release of the Human Phenotype Ontology database (+258 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.
  
 
<h3>C8: cell type signature gene sets</h3>
 
<h3>C8: cell type signature gene sets</h3>
 
+
Added eye gene sets from <span class="plainlinks">[https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC8478974/ Gautam and Hamashima et al. 2021. Multi-species single-cell transcriptomic analysis of ocular compartment regulons.]</span> (+29 gene sets)
The 77 Global cell type gene sets with the prefix designation "DESCARTES_MAIN_FETAL_" from the [https://descartes.brotmanbaty.org Descartes database] Human Gene Expression During Development atlas [https://pubmed.ncbi.nlm.nih.gov/33184181 (Cao et al. PMID33184181)] have had their gene set members replaced. This resulted in the deletion of two sets that no longer pass the minimum gene member threshold. The prior version of these sets had been erroneously compiled  from the same sources as the tissue-specific cell type sets by merging the 172 Tissue specific cell types across tissues, and did not properly represent the combined-tissue level differential expression calculations.
 
  
 
<h3>CHIP file updates</h3>
 
<h3>CHIP file updates</h3>
 
+
<ul>
All CHIP files previously provided in the standard MSigDB 7.3 release have been updated for MSigDB 7.4 in accordance with previously described procedures. These CHIPs contain updated gene ID lists from NCBI, HGNC, MGI and RGD but do not change the target Ensembl version.
+
    <li>MSigDB 7.5 gene annotations and gene mapping CHIP files have been updated to data from Ensembl 105.</li>
 
+
    <li>MSigDB 7.5 includes new handling for deprecated Ensembl Gene IDs thanks to work by <span class="plainlinks">[https://github.com/dhimmel Daniel Himmelstein]</span>. Briefly, historical IDs that map uniquely to one "newest" ensembl gene ID were extracted from the "old_to_newest.tsv" file from the respective Human, Mouse, and Rat repositories generated for Ensembl 105 <span class="plainlinks">[https://github.com/related-sciences/ensembl-genes (see: Github repository)]</span> and then merged into the species specific Ensembl_Gene_ID chip file.</li>
Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to <span class="plainlinks">[https://www.alliancegenome.org/ Alliance of Genome Resources] orthology database release 4.0.0.
+
    <li>Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to <span class="plainlinks">[https://www.alliancegenome.org/ Alliance of Genome Resources]</span> orthology database release 4.2.</li>
 +
    <li><b>Warning:</b> Rat Microarray derived annotations. Ensembl 105 brought a major update to the rat genome assembly transitioning from the deprecated Rnor_6.0 assembly to the modern mRatBN7.2 assembly. However, Ensembl has not yet released updated microarray probe mappings for the mRatBN7.2 assembly. In order to continue to provide CHIP files and internal mappings in MSigDB for experiments derived from these platforms, we have carried forward the historical Probe-to-Gene mappings from Ensembl 103/MSigDB v7.4 and remapped the target genes to the current assembly using the Ensembl_Gene_ID_MSigDB.v7.5 chip. However, until Ensembl releases updated probe to gene mappings derived directly from the mRatBN7.2 assembly the quality of MSigDB's rat microarray chip files may be impacted. Rat CHIP files affected by this have recieved the suffix the temporary suffix <b>_REMAPPED</b> after the MSigDB version number.</li>
 +
</ul>

Revision as of 19:10, 17 March 2022

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.5.x of the Molecular Signatures Database (MSigDB). This release contains updates to: C1, GO, HPO, and Reactome, as well as the addition of curated sets to C8 and user submitted sets to C2:CGP. This update incorporates the removal of clone based gene IDs introduced in Ensembl 104.


Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.5 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v105/GENCODE v39.

Specific Updates in the MSigDB 7.5.1 Patch Release

C2:CGP and C2:CP:WikiPathways

  • SATOH_COLORECTAL_CANCER_MYC_DN contributed by Rintaro Saito, Institute for Advanced Biosciences, Keio University (+1 gene set) from (PMID:28847964) was inadvertently omitted from MSigDB 7.5, this has been corrected
  • In C2:CP:WikiPathways duplicate WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY gene sets have been disambiguated by using the specific WikiPathways ID number as a suffix; the two sets are now indicated as WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2586 and WP_ARYL_HYDROCARBON_RECEPTOR_PATHWAY_WP2873
  • In C2:CP:WikiPathways duplicate WP_HEDGEHOG_SIGNALING_PATHWAYgene sets have been disambiguated by using the specific WikiPathways ID number as a suffix; the two sets are now indicated as WP_HEDGEHOG_SIGNALING_PATHWAY_WP4249 and WP_HEDGEHOG_SIGNALING_PATHWAY_WP47
  • MSigDB version numbers in CHIP files and other collection GMTs have also been incremented to MSigDB 7.5.1.

Known Issues

  • The gene set descriptions in C1 incorrectly state that the gene positional information was taken from Ensembl 103. This is stated correctly in the "Version history" field to be from Ensembl 105


MSigDB 7.5 Initial Release

Updates to Collections

C1

Updated human gene annotations to Ensembl 105 (+21 gene sets).

C2:CGP

Gene sets contributed by the following individuals have been added to C2:CGP

  • NRF response gene sets contributed by Lara Ibrahim, The Scripps Research Institute (+6 gene sets) from (PMID:33096892)
  • Gene sets describing the epithelial-mesenchymal transition (EMT) upon transforming growth factor beta (TGFb) stimulation contributed by Dharmesh D. Bhuva, Walter and Eliza Hall Institute of Medical Research (+6 gene sets) from (PMID:28119430)
  • SEAVEY_EPITHELIOID_HEMANGIOENDOTHELIOMA contributed by Caleb Seavey, Cleveland Clinic Foundation (+1 gene set) from (PMID:33766982)
  • SATOH_COLORECTAL_CANCER_MYC_UP contributed by Rintaro Saito, Institute for Advanced Biosciences, Keio University (+1 gene set) from (PMID:28847964)
  • GLASS_IGF2BP1_CLIP_TARGETS_KNOCKDOWN_DN contributed by Markus Glaß, Martin Luther University Halle-Wittenberg (+1 gene set) from (PMID:33829040)

C2:CP:Reactome

  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v78 (+11 gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C2:CP:WikiPathways

WikiPathways gene sets have been updated to the January 10, 2022 release (+47 gene sets).

C3:TFT:GTRD

As a result of Ensembl gene annotations, 5 gene sets were removed from GTRD as they no fall below the the maximum number genes threshold (<2000 genes). (-5)

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-12-15 and NCBI gene2go annotations downloaded on 2022-01-03.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+177 gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (+10 gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+30 gene sets). Gene sets derived from the Molecular Function Ontology.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)

Gene sets in this sub-collection have been updated to reflect the 2021-10-10 release of the Human Phenotype Ontology database (+258 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

C8: cell type signature gene sets

Added eye gene sets from Gautam and Hamashima et al. 2021. Multi-species single-cell transcriptomic analysis of ocular compartment regulons. (+29 gene sets)

CHIP file updates

  • MSigDB 7.5 gene annotations and gene mapping CHIP files have been updated to data from Ensembl 105.
  • MSigDB 7.5 includes new handling for deprecated Ensembl Gene IDs thanks to work by Daniel Himmelstein. Briefly, historical IDs that map uniquely to one "newest" ensembl gene ID were extracted from the "old_to_newest.tsv" file from the respective Human, Mouse, and Rat repositories generated for Ensembl 105 (see: Github repository) and then merged into the species specific Ensembl_Gene_ID chip file.
  • Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 4.2.
  • Warning: Rat Microarray derived annotations. Ensembl 105 brought a major update to the rat genome assembly transitioning from the deprecated Rnor_6.0 assembly to the modern mRatBN7.2 assembly. However, Ensembl has not yet released updated microarray probe mappings for the mRatBN7.2 assembly. In order to continue to provide CHIP files and internal mappings in MSigDB for experiments derived from these platforms, we have carried forward the historical Probe-to-Gene mappings from Ensembl 103/MSigDB v7.4 and remapped the target genes to the current assembly using the Ensembl_Gene_ID_MSigDB.v7.5 chip. However, until Ensembl releases updated probe to gene mappings derived directly from the mRatBN7.2 assembly the quality of MSigDB's rat microarray chip files may be impacted. Rat CHIP files affected by this have recieved the suffix the temporary suffix _REMAPPED after the MSigDB version number.