Difference between revisions of "MSigDB v2022.1.Hs Release Notes"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
(Created page with '<span class="plainlinks"> [http://www.broadinstitute.org/gsea/ GSEA Home] | [http://www.broadinstitute.org/gsea/downloads.jsp Downloads] | [http://www.broadinstitute.org/gsea/ms…')
 
m
 
(6 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
This page describes updates made to the Molecular Signatures Database for release 2022.1. This release introduces several major changes to previous conventions. MSigDB is now split into two major divisions; a series of gene set collections that are provided in the namespace of human gene symbols, and a series of gene set collections that are provided in the namespace of mouse gene symbols. As such the versioning convention of MSigDB has changed to adopt the format Year.Release.Species. This initial release in the new format is versioned 2022.1.Hs for the human collections and 2022.1.Mm for the mouse collections. Likewise, CHIP files have been updated to reflect this convention, as well as the specific series of collections (i.e. human or mouse) that they are targeted towards.
 
This page describes updates made to the Molecular Signatures Database for release 2022.1. This release introduces several major changes to previous conventions. MSigDB is now split into two major divisions; a series of gene set collections that are provided in the namespace of human gene symbols, and a series of gene set collections that are provided in the namespace of mouse gene symbols. As such the versioning convention of MSigDB has changed to adopt the format Year.Release.Species. This initial release in the new format is versioned 2022.1.Hs for the human collections and 2022.1.Mm for the mouse collections. Likewise, CHIP files have been updated to reflect this convention, as well as the specific series of collections (i.e. human or mouse) that they are targeted towards.
  
'''Note that in order to access the MSigBD mouse collections through the GSEA UI, the latest version of GSEA (4.3.0) is required.'''
+
'''In order to access the MSigBD mouse collections through the GSEA UI, the latest version of GSEA (4.3.0) is required.'''
  
 
MSigDB v2022.1 is based on gene annotation data from Ensembl Release 107 (Jul 2022).
 
MSigDB v2022.1 is based on gene annotation data from Ensembl Release 107 (Jul 2022).
 +
 +
<b>Note: </b>Please be aware that on September 12th we hot-fixed an issue where two gene sets in the Reactome sub-collection were identified by identical standard names due to an issue upstream in the Reactome database. As this issue resulted in GSEA failing to process the GMT file this should not have affected any GSEA results. If you previously encountered an error regarding "GeneSets should have unique names" when running MSigDB v2022.1.Hs Reactome gene sets please re-run the analysis as this has been resolved.
  
 
<h1>Updates to Human Collections (MSigDB v2022.1.Hs)</h1>
 
<h1>Updates to Human Collections (MSigDB v2022.1.Hs)</h1>
 +
 +
<h2>C1: positional gene sets</h2>
 +
Updated human gene annotations to Ensembl 107.
 +
<h2>C2:CGP</h2>
 +
 +
15 Gene sets contributed by MSigDB users have been added to C2:CGP
 +
<ul>
 +
<span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_XDGS_UP LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_XDGS_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_XDGS_DN LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_XDGS_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_KINASES_DN LIU_OVARIAN_CANCER_TUMORS_AND_XENOGRAFTS_KINASES_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/BANG_VERTEPORFIN_ENDOMETRIAL_CANCER_CELLS_UP BANG_VERTEPORFIN_ENDOMETRIAL_CANCER_CELLS_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/BANG_VERTEPORFIN_ENDOMETRIAL_CANCER_CELLS_DN BANG_VERTEPORFIN_ENDOMETRIAL_CANCER_CELLS_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_UP CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_DN CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_14Q32OVEREXPRESSION_IN_HEPATOBLASTOMA CARRILLOREIXACH_14Q32OVEREXPRESSION_IN_HEPATOBLASTOMA]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_HYPERMETHYLATED_AND_DN CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_HYPERMETHYLATED_AND_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_HYPOMETHYLATED_AND_UP CARRILLOREIXACH_HEPATOBLASTOMA_VS_NORMAL_HYPOMETHYLATED_AND_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_MRS3_VS_LOWER_RISK_HEPATOBLASTOMA_UP CARRILLOREIXACH_MRS3_VS_LOWER_RISK_HEPATOBLASTOMA_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CARRILLOREIXACH_MRS3_VS_LOWER_RISK_HEPATOBLASTOMA_DN CARRILLOREIXACH_MRS3_VS_LOWER_RISK_HEPATOBLASTOMA_DN]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/CURSONS_NATURAL_KILLER_CELLS CURSONS_NATURAL_KILLER_CELLS]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/MISIAK_ANAPLASTIC_THYROID_CARCINOMA_UP MISIAK_ANAPLASTIC_THYROID_CARCINOMA_UP]</span></li>
 +
    <li><span class="plainlinks">[https://gsea-msigdb.org/gsea/msigdb/human/geneset/MISIAK_ANAPLASTIC_THYROID_CARCINOMA_DN MISIAK_ANAPLASTIC_THYROID_CARCINOMA_DN]</span></li>
 +
</ul>
 +
<br>
 +
<p>STANHILL_HRAS_TRANSFROMATION_UP and SHARMA_ASTROCYTOMA_WITH_NF1_SYNDROM were archived in previous MSigDB releases due to no longer passing thresholds for inclusion (<5 genes), these set once again pass thresholds and has been included in MSigDB.</p>
 +
<p>Two gene sets, BIERIE_INFLAMMATORY_RESPONSE_TGFB1, and FUJIWARA_PARK2_IN_LIVER_CANCER_DN, are no longer included in this release of MSigDB as they no longer pass MSigDB inclusion thresholds (>5 genes).</p>
 +
<p>SATOH_COLORECTAL_CANCER_MYC_UP and SATOH_COLORECTAL_CANCER_MYC_DN have been renamed SOGA_COLORECTAL_CANCER_MYC_UP and SOGA_COLORECTAL_CANCER_MYC_DN respectively to reflect the wishes of the set contributors</p>
 +
 +
<h2>C2:CP:Reactome</h2>
 +
 +
<ul>
 +
    <li>Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of '''Reactome v81''' (+19 gene sets).</li>
 +
    <li>As previously described in the [[MSigDB_v7.0_Release_Notes#C2:CP:Reactome_-_Major_overhaul | Reactome release notes for MSigDB 7.0]], in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.</li>
 +
</ul>
 +
 +
<h2>C2:CP:WikiPathways</h2>
 +
WikiPathways gene sets have been updated to the August 10, 2022 release (+48 gene sets).
 +
 +
<h2>C3:TFT:GTRD</h2>
 +
<p>GCM2_TARGET_GENES was removed as it no longer passes MSigDB inclusion thresholds. (set members >2000 genes).</p>
 +
 +
<h2>C5:GO (Gene Ontology)</h2>
 +
<p> Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (<span class="plainlinks">[http://www.geneontology.org Nature Genet 2000]</span>). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2022-07-01 and NCBI gene2go annotations downloaded on 2022-07-15.</p>
 +
 +
<p>This collection is divided into three sub-collections:</p>
 +
<ul>
 +
    <li><strong>BP</strong>: GO Biological process (+105 gene sets). Gene sets derived from the Biological Process Ontology, and are prefixed with "GOBP_".</li>
 +
    <li><strong>CC</strong>: GO Cellular component (+29 gene sets). Gene sets derived from the Cellular Component Ontology, and are prefixed with "GOCC_".</li>
 +
    <li><strong>MF</strong>: GO Molecular function (+25 gene sets). Gene sets derived from the Molecular Function Ontology, and are prefixed with "GOMF_"..</li>
 +
</ul>
 +
 +
<p>These updates were generated in accordance with the procedure described in the [[MSigDB_v7.0_Release_Notes#C5_.28Gene_Ontology_collection.29_-_Major_overhaul | GO release notes for MSigDB 7.0.]]</p>
 +
 +
<h2>C5:HPO (Human Phenotype Ontology)</h2>
 +
 +
Gene sets in this sub-collection have been updated to reflect the 2022-06-11 release of the Human Phenotype Ontology database (+71 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.
 +
 +
<h2>C8 cell type signature gene sets</h2>
 +
<p>Added gene sets describing pancreatic cell type identity signatures from <span class="plainlinks">[https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC9019032/ van Gurp et al. 2022 Generation of human islet cell type-specific identity genesets]</span> (+4 gene sets) </p>
 +
 +
<h2>CHIP file updates</h2>
 +
<ul>
 +
    <li>MSigDB 2022.1.Hs gene annotations and gene mapping CHIP files have been updated to data from Ensembl 107.</li>
 +
    <li>Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to <span class="plainlinks">[https://www.alliancegenome.org/ Alliance of Genome Resources]</span> orthology database release 5.2.1 (2022-07-15)</li>
 +
    <li>Rat Microarray annotations derived from mappings to the mRatBN7.2 assembly are now available, the previous warning associated with remapped data from the  deprecated Rnor_6.0 assembly has been removed.</li>
 +
</ul>

Latest revision as of 01:30, 13 September 2022

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

Important Notices

This page describes updates made to the Molecular Signatures Database for release 2022.1. This release introduces several major changes to previous conventions. MSigDB is now split into two major divisions; a series of gene set collections that are provided in the namespace of human gene symbols, and a series of gene set collections that are provided in the namespace of mouse gene symbols. As such the versioning convention of MSigDB has changed to adopt the format Year.Release.Species. This initial release in the new format is versioned 2022.1.Hs for the human collections and 2022.1.Mm for the mouse collections. Likewise, CHIP files have been updated to reflect this convention, as well as the specific series of collections (i.e. human or mouse) that they are targeted towards.

In order to access the MSigBD mouse collections through the GSEA UI, the latest version of GSEA (4.3.0) is required.

MSigDB v2022.1 is based on gene annotation data from Ensembl Release 107 (Jul 2022).

Note: Please be aware that on September 12th we hot-fixed an issue where two gene sets in the Reactome sub-collection were identified by identical standard names due to an issue upstream in the Reactome database. As this issue resulted in GSEA failing to process the GMT file this should not have affected any GSEA results. If you previously encountered an error regarding "GeneSets should have unique names" when running MSigDB v2022.1.Hs Reactome gene sets please re-run the analysis as this has been resolved.

Updates to Human Collections (MSigDB v2022.1.Hs)

C1: positional gene sets

Updated human gene annotations to Ensembl 107.

C2:CGP

15 Gene sets contributed by MSigDB users have been added to C2:CGP


STANHILL_HRAS_TRANSFROMATION_UP and SHARMA_ASTROCYTOMA_WITH_NF1_SYNDROM were archived in previous MSigDB releases due to no longer passing thresholds for inclusion (<5 genes), these set once again pass thresholds and has been included in MSigDB.

Two gene sets, BIERIE_INFLAMMATORY_RESPONSE_TGFB1, and FUJIWARA_PARK2_IN_LIVER_CANCER_DN, are no longer included in this release of MSigDB as they no longer pass MSigDB inclusion thresholds (>5 genes).

SATOH_COLORECTAL_CANCER_MYC_UP and SATOH_COLORECTAL_CANCER_MYC_DN have been renamed SOGA_COLORECTAL_CANCER_MYC_UP and SOGA_COLORECTAL_CANCER_MYC_DN respectively to reflect the wishes of the set contributors

C2:CP:Reactome

  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v81 (+19 gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C2:CP:WikiPathways

WikiPathways gene sets have been updated to the August 10, 2022 release (+48 gene sets).

C3:TFT:GTRD

GCM2_TARGET_GENES was removed as it no longer passes MSigDB inclusion thresholds. (set members >2000 genes).

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2022-07-01 and NCBI gene2go annotations downloaded on 2022-07-15.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+105 gene sets). Gene sets derived from the Biological Process Ontology, and are prefixed with "GOBP_".
  • CC: GO Cellular component (+29 gene sets). Gene sets derived from the Cellular Component Ontology, and are prefixed with "GOCC_".
  • MF: GO Molecular function (+25 gene sets). Gene sets derived from the Molecular Function Ontology, and are prefixed with "GOMF_"..

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)

Gene sets in this sub-collection have been updated to reflect the 2022-06-11 release of the Human Phenotype Ontology database (+71 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

C8 cell type signature gene sets

Added gene sets describing pancreatic cell type identity signatures from van Gurp et al. 2022 Generation of human islet cell type-specific identity genesets (+4 gene sets)

CHIP file updates

  • MSigDB 2022.1.Hs gene annotations and gene mapping CHIP files have been updated to data from Ensembl 107.
  • Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 5.2.1 (2022-07-15)
  • Rat Microarray annotations derived from mappings to the mRatBN7.2 assembly are now available, the previous warning associated with remapped data from the deprecated Rnor_6.0 assembly has been removed.