MSigDB v7.4 Release Notes

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.4 of the Molecular Signatures Database (MSigDB). This release contains updates to GO and Reactome, as well as a bugfix for certain sets in C8 introduced with MSigDB 7.3 that contained errors to their gene members.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.4 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v103/GENCODE v37.

Updates to Existing Gene Sets by Collection

C2:CP:Reactome

  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v76 (+35 gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-02-01 and NCBI gene2go annotations downloaded on 2021-03-30.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+2 gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (+0 gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+0 gene sets). Gene sets derived from the Molecular Function Ontology.

Gene sets in GO sub-collection prior to MSigDB 7.3 had the universal prefix "GO_", this prefix has been updated to be sub-collection specific. As of MSigDB 7.3 gene sets in GO:BP now begin with "GOBP_", GO:CC now begin with "GOCC_", and GO:MF now begin with "GOMF_". This change should enable better "at a glance" determinations of which GO sub-collection was the origin of a specific gene set hit in analysis pipelines.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C8: cell type signature gene sets

The 77 Global cell type gene sets with the prefix designation "DESCARTES_MAIN_FETAL_" from the Descartes database Human Gene Expression During Development atlas (Cao et al. PMID33184181) have had their gene set members replaced. This resulted in the deletion of two sets that no longer pass the minimum gene member threshold. The prior version of these sets had been erroneously compiled from the same sources as the tissue-specific cell type sets by merging the 172 Tissue specific cell types across tissues, and did not properly represent the combined-tissue level differential expression calculations.

CHIP file updates

All CHIP files previously provided in the standard MSigDB 7.3 release have been updated for MSigDB 7.4 in accordance with previously described procedures. These CHIPs contain updated gene ID lists from NCBI, HGNC, MGI and RGD but do not change the target Ensembl version.

Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 4.0.0.