MSigDB v7.5.1 Release Notes

From GeneSetEnrichmentAnalysisWiki

Revision as of 18:54, 11 January 2022 by Acastanza (Talk | contribs)
Jump to: navigation, search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.5 of the Molecular Signatures Database (MSigDB). This release contains updates to: C1, GO, HPO, and Reactome, as well as the addition of curated sets to C8 and user submitted sets to C2:CGP.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.5 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v105/GENCODE v39.


Updates to Collections




  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v78 (+{} gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.


WikiPathways gene sets have been updated to the January 10, 2022 release (+{} gene sets).

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-12-15 and NCBI gene2go annotations downloaded on 2022-01-03.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+{} gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (+{} gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+{} gene sets). Gene sets derived from the Molecular Function Ontology.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)

Gene sets in this sub-collection have been updated to reflect the 2021-10-10 release of the Human Phenotype Ontology database (+{} gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

C8: cell type signature gene sets


CHIP file updates

  • MSigDB 7.5 includes new handling for deprecated Ensembl Gene IDs thanks to work by Daniel Himmelstein. Briefly, historical IDs that map uniquely to one "newest" ensembl gene ID were extracted from the "old_to_newest.tsv" file from the respective Human, Mouse, and Rat repositories generated for Ensembl 105 (see: Github repository) and then merged into the species specific Ensembl_Gene_ID chip file.
  • Warning: Rat Microarray derived annotations. Ensembl 105 brought a major update to the rat genome assembly transitioning from the deprecated Rnor_6.0 assembly to the modern mRatBN7.2 assembly. However, Ensembl has not yet released updated microarray probe mappings for the mRatBN7.2 assembly. In order to continue to provide CHIP files and internal mappings in MSigDB for experiments derived from these platforms, we have carried forward the historical Probe-to-Gene mappings from Ensembl 103/MSigDB v7.4 and remapped the target genes to the current assembly using the Ensembl_Gene_ID_MSigDB.v7.5 chip. However, until Ensembl releases updated probe to gene mappings derived directly from the mRatBN7.2 assembly the quality of MSigDB's rat microarray chip files may be impacted. Rat CHIP files affected by this have recieved the suffix the temporary suffix _REMAPPED after the MSigDB version number.
  • Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 4.2.

Personal tools