MSigDB v7.2 Release Notes

From GeneSetEnrichmentAnalysisWiki
Revision as of 17:42, 3 September 2020 by Acastanza (talk | contribs)
Jump to navigation Jump to search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes STUFF.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.2 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v101/GENCODE v35.

Changes to Collection Organization

Updates to Gene Sets by Collection

C1 (positional gene sets)

C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.


  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v73 (+NUMBER gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C5:GO (Gene Ontology)

Gene sets in this collection are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-07-16 and NCBI GO annotations downloaded on 2020-07-30.

This collection is divided into three sub-collections:

  • BP: GO Biological process (+NUMBER gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (+NUMBER gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+NUMBER gene sets). Gene sets derived from the Molecular Function Ontology.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

CHIP file updates

All CHIP files previously provided in the standard MSigDB 7.1 release have been updated for MSigDB 7.2 in accordance with previously described procedures.

Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.1.1.