MSigDB v7.2 Release Notes
GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact
This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes a substantial reorganization of C5, the addition of gene sets from WikiPathways to C2:CP, and the promotion of a supplemental collection to C8, among other updates and additions.
Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.2 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v101/GENCODE v35.
Contents
New Additions and Changes to Collection Organization
C2:CP:WikiPathways
Begining in MSigDB 7.2, the WikiPathways analysis subset gene sets are now included as a canonical pathway subset in C2. This initial release reflects the WikiPathways September release.
C2:CGP
60 gene sets have been curated from literature or contributed by users and are now available in C2:CGP.
36 of these gene sets derived from two publications (prefixed with "MANNE" and "BLANCO_MELO") are derived from research related to the ongoing COVID-19 global pandemic.
The remaining sets consist data contributed by the following individuals:
- Francesca Buffa - BUFFA_HYPOXIA_METAGENE Signature
- Orlando Musso - 4 "DESERT" and 12 "MEBARKI" (12) Hepatocellular Carcinoma gene sets
- Goodwin Jinesh - 6"JINESH_BLEBBISHIELD"gene sets
- Russell Ryan - "RYAN_MANTLE_CELL_LYMPHOMA_NOTCH_DIRECT_UP" gene set
C5 ontology
C5 has been renamed from "C5 GO gene sets" to "C5: ontology gene sets". This change reflects the addition of a new sub-collection of gene sets from the Human Phenotype Ontology project. This initial release is categorized under C5:HP and reflects the August release of the Human Phenotype Ontology. This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.
C8: cell type signature gene sets
The previously supplemental release of gene sets for single cell identities has been updated and promoted to a full MSigDB collection.
The new C8 differs from the previously released supplemental in the following ways:
- Added 26 new gene sets from Durante et al. Single-cell analysis of olfactory neurogenesis and differentiation in adult humans.
- Added 25 new gene sets from Cui et al. Single-Cell Transcriptome Analysis Maps the Developmental Track of the Human Heart.
- Performed additional significance filtering for 35 gene sets from Hay et al. The Human Cell Atlas bone marrow single-cell interactive web portal. This additional filtering resulted in the outright deletion of two sets, and the reduction of several more to below the MSigDB inclusion threshold.
Updates to Existing Gene Sets by Collection
C1 (positional gene sets)
C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38) (+1 gene set). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.
C2:CP:Reactome
- Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v73 (+25 gene sets).
- As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
C3 regulatory target gene sets
C3:GTRD has been updated to GTRD v20.06. A substantial addition of new content to the source database resulted in a substantial number of gene sets increasing in size over the MSigDB maximum size of inclusion threshold. This resulted in a net decrease in the size of the collection (-176 gene sets).
C5:GO (Gene Ontology)
Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-08-11 and NCBI gene2go annotations downloaded on 2020-09-03.
This collection is divided into three sub-collections:
- BP: GO Biological process (+43 gene sets). Gene sets derived from the Biological Process Ontology.
- CC: GO Cellular component (+2 gene sets). Gene sets derived from the Cellular Component Ontology.
- MF: GO Molecular function (+34 gene sets). Gene sets derived from the Molecular Function Ontology.
These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.
CHIP file updates
All CHIP files previously provided in the standard MSigDB 7.1 release have been updated for MSigDB 7.2 in accordance with previously described procedures.
Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.1.1.
Addendum
Hallmark founder gene sets in the MSigDB XML file have had their identifiers adjusted to reflect their internal "systematic name". This change enables more precise tracking of Hallmark founder gene sets across releases. Previously these gene sets were identified by their standard name as represented in the initial release of the MSigDB Hallmarks collection.