MSigDB v7.2 Release Notes
GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact
This page describes the changes made to the gene set collections for Release 7.2 of the Molecular Signatures Database (MSigDB). This release includes STUFF.
Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.2 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v101/GENCODE v35.
Contents
Changes to Collection Organization
Updates to Gene Sets by Collection
C1 (positional gene sets)
C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 101 and GENCODE 35 (GRCh38). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 101) and reflect the gene architecture as represented on the primary assembly.
C2:CP:Reactome
- Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v73 (+209 gene sets).
- As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
C5:GO (Gene Ontology)
Gene sets in this collection are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2020-07-16 and NCBI GO annotations downloaded on 2020-07-30.
This collection is divided into three sub-collections:
- BP: GO Biological process (+564 gene sets). Gene sets derived from the Biological Process Ontology.
- CC: GO Cellular component (+164 gene sets). Gene sets derived from the Cellular Component Ontology.
- MF: GO Molecular function (+213 gene sets). Gene sets derived from the Molecular Function Ontology.
These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.
CHIP file updates
All CHIP files previously provided in the standard MSigDB 7.1 release have been updated for MSigDB 7.2 in accordance with previously described procedures.
Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.1.1.