MSigDB v7.1 Release Notes
This page describes the changes made to the gene set collections for Release 7.1 of the Molecular Signatures Database (MSigDB). This is a minor release that includes updates to gene symbol mappings, updated data from external resources, and new datasets for potential transcription factor and microRNA regulatory target genes.
Note: Due to substantial changes in MSigDB, it is recommended that users migrate to GSEA 4.0.0+ when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.1 always use the GSEA "Collapse dataset to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v99/GENCODE v33.
This advisory has been updated to reflect MSigDB symbol annotations as of the 7.1 update.
- 1 Updates to MSigDB Gene Symbol Mapping Procedures
- 2 Updates to Gene Sets by Collection
- 3 Changes to C3
Updates to MSigDB Gene Symbol Mapping Procedures
Update to Ensembl annotations
Beginning in MSigDB 7.0, identifiers for genes are mapped to their HGNC approved Gene Symbol and NCBI Gene ID through annotations extracted from Ensembl's BioMart data service. MSigDB 7.1 incorporates annotation information exported from Ensembl release 99. All analysis run against MSigDB 7.1 gene sets should ensure that the dataset gene symbols match this Ensembl version/GENCODE release 33. Alternatively MSigDB 7.1 provides CHIP files designed to be used with the GSEA Collapse/Remap dataset feature which may be used to re-annotate the dataset.
- Gene annotations supplied in the MSigDB 7.1 release are derived from corresponding to GENCODE release 33 and reflect the HGNC Gene Symbols as of the GENCODE 33 freeze date of August 2019.
Change to gene orthology mapping procedure for non-human genes
Previously in MSigDB 7.0 we implemented a ranking procedure whereby the best human orthologue for each non-human gene was selected using solely Ensembl orthology table statistics. MSigDB 7.1 replaces this procedure. MSigDB 7.1 utilizes best match orthology tables exported via the
CHIP file updates
All CHIP files previously provided in the standard MSigDB 7.0 release have been updated for MSigDB 7.1 in accordance with previously described procedures.
Updates to Gene Sets by Collection
C1 (positional gene sets) - Minor Update
C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 99 and GENCODE 33 (GRCh38). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 99) and reflect the gene architecture as represented on the primary assembly.
C2:CP:Reactome - Minor Update
- Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v71 (+10 gene sets).
- As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
C5 (Gene Ontology collection) - Minor Update
Gene sets in this collection are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as of January 15, 2020).
This collection is divided into three sub-collections:
- BP: GO Biological process (+180 gene sets). Gene sets derived from the Biological Process Ontology.
- CC: GO Cellular component (+2 gene sets). Gene sets derived from the Cellular Component Ontology.
- MF: GO Molecular function (+18 gene sets). Gene sets derived from the Molecular Function Ontology.
These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.
Changes to C3
Changes to collection structure
MSigDB 7.1 has been updated to introduce new content for the analysis of gene sets in the context of their targeting by microRNAs or Transcription Factors. This new content necessitated several revisions to C3.
- C3 has been renamed from "C: motif gene sets" to "C3: regulatory target gene sets" to better reflect the new content in the collection.
- The previous "MIR: microRNA targets" sub-collection has been demoted to a sub-level and renamed to "MIR_Legacy subset of MIR". There have been no changes to this sub-level collection content beyond gene symbol updates.
- The "MIR: microRNA targets" sub-collection now consists of a combined collection of the old MIR collection (MIR_Legacy) as well as the newly introduced MIRDB content.
- The previous "TFT: transcription factor targets" sub-collection has been demoted to a sub-level and renamed to "TFT_Legacy subset of TFT". There have been no changes to this sub-level collection content beyond gene symbol updates.
- The "TFT: transcription factor targets" sub-collection now consists of a combined collection of the old TFT collection (TFT_Legacy) as well as the newly introduced GTRD content.
New C3 Resources
New miRNA target content from miRDB
A newly created sub-level for predicted targets of miRNAs consisting 2377 new gene sets has been added to the "MIR: microRNA targets" sub-collection of C3.
New Transcription Factor target content from the Gene Transcription Regulation Database (GTRD)
A newly created sub-level for predicted targets of transcription factors consisting 221 new gene sets has been added to the "TFT: transcription factor targets" sub-collection of C3.