MSigDB v7.3 Release Notes

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact

This page describes the changes made to the gene set collections for Release 7.3 of the Molecular Signatures Database (MSigDB). This release includes a reorganization of C7 to accommodate the addition of vaccination response gene sets provided by the Human Immunology Project Consortium among other minor updates and additions.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.
Advisory: It is strongly recommended that users of MSigDB 7.3 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v103/GENCODE v37.

New Additions and Changes to Collection Organization

C2:CGP

Gene sets describing the molecular effect of over expression of S1PR3 in Leukemia (PMID33458693), and signatures describing the effects of anti-TNF therapy on inflammatory bowel disease (PMID33429950) as well as gene sets contributed by the following individuals have been added to C2:CGP

  • Jorge Benitez, University of California, San Diego - BENITEZ_GBM_PROTEASOME_INHIBITION_RESPONSE Signature, (PMID33428749)
  • Martin Fischer, Leibniz Institute on Aging, Fritz Lipmann Institute - RIEGE_DELTANP63_DIRECT_TARGETS_UP Signature, (PMID33263276)

C7: immunologic signature gene sets

  • C7 has been reorganized to accommodate the addition of new data. The previous C7 collection has been moved to sub-collection level and renamed to C7:ImmuneSigDB, to reflect its original publication title.
  • A new sub-collection, C7:VAX has been added to C7. This sub-collection consists of 347 gene sets curated from the literature by the Human Immunology Project Consortium (HIPC). These sets describe the human immunological responses to specific vaccines. Sets in this collection include signatures of age specific responses, post-vaccination response time-courses, and predictive signatures of responders to vaccination vs. non-responders among other curated data.

C8: cell type signature gene sets

333 Gene sets of single-cell sequencing derived cell identity signatures have been added to C8. These consist of:

"Filtered by similarity" Annotations

Gene set sub-collections updated in this release that have undergone redundancy filtering for inclusion in MSigDB now have an additional field on the gene set page "Filtered by similarity". This field contains the source database IDs of other candidate gene sets that clustered with the selected set by Jaccard similarity coefficient, and exhibited Jaccard coefficients >0.85 with the selected set but were filtered out of the collection on the basis of tree distance or set size. These database IDs link to the source resource's page for that identifier as in the EXTERNAL_DETAILS_URL field.

Updates to Existing Gene Sets by Collection

C1 (positional gene sets)

C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 103 and GENCODE 37 (GRCh38). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 103) and reflect the gene architecture as represented on the primary assembly.

C2:CP:Reactome

  • Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v75 (+15 gene sets).
  • As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome sub-collection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C2:CP:WikiPathways

WikiPathways gene sets have been updated to reflect the state of WikiPathways Release 20210310 (+28 gene sets).

C3 regulatory target gene sets

C3:GTRD has been updated to GTRD v20.06 (+175 gene sets), this additionally corrects an error where data from certain transcription factors with short promoter regions may have been omitted.

C5:GO (Gene Ontology)

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-02-01 and NCBI gene2go annotations downloaded on 2021-02-16.

This collection is divided into three sub-collections:

  • BP: GO Biological process (-94 gene sets). Gene sets derived from the Biological Process Ontology.
  • CC: GO Cellular component (-5 gene sets). Gene sets derived from the Cellular Component Ontology.
  • MF: GO Molecular function (+11 gene sets). Gene sets derived from the Molecular Function Ontology.

Gene sets in GO sub-collection previously had the universal prefix "GO_", this prefix has been updated to be sub-collection specific. Gene sets in GO:BP now begin with "GOBP_", GO:CC now begin with "GOCC_", and GO:MF now begin with "GOMF_". This change should enable better "at a glance" determinations of which GO sub-collection was the origin of a specific gene set hit in analysis pipelines.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)

Gene sets in this sub-collection have been updated to reflect the 2021-02-09 release of the Human Phenotype Ontology database (+319 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

CHIP file updates

All CHIP files previously provided in the standard MSigDB 7.2 release have been updated for MSigDB 7.3 in accordance with previously described procedures.

Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.2. Genes with no ortholog listed in the Alliance Orthology file, and genes with multiple orthologs are now processed using additional information from Ensembl's orthology table using a procedure adapted from the procedure utilized in MSigDB 7.0. to enable the selection of additional best-match orthologs.