Difference between revisions of "MSigDB collections"
m |
m |
||
Line 25: | Line 25: | ||
Gene sets collected from various sources such as online pathway databases, scientific publications and personal contributions from domain experts. | Gene sets collected from various sources such as online pathway databases, scientific publications and personal contributions from domain experts. | ||
<h3>CGP: chemical and genetic perturbations</h3> | <h3>CGP: chemical and genetic perturbations</h3> | ||
− | Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: an xxx_UP (xxx_DN) gene set representing genes induced (repressed) by the perturbation. | + | Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: an xxx_UP (xxx_DN) gene set representing genes induced (repressed) by the perturbation. |
+ | <ul> | ||
+ | <li>Sets curated from biomedical literature by MSigDB team</li> | ||
+ | <li>[http://depts.washington.edu/l2l/ L2L]: L2L is a database of published microarray gene expression data [http://genomebiology.com/2005/6/9/R81 Newman and Weiner] and kindly shared with MSigDB. These sets list John Newman as the contributor. | ||
+ | </ul> | ||
<h3>CP: canonical pathways</h3> | <h3>CP: canonical pathways</h3> | ||
Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. | Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. |
Revision as of 10:47, 7 March 2014
<a href="http://www.broadinstitute.org/gsea/">GSEA Home</a> |
<a href="http://www.broadinstitute.org/gsea/downloads.jsp">Downloads</a> |
<a href="http://www.broadinstitute.org/gsea/msigdb/">Molecular Signatures Database</a> |
Documentation |
<a href="http://www.broadinstitute.org/gsea/contact.jsp">Contact</a>
This page provides detailed descriptions of all collections of gene sets in MSigDB.
To learn about changes and other information specific for a particular release of MSigDB, please refer to the corresponding Release_Notes.
Contents
H: Hallmarks
some text
C1: positional gene sets
Genes from the same genomic location (chromosome or cytogenetic band) are grouped in a gene set. Cytogenetic annotations are from three sources:
- Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC)
- UniGene
- Affymetrix microarray annotations
We merged the relevant annotations from these resources and derived a single cytogenetic band location for every gene symbol. These were then grouped into sets. Decimals in cytogenetic bands were ignored. For example, 5q31.1 was considered 5q31. Therefore, genes annotated as 5q31.2 and those annotated as 5q31.3 were both placed in the same set, 5q31.
When there were conflicts, the UniGene entry was used.
These sets are helpful in identifying effects related to chromosomal deletions or amplifications, dosage compensation, epigenetic silencing, and other regional effects.
C2: curated gene sets
Gene sets collected from various sources such as online pathway databases, scientific publications and personal contributions from domain experts.
CGP: chemical and genetic perturbations
Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: an xxx_UP (xxx_DN) gene set representing genes induced (repressed) by the perturbation.
- Sets curated from biomedical literature by MSigDB team
- L2L: L2L is a database of published microarray gene expression data Newman and Weiner and kindly shared with MSigDB. These sets list John Newman as the contributor.
CP: canonical pathways
Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts.
C3: motif gene sets
Gene sets group genes by cis-regulatory motifs. The motifs are catalogued in Xie et al. and represent known or putative conserved regulatory elements in promoters and 3’-UTR regions. These sets make it possible to link changes in a genomic experiment to a conserved, putative cis-regulatory elements.
C4: computational gene sets
Gene sets defined by mining large collections of cancer-oriented genes.
C5: GO gene sets
Gene sets are named by Gene Ontology (GO) terms and contain genes annotated by that term.
C6: oncogenetic signatures
Gene sets represent signatures of cellular pathways which are often dis-regulated in cancer. The majority of signatures were generated directly from microarray data from NCBI GEO or from in house unpublished expression profiling experiments which involved perturbation of known cancer genes. In addition, a small number of oncogenic signatures was curated from scientific publications.
C7: immunologic signatures
Gene sets that represent cell states and perturbations within the immune system. The signatures were generated by manual curation of published studies in human and mouse immunology. For each study, pairwise comparisons of relevant classes were made and genes ranked by mutual information. Gene sets correspond to top or bottom ranking genes (FDR < 0.25 or maximum of 200 genes) for each comparison. This resource is generated as part of the Human Immunology Project Consortium (HIPC).