MSigDB XML description

From GeneSetEnrichmentAnalysisWiki
Revision as of 02:21, 25 September 2016 by Eby (talk | contribs)
Jump to navigation Jump to search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact
The MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in a given release of MSigDB. This page describes the tags and attributes of the XML file.

Attributes of the MSIGDB tag document the whole database in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
MSIGDB NAME Name of the database required
VERSION Version of the database required
BUILD_DATE Date the XML file was built required



Attributes of the GENESET tags document individual gene sets in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
STANDARD_NAME Gene set name required
SYSTEMATIC_NAME Gene set name for internal indexing purposes required
HISTORICAL_NAMES Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB optional
ORGANISM Organism name required
PMID PubMed ID for the source publication optional
AUTHORS Authors of the gene set source publication, according to PubMed ID optional
GEOID A GEO or ArrayExpress ID for the raw microarray data in GEO or ArrayExpress repository optional
GENESET_LISTING_URL URL of the original source that listed the gene set members optional
EXTERNAL_DETAILS_URL URL of the original source page of the gene set optional
CHIP Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A" required
CATEGORY_CODE Gene set collection code, e.g., C2 required
SUB_CATEGORY_CODE Gene set subcategory code, e.g., CGP optional
CONTRIBUTOR Name of the person or institution that contributed the gene set to MSigDB required
CONTRIBUTOR_ORG Name of the organization associated with the gene set contributor required
DESCRIPTION_BRIEF Brief description of the gene set required
DESCRIPTION_FULL Full description of the gene set or abstract of the source publication optional
TAGS Optional tags to enhance gene set annotations; currently not in use optional
MEMBERS Comma-separated list of gene set members as they originally appeared in the source required
MEMBERS_SYMBOLIZED Comma-separated list of gene set members in the form of human gene symbols required
MEMBERS_EZID Comma-separated list of gene set members in the form of human Entrez Gene IDs required
MEMBERS_MAPPING Pipe-separated list of mappings between gene set members in the form of:
MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID
required
FOUNDER_NAMES Pipe-separated list of v4.0 MSigDB founder gene sets for the hallmark signatures applies to hallmarks only
REFINEMENT_DATASETS Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures
GEO or ArrayExpress ID, comparison details
applies to hallmarks only
VALIDATION_DATASETS Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signatures
GEO or ArrayExpress ID, comparison details
applies to hallmarks only
STATUS Indicates gene set status. In the current release, all gene sets have their STATUS="public" required