MSigDB XML description

From GeneSetEnrichmentAnalysisWiki
Revision as of 11:14, 4 August 2010 by Liberzon (talk | contribs)
Jump to navigation Jump to search

MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in a given release of MSigDB. This page describes the tags and attributes of the XML file.

Attributes of the MSIGDB tag document the whole database in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
MSIGDB NAME Name of the database required
VERSION Version of the database required
BUILD_DATE Date the XML file was built required



Attributes of the GENESET tags document individual gene sets in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
STANDARD_NAME Gene set name required
SYSTEMATIC_NAME Gene set name for internal indexing purposes required
HISTORICAL_NAMES Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB optional
ORGANISM Organism name required
PMID PubMed ID for the source publication optional
AUTHORS Authors of the gene set source publication, according to PubMed ID optional
GEOID A GEO or ArrayExpress ID for the raw microarray data in GEO or ArrayExpress repository. optional
GENE_SET_LISTING_URL URL of the original source that listed the gene set members optional
EXTERNAL_DETAILS_URL URL of the original source page of the gene set optional
CHIP Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A" required
CATEGORY_CODE Gene set collection code, e.g., C2 required
SUB_CATEGORY_CODE Gene set subcategory code, e.g., CGP optional
CONTRIBUTOR Name of the person or institution that contributed the gene set to MSigDB required
CONTRIBUTOR_ORG Name of the organization associated with the gene set contributor required
DESCRIPTION_BRIEF Brief description of the gene set required
DESCRIPTION_FULL Full description of the gene set or abstract of the source publication optional
TAGS Optional tags to enhance gene set annotations; currently not in use optional
MEMBERS Comma-separated list of gene set members as they originally appeared in the source required
MEMBERS_SYMBOLIZED Comma-separated list of gene set members in the form of human gene symbols required
MEMBERS_EZID Comma-separated list of gene set members in the form of human Entrez Gene IDs required
MEMBERS_MAPPING Pipe-separated ist of mappings between gene set members in the form of:
MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID |
required
STATUS Indicates gene set status:
                     STATUS="public" means that the set is included in the present MSigDB release
STATUS="deprecated:reason" indicates an archived set from previous releases that is not included in the present MSigDB release. This feature is not yet functional.