Difference between revisions of "MSigDB XML description"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
m
m
Line 11: Line 11:
 
         <tr>
 
         <tr>
 
             <td> MSIGDB NAME</td>
 
             <td> MSIGDB NAME</td>
             <td>Name of the database.</td>
+
             <td>Name of the database</td>
 
             <td>required</td>
 
             <td>required</td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
 
             <td>VERSION          </td>
 
             <td>VERSION          </td>
             <td>Version of the database.</td>
+
             <td>Version of the database</td>
 
             <td>required</td>
 
             <td>required</td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
 
             <td>BUILD_DATE    </td>
 
             <td>BUILD_DATE    </td>
             <td>Date the XML file was built.</td>
+
             <td>Date the XML file was built</td>
 
             <td>required</td>
 
             <td>required</td>
 
         </tr>
 
         </tr>
Line 143: Line 143:
 
             MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID |</td>
 
             MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID |</td>
 
             <td>required</td>
 
             <td>required</td>
 +
        </tr>
 +
        <tr>
 +
            <td>FOUNDER_NAMES</td>
 +
            <td>Pipe-separated list of v4.0 MSigDB founder gene sets for the hallmark signatures</td>
 +
            <td>applies to hallmarks only</td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
 +
            <td>REFINEMENT_DATASETS</td>
 +
            <td>Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures<br />
 +
            GEO or ArrayExpress ID, comparison details |</td>
 +
            <td>applies to hallmarks only</td>
 +
        </tr>
 +
      <tr>
 +
            <td>VALIDATION_DATASETS</td>
 +
            <td>Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signatures<br />
 +
            GEO or ArrayExpress ID, comparison details |</td>
 +
            <td>applies to hallmarks only</td>
 +
        </tr>
 +
      <tr>
 
             <td>STATUS</td>
 
             <td>STATUS</td>
 
             <td>Indicates gene set status. In the current release, all gene sets have their  STATUS=&quot;public&quot;.              </td>
 
             <td>Indicates gene set status. In the current release, all gene sets have their  STATUS=&quot;public&quot;.              </td>

Revision as of 22:31, 16 March 2015

The MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in a given release of MSigDB. This page describes the tags and attributes of the XML file.

Attributes of the MSIGDB tag document the whole database in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
MSIGDB NAME Name of the database required
VERSION Version of the database required
BUILD_DATE Date the XML file was built required



Attributes of the GENESET tags document individual gene sets in the file.

<tbody> </tbody>
XML ATTRIBUTE DESCRIPTION  
STANDARD_NAME Gene set name. required
SYSTEMATIC_NAME Gene set name for internal indexing purposes. required
HISTORICAL_NAMES Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB. optional
ORGANISM Organism name. required
PMID PubMed ID for the source publication. optional
AUTHORS Authors of the gene set source publication, according to PubMed ID. optional
GEOID A GEO or ArrayExpress ID for the raw microarray data in GEO or ArrayExpress repository. optional
GENESET_LISTING_URL URL of the original source that listed the gene set members. optional
EXTERNAL_DETAILS_URL URL of the original source page of the gene set. optional
CHIP Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A". required
CATEGORY_CODE Gene set collection code, e.g., C2. required
SUB_CATEGORY_CODE Gene set subcategory code, e.g., CGP. optional
CONTRIBUTOR Name of the person or institution that contributed the gene set to MSigDB. required
CONTRIBUTOR_ORG Name of the organization associated with the gene set contributor. required
DESCRIPTION_BRIEF Brief description of the gene set. required
DESCRIPTION_FULL Full description of the gene set or abstract of the source publication. optional
TAGS Optional tags to enhance gene set annotations; currently not in use. optional
MEMBERS Comma-separated list of gene set members as they originally appeared in the source. required
MEMBERS_SYMBOLIZED Comma-separated list of gene set members in the form of human gene symbols. required
MEMBERS_EZID Comma-separated list of gene set members in the form of human Entrez Gene IDs required
MEMBERS_MAPPING Pipe-separated list of mappings between gene set members in the form of:
MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID |
required
FOUNDER_NAMES Pipe-separated list of v4.0 MSigDB founder gene sets for the hallmark signatures applies to hallmarks only
REFINEMENT_DATASETS Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures
GEO or ArrayExpress ID, comparison details |
applies to hallmarks only
VALIDATION_DATASETS Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signatures
GEO or ArrayExpress ID, comparison details |
applies to hallmarks only
STATUS Indicates gene set status. In the current release, all gene sets have their STATUS="public". required