Difference between revisions of "MSigDB XML description"
Jump to navigation
Jump to search
(added sub_category_code and geoid) |
m |
||
(22 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in MSigDB. This page describes the tags and attributes of the XML file. | + | [http://www.broadinstitute.org/gsea/ GSEA Home] | |
− | <p> | + | [http://www.broadinstitute.org/gsea/downloads.jsp Downloads] | |
+ | [http://www.broadinstitute.org/gsea/msigdb/ Molecular Signatures Database] | | ||
+ | [http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page Documentation] | | ||
+ | [http://www.broadinstitute.org/gsea/contact.jsp Contact] | ||
+ | <br> | ||
+ | The MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in a given release of MSigDB. This page describes the tags and attributes of the XML file. <br /> | ||
+ | <br /> | ||
+ | <p>Attributes of the <strong>MSIGDB</strong> tag document the whole database in the file.</p> | ||
<table width="75%" cellspacing="2" cellpadding="5" border="2"> | <table width="75%" cellspacing="2" cellpadding="5" border="2"> | ||
− | |||
<tr> | <tr> | ||
− | <th>XML | + | <th>XML ATTRIBUTE</th> |
− | <th>DESCRIPTION | + | <th>DESCRIPTION</th> |
<th> </th> | <th> </th> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> MSIGDB NAME</td> | <td> MSIGDB NAME</td> | ||
− | <td>Name of the database | + | <td>Name of the database</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>VERSION | + | <td>VERSION</td> |
− | <td>Version of the database | + | <td>Version of the database</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>BUILD_DATE | + | <td>BUILD_DATE</td> |
<td>Date the XML file was built</td> | <td>Date the XML file was built</td> | ||
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
− | |||
</table> | </table> | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
− | <p> | + | <p>Attributes of the <strong>GENESET</strong> tags document individual gene sets in the file.</p> |
<table width="75%" cellspacing="2" cellpadding="5" border="2"> | <table width="75%" cellspacing="2" cellpadding="5" border="2"> | ||
− | |||
<tr> | <tr> | ||
− | <th>XML | + | <th>XML ATTRIBUTE</th> |
− | <th>DESCRIPTION | + | <th>DESCRIPTION</th> |
<th> </th> | <th> </th> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>STANDARD_NAME </td> |
− | <td>Gene set name | + | <td>Gene set name</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>SYSTEMATIC_NAME | + | <td>SYSTEMATIC_NAME</td> |
− | <td>Gene set name for internal indexing purposes | + | <td>Gene set name for internal indexing purposes</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>HISTORICAL_NAMES | + | <td>HISTORICAL_NAMES</td> |
<td>Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB</td> | <td>Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB</td> | ||
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>ORGANISM | + | <td>ORGANISM</td> |
− | <td>Organism name | + | <td>Organism name</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>PMID | + | <td>PMID</td> |
− | <td>PubMed ID for the source publication | + | <td>PubMed ID for the source publication</td> |
+ | <td>optional</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>AUTHORS</td> | ||
+ | <td>Authors of the gene set source publication, according to PubMed ID</td> | ||
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>GEOID</td> |
− | <td> | + | <td>A GEO or ArrayExpress ID for the raw microarray data in GEO or ArrayExpress repository</td> |
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
+ | <tr> | ||
+ | <td>EXACT_SOURCE</td> | ||
+ | <td>Description of the exact source of the set - usually a specific figure or table in the source publication.</td> | ||
+ | <td>optional</td> | ||
+ | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>GENESET_LISTING_URL</td> |
− | <td>URL of the original source that listed the gene set members | + | <td>URL of the original source that listed the gene set members</td> |
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>EXTERNAL_DETAILS_URL | + | <td>EXTERNAL_DETAILS_URL</td> |
− | <td>URL of the original source page of the gene set | + | <td>URL of the original source page of the gene set</td> |
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>CHIP | + | <td>CHIP</td> |
− | <td>Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A" | + | <td>Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A"</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>CATEGORY_CODE | + | <td>CATEGORY_CODE</td> |
− | <td>Gene set collection code, e.g., C2 | + | <td>Gene set collection code, e.g., C2</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
Line 91: | Line 105: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>CONTRIBUTOR | + | <td>CONTRIBUTOR</td> |
− | <td>Name of the person or institution that contributed the gene set to MSigDB | + | <td>Name of the person or institution that contributed the gene set to MSigDB</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>CONTRIBUTOR_ORG | + | <td>CONTRIBUTOR_ORG</td> |
− | <td>Name of the organization associated with the gene set contributor | + | <td>Name of the organization associated with the gene set contributor</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>DESCRIPTION_BRIEF | + | <td>DESCRIPTION_BRIEF</td> |
− | <td>Brief description of the gene set | + | <td>Brief description of the gene set</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>DESCRIPTION_FULL | + | <td>DESCRIPTION_FULL</td> |
− | <td>Full description of the gene set or abstract of the source publication | + | <td>Full description of the gene set or abstract of the source publication</td> |
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>TAGS | + | <td>TAGS</td> |
− | <td>Optional tags to enhance gene set annotations; currently not in use | + | <td>Optional tags to enhance gene set annotations; currently not in use</td> |
<td>optional</td> | <td>optional</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td>MEMBERS | + | <td>MEMBERS</td> |
− | <td>Comma-separated list of gene set members as they originally appeared in the source | + | <td>Comma-separated list of gene set members as they originally appeared in the source</td> |
+ | <td>required</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>MEMBERS_SYMBOLIZED</td> | ||
+ | <td>Comma-separated list of gene set members in the form of human gene symbols</td> | ||
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>MEMBERS_EZID</td> |
− | <td>Comma-separated list of gene set members in the form of human | + | <td>Comma-separated list of gene set members in the form of human Entrez Gene IDs</td> |
<td>required</td> | <td>required</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | </tr> |
− | <td> | + | <tr> |
+ | <td>MEMBERS_MAPPING</td> | ||
+ | <td>Pipe-separated list of mappings between gene set members in the form of: <br> | ||
+ | MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID</td> | ||
<td>required</td> | <td>required</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>FOUNDER_NAMES</td> | ||
+ | <td>Pipe-separated list of v4.0 MSigDB founder gene sets for the hallmark signatures</td> | ||
+ | <td>applies to hallmarks only</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td> | + | <td>REFINEMENT_DATASETS</td> |
− | <td>& | + | <td>Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures<br> |
− | <td> | + | GEO or ArrayExpress ID, comparison details</td> |
+ | <td>applies to hallmarks only</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>VALIDATION_DATASETS</td> | ||
+ | <td>Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signatures<br> | ||
+ | GEO or ArrayExpress ID, comparison details</td> | ||
+ | <td>applies to hallmarks only</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | <td>STATUS</td> | ||
+ | <td>Indicates gene set status. In the current release, all gene sets have their STATUS="public"</td> | ||
+ | <td>required</td> | ||
</tr> | </tr> | ||
− | |||
</table> | </table> |
Latest revision as of 09:49, 2 May 2017
GSEA Home |
Downloads |
Molecular Signatures Database |
Documentation |
Contact
The MSigDB database in XML format captures both the content (i.e., gene members) and annotation about the gene sets in a given release of MSigDB. This page describes the tags and attributes of the XML file.
Attributes of the MSIGDB tag document the whole database in the file.
XML ATTRIBUTE | DESCRIPTION | |
---|---|---|
MSIGDB NAME | Name of the database | required |
VERSION | Version of the database | required |
BUILD_DATE | Date the XML file was built | required |
Attributes of the GENESET tags document individual gene sets in the file.
XML ATTRIBUTE | DESCRIPTION | |
---|---|---|
STANDARD_NAME | Gene set name | required |
SYSTEMATIC_NAME | Gene set name for internal indexing purposes | required |
HISTORICAL_NAMES | Comma-separated list of older gene set names, starting from VERSION="V.2.5" of MSigDB | optional |
ORGANISM | Organism name | required |
PMID | PubMed ID for the source publication | optional |
AUTHORS | Authors of the gene set source publication, according to PubMed ID | optional |
GEOID | A GEO or ArrayExpress ID for the raw microarray data in GEO or ArrayExpress repository | optional |
EXACT_SOURCE | Description of the exact source of the set - usually a specific figure or table in the source publication. | optional |
GENESET_LISTING_URL | URL of the original source that listed the gene set members | optional |
EXTERNAL_DETAILS_URL | URL of the original source page of the gene set | optional |
CHIP | Indicates the type of the original gene set members, equivalent to the CHIP file, e.g., "HG-U133A" | required |
CATEGORY_CODE | Gene set collection code, e.g., C2 | required |
SUB_CATEGORY_CODE | Gene set subcategory code, e.g., CGP | optional |
CONTRIBUTOR | Name of the person or institution that contributed the gene set to MSigDB | required |
CONTRIBUTOR_ORG | Name of the organization associated with the gene set contributor | required |
DESCRIPTION_BRIEF | Brief description of the gene set | required |
DESCRIPTION_FULL | Full description of the gene set or abstract of the source publication | optional |
TAGS | Optional tags to enhance gene set annotations; currently not in use | optional |
MEMBERS | Comma-separated list of gene set members as they originally appeared in the source | required |
MEMBERS_SYMBOLIZED | Comma-separated list of gene set members in the form of human gene symbols | required |
MEMBERS_EZID | Comma-separated list of gene set members in the form of human Entrez Gene IDs | required |
MEMBERS_MAPPING | Pipe-separated list of mappings between gene set members in the form of: MEMBERS, MEMBERS_SYMBOLIZED, MEMBERS_EZID |
required |
FOUNDER_NAMES | Pipe-separated list of v4.0 MSigDB founder gene sets for the hallmark signatures | applies to hallmarks only |
REFINEMENT_DATASETS | Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to refine hallmark signatures GEO or ArrayExpress ID, comparison details |
applies to hallmarks only |
VALIDATION_DATASETS | Pipe-separated list of GEO or ArrayExpress identifiers of microarray data used to validate hallmark signatures GEO or ArrayExpress ID, comparison details |
applies to hallmarks only |
STATUS | Indicates gene set status. In the current release, all gene sets have their STATUS="public" | required |