CHIP File Selection Help

From GeneSetEnrichmentAnalysisWiki
Revision as of 12:12, 15 September 2022 by Acastanza (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact


The 4.3.0 version of GSEA introduced support for the new gene set database files and chip files to support native analysis of mouse data first made available with MSigDB v2022.1.

These files are now split into two different tabs in both the GSEA gene sets and chip files UI windows.

Gene set files from the Human Collections have the suffix ".Hs.symbols.gmt" following the MSigDB version (v2022.1 in this initial release), gene set files from the Mouse Collections have the suffix ".Mm.symbols.gmt". Gene set files from the Human Collections have their contents provided in HGNC Gene Symbols, gene set files from the Mouse Collections have their contents provided in MGI Gene Symbols. These symbols follow different canonical formats (i.e. MTOR for human vs. Mtor for mouse).

As such, it is critical to pick CHIP files that map your data into the appropriate namespace (either human or mouse symbols).

Therefore, if a gene set file from the Human Collection (MSigDB) window is selected, you must also select a CHIP file from the Human Collection Chips (MSigDB) tab of the Chip platform selector window, and if a gene set file from the Mouse Collection (MSigDB) window is selected, you must also select a CHIP file from the Mouse Collection Chips (MSigDB) tab of the Chip platform selector window.

Chip files, like gene set database files are versioned and have a species specific suffix, i.e. ".Hs.chip" following the MSigDB version (v2022.1 in this initial release) for files targeting the Human Collections, and ".Mm.chip" for files targeting the Mouse Collections.

Both Human Collection Chips, and Mouse Collection Chips contain a full complement of Chips for native as well as orthology based analysis. Chips from the Human Collection Chips tab (files ending in .Hs.chip) that begin with Human_ convert human gene IDs to human symbols, files from this tab (still with the .Hs.chip suffix) that begin with Mouse_ or Rat_ will also have the text "_Human_Orthologs_" in the file name, and when used will apply orthology mapping data to convert the dataset to match the human gene symbols namespace of the MSigDB Human Collections database.

Likewise, chips from the Mouse Collection Chips tab (files ending in .Mm.chip) that begin with Mouse convert mouse gene IDs to mouse symbols, files from this tab (still with the .Mm.chip suffix) that begin with Human or Rat_ will also have the text "_Mouse_Orthologs_" in the file name, and when used will apply orthology mapping data to convert the dataset to match the mouse gene symbols namespace of the MSigDB mouse Collections database.

So, if the gene set database file c5.go.bp.v2022.1.Hs.symbols.gmt was selected from the Human Collection's tab, the appropriate CHIP file regardless of dataset species would be one of the _MSigDB.v.2022.1.Hs.chip files available from the Human Collection CHIPs (MSigDB) tab, and if the gene set database file m5.go.bp.v2022.1.Mm.symbols.gmt was selected from the Mouse Collection's tab the appropriate CHIP file regardless of dataset species would be one of the _MSigDB.v.2022.1.Mm.chip files available from the Mouse Collection CHIPs (MSigDB) tab.