https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&feed=atom&action=history Using RNA-seq Datasets with GSEA - Revision history 2024-03-29T12:50:13Z Revision history for this page on the wiki MediaWiki 1.34.4 https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4426&oldid=prev Acastanza at 21:30, 21 September 2020 2020-09-21T21:30:10Z <p></p> <table class="diff diff-contentalign-left" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 21:30, 21 September 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8" >Line 8:</td> <td colspan="2" class="diff-lineno">Line 8:</td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;/ul&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;/ul&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These quantifications are not properly normalized for comparisons across samples.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These quantifications are not properly normalized for comparisons across samples.</div></td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/<del class="diffchange diffchange-inline">ssGSEAProjection</del>-gpmodule/<del class="diffchange diffchange-inline">v9</del>/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA. For the ssGSEA implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/<ins class="diffchange diffchange-inline">ssGSEA</ins>-gpmodule/<ins class="diffchange diffchange-inline">v10</ins>/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA. For the ssGSEA implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;h2&gt;Count Normalization for Standard GSEA&lt;/h2&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;h2&gt;Count Normalization for Standard GSEA&lt;/h2&gt;</div></td></tr> </table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4374&oldid=prev Acastanza: Page formatting 2019-11-19T16:18:18Z <p>Page formatting</p> <table class="diff diff-contentalign-left" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 16:18, 19 November 2019</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">&lt;h2&gt;Quantification Types and Input Data &lt;/h2&gt;</ins></div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. RNA-seq quantification pipelines typically produce quantifications containing one or more of the following:&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. RNA-seq quantification pipelines typically produce quantifications containing one or more of the following:&lt;/p&gt;</div></td></tr> <tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l6" >Line 6:</td> <td colspan="2" class="diff-lineno">Line 7:</td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;li&gt;FPKM/RPKM&lt;/li&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;li&gt;FPKM/RPKM&lt;/li&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;/ul&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;/ul&gt;</div></td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">&lt;p&gt;</del>These quantifications are not normalized for comparisons across samples. <del class="diffchange diffchange-inline">Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA.&lt;/p&gt; </del></div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>These quantifications are not <ins class="diffchange diffchange-inline">properly </ins>normalized for comparisons across samples.</div></td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/ssGSEAProjection-gpmodule/v9/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA<del class="diffchange diffchange-inline">, for this </del>implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/ssGSEAProjection-gpmodule/v9/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA<ins class="diffchange diffchange-inline">. For the ssGSEA </ins>implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&lt;br&gt;</ins></div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&lt;h2&gt;Count Normalization for Standard GSEA&lt;/h2&gt;</ins></div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&lt;p&gt;Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA.&lt;/p&gt; </ins></div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA. The [https://genepattern.github.io/DESeq2/v1/index.html DESeq2 module] available through the [https://cloud.genepattern.org/ GenePattern environment] produces a GSEA compatible “normalized counts” table in the [[Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29|GCT format]] which can be directly used in the GSEA application.&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA. The [https://genepattern.github.io/DESeq2/v1/index.html DESeq2 module] available through the [https://cloud.genepattern.org/ GenePattern environment] produces a GSEA compatible “normalized counts” table in the [[Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29|GCT format]] which can be directly used in the GSEA application.&lt;/p&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td></tr> <tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l15" >Line 15:</td> <td colspan="2" class="diff-lineno">Line 20:</td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;The GSEA algorithm ranks the features listed in a GCT file. It provides a number of alternative statistics that can be used for feature ranking. But in all cases (or at least in the cases where the dataset represents expression profiles for differing categorical phenotypes) the ranking statistics capture some measure of genes' differential expression between a pair of categorical phenotypes. While these metrics are widely used for RNA-seq datasets, the GSEA team has yet to fully evaluate whether these ranking statistics, originally selected for their effectiveness when used with Microarray-based expression data, are entirely appropriate for use with data derived from RNA-seq experiments.&lt;/p&gt;  </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;The GSEA algorithm ranks the features listed in a GCT file. It provides a number of alternative statistics that can be used for feature ranking. But in all cases (or at least in the cases where the dataset represents expression profiles for differing categorical phenotypes) the ranking statistics capture some measure of genes' differential expression between a pair of categorical phenotypes. While these metrics are widely used for RNA-seq datasets, the GSEA team has yet to fully evaluate whether these ranking statistics, originally selected for their effectiveness when used with Microarray-based expression data, are entirely appropriate for use with data derived from RNA-seq experiments.&lt;/p&gt;  </div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;br&gt;</div></td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>As an alternative to standard GSEA, analysis of data derived from RNA-seq experiments may also be conducted through the <del class="diffchange diffchange-inline">GSEAPreranked </del>tool.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&lt;h2&gt;Alternative Method: GSEA-Preranked&lt;/h2&gt;</ins></div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">This previously served as the GSEA team's recommended pipeline for analysis of RNA-seq data, however, we now recommend the normalized counts procedure described above. </ins>As an alternative to standard GSEA, analysis of data derived from RNA-seq experiments may also be conducted through the <ins class="diffchange diffchange-inline">GSEA-Preranked </ins>tool.  </div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;In particular:&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;In particular:&lt;/p&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;ol&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;ol&gt;</div></td></tr> </table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4373&oldid=prev Acastanza at 19:33, 18 November 2019 2019-11-18T19:33:40Z <p></p> <table class="diff diff-contentalign-left" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 19:33, 18 November 2019</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8" >Line 8:</td> <td colspan="2" class="diff-lineno">Line 8:</td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;These quantifications are not normalized for comparisons across samples. Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA.&lt;/p&gt;  </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;These quantifications are not normalized for comparisons across samples. Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA.&lt;/p&gt;  </div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/ssGSEAProjection-gpmodule/v9/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA, for this implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/ssGSEAProjection-gpmodule/v9/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA, for this implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;</div></td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;Tools such as <del class="diffchange diffchange-inline">Voom or </del>DESeq2 can be made to produce properly normalized data which are compatible with GSEA. <del class="diffchange diffchange-inline">[http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/VoomNormalize/2 Voom-Normalize] and </del>[https://genepattern.github.io/DESeq2/v1/index.html DESeq2] <del class="diffchange diffchange-inline">modules which produce this output are </del>available through the [https://cloud.genepattern.org/ GenePattern environment]<del class="diffchange diffchange-inline">. These tools to produce </del>GSEA compatible “normalized counts” <del class="diffchange diffchange-inline">tables </del>in the [[Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29|GCT format]].&lt;/p&gt;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;Tools such as DESeq2 can be made to produce properly normalized data <ins class="diffchange diffchange-inline">(normalized counts) </ins>which are compatible with GSEA. <ins class="diffchange diffchange-inline">The </ins>[https://genepattern.github.io/DESeq2/v1/index.html DESeq2 <ins class="diffchange diffchange-inline">module</ins>] available through the [https://cloud.genepattern.org/ GenePattern environment] <ins class="diffchange diffchange-inline">produces a </ins>GSEA compatible “normalized counts” <ins class="diffchange diffchange-inline">table </ins>in the [[Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29|GCT format]] <ins class="diffchange diffchange-inline">which can be directly used in the GSEA application</ins>.&lt;/p<ins class="diffchange diffchange-inline">&gt;</ins></div></td></tr> <tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">&lt;br</ins>&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''While GSEA can accept transcript-level quantification directly and sum these to gene-level, these quantifications are not typically properly normalized for between sample comparisons. As such, transcript level CHIP annotations are no longer provided by the GSEA-MSigDB team.&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;'''Note: '''While GSEA can accept transcript-level quantification directly and sum these to gene-level, these quantifications are not typically properly normalized for between sample comparisons. As such, transcript level CHIP annotations are no longer provided by the GSEA-MSigDB team.&lt;/p&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>For more information on performing GSEA with RNA-seq data see: [[RNA-Seq_Data_and_Ensembl_CHIP_files|RNA-seq Data and Ensembl CHIP Files]]</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>For more information on performing GSEA with RNA-seq data see: [[RNA-Seq_Data_and_Ensembl_CHIP_files|RNA-seq Data and Ensembl CHIP Files]]</div></td></tr> </table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4367&oldid=prev Acastanza at 18:25, 14 November 2019 2019-11-14T18:25:14Z <p></p> <table class="diff diff-contentalign-left" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:25, 14 November 2019</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">&lt;h1&gt;Using RNA-seq Datasets with GSEA&lt;/h1&gt;</del></div></td><td colspan="2"> </td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. RNA-seq quantification pipelines typically produce quantifications containing one or more of the following:&lt;/p&gt;</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. RNA-seq quantification pipelines typically produce quantifications containing one or more of the following:&lt;/p&gt;</div></td></tr> </table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4365&oldid=prev Acastanza: moved RNA-Seq Data and GSEA to Using RNA-seq Datasets with GSEA 2019-11-14T18:24:57Z <p>moved <a href="/cancer/software/gsea/wiki/index.php/RNA-Seq_Data_and_GSEA" class="mw-redirect" title="RNA-Seq Data and GSEA">RNA-Seq Data and GSEA</a> to <a href="/cancer/software/gsea/wiki/index.php/Using_RNA-seq_Datasets_with_GSEA" title="Using RNA-seq Datasets with GSEA">Using RNA-seq Datasets with GSEA</a></p> <table class="diff diff-contentalign-left" data-mw="interface"> <tr class="diff-title" lang="en"> <td colspan="1" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="1" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:24, 14 November 2019</td> </tr><tr><td colspan="2" class="diff-notice" lang="en"><div class="mw-diff-empty">(No difference)</div> </td></tr></table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4364&oldid=prev Acastanza at 18:24, 14 November 2019 2019-11-14T18:24:45Z <p></p> <table class="diff diff-contentalign-left" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 18:24, 14 November 2019</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td></tr> <tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;h1&gt;Using <del class="diffchange diffchange-inline">GSEA with </del>RNA-seq Datasets&lt;/h1&gt;</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>&lt;h1&gt;Using RNA-seq Datasets <ins class="diffchange diffchange-inline">with GSEA</ins>&lt;/h1&gt;</div></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr> <tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>&lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.</div></td></tr> </table> Acastanza https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php?title=Using_RNA-seq_Datasets_with_GSEA&diff=4363&oldid=prev Acastanza: Initial draft 2019-11-14T18:24:01Z <p>Initial draft</p> <p><b>New page</b></p><div>&lt;h1&gt;Using GSEA with RNA-seq Datasets&lt;/h1&gt;<br /> <br /> &lt;p&gt;GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column.<br /> It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. RNA-seq quantification pipelines typically produce quantifications containing one or more of the following:&lt;/p&gt;<br /> &lt;ul&gt;<br /> &lt;li&gt;Counts/Expected Counts&lt;/li&gt;<br /> &lt;li&gt;Transcripts per Million (TPM)&lt;/li&gt;<br /> &lt;li&gt;FPKM/RPKM&lt;/li&gt;<br /> &lt;/ul&gt;<br /> &lt;p&gt;These quantifications are not normalized for comparisons across samples. Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA.&lt;/p&gt; <br /> &lt;p&gt;'''Note: '''[https://gsea-msigdb.github.io/ssGSEAProjection-gpmodule/v9/index.html ssGSEA] (single-sample GSEA) projections perform substantially different mathematical operations from standard GSEA, for this implementation, gene-level summed TPM serves as an appropriate metric for analysis of RNA-seq quantifications.&lt;/p&gt;<br /> &lt;p&gt;Tools such as Voom or DESeq2 can be made to produce properly normalized data which are compatible with GSEA. [http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/VoomNormalize/2 Voom-Normalize] and [https://genepattern.github.io/DESeq2/v1/index.html DESeq2] modules which produce this output are available through the [https://cloud.genepattern.org/ GenePattern environment]. These tools to produce GSEA compatible “normalized counts” tables in the [[Data_formats#GCT:_Gene_Cluster_Text_file_format_.28.2A.gct.29|GCT format]].&lt;/p&gt;<br /> &lt;p&gt;'''Note: '''While GSEA can accept transcript-level quantification directly and sum these to gene-level, these quantifications are not typically properly normalized for between sample comparisons. As such, transcript level CHIP annotations are no longer provided by the GSEA-MSigDB team.&lt;/p&gt;<br /> For more information on performing GSEA with RNA-seq data see: [[RNA-Seq_Data_and_Ensembl_CHIP_files|RNA-seq Data and Ensembl CHIP Files]]<br /> &lt;br&gt;<br /> &lt;p&gt;The GSEA algorithm ranks the features listed in a GCT file. It provides a number of alternative statistics that can be used for feature ranking. But in all cases (or at least in the cases where the dataset represents expression profiles for differing categorical phenotypes) the ranking statistics capture some measure of genes' differential expression between a pair of categorical phenotypes. While these metrics are widely used for RNA-seq datasets, the GSEA team has yet to fully evaluate whether these ranking statistics, originally selected for their effectiveness when used with Microarray-based expression data, are entirely appropriate for use with data derived from RNA-seq experiments.&lt;/p&gt; <br /> &lt;br&gt;<br /> As an alternative to standard GSEA, analysis of data derived from RNA-seq experiments may also be conducted through the GSEAPreranked tool.<br /> &lt;p&gt;In particular:&lt;/p&gt;<br /> &lt;ol&gt;<br /> &lt;li&gt;Prior to conducting gene set enrichment analysis, conduct your differential expression analysis using any of the tools developed by the bioinformatics community (e.g., cuffdiff, edgeR, DESeq, etc).&lt;/li&gt;<br /> &lt;li&gt;Based on your differential expression analysis, rank your features and capture your ranking in an RNK-formatted file. The ranking metric can be whatever measure of differential expression you choose from the output of your selected DE tool. For example, cuffdiff provides the (base 2) log of the fold change.&lt;/li&gt;<br /> &lt;li&gt;Run GSEAPreranked, if the exact magnitude of the rank metric is not directly biologically meaningful select &quot;classic&quot; for your enrichment score (thus, not weighting each gene's contribution to the enrichment score by the value of its ranking metric).&lt;/li&gt;<br /> &lt;/ol&gt;<br /> Please note that if you choose to use any of the gene sets available from MSigDB in your analysis, you need to make sure that the features listed in your RNK file are genes, and the genes are identified by their HUGO gene symbols. All gene symbols listed in the RNK file must be unique, match the ENSEMBL version used in the targeted version of MSigDB, and we recommend the values of the ranking metrics be unique.</div> Acastanza