Difference between revisions of "GSEA v4.2.x Release Notes"

From GeneSetEnrichmentAnalysisWiki
Jump to navigation Jump to search
Line 10: Line 10:
 
The GSEA v4.2.0 release includes a number of improvements and bug fixes, including:
 
The GSEA v4.2.0 release includes a number of improvements and bug fixes, including:
  
* Added a Spearman Correlation metric for continuous phenotypes. ''Need to add a "what and why" explanation ...''
+
* Added a Spearman Correlation metric for continuous phenotypes.
* Added a new Absolute Max of Probes collapse mode. ''Further details to come ...''
+
* Added a new Absolute Max of Probes collapse mode.
 
* Added a feature to allow saving the resulting dataset when the Collapse or Remap_Only options are set for a GSEA analysis.  If the 'Create GCT files' option under Advanced Fields is set to ''true'', the dataset will be saved as a GCT in the ''edb'' sub-folder of the analysis result directory.
 
* Added a feature to allow saving the resulting dataset when the Collapse or Remap_Only options are set for a GSEA analysis.  If the 'Create GCT files' option under Advanced Fields is set to ''true'', the dataset will be saved as a GCT in the ''edb'' sub-folder of the analysis result directory.
 
* Modified to save the console log to a 'gsea.log' file in gsea_home'.
 
* Modified to save the console log to a 'gsea.log' file in gsea_home'.
 +
* Updated to Log4J 2.16.0.  Note however, we do not believe any version of GSEA Desktop is impacted by the vulnerability of earlier Log4j versions because it is a desktop application and does not expose any input forms to users over the web.  '''If you are exposing GSEA through a website or other networked server then we recommend you update to 4.2.0 immediately.'''
  
 
There are also updates for better handling of missing values in the input datasets in the file parsers and computations.  GSEA ignores missing values in general but there were certain situations where this was not the case.  These happened primarily around missing tab fields and explicit NA or NaN input values, but there were also improvements to the handling of missing values overall.
 
There are also updates for better handling of missing values in the input datasets in the file parsers and computations.  GSEA ignores missing values in general but there were certain situations where this was not the case.  These happened primarily around missing tab fields and explicit NA or NaN input values, but there were also improvements to the handling of missing values overall.

Revision as of 20:13, 15 December 2021

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact


GSEA Desktop v4.2.0 (Dec 2021)

The GSEA v4.2.0 release includes a number of improvements and bug fixes, including:

  • Added a Spearman Correlation metric for continuous phenotypes.
  • Added a new Absolute Max of Probes collapse mode.
  • Added a feature to allow saving the resulting dataset when the Collapse or Remap_Only options are set for a GSEA analysis. If the 'Create GCT files' option under Advanced Fields is set to true, the dataset will be saved as a GCT in the edb sub-folder of the analysis result directory.
  • Modified to save the console log to a 'gsea.log' file in gsea_home'.
  • Updated to Log4J 2.16.0. Note however, we do not believe any version of GSEA Desktop is impacted by the vulnerability of earlier Log4j versions because it is a desktop application and does not expose any input forms to users over the web. If you are exposing GSEA through a website or other networked server then we recommend you update to 4.2.0 immediately.

There are also updates for better handling of missing values in the input datasets in the file parsers and computations. GSEA ignores missing values in general but there were certain situations where this was not the case. These happened primarily around missing tab fields and explicit NA or NaN input values, but there were also improvements to the handling of missing values overall.

  • Added more prominent warnings in the logs, the UI, and the reports when there are missing values in the input.
  • Modified the GCT, TXT, RNK, and PCL parsers to better handle these cases. NA values were formerly not treated as missing and would cause a numeric parsing error. Likewise for quoted empty values. These are now treated simply as missing values aand ignored.
  • Fixed bugs in most metric calculations where the missing values were not ignored as intended. This affected all metrics except signal-to-noise (S2N, the default) and tTest.
  • Fixed the collapse calculations to also ignore missing values among the individual probes in the same way as the metric computations. This can affect the calculation of mean or median, for example.

Likewise, there are also updates to provide warnings about explicit infinite values in the input dataset. Such values can cause unexpected results during computation or plotting and are not recommended. Infinite values in the input will, however, be handled and used as-is in the metric computations.

Infinite values coming out of the metric computations will be adjusted to 0.01 when using the various "weighted" scoring modes, to avoid interfering with the rest of the enrichment results and any subsequent reporting. This has the effect of de-emphasizing that particular gene in any scoring.

This adjustment has historically been applied to the "weighted" scoring mode but was not previously documented; it has been extended to the "weighted_p1.5" and "weighted_p2" modes. It is not applied to the Classic K-S scoring mode since the expression values are note directly used with this mode.

This adjustment is also made to infinite values during plotting to avoid errors from the charting library being unable to render such values.

Warnings are also provided for NaN values coming out of metric computations (resulting from division-by-zero or taking the root of a negative value).

The vast majority of datasets should be unaffected by these changes as such values should be relatively rare. If you have run analyses on datasets with missing, NA, NaN, or Infinite values and are concerned about changes to the results, we recommend re-running the analysis with GSEA 4.2.0 to evaluate the possible differences.

Beyond that, there are a number of miscellaneous improvements and bug fixes. Chief among these are:

  • Changed the FDR q-value scale on the NES vs Significance plot. This was formerly 0-100 but has been changed to 0.0-1.0 to match the values in the report table.
  • Added minimum-sample warnings and errors for the continuous phenotype metrics. Fixed a bug where the minimum-sample check was not applied with gene_set permutation mode.
  • Added a warning about use of the FDR when only one gene set is being analyzed. Reported FDRs are not an accurate representation of the actual false discovery rate when derived from a single gene set.
  • Modified the launcher scripts to fix some issues with recent Java 11 releases on newer versions of macOS and to better support symlinks on Mac and Linux.
  • Fixed bugs with GMT caching and the gene set subset-select feature on Windows.
  • Fixed a bug with some UI parameter widgets handling empty values.
  • Fixed a bug where the analysis RPT file was not saved if there was an error.
  • Fixed a bug with GMT & CHIP sorting for MSigDB point releases.
  • Fixed some issues with blank fields in the CHIP parser.
  • Fixed some bugs in the GCT & TXT export functions.
  • Improved the error message for a missing phenotype selection.
  • Updated the CHIP Download link in the Help menu to use our new location.
  • Fixed a UI dialog-centering bug.
  • Added GSEA & MSigDB citation info to the report.