Gsea enhancements

From GeneSetEnrichmentAnalysisWiki
Revision as of 12:27, 19 April 2006 by Hkuehn (talk | contribs)
Jump to navigation Jump to search

Beta testing 4/18:

1. Created a tiny dataset with 4 samples; create phenotype on the fly with 1 sample in ClassA and 3 in ClassB; got this error. If I create phenotype on the fly with 2 samples in each class, life is good.

<Error Details>

Full Error Message ----

Nan hit score for feature: TACC2

Stack Trace ----

  1. of exceptions: 1

Nan hit score for feature: TACC2------

java.lang.IllegalStateException: Nan hit score for feature: TACC2 at at at at at at at xtools.gsea.Gsea.execute_one( at xtools.gsea.Gsea.execute( at$ at Source)

2. From the command line, gene set names have to be case sensitive. They should be case INsensitive. (Tested using xtools.gsea.LeadingEdgeTool, where -gsets is comma-separated list of gene set names.)

GSEA Build 4/10:

Added a new preference that disables the online check for gene sets. Noticed with users that if their laptop was not connected to the internet then the gene sets widget in the gsea panel took forever to launch. So, users can now disable the online check by unchecking the preference. This works the same way netscape/email programs do the online thing.


GSEA Beta 4/7:

1. software_download.html: easilly -> easily. Might also want to add: If you cannot remember the location of the GSEA home folder, start the application and click "Help>Show GSEA Home Folder".


2. Installed into c:\gsea_home. Installation seems to have created TWO home directories: c:\gsea_home and C:\Documents and Settings\hkuehn\gsea_home\.

No way. Show me
See dropbox/aravind/gsea_install.ppt -- pictures of the two directories that get created. I deleted ALL gsea_home directories from everywhere and then did a fresh install from Beta
to c:\program files\gsea_home this morning (4/12). These are the two directories that got created. (I immediately went to Preferences and pointed everything to c:\program files\gsea_home; everything seems to be working fine; nothing seems to be pointing to or using c:\documents and settings.)

The directories have different contents (eg c:\gsea_home has uninstall directory and annotations; the other has no uninstall diretory, no annotations). GSEA app (Help>Show GSEA Home Folder, Show GSEA Output, Preferences) all show C:\Documents and Settings\hkuehn\gsea_home\.

Attempt to run GSEA, no annotations. Annotations are in c:\gsea_home; app is looking in C:\Documents and Settings\hkuehn\gsea_home\.

3. On MSigDB page, Find Sets that Contain this gene, doesn't seem to be working. Tried search for MEST, no such gene. Checked gene set cards for a gene symbol (tp53), searched for that on msigdb page, no such gene.

Cant reproduce -- seems to work for me.
OK; works for me as well.

4. On Preference window, what's the path to the user home directory used for?

Its a very important preference. It allows one to place the gsea_home folder anywhere and then tell gsea where it lives. This allows one to for example, not place gsea under a backed up folder  (recommended for users at the Borad - it just makes the login much longer because there are many Mb of annotation files that are needlessly synchronized)

On the GSEA Web Site:

1. Doc page top menu has a different order than other pages:
    Other:  Home  ˆ    Software  ˆ    MSigDB  ˆ    GeneSetCards  ˆ    Documentation  ˆ    Resources  ˆ    
    Doc:    Home  ˆ   Documentation  ˆ   Software  ˆ   MSigDB  ˆ   GeneSetCards  ˆ   Resources  ˆ   


2. GSEA icon on doc page should link to GSEA home page (not doc page)

Unable to fix

3. MSigDB page needs to point to license notice, as on V1 web site.


4. Are we putting Java docs for GSEA application on the web site? Where are they?

Probably not unless someone asks

5. Make the Contact Us page more prominent (add it to Resources page?).

6. On the Contact Us page, software credits and MSigDB credits should point to the appropriate wiki pages.

7. How should people cite GSEA and/or MSigDB in their papers (eg I used GSEA with my own gene set or the p53 gene set is enriched in my analysis)? We need to add this info to the Publications page (and to MSigDB page?).
See email

8. On the software download page (, incorrect link to GSEA-R zip file.

9. The title bar on the Gene Set Card pages is missing the GSEA logo and has the menu items in the wrong order (Doc before Software rather than before Resources). Since I notice that you've already fixed many of the above, perhaps it's just that I was looking while you were playing with the pages?

Next build will fix this

Found 4/4:

1. Specify Bhattacharyya with Continous pheno, should get error 1011 (get hardcoded error). Metric for ranking genes parameter.

Bhattacharyya is a continuous metric so isnt this correct?
When I run
Bhattacharyya with a continuous phenotype I get the following error (no error help button):

 Tool execution error
 Message: Template is not biphasic. Name: 100_g_at_profile_in_p53_dataset_hgu95av2.cls#100_g_at # splits= 50
 This metric can only be used with 2 class comparisons

java.lang.RuntimeException: Template is not biphasic. Name: 100_g_at_profile_in_p53_dataset_hgu95av2.cls#100_g_at # splits= 50
 This metric can only be used with 2 class comparisons

2. Specify Pearson with Categorical, should get error 1010 (get hardcoded error). Metric for ranking genes parameter.

Pearson is allowed for categorical & continuous
When I run Pearson with a Categorical phenotype, I get the following error (no help button); now that I look more closely, it doesn't seem to be related to the Categorical/Continuous thing...

Message: For input string: "MUT"

java.lang.NumberFormatException: For input string: "MUT"
    at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
    at java.lang.Float.parseFloat(Unknown Source)....

3. Add Fold Change back into Metric for ranking genes parameter drop-down list.

Done. Its called 'Ratio_of_Means' to be more prcise on what fold change is.

Found 4/3 (Aravind's notes from Heidi's cube):

tool cache doesnt show up

tool cache wrong on errored out report

file chooser static
No fix. Known limitation

installer must install databases/Hs.alias
installer must delete folder
Made new software upgrade to v2 page

Found 4/3 with new build:
1. Attempt to run the leading edge analysis gives following error (Build HTML Report works fine):
java.lang.NoClassDefFoundError: org/apache/commons/math/MathException
    at org.genepattern.gsea.LeadingEdgeAnalysis.<init>(
    at org.genepattern.gsea.LeadingEdgeWidget.runAnalysis(
    at org.genepattern.gsea.LeadingEdgeWidget.access$400(
    at org.genepattern.gsea.LeadingEdgeWidget$
    at foxtrot.AbstractWorkerThread$
    at Method)
    at foxtrot.AbstractWorkerThread.runTask(
    at Source)

Message: na
java.lang.Throwable: na
    at org.genepattern.gsea.XToolsMessageHandler.showErrorDialog(
    at org.genepattern.uiutil.UIUtil.showErrorDialog(
    at org.genepattern.gsea.LeadingEdgeWidget.runAnalysis(
    at org.genepattern.gsea.LeadingEdgeWidget.access$000(
    at org.genepattern.gsea.LeadingEdgeWidget$1.actionPerformed(
    at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
    at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
    at java.awt.Component.processMouseEvent(Unknown Source)
    at javax.swing.JComponent.processMouseEvent(Unknown Source)
    at java.awt.Component.processEvent(Unknown Source)
    at java.awt.Container.processEvent(Unknown Source)
    at java.awt.Component.dispatchEventImpl(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Window.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.EventQueue.dispatchEvent(Unknown Source)
    at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at Source)


Notes from user tests 3/31

Needs java1.5
So if you launch on a mac and see the error message unsupported class version it means that your mac (or windows) doesnt not have the correct java version.
For the mac:

Feature Additions for build 3/2

1. Leading edge interactive viewer includes HTML report option
2. Installer works: Download new version from: XXX
3. New msigdb xml file and gene set cards created. Gene set descriptions cleaned up. Anything odd is now a bug.
4. I'll be adding to the software version 2 release notes wiki page (pl subscribe and edit as needed).
5. Unified application messages (they are all now in the status bar at the bottom), no more system console viewer
6. Added a splash screen so that users know the application is loading after its desktop application has been double clicked
7. Implemented the startup screen (let me know what you think of its content)
8. For the chip platform parameter, multiple chips can be selected (for example if a dataset is made from 2 different chips).
9. By default the hyperlinks from the software will point to the PROD server. To make them connect to the DEV server, use the -Ddebug=true flag

Found/requested 3/27 (with installed Gsea2):

1. From the Load Data page: load a gene set matrix, use Extract Gene Sets to create gene sets in memory, use Create Ranked List to create a ranked list from one of those gene sets. It tells me that it created the ranked list and gives me the following info messages, but I can't find the ranked list: it's not in the default output folder, the object cache list, or the drop-downs on the PreRanked GSEA page.

1355 [INFO ] Starting: => Extract GeneSets from the GeneMatrix
1385 [INFO ] Successfully created a GeneSet from the Dataset s2_gene_set_database.diabetes.gmt into: 323 gene sets
1425 [INFO ] Null widget - no window opened
9346 [INFO ] Starting: => Create a RankedList
9366 [INFO ] Successfully created a RankedList from the GeneSet 41BBPATHWAY into: 34 members
9366 [INFO ] Null widget - no window opened

I fixed this but then rmoved the action - because i wondered if converting a gene set to a ranked list would cause confusion b/w what a gene set is and what a ranked list is. Thoughts?
Agreed; I removed it from the doc.

2. On the Leading Edge Report page, I built a report, realized I built the wrong one, so selected and built a different one. A new tab appears, but it doesn't get the focus (so I didn't notice it at first). Also, the tabs that get created should have close (X) icons.
Pl ask Josh; sent him email

3. The output folder creates a subfolder for each day (mar21, mar22, etc). Shouldn't the folder name include the year? Or, do we expect people to be deleting the reports on a regular basis?

Good point. But folder names are key to a lot of things and i'm wary of changing the convention at this stage.
Sounds reasonable. We can decide whether to change it or doc it in the next release (doc=recommend that people periodically delete old reports, renaming and saving what they want to keep).

4. On the Run GSEA page, select phenotypes, on the Select one or more phenotypes window, the button should be Create an on-the-fly phenotype... rather than Create a on-the-fly phenotype....

Found/requested 3/24 (with installed Gsea2):

1. Names generaed by GSEA were too long for Windows. My personal profile failed to load and I was logged in with a temporary. Had to zip (and delete) one of my GSEA report folders before I could log into Windows with my own profile. Put the zipped reports folder in dropbox/foraravind.

IMPortant to fix. Fixed
Is this impossible to fix or important to fix?

Found/requested 3/22 (with installed Gsea2):

1. I expected chip or no chip specified to determine whether gene symbols/titles showed up in gene set detail report. Ran GSEA with p53, native, no chip, and got no gene symbol as expected. Added chip and got gene symbol as expected. But, then removed chip and still got the gene symbols (as if they were stored in memory or something). Parameters on the report confirm that I did not have chip specified. All three reports (nochip, chip, and chip_nochip) are in dropbox/foraravind.

Indeed, if you set chip once for a dataset, the program remembers this association during the current session.

2. Created my own gene set file, one gene set with text description and another with URL. Ran GSEA. The Enrichment Result report tries to link my gene sets to MSigDB rather than using my descriptions. (That report also on the dropbox/foraravind.)

Fixed (note: it was clumsy to place a desc text in the table as it could be very long. So, the 2 modes are: 1) na in the desc field of the gmx /gmt file -> auto links to msigdb. 2) a valid url i.e text that starts with http > links to the custom http ... site specified
Doc with file formats.

3. On Leading Edget report, when I click select reports from application cache, I expected to get today's reports (the ones in the Object Cache list on the Load Data page). Instead, I get a list of all reports, neatly arranged by date, excluding todays. When I run reports, they appear in the Object Cache as I expect.

This is a 'feature'. The object cache lists the programs memory  while the leading edge cache lists the file system - i.e all analysis ever done (which could be too large to load into memory).

4. Run GSEA using the new Gene Matrix from web site tab to select the C2 gene sets. Analysis fails due to duplicate gene sets.

Fixed (new gene sets database)

5. On the new error handling, when I click the red error in the processes box, it should automatically open the console viewer and show me the error. It took me a litle bit to realize that I had to Click for details... to open the console and then click Error to see the error.


6. Analysis History page still isn't showing up for me.


7. On MSigDB page, when I click export, I'm expecting to export what I have displayed in the table on the MSigDB page. When I selected All Items, I thought that meant all items displayed in the table; it seems mean all items in the MSigDB? (thanks for going back to one button)


8. RunGSEA form, the normalization parameter, you were going to delete the varmean option (or at least change the current name VarMeanPosNegSeparate back to varmean).


9. On the Algorithms page of the Preferences window, delete the phrase "(they can also be changed in params)" -- only one of them can.


10. Run GSEA, add a phenotypes file, create phenotypes on the fly (works fine), click the Show Phenotypes from all Sources. Should show labels from both the file you added and the one you just created, but now shows only one at a time. (This seems to have broken in the 3/21 build; I'm pretty sure it was working in the 3/20 build).

11. On the GSEA analysis report:  Indent the phenotype permutation warning (or make it another bullet) so it looks more like part of the "Other" section. Also, remove the message at the  bottom: "# of genesets before size filtering: 3 and # of genesets after size filtering: 3" (that info is already in the "Gene set details" section of the report).


12. Leading Edge Viewer looks good! Two things: (1) The bottom two viewers have zoom controlled by CTRL+[ and CTRL+], but I can't get the focus to the right-hand viewer. The CTRL sequence always zooms the left viewer.  (2) The viewers completely replace the old HTML report, which included the details of gene sets used (name, # of members, # of members in signal, tag%, list%, signal strength). That info seems useful and now lost? (The 2nd viewer (the set-to-set comparison) has a nice open spot for a "Display gene set details" button...)

Josh is adding 2) and can help fix 1). sent email

Feature Additions for build 3/2

1. Leading edge interactive viewer
2. Added a preferences field for path to user home dir.
3. made native the default space in gsea. Not sure if this is better??
        I prefer gene_symbol since that's what we recommend. Fixed
4. default collection of gene sets available via ftp

Found/requested 3/17:

1. Add Command button to Leading Edge page.

Cant do this easilly because of the way the command thing is setup.
You said you could do this after all.
Na anymore because of Joshs improved interactive impl
Command syntax doc'd

2. Leading edge from command line give me fatal errors:
         719  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\databases    at
         719  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\chip2chip    at  
         799  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\reports_cache_foo    at

Must set -DGSEA=true flag

java -Xmx ... -DGSEA=true xtools.....

3. MSigDB page, the Find sets that Overlap search gives page with no Export button; make that page more like the
      Find sets that contain this gene page.

Joshs reworked impl will replace this and the export option should be added to it

4. On Run GSEA, change parameter name from "Analyze in the feature space" to "Gene/probe identifier format"

Lets talk. Forgot to bring this up in our last chat.
What about using "Collapse dataset" as the name, where values are True (use 'chip' to collapse dataset to gene symbols) and False (blah blah).

5. Remove Downloads>Download Gene Sets (no longer needed now that we have MSigDB page).

Done Doc'd

6. Change first two Help items to "GSEA web site" and "GSEA documentation". First points to home page of web site and
           second points to the doc page of the web site.

Done Doc'd

Bugs found 3/15/2006 include:

1. Generate a report, go to "Other" section, look at parameters. Error, file cannot be found.


2. Started a pretty long analysis, killed it. It took a few minutes versus a few seconds. If that's expected, perhaps change message to say may take a few minutes.

Whats a long analysis? (for leading edge clustering more than a handful of sets, say 20, is likely pointless)

3. Leading edge report brings up the gct files in a text editor rather than Excel; can't really read them in a text editor.

Opening gct files now works like this: First check prefs to see if excel (or the other programs) exist at location indicated in the prefs. If so, use it. Otherwise issue a generic open file command in windows that will open up the file in whatever editor has been registered by windows for that file type. On the mac, the later mode is always done. On unix i dont know what will happen.