Gsea enhancements

From GeneSetEnrichmentAnalysisWiki
Revision as of 15:21, 27 March 2006 by Hkuehn (talk | contribs)
Jump to navigation Jump to search

Found/requested 3/27 (with installed Gsea2):

1. From the Load Data page: load a gene set matrix, use Extract Gene Sets to create gene sets in memory, use Create Ranked List to create a ranked list from one of those gene sets. It tells me that it created the ranked list and gives me the following info messages, but I can't find the ranked list: it's not in the default output folder, the object cache list, or the drop-downs on the PreRanked GSEA page.

1355 [INFO ] Starting: => Extract GeneSets from the GeneMatrix
1385 [INFO ] Successfully created a GeneSet from the Dataset s2_gene_set_database.diabetes.gmt into: 323 gene sets
1425 [INFO ] Null widget - no window opened
9346 [INFO ] Starting: => Create a RankedList
9366 [INFO ] Successfully created a RankedList from the GeneSet 41BBPATHWAY into: 34 members
9366 [INFO ] Null widget - no window opened

2. On the Leading Edge Report page, I built a report, realized I built the wrong one, so selected and built a different one. A new tab appears, but it doesn't get the focus (so I didn't notice it at first). Also, the tabs that get created should have close (X) icons.

3. The output folder creates a subfolder for each day (mar21, mar22, etc). Shouldn't the folder name include the year? Or, do we expect people to be deleting the reports on a regular basis?

Found/requested 3/24 (with installed Gsea2):

1. Names generaed by GSEA were too long for Windows. My personal profile failed to load and I was logged in with a temporary. Had to zip (and delete) one of my GSEA report folders before I could log into Windows with my own profile. Put the zipped reports folder in dropbox/foraravind.


Found/requested 3/22 (with installed Gsea2):

1. I expected chip or no chip specified to determine whether gene symbols/titles showed up in gene set detail report. Ran GSEA with p53, native, no chip, and got no gene symbol as expected. Added chip and got gene symbol as expected. But, then removed chip and still got the gene symbols (as if they were stored in memory or something). Parameters on the report confirm that I did not have chip specified. All three reports (nochip, chip, and chip_nochip) are in dropbox/foraravind.

Indeed, if you set chip once for a dataset, the program remembers this association during the current session.

2. Created my own gene set file, one gene set with text description and another with URL. Ran GSEA. The Enrichment Result report tries to link my gene sets to MSigDB rather than using my descriptions. (That report also on the dropbox/foraravind.)

Fixed (note: it was clumsy to place a desc text in the table as it could be very long. So, the 2 modes are: 1) na in the desc field of the gmx /gmt file -> auto links to msigdb. 2) a valid url i.e text that starts with http > links to the custom http ... site specified

3. On Leading Edget report, when I click select reports from application cache, I expected to get today's reports (the ones in the Object Cache list on the Load Data page). Instead, I get a list of all reports, neatly arranged by date, excluding todays. When I run reports, they appear in the Object Cache as I expect.

This is a 'feature'. The object cache lists the programs memory  while the leading edge cache lists the file system - i.e all analysis ever done (which could be too large to load into memory).

4. Run GSEA using the new Gene Matrix from web site tab to select the C2 gene sets. Analysis fails due to duplicate gene sets.

5. On the new error handling, when I click the red error in the processes box, it should automatically open the console viewer and show me the error. It took me a litle bit to realize that I had to Click for details... to open the console and then click Error to see the error.

Fixed

6. Analysis History page still isn't showing up for me.

Fixed

7. On MSigDB page, when I click export, I'm expecting to export what I have displayed in the table on the MSigDB page. When I selected All Items, I thought that meant all items displayed in the table; it seems mean all items in the MSigDB? (thanks for going back to one button)

Fixed

8. RunGSEA form, the normalization parameter, you were going to delete the varmean option (or at least change the current name VarMeanPosNegSeparate back to varmean).

Fixed

9. On the Algorithms page of the Preferences window, delete the phrase "(they can also be changed in params)" -- only one of them can.

Fixed

10. Run GSEA, add a phenotypes file, create phenotypes on the fly (works fine), click the Show Phenotypes from all Sources. Should show labels from both the file you added and the one you just created, but now shows only one at a time. (This seems to have broken in the 3/21 build; I'm pretty sure it was working in the 3/20 build).


11. On the GSEA analysis report:  Indent the phenotype permutation warning (or make it another bullet) so it looks more like part of the "Other" section. Also, remove the message at the  bottom: "# of genesets before size filtering: 3 and # of genesets after size filtering: 3" (that info is already in the "Gene set details" section of the report).

Fixed

12. Leading Edge Viewer looks good! Two things: (1) The bottom two viewers have zoom controlled by CTRL+[ and CTRL+], but I can't get the focus to the right-hand viewer. The CTRL sequence always zooms the left viewer.  (2) The viewers completely replace the old HTML report, which included the details of gene sets used (name, # of members, # of members in signal, tag%, list%, signal strength). That info seems useful and now lost? (The 2nd viewer (the set-to-set comparison) has a nice open spot for a "Display gene set details" button...)

Josh is adding 2) and can help fix 1).



Feature Additions for build 3/2
1

1. Leading edge interactive viewer
2. Added a preferences field for path to user home dir.
3. made native the default space in gsea. Not sure if this is better??
        I prefer gene_symbol since that's what we recommend.
4. default collection of gene sets available via ftp

Found/requested 3/17:

1. Add Command button to Leading Edge page.

Cant do this easilly because of the way the command thing is setup.
You said you could do this after all.

2. Leading edge from command line give me fatal errors:
         719  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\databases    at
              edu.mit.broad.xbench.core.api.VdbManagerImpl._mkdir(VdbManagerImpl.java:89)
         719  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\chip2chip    at  
                        edu.mit.broad.xbench.core.api.VdbManagerImpl._mkdir(VdbManagerImpl.java:89)
         799  [FATAL] Could not make dir: C:\Documents and Settings\hkuehn\.xtools_home\reports_cache_foo    at
                       edu.mit.broad.xbench.core.api.VdbManagerImpl._mkdir(VdbManagerImpl.java:89)

Must set -DGSEA=true flag
So,

java -Xmx ... -DGSEA=true xtools.....



3. MSigDB page, the Find sets that Overlap search gives page with no Export button; make that page more like the
      Find sets that contain this gene page.

Joshs reworked impl will replace this and the export option should be added to it

4. On Run GSEA, change parameter name from "Analyze in the feature space" to "Gene/probe identifier format"

Lets talk. Forgot to bring this up in our last chat.
What about using "Collapse dataset" as the name, where values are True (use 'chip' to collapse dataset to gene symbols) and False (blah blah).

5. Remove Downloads>Download Gene Sets (no longer needed now that we have MSigDB page).

Done

6. Change first two Help items to "GSEA web site" and "GSEA documentation". First points to home page of web site and
           second points to the doc page of the web site.

Done

Bugs found 3/15/2006 include:

1. Generate a report, go to "Other" section, look at parameters. Error, file cannot be found.

Fixed.

2. Started a pretty long analysis, killed it. It took a few minutes versus a few seconds. If that's expected, perhaps change message to say may take a few minutes.

Whats a long analysis? (for leading edge clustering more than a handful of sets, say 20, is likely pointless)

3. Leading edge report brings up the gct files in a text editor rather than Excel; can't really read them in a text editor.

Opening gct files now works like this: First check prefs to see if excel (or the other programs) exist at location indicated in the prefs. If so, use it. Otherwise issue a generic open file command in windows that will open up the file in whatever editor has been registered by windows for that file type. On the mac, the later mode is always done. On unix i dont know what will happen.