September 2009http://www.genepattern.org/

 1. GenePattern 3.2

GenePattern version 3.2 is now available. New features in this release include:

  • Shared analysis result files: GenePattern users can now share analysis result files with all GenePattern users or a group of GenePattern users. All jobs are still private by default. more...

  • New job status page: The job status page now displays complete information about the analysis job, including run status, parameter values, analysis results, and any visualizers included in the job. This new job status page can be displayed at any time for easy access to analysis results and the visualizers used to view them. more...

  • New results summary page: The results summary page has been reorganized to provide a clear and comprehensive view of your jobs. more...

  • Flexible authentication and authorization: GenePattern 3.2 can be configured through two new Java interfaces. more...

  • Desktop Client retired: The features of GenePattern's Java-based application interface have been migrated to the GenePattern web interface. GenePattern 3.2 and future releases will use the web interface.
GenePattern Web site:http://www.genepattern.org/
GenePattern 3.2 public server:http://genepattern.broadinstitute.org/gp/
GenePattern 3.2 download:http://www.genepattern.org/download/
GenePattern 3.2 release notes:http://www.genepattern.org/doc/relnotes/3.2/

We welcome your feedback and encourage you to send questions and comments to gp-help@broadinstitute.org.


 2. New and updated modules

A number of modules have been added to GenePattern since our December 2008 newsletter.

New modules

The following new modules have been added since December 2008:

  • CoxRegression and LogisticRegression are survival analyses. Cox proportional hazard modeling is used to assess association of variable(s) of interest with time-to-event data (e.g., death, tumor recurrence,…). Logistic regression is used to assess association of variable(s) of interest with binary clinical data (e.g., treatment response). Both modules are based on the R package for survival analysis.

  • ESPPredictor provides a means of predicting, from sequence alone, which peptides for any given protein are likely to work well for targeted mass spec assay development. It is based on work published by Fusaro et al. in Prediction of high-responding peptides for targeted protein assays by mass spectrometry (Nature Biotechnology 27:190-198, 2009).

  • NearestTemplatePrediction performs class prediction using a predefined list of marker genes for two or more classes. It is based on work published by Hoshida Y, et al. in Gene Expression in Fixed Tissues and Outcome in Hepatocellular Carcinoma (N Engl J Med 359:1995-2004, 2008).

  • The FLAME modules define and characterize discrete populations in flow cytometric data using FLow analysis with Automated Multivariate Estimation (FLAME). They are based on work published by Pyne, et al. in Automated high-dimensional flow cytometric data analysis (PNAS 106:8519-8524, 2009).

    • FLAMEPreviewTransformation applies several transformations to a representative sample from the flow cytometric data and generates a scatter plot for each one.
    • FLAMEPreprocess performs a series of preprocessing operations on flow cytometric data files, including column/channel selection, bi-exponential transformation, and optional live-cell gating.
    • FLAMEMixtureModel clusters each preprocessed sample data file over a range of possible cluster numbers.
    • FLAMEChooseOptimalClusterNumber determines the optimal number of clusters for each sample based on the range of cluster numbers provided to FLAMEMixtureModel.
    • FLAMEMetacluster takes the data samples, which have been optimally clustered into subpopulations, and matches the subpopulations so that a given population can be identified uniformly across all samples.
    • FLAMEContourDataGenerator generates data which allows the FLAMEViewer to draw a 3-D contour plot of the clusters in the sample.
    • FLAMEViewer displays a 3-D scatterplot and, if FLAMEContourDataGenerator was run, a 3-D contour plot of the clusters in the sample.
    • FLAMEContourViewer.Pipeline runs the FLAMEContourDataGenerator and FLAMEViewer modules.

  • Mark Doderer (University of Texas at San Antonio) contributed the following modules, which implement two new class prediction methods for protein sequences. The modules are described in the paper Species Independent Protein Localization Prediction for Multi-compartmentalized Proteins presented by Doderer, Yoon, and Kwek at MLMTA 2007 — The 2007 International Conference on Machine Learning: Models, Technologies & Applications.

    • BlastTrainTest and BlastXValidation implement a sequence similarity class prediction method, which uses BLAST to find a known protein that is homologous to the unknown protein and then uses the known label for the prediction. Input dataset features and class labels must be in the ARFF file format.
    • ModEcocTrainTest and ModEcocXValidation implement the Modified Error correcting output code (ModEcoc) class prediction method, which was developed to address the need for a classifier that could handle multi-labels with the ability to make a priori predictions of combinations of classes not seen in the training set. Input dataset features and class labels must be in the ARFF file format.
    • Arff2Gct and Gct2Arff convert between the ARFF file format and GenePattern's GCT file format. An ARFF (Attribue-Relation File Format) file is a text file that describes data using a set of attributes. It was developed for the Weka machine learning softeware by the Machine Learning Project at the Department of Computer Science at the University of Waikato.
    • PredictionResultsViewer (v5) updates the PredictionResultsViewer module to display the prediction results generated by the new protein class prediction modules.
    • CombineOdf combines the .odf output files from two class prediction modules and creates a multi-label .odf file such as the one produced by the ModEcoc and BLAST class prediction modules.

Updated modules

In addition, the following modules have been updated since December 2008:

  • GSEA (v5) includes a fourfold speed improvement and minor corrections to file handling.

  • HierarchicalClustering (v5) now clusters only columns by default. Row clustering is still available.

Access the modules

The above new and updated modules are available on the GenePattern public server: http://genepattern.broadinstitute.org/gp/.

For comprehensive documentation on all modules in the repository, see the Modules page of the GenePattern web site.


 3. We are hiring

The GenePattern team is looking to hire an experienced Java software developer. For more information, see the job description at the Broad Institute Career Center.


 4. Contact us

We always welcome your feedback and encourage you to send questions and comments to gp-help@broadinstitute.org.

User survey

We'd like to know more about your day-to-day experiences with GenePattern. Our user survey is a brief online form that lets you give us feedback about the software and other aspects of using GenePattern. Your responses are greatly appreciated - they help us to understand how GenePattern is being used and how to make it a more valuable tool.

Early adopters

If you'd like early access to new GenePattern releases to help us test new GenePattern features,
join the early adopters mailing list.


To remove yourself from this list, unsubscribe.