December 2008

 GenePattern Workshops in February

Upcoming Dates

One-day workshops will be held at the Broad Institute in Cambridge, Massachusetts on

  • February 4, Wednesday, 9am-5pm
  • February 11, Wednesday, 9am-5pm

Register now. Academic/nonprofit organizations, free. For-profit organizations, $600. If these dates are inconvenient, use the registration form to request that we notify you of future workshops.

About the Workshop

Our popular GenePattern workshop introduces GenePattern and the methods behind the GenePattern modules for Gene Expression Analysis, including:

  • Running analyses using the GenePattern web interface
  • Differential gene expression analysis
  • Classification/prediction methods
  • Clustering
  • Using pipelines to chain modules together to create and share methodologies

Request for Workshop Topics

What would you like to hear about in a future workshop? proteomics? SNP analysis? how to write modules? in-depth discussion of computational methods? Tell us what interests you:

We'd like to hear from you regardless of whether you've attended a workshop.

 New and Updated Modules

New Modules

A number of modules have been added to GenePattern since our February newsletter:

  • CaArray2.1.0Importer imports data files from a caArray 2.1.0 repository into GenePattern. caArray ( is a web-accessible array data management system developed by the National Cancer Institute Center for Bioinformatics (NCICB).

  • ComBat runs the ComBat (Combining Batches) R script on a microarray dataset. The script uses an Empirical Bayes method to adjust for potential batch effects in the dataset. It is based on work published by Johnson, WE, Rabinovic, A, and Li, C in Adjusting batch effects in microarray expression data using Empirical Bayes methods (Biostatistics 8(1):118-127, 2007).

  • GctToPcl and PclToGct convert between Stanford's PCL file format and GenePattern's GCT file format.

  • GISTIC identifies regions of the genome that are significantly amplified or deleted across a set of samples. It implements the Genomic Identification of Significant Targets in Cancer (GISTIC) method published by Beroukhim R, Getz G, et al. in Assessing the significance of chromosomal abberations in cancer: Methodology and application to glioma (Proc Natl Acad Sci, 104:20007-20012, 2007). A GISTICPreprocess modules prepares SNP files for GISTIC and a Beroukhim.Getz.2007.PNAS.Glioma.GISTIC pipeline runs GISTIC with published data. The modules and pipeline are available only on the GenePattern public server.

  • IGVPreprocessor preprocesses data for display in the Integrative Genomics Viewer (IGV, Preprocessing computes the data that will be displayed at selected zoom levels and stores the results in a smaller binary .h5 file. Most data files can be loaded into IGV without preprocessing. Available only on the GenePattern public server.

  • IlluminaDASLPipeline creates a GenePattern GCT file from the raw data generated by a DNA-mediated Annealing, Selection, extension and Ligation (DASL) assay (Illumina) and scanned using a BeadArray Reader (Illumina). The pipeline comprises the IlluminaScanExtractor, IlluminaNormalizer, and IlluminaConcatenator modules. It implements the method published by Hoshida Y, et al. in Gene Expression in Fixed Tissues and Outcome in Hepatocellular Carcinoma (N Engl J Med., 2008 Oct 15). Available only on the GenePattern public server.

  • SubMap and SubMapBrowser identify the correspondence or commonality of subtypes found in multiple, independent data sets generated on various platforms. They implement the Subclass Mapping method published by Hoshida Y, et al. in Subclass Mapping: Identifying Common Subtypes in Independent Disease Data sets (PLoS ONE 2(11): e1195, 2007).

  • SurvivalCurve generates the survival curve for censored survival data. It is based on the survival 2.20 R package.

  • SurvivalDifference tests whether there is a difference between two or more survival curves based on sample classes defined by genomic data. It is based on the survival 2.20 R package.

Updated Modules

In addition, the following modules have been updated:

  • ComparativeMarkerSelection (v5) provides a new test statistics for calculating the difference in gene expression between two classes. The paired t-test option computes a paired, two-sample t-statistic.

  • CopyNumberDivideByNormals (v2) now accepts sample information files that do not include a Gender column.

  • ExpressionFileCreator (v8) now supports the latest Affymetrix CEL file formats (version 3, version 4, and Command Console version 1). As of v6, ExpressionFileCreator also supports custom CDF files.

  • HierarchicalClustering (v4) corrects a bug that we discovered in the original Cluster software. In versions 3 and earlier of HierarchicalClustering, if you select a centering option (row or column) it will use the "center by mean" choice regardless of whether you choose to center by mean or median. In version 4, the centering options work correctly.

    We have contacted the original authors of this software, Cluster software ( They have corrected this problem in Version 1.40 of their software.

  • ImputeMissingValuesKNN (v13) renamed the ImputeMissingValues.KNN module to ImputeMissingValuesKNN.

  • LOHPaired (v3) now bases pairings on array names rather than sample names (Array column of the sample information file rather than the Sample column).

  • PCAViewer (v5) adds the ability to color-code groups of features (experiments or genes) that have been projected onto a 2D or 3D principal component plot.

Access the Modules

All new and updated modules are available on the GenePattern 3.1 public server: To install new and updated modules on another GenePattern server, open the GenePattern Web Client and click Modules & Pipelines>Install from Repository. For comprehensive documentation on the modules in the repository, see our module page.

 Let Us Know How You're Using GenePattern

If you've a GenePattern story to share, we'd love to hear about it. If you're using GenePattern for your research, to share analysis methods, or to publish analysis results, please let us know:
email the GenePattern team.

User Survey

We'd like to know more about your day-to-day experiences with GenePattern. Our user survey is a brief online form that lets you give us feedback about the software and other aspects of using GenePattern. Your responses are greatly appreciated - they help us to understand how GenePattern is being used and how to make it a more valuable tool.

Early Adopters

If you'd like early access to new GenePattern releases to help us test new GenePattern features,
join the early adopters mailing list.

