Cancer Program Data Resources

Achilles Project

The goal of the Achilles Project is to create a genome-wide catalog of tumor dependencies to identify vulnerabilities associated with genetic or epigenetic alterations.

ALL/AML Classification

The landmark 1999 Golub, Slonim et al study represented the first demonstration that genomic approaches (in this case gene expression profiling) could be used to identify new cancer subtypes or assign tumors to known classes. Here, you can download the original proof-of-concept data demonstrating successful classification between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes.

Cancer Cell Line Encyclopedia (CCLE)

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines.

Cancer Cell Line Factory (CCLF)

The limited availability of appropriate disease models is a major bottleneck that limits research impact. If model systems representing the genetic diversity of diseases such as cancer were readily available, experiments to understand the biological consequence of particular gene mutations and to develop new therapeutic strategies could proceed more rapidly. Until recently, despite enormous effort dedicated to the creation of cancer cell line models, success rates have been very low (often zero).
We have launched a pilot Cancer Cell Line Factory project at the Broad Institute, together with hospital partners, to explore how best to overcome current obstacles, produce faithful models at scale and explore the ethical issues associated with enabling rapid and unfettered access. Our project aims to:

  1. Establish robust SOPs for major cancer types
  2. Comprehensively characterize genomes of new models
  3. Enable rapid and unfettered access
  4. Prioritize treatments for each cancer patient

We've made exciting progress and are always looking for new collaborators.

Connectivity Map (CMap)

The Connectivity Map (or CMap) is a catalog of gene-expression data collected from human cells treated with chemical compounds and genetic reagents. Computational methods to reduce the number of necessary genomic measurements along with streamlined methodologies enable the current effort to significantly increase the size of the CMap database and along with it, our potential to connect human diseases with the genes that underlie them and the drugs that treat them.

Differentiation Map (DMAP) Portal

A differentiation map of hematopoiesis, from the paper "Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis"

DLBCL Genome Sequencing

To gain insight into the genomic basis of diffuse large B-cell lymphoma (DLBCL), we performed massively parallel whole-exome sequencing of 55 primary tumor samples from patients with DLBCL and matched normal tissue. We identified recurrent mutations in genes that are likely to play a role in the biology of DLBCL.


FireBrowse is a simple and elegant way to explore cancer data, backed by a powerful computational infrastructure, application programming interface (API), graphical tools and online reports.  It sits above the TCGA GDAC Firehose, one of the deepest and most integratively characterized open cancer datasets in the world--with over 80K sample aliquots from 11,000+ cancer patients, spanning 38 unique disease cohorts.  FireBrowse makes it possible to find any of thousands of data archives generated by Firehose in just 2 clicks.  Likewise, two clicks are all that's needed to find any of the ~1500 analysis resports created by Firehose in each analysis run.  For programmers a powerful RESTful API is provided, with bindings to the UNIX command line, Python and R.  And for scientists we provide graphical tools like viewGene to explore expression levels, and iCoMut to explore the comprehensive analysis profile of each TCGA disease study within a single, interactive figure.

Global Cancer Map

The Global Cancer Map datasets consist of gene expression data for 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples, assayed on the Affymetrix Hu6800 and Hu35KsubA chips.

Head and Neck Cancer Sequencing

Head and neck squamous cell carcinoma (HNSCC) is a common, morbid, and frequently lethal malignancy. To uncover its mutational spectrum, we analyzed whole-exome sequencing data from 74 tumor-normal pairs.