Extracts absolute copy numbers per cancer cell from a mixed DNA population. Use this module for the per-sample processing step in the workflow.
Author: Scott Carter, Matthew Meyerson, Gad Getz
Algorithm Version: ABSOLUTE 1.0.6
ABSOLUTE provides various models of tumor cell purity and ploidy for subsequent manual solution selection. ABSOLUTE infers multiple models of purity, malignant cell ploidy and absolute somatic copy-numbers from copy ratios data. It determines possible models for absolute copy numbers per cancer cell from a mixed DNA population and gives copy numbers for genomic segments, and if provided mutation data, for mutated alleles. Use of homologue-specific copy ratios (HSCRs) data reduces ambiguity of copy profiles, compared to using total copy-ratios data, e.g. from comparative genomic hybridization (CGH) or low-pass sequencing. Results from multiple ABSOLUTE runs are compiled into a format facilitating manual solution selection by GenePattern's ABSOLUTE.summarize module. Manual review is necessary as, for a given tumor, the highest scoring model is not always the best solution. Manually selected solutions are then provided to ABSOLUTE.review for finalized results.
Elucidation of the sequence of the multiple genomic events that give rise to tumorigenesis is an ongoing area of research. Genomic events include functional mutations, genomic rearrangements including translocations and chromothripsis, gene conversion or loss of heterozygosity (LOH), and somatic copy number alterations (SCNAs) that range from regional and chromosomal amplifications and deletions to whole genome duplications (Burrell et al.). SCNAs can lead to gene dosage changes impacting phenotype; SCNAs and copy neutral LOH events at heterozygous or mutant loci can lead to unequal dose contributions of one allele over the other.
Current models calculate somatic alterations in units of genomes or DNA mass and are interpreted in the context of a tumor's purity and overall ploidy. However, to compare across samples, copy numbers should be measured in copies per cancer cell. Absolute copy numbers could be inferred by normalizing relative data on cytological measurements of DNA mass per cell or on single-cell sequencing data. Alternatively, ABSOLUTE can be used to mathematically model solutions of tumor cell purity and ploidy.
Inferring absolute copy numbers and ranking solutions depends on the following three factors. These are (1) sample heterogeneity from copy ratios and mutation data, (2) karyotype models from a reference panel built into ABSOLUTE algorithm v1.0.6, and (3) allelic fraction from mutation data. Providing mutation data though optional is recommended.
(1) Sample heterogeneity. Samples are heterogeneous at two tiers. (i) Tumor purity indicates the fraction of tumor cells to normal cells that nearly always contaminate samples, e.g. normal tissue and blood cells. Normal cells are diploid (2N) and are further identified by normal genotype. (ii) Tumor cell heterogeneity, if any, based on polygenomic populations, either segregated or intermixed, due to ongoing subclonal evolution. Each tumor population is grouped by ploidy, which is defined in units of normal haploid genomes for genomic segments. Segments are previously defined by equal copy ratio.
One method to validate purity estimates, used by the authors, compared calculated and histological purity estimates with methylation signatures characteristic of leukocytes given blood is a common sample contaminant.
(2) Karyotype models. Copy ratios data will provide a number of putative integer value solutions of ploidy from which purity is then inferred. In the first two charts above, solutions are in different colors (circles and bars). To better rank these solutions, ABSOLUTE refers to external data in the form of karyotype models. These mixture models of recurrent cancer karyotypes were bootstrapped from thousands of pre-TCGA tumor samples matched to cytological data (Carter et al, 2012). Karyotype models do not impact calculation of individual solutions, only their ranking. Likelihoods from the SCNAs, SSNVS, and pan-cancer karyotype models are combined to produce rankings.
For increased sensitivity for ambiguous cases, when given a primary disease parameter, ABSOLUTE incorporates karyotype models specific to the tumor type in ranking solutions. The impact of this is seen, for example, in differentiating ambiguous solutions, one of which implies a genome doubling event. The frequency of genome doublings vary across tumor types and reflect disease-tissue specific biology. Genome doublings are rare in hematopoietic neoplasms, e.g. ALL and CLL, and have a higher incidence in other types of cancer, such as oesophageal adenocarcinoma (Barrett et al. 1999).
(3) Allelic fraction. ABSOLUTE utilizes the optionally provided, but recommended, mutation data in two ways. (i) ABSOLUTE infers purity of a sample with copy number data in conjunction with mutation data. (ii) ABSOLUTE estimates cellular multiplicity, that is, average allelic copies per cancer cell, to potentially reveal subclonal populations as diagrammed in the fourth chart. Putative solutions incorporating mutation data aid in the manual selection of a best solution. What is key for ABSOLUTE is that the mutation information provide somatic events.
Given the likely divergent instigations of different types of genomic events in cancer, SCNAs alone provide limited resolution in inferring tumor heterogeneity. Sequence mutation information provides ABSOLUTE an alternative point of reference, that is, more incremental information in tumor progression, that then allows a more comprehensive modeling of tumor heterogeneity.
The module's default parameters reflect the original analysis aims of balancing over-fitting subclonal copy alterations to derive more complex karyotypes against the applicability of a simpler solution in finding tumor samples with high purity. For example, default parameters discard solutions with greater than 5% subclonal fractions and thus skew presented solutions to those of increased ploidy. Change default parameters for samples expected to have a higher proportion of heterogeneous nuclei, especially those for which mutation data are also provided.
Equations used in the algorithm are in the Carter et. al. publication.
ABSOLUTE extracts the absolute copy number of local DNA segments per cancer cell from the mixed DNA population in three steps:
ABSOLUTE expects copy-ratios very close to 1.0 and will fail if ratios are less than 0.75 or greater than 1.25. ABSOLUTE analysis can fail due to exceeding the max.as.seg.count threshold. Too many segments are associated with noisy or poor quality data.
Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA, Beroukhim R, Pellman D, Levine DA, Lander ES, Meyerson M, Getz G. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30(5):413-21. (abstract and link to PDF)
|seg dat file *||A HAPSEG output file (<plate.name>_<array.name>.segdat.RData) or other segmented copy number data file. If you supply a tab-delimited segmentation file, see the Input Files section for file details.|
|output file name base *||
If specified, provides a base filename for all output files. The default value is the sample name parameter.
|sigma p *||Provisional value of excess sample level variance used for mode search. Default: 0|
|max sigma h *||Maximum value of excess sample level variance. For more details, see equation 6 in the ABSOLUTE paper. Default: 0.015|
|min ploidy *||Specifies the minimum ploidy value, N, for the algorithm to consider, and models implying lower ploidy values will be discarded. Default: 0.95N|
|max ploidy *||Specifies maximum ploidy value, N, to consider, and models implying greater ploidy values will be discarded. Default: 10N|
|primary disease *||
Primary disease of the sample for specific tumor karyotype matching. Enter 'NA' to use pan-cancer karyotype reference. This parameter impacts ranking of solutions and not solutions themselves. If a provided input does not match to the following list, then ABSOLUTE defaults to the pan-cancer reference:
Acute myelogenous leukemia, Bladder Cancer, BLCA, Brain Cancer, BRCA, Breast, Breast Cancer, Carcinoid, ccRCC, Cervical Cancer, CESC, Chronic Lymphocytic Leukemia, Chronic lymphocytic leukemia, CLL, COAD, Colon Cancer, Colorectal, Dedifferentiated Liposarcoma, Endometrial Cancer, Esophageal adenocarcinoma, Esophageal Cancer, Esophageal squamous, Ewing Sarcoma, Gastric, Gastric Cancer, GBM, GIST, Glioma, Head and Neck Cancer, Hepatocellular Carcinoma, Kidney Cancer, Kidney cancer, KIRC, LAML, Leiomyosarcoma, Liver Cancer, LUAD, Lung, Lung adenocarcinoma, Lung adenosquamous, Lung Cancer, Lung SCLC, Lung squamous, LUSC, Lymphoma, Medulloblastoma, Melanoma, Mesothelioma, MFH, Mulitple Myeloma, Multiple Myeloma, Myelodysplasia, Myeloproliferative Disorder, Myxoid Liposarcoma, Neuroblastoma, NSCLC, Osteosarcoma, OV, Ovarian, Ovarian Cancer, Pancreatic Cancer, Pediatric Acute lymphoblastic leukemia, Pediatric GIST, Pleomorphic Liposarcoma, PRAD, Prostate, Prostate Cancer, READ, Rectum Cancer, Renal Cancer, Rhabdoid, Rhabdoid Tumor, Sarcoma, Stomach Cancer, Synovial sarcoma, Thyroid, Thyroid Cancer
The platform used to generate the data. Supported platforms are:
|sample name *||The name of the sample for display and for use in downstream module ABSOLUTE.summarize, which, for multiple concurrent file input, requires unique sample names.|
|max as seg count *||Maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'. Default: 1500|
|max neg genome *||
Sometimes, due to noise in the data, ABSOLUTE may model the fraction of the genome attributed to tumor subclones to be less than zero. This parameter specifies the maximum allowable fraction of the genome that can be modeled as being less than zero without discarding a given solution. Default: 0.005
|max non clonal *||
Maximum genome fraction that may be modeled as non-clonal — that is, as being derived from tumor subclones. Solutions implying greater values will be discarded. Default: 0.05
|copy number type *||
The copy number type to assess based on input data type.
|maf file||If available, somatic mutation data in mutation annotation format (MAF) that includes t_ref_count and t_alt_count columns. See Input Files section for more details. If using this parameter, also specify the min mut af parameter described next.|
|min mut af||Mutations with lower allelic fractions than the indicated minimum mutation allelic fraction will be excluded from analysis. Zero is an accepted value. Note that if maf file is specified, min mut af must also be specified.|
* - required
Each file represents one sample. Files containing multiple sample sets are unacceptable. ABSOLUTE algorithm v1.0.6 filters features mapping to any chromosome labeled “X”.
For ALLELIC copy number type analysis, supply an RData file produced by HAPSEG or AllelicCapseg. These datasets allow incorporation of copy neutral LOH events. Segmentation data produced by any other means must conform to the output formats of HAPSEG/AllelicCapseg for ABSOLUTE to consider copy neutral LOH events.
For TOTAL copy number type analysis, suppy a tab-delimited segmentation file in plain-text format. File extension does not matter. ABSOLUTE algorithm v1.0.6 requires the following five columns. Additional columns are ignored.
Whether or not you get an error message, or if a PDF is not produced, examine the stdout.txt and stderr.txt files from your jobs for clues on what may have caused the error or to note what portions of data are excluded from the analysis based on filtering mechanisms in place.
The dataset samples are from a single human bladder cancer patient. The following parameters were used to obtain results: max sigma h = 0.2, min ploidy = 0.5, max ploidy = 8, primary disease = BLCA, platform = SNP_6.0, varied sample names, max non clonal = 1, and copy number type = total. For the two samples with somatic mutation data, min mut af = 0.1.
The results were then passed through ABSOLUTE.summarize, manually reviewed and augmented to select for alternative solutions and finalized through ABSOLUTE.review. Download these example results and the example override file using the following links:
Acceptance of the module license is required for its use. A copy of the license text is available at http://www.broadinstitute.org/cancer/cga/sites/default/files/images/ABSOLUTE_HAPSEG_license_2013.pdf.The ABSOLUTE module runs only on GenePattern 3.4.2 or above and requires R2.15 with the following packages, each of which will automatically download and install when the module is installed:
Please install R2.15.3 instead of R2.15.2 before installing the module. The GenePattern team has confirmed test data reproducibility for this module using R2.15.3 compared to R2.15.2 and can only provide limited support for other versions. The GenePattern team recommends R2.15.3, which fixes significant bugs in R2.15.2, and which must be installed and configured independently as discussed in Using Different Versions of R and Using the R Installer Plug-in. These sections also provide information on patch level fixes that are necessary when additional installations of R are made and considerations for those who use R outside of GenePattern.
|1.5||2015-10-13||Updated to make use of the R package installer.|