What is a mutational process?

A catalogue of somatic mutations in cancer genomes is the outcome of cumulative actions of several mutagenic processes that operate over the patient’s lifetime, arising from an exposure to exogenous DNA damaging agents or carcinogens (tobacco smoking or UV radiation), endogenous mutagens (reactive oxygen species or AID/APOBEC cytidine deaminases), or genomic defects in DNA repair or replicative processes (microsatellite instability / homologous recombination / DNA-polymerase proofreading defects) [1,2] Even with a significant variation of mutation rates and heterogeneous mutation spectra across samples or tumor types somatic mutations often leave a characteristic molecular imprint, termed “mutation signatures”, i.e., a characteristic or common mutational patterns in mutation types or local sequence contexts shared among samples [3,4].


Characterizing underlying mutational processes with a correct inference for the number of signatures, and signature profiles and activities across samples provide a key understanding on cancer initiation and progression. However, the number of mutation processes (K*) is highly variable across patients even in a single tumor type and its accurate estimation is a non-trivial task due to a different duration and intensity of exposure to a specific mutational process. Non-negative matrix factorization algorithm (NMF) has been widely used in deciphering mutations signatures in cancer somatic mutations stratified by 96 base substitutions in tri-nucleotide sequence contexts. In contrast to conventional NMF requiring K* as an input parameter "SignatureAnalyzer" exploits a Bayesian variant of NMF algorithm and enables an inference for the optimal K* from data itself at a balance between the “data fidelity” (likelihood) and the “model complexity” (regularization) [5,6,7]. As an illustrative example of SignatureAnalyser, Figure1A describes six mutational processes extracted from 14 WES mutation data of immortalized mouse cell lines exposed to several exogenous or endogenous mutagens [8] and Figure1B represents the activity of discovered mutational processes across samples.


Figure1: Mutation signature discovery by BayesNMF for 14 WES data of immortalised cell lines derived from primary murine embryonic fibroblasts (MEFs) exposed in vitro to several carcinogen [8]. (a) Six mutational signature profiles across 96 mutation contexts and (b) signature activities across 14 samples. Six mutation signatures recapitulate well-known characteristics of cancer mutations such as dominant A>T mutations in Aristolochic Acid (AA) signature, C>A transversions in smoking signature, exclusive C>T mutations at pyrimidine dimer sites (CC or TC) in UV signature, massive C>T mutations similar to the temozolomide treated GBM patient in Alkylating signature, C>T mutations at the hotspot RCY motifs (R=A/G, Y=pyrimidine) in AID signature, and a background signature in control samples.

Source Code

SignatureAnalyzer ingests a lego matrix containing mutation counts along 96 tri-nucleotide mutation contexts across samples and automatically generates the signature and activity plots for determined K. The results from several runs are saved as RData files for downstream analyses. 

Key References

[1] Halleday T, Eshtad S Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers Nature Reviews Genetics 15, 585–598 (2014)
[2] Roberts, S. A. & Gordenin, D. A. Hypermutation in human cancer genomes: footprints and mechanisms. Nat. Rev. Cancer 14, 786–800 (2014).
[3] Alexandrov, LB, Nik-Zainal, S, Wedge, DC, Aparicio, SA, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415-21
[4] Lawrence, MS, Stojanov, P, Polak, P, Kryukov, GV, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 2013;499:214-8.
[5] Tan, VY, Fevotte, C. Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans Pattern Anal Mach Intell 2013;35:1592-605.
[6] Kasar, S, Kim J et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat Commun. 6:8866 doi: 10.1038/ncomms9866 (2015).
[7] Kim J, Mouw K, P. Polak et al, Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat Genet. doi:10.1038/ng.3557 (2016).
[8] Olivier, M. et al. Modeling mutation lanscapes of human cancers in vitro, Scientific Reports;4;4482 doi:10.1038/srep04482 (2014).


This algorithm has been previously referred to as "Mutation Signature Profiling".