MoChA WDL
MoChA WDL is a set of pipelines released under the MIT license for multiple steps often required when working with DNA microarray data (or whole genome sequence data):
  - mocha.wdl - phasing with SHAPEIT4 and mosaic chromosomal alterations detection
- impute.wdl - imputation with IMPUTE5
- score.wdl - polygenic score computations
- assoc.wdl - PCA and association using REGENIE
- shift.wdl - allelic shift analysis for mosaic chromosomal alterations
It can be used both with Illumina and Affymetrix data. It can also be used for detection of germline copy number variants. The mocha.wdl pipeline uses BCFtools/gtc2vcf and BCFtools/mocha while the score.wdl pipeline uses BCFtools/score. Both assoc.wdl and shift.wdl will output results using the GWAS-VCF specificationDownload
MoChA WDL pipelines are entirely written in WDL. We provide resources to run the MoChA WDL pipelines, including the reference panels for phasing and imputation:
mocha.GRCh37.zip - Resources for the human GRCh37 reference (approx. 13GB) updated on 2023-04-27
mocha.GRCh38.zip - Resources for the human GRCh38 reference (approx. 17GB) updated on 2023-10-10
To find more information on how to run the pipelines, try our github page. For any feedback, send an email to giulio.genovese@gmail.com
Publications
Publications that used MoChA WDL:
2025
- Hubbard, A. K. et al. Integration of Germline and Somatic Variation Improves Chronic Lymphocytic Leukemia Risk Stratification. Cancer Research (2025) doi:10.1158/0008-5472.CAN-24-4251
- Otsuka, I. et al. Increased somatic mosaicism in autosomal and X chromosomes for suicide death. Molecular Psychiatry (2025) doi:10.1038/s41380-024-02718-y
- Brierley, C. et al. Clonal hematopoiesis in metastatic urothelial and renal cell carcinoma. Nature Genetics (2025) doi:10.1038/s41698-025-00965-y
- Uchiyama, S. et al. Associations between mosaic loss and schizophrenia or bipolar disorder of young onset. medRxiv (2025) doi:10.1101/2025.06.13.25329510
- Uchiyama, S. et al. Mosaic loss of chromosome Y characterises late-onset rheumatoid arthritis and contrasting associations of polygenic risk score based on age at onset. Annals of the Rheumatic Diseases (2025) doi:10.1016/j.ard.2025.01.034
- Zhou, W. et al. Estimation of mosaic loss of Y chromosome cell fraction with genotyping arrays lacking coverage in the pseudoautosomal region. BMC Bioinformatics (2025) doi:10.1186/s12859-025-06076-6
2024
- Xu, J. et al. Evaluation of imputation performance of multiple reference panels in a Pakistani population. Human Genetics and Genomics Advances (2024) doi:10.1016/j.xhgg.2024.100395
- Jungeun, L. et al. Associations Between Mosaic Loss of Sex Chromosomes and Incident Hospitalization for Atrial Fibrillation in the United Kingdom. Journal of the American Heart Association (2024) doi:10.1161/JAHA.124.036984
- Chang, K. et al. The Contribution of Mosaic Chromosomal Alterations to Schizophrenia. Biological Psychiatry (2024) doi:10.1016/j.biopsych.2024.06.015
- Lim, J. et al. Associations between mosaic loss of sex chromosomes and incident hospitalization for atrial fibrillation in the United Kingdom. medRxiv (2024) doi:10.1101/2024.05.29.24308171
- Liu, A. et al. Genetic drivers and cellular selection of female mosaic X chromosome loss. Nature (2024) doi:10.1038/s41586-024-07533-7
- Stankowska, W. et al. Tumor Predisposing Post-Zygotic Chromosomal Alterations in Bladder Cancer-Insights from Histologically Normal Urothelium. Cancers (2024) doi:10.3390/cancers16050961
- Ling, E. et al. Concerted neuron-astrocyte gene expression declines in aging and schizophrenia. bioRxiv (2024) doi:10.1101/2024.01.07.574148
2023
- Hubbard, A. K. et al. Serum biomarkers are altered in UK Biobank participants with mosaic chromosomal alterations. Human Molecular Genetics (2023) doi:10.1093/hmg/ddad133
2021
- Niroula, A. et al. Distinction of lymphoid and myeloid clonal hematopoiesis. Nat Med (2021) doi:10.1038/s41591-021-01521-4
- Zekavat S., et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Medicine (2021) doi:10.1038/s41591-021-01371-0
Releases
MoChA pipeline version 2025-08-19 WDL
MoChA pipeline version 2024-09-27 WDL
MoChA pipeline version
2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
MoChA pipeline version 2023-09-19 WDL (warning: it includes a severe regression due to the ligate component in SHAPEIT5 5.1.1 not correcting switch errors among consecutive phasing blocks which is now fixed)
MoChA pipeline version 2022-12-21 WDL
MoChA pipeline version 2022-05-18 WDL
MoChA pipeline version 2022-01-14 WDL
MoChA pipeline version 2021-10-15 WDL
MoChA pipeline version 2021-05-14 WDL
MoChA pipeline version 2021-03-15 WDL
MoChA pipeline version 2021-01-20 WDL
MoChA pipeline version 2020-09-02 WDL
MoChA pipeline version 2020-08-25 WDL
MoChA pipeline version 2020-08-13 WDL
MoChA pipeline version 2020-08-11 WDL
MoChA pipeline version 2020-07-22 WDL
Imputation pipeline version 2025-08-19 WDL
Imputation pipeline version 2024-09-27 WDL
Imputation pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Imputation pipeline version 2023-09-19 WDL
Imputation pipeline version 2022-12-21 WDL
Imputation pipeline version 2022-05-18 WDL
Imputation pipeline version 2022-01-14 WDL
Imputation pipeline version 2021-10-15 WDL
Imputation pipeline version 2021-05-14 WDL
Imputation pipeline version 2021-03-15 WDL
Imputation pipeline version 2021-01-20 WDL
Allelic shift pipeline version 2025-08-19 WDL
Allelic shift pipeline version 2024-09-27 WDL
Allelic shift pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Allelic shift pipeline version 2023-09-19 WDL
Allelic shift pipeline version 2022-12-21 WDL
Allelic shift pipeline version 2022-05-18 WDL
Allelic shift pipeline version 2022-01-12 WDL
Allelic shift pipeline version 2021-10-15 WDL
Allelic shift pipeline version 2021-05-14 WDL
Allelic shift pipeline version 2021-03-15 WDL
Association pipeline version 2025-08-19 WDL
Association pipeline version 2024-09-27 WDL (warning: the regenie docker gives error "regenie: error while loading shared libraries: libcurl.so.4: cannot open shared object file: No such file or directory" so you need to use option "assoc.regenie_docker": "regenie:1.20-dev")
Association pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Association pipeline version 2023-09-19 WDL
Association pipeline version 2022-12-21 WDL
Association pipeline version 2022-05-18 WDL
Association pipeline version 2022-01-12 WDL
Association pipeline version 2021-10-15 WDL
Polygenic score pipeline version 2025-08-19 WDL
Polygenic score pipeline version 2024-09-27 WDL
Polygenic score pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Polygenic score pipeline version 2023-09-19 WDL
Polygenic score pipeline version 2022-12-21 WDL
Polygenic score pipeline version 2022-05-18 WDL
Polygenic score pipeline version 2022-01-12 WDL
Polygenic score pipeline version 2021-10-15 WDL
Polygenic score pipeline version 2021-05-14 WDL
Development Versions
These versions might be buggy without notice and rely on dockers that can be recalled at any time. Use at your own risk
MoChA pipeline development WDL
Imputation pipeline development WDL
Allelic shift pipeline development WDL
Association pipeline development WDL
Polygenic score pipeline development WDL
Docker Images
us.gcr.io/mccarroll-mocha/bcftools:1.22-20250819
us.gcr.io/mccarroll-mocha/bcftools:1.20-20240927
us.gcr.io/mccarroll-mocha/bcftools:1.20-20240505
us.gcr.io/mccarroll-mocha/bcftools:1.17-20230919
us.gcr.io/mccarroll-mocha/bcftools:1.16-20221221
us.gcr.io/mccarroll-mocha/bcftools:1.15.1-20220518
us.gcr.io/mccarroll-mocha/bcftools:1.14-20220112
us.gcr.io/mccarroll-mocha/bcftools:1.13-20211015
us.gcr.io/mccarroll-mocha/bcftools:1.11-20210514
us.gcr.io/mccarroll-mocha/bcftools:1.11-20210315
us.gcr.io/mccarroll-mocha/mocha:1.11-20210120
us.gcr.io/mccarroll-mocha/mocha:1.10.2-20200901
us.gcr.io/mccarroll-mocha/mocha:1.10.2-20200824
us.gcr.io/mccarroll-mocha/r_mocha:1.22-20250819
us.gcr.io/mccarroll-mocha/r_mocha:1.20-20240927
us.gcr.io/mccarroll-mocha/r_mocha:1.20-20240505
us.gcr.io/mccarroll-mocha/r_mocha:1.17-20230919
us.gcr.io/mccarroll-mocha/r_mocha:1.16-20221221
us.gcr.io/mccarroll-mocha/r_mocha:1.15.1-20220518
us.gcr.io/mccarroll-mocha/r_mocha:1.14-20220112
us.gcr.io/mccarroll-mocha/r_mocha:1.13-20211015
us.gcr.io/mccarroll-mocha/r_mocha:1.11-20210514
us.gcr.io/mccarroll-mocha/mocha_plot:1.11-20210315
us.gcr.io/mccarroll-mocha/mocha_plot:1.11-20210120
us.gcr.io/mccarroll-mocha/mocha_plot:1.10.2-20200901
us.gcr.io/mccarroll-mocha/mocha_plot:1.10.2-20200824
us.gcr.io/mccarroll-mocha/pgs:1.22-20250819
us.gcr.io/mccarroll-mocha/pgs:1.20-20240927
us.gcr.io/mccarroll-mocha/pgs:1.20-20240505
us.gcr.io/mccarroll-mocha/pgs:1.17-20230919
us.gcr.io/mccarroll-mocha/bcftools:1.20-dev (development)
us.gcr.io/mccarroll-mocha/pgs:1.20-dev (development)
Credits
MoChA WDL is developed by Giulio Genovese at the Broad Institute and at
the McCarroll Lab in the Harvard Medical School Department of Genetics under the supervision of Steven McCarroll
We would like to thank the following people: Pradeep Natarajan for initial motivation to implement the pipeline in the WDL format; Aoxing Liu at FINNGEN, Bryan Gorman at the VA, and Tim Bigdeli at SUNY Downstate for critical feedback with applications to large biobanks; Vladislav Tuzov at the Estonian Biobank, Daniil Sarkisyan at Uppsala University, and Joe Dennis at the  University of Cambridge for feedback with the configuration of Cromwell on SLURM; Chris Whelan. Chris Llanwarne, Jason Cerrato, Kyle Vernest and Khalid Shakir, and many other members of the Terra/Cromwell team, for their help and advice