MoChA WDL

MoChA WDL is a set of pipelines released under the MIT license for multiple steps often required when working with DNA microarray data (or whole genome sequence data): It can be used both with Illumina and Affymetrix data. It can also be used for detection of germline copy number variants. The mocha.wdl pipeline uses BCFtools/gtc2vcf and BCFtools/mocha while the score.wdl pipeline uses BCFtools/score. Both assoc.wdl and shift.wdl will output results using the GWAS-VCF specification

Download

MoChA WDL pipelines are entirely written in WDL. We provide resources to run the MoChA WDL pipelines, including the reference panels for phasing and imputation:

mocha.GRCh37.zip - Resources for the human GRCh37 reference (approx. 13GB) updated on 2023-04-27

mocha.GRCh38.zip - Resources for the human GRCh38 reference (approx. 17GB) updated on 2023-10-10

To find more information on how to run the pipelines, try our github page. For any feedback, send an email to giulio.genovese@gmail.com

Publications

Publications that used MoChA WDL:

Releases

MoChA pipeline version 2024-09-27 WDL
MoChA pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
MoChA pipeline version 2023-09-19 WDL (warning: it includes a severe regression due to the ligate component in SHAPEIT5 5.1.1 not correcting switch errors among consecutive phasing blocks which is now fixed)
MoChA pipeline version 2022-12-21 WDL
MoChA pipeline version 2022-05-18 WDL
MoChA pipeline version 2022-01-14 WDL
MoChA pipeline version 2021-10-15 WDL
MoChA pipeline version 2021-05-14 WDL
MoChA pipeline version 2021-03-15 WDL
MoChA pipeline version 2021-01-20 WDL
MoChA pipeline version 2020-09-02 WDL
MoChA pipeline version 2020-08-25 WDL
MoChA pipeline version 2020-08-13 WDL
MoChA pipeline version 2020-08-11 WDL
MoChA pipeline version 2020-07-22 WDL

Imputation pipeline version 2024-09-27 WDL
Imputation pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Imputation pipeline version 2023-09-19 WDL
Imputation pipeline version 2022-12-21 WDL
Imputation pipeline version 2022-05-18 WDL
Imputation pipeline version 2022-01-14 WDL
Imputation pipeline version 2021-10-15 WDL
Imputation pipeline version 2021-05-14 WDL
Imputation pipeline version 2021-03-15 WDL
Imputation pipeline version 2021-01-20 WDL

Allelic shift pipeline version 2024-09-27 WDL
Allelic shift pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Allelic shift pipeline version 2023-09-19 WDL
Allelic shift pipeline version 2022-12-21 WDL
Allelic shift pipeline version 2022-05-18 WDL
Allelic shift pipeline version 2022-01-12 WDL
Allelic shift pipeline version 2021-10-15 WDL
Allelic shift pipeline version 2021-05-14 WDL
Allelic shift pipeline version 2021-03-15 WDL

Association pipeline version 2024-09-27 WDL (warning: the regenie docker gives error "regenie: error while loading shared libraries: libcurl.so.4: cannot open shared object file: No such file or directory" so you need to use option "assoc.regenie_docker": "regenie:1.20-dev")
Association pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Association pipeline version 2023-09-19 WDL
Association pipeline version 2022-12-21 WDL
Association pipeline version 2022-05-18 WDL
Association pipeline version 2022-01-12 WDL
Association pipeline version 2021-10-15 WDL

Polygenic score pipeline version 2024-09-27 WDL
Polygenic score pipeline version 2024-05-05 WDL (warning: it can generate malformed indexes due to use of HTSlib 1.19 and bug #1740)
Polygenic score pipeline version 2023-09-19 WDL
Polygenic score pipeline version 2022-12-21 WDL
Polygenic score pipeline version 2022-05-18 WDL
Polygenic score pipeline version 2022-01-12 WDL
Polygenic score pipeline version 2021-10-15 WDL
Polygenic score pipeline version 2021-05-14 WDL

Development Versions

These versions might be buggy without notice and rely on dockers that can be recalled at any time. Use at your own risk

MoChA pipeline development WDL

Imputation pipeline development WDL

Allelic shift pipeline development WDL

Association pipeline development WDL

Polygenic score pipeline development WDL

Docker Images

us.gcr.io/mccarroll-mocha/bcftools:1.20-20240927
us.gcr.io/mccarroll-mocha/bcftools:1.20-20240505
us.gcr.io/mccarroll-mocha/bcftools:1.17-20230919
us.gcr.io/mccarroll-mocha/bcftools:1.16-20221221
us.gcr.io/mccarroll-mocha/bcftools:1.15.1-20220518
us.gcr.io/mccarroll-mocha/bcftools:1.14-20220112
us.gcr.io/mccarroll-mocha/bcftools:1.13-20211015
us.gcr.io/mccarroll-mocha/bcftools:1.11-20210514
us.gcr.io/mccarroll-mocha/bcftools:1.11-20210315
us.gcr.io/mccarroll-mocha/mocha:1.11-20210120
us.gcr.io/mccarroll-mocha/mocha:1.10.2-20200901
us.gcr.io/mccarroll-mocha/mocha:1.10.2-20200824

us.gcr.io/mccarroll-mocha/r_mocha:1.20-20240927
us.gcr.io/mccarroll-mocha/r_mocha:1.20-20240505
us.gcr.io/mccarroll-mocha/r_mocha:1.17-20230919
us.gcr.io/mccarroll-mocha/r_mocha:1.16-20221221
us.gcr.io/mccarroll-mocha/r_mocha:1.15.1-20220518
us.gcr.io/mccarroll-mocha/r_mocha:1.14-20220112
us.gcr.io/mccarroll-mocha/r_mocha:1.13-20211015
us.gcr.io/mccarroll-mocha/r_mocha:1.11-20210514
us.gcr.io/mccarroll-mocha/mocha_plot:1.11-20210315
us.gcr.io/mccarroll-mocha/mocha_plot:1.11-20210120
us.gcr.io/mccarroll-mocha/mocha_plot:1.10.2-20200901
us.gcr.io/mccarroll-mocha/mocha_plot:1.10.2-20200824

us.gcr.io/mccarroll-mocha/pgs:1.20-20240927
us.gcr.io/mccarroll-mocha/pgs:1.20-20240505
us.gcr.io/mccarroll-mocha/pgs:1.17-20230919

us.gcr.io/mccarroll-mocha/bcftools:1.20-dev (development)
us.gcr.io/mccarroll-mocha/pgs:1.20-dev (development)

Credits

MoChA WDL is developed by Giulio Genovese at the Broad Institute and at the McCarroll Lab in the Harvard Medical School Department of Genetics under the supervision of Steven McCarroll

We would like to thank the following people: Pradeep Natarajan for initial motivation to implement the pipeline in the WDL format; Aoxing Liu at FINNGEN, Bryan Gorman at the VA, and Tim Bigdeli at SUNY Downstate for critical feedback with applications to large biobanks; Vladislav Tuzov at the Estonian Biobank and Daniil Sarkisyan at Uppsala University for feedback with the configuration of Cromwell on SLURM; Chris Whelan. Chris Llanwarne, Jason Cerrato, Kyle Vernest and Khalid Shakir, and many other members of the Terra/Cromwell team, for their help and advice