gtc2vcf
gtc2vcf is a free software tool released under the MIT license for rapidly converting DNA microarray data from Illumina or Affymetrix in standard VCF files. It can be used for the preparation of data for the analysis through the BCFtools/mocha plugin or the MoChA WDL pipelines
If necessary, you can use the liftover tool from score to convert output VCFs to a different reference, though we advise instead to realign the manifest files using the --fasta-flank/--sam-flank infrastructure
gtc2vcf can read Illumina IDAT, BPM, EGT, and GTC binary files and Affymetrix CEL and CHP binary files
gtc2vcf is entirely written in C as a BCFtools plugin. It can be compiled with BCFtools or downloaded as a set of binary files. It requires BCFtools 1.20 or newer to run. A script to plot single variants is also provided based on ggplot2
Download
You can download from this page the latest Linux x86_64 BCFtools plugin binaries for
the stable version and
the development version
Source code is also available for the the stable version and the development version
To run a BCFtools plugin binary, say gtc2vcf.so, there are four options:
$ export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools +gtc2vcf
$ export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools plugin gtc2vcf
$ bcftools +$BCFTOOLS_PLUGINS/gtc2vcf.so
$ bcftools plugin $BCFTOOLS_PLUGINS/gtc2vcf.so
To find more information on how to run the software, try our github page. For any feedback, send an email to giulio.genovese@gmail.com
Beta
For Ubuntu users, a debian package will be provided in the future to install the plugins
Publications
Publications that used gtc2vcf (or affy2vcf):
2024
- Pinakhina, D. et al. The effect size of rs521851 in the intron of MAGI2/S-SCAM on HADS-D scores correlates with EAT-26 scores for eating disorders risk. Frontiers in Psychiatry (2024) doi:10.3389/fpsyt.2024.1416009
- Otsuka, I. et al. Increased somatic mosaicism in autosomal and X chromosomes for suicide death. Molecular Psychiatry (2024) doi:10.1038/s41380-024-02718-y
- Haas, D. et al. Early pregnancy associations with Gestational Diabetes: methods and cohort results of the Hoosier Moms Cohort. North American Proceedings in Gynecology & Obstetrics (2024) doi:10.54053/001c.121481
- Patel, W. et al. Role of Transporters and Enzymes in Metabolism and Distribution of 4-Chlorokynurenine (AV-101). Mol. Pharmaceutics (2024) doi:10.1021/acs.molpharmaceut.3c00700
2023
- Locatelli, N. S. et al. Genome assemblies and genetic maps highlight chromosome-scale macrosynteny in Atlantic acroporids. bioRxiv (2023) doi:10.1101/2023.12.22.573044
- Herrick, N., & Walsh. S. ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications. BMC Bioinformatics (2023) doi:10.1186/s12859-023-05548-x
- Babadi, M. et al. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nature Genetics (2023) doi:10.1038/s41588-023-01449-0
- Kamphuis, P. et al. Clonal Hematopoiesis Defined by Somatic Mutations Infrequently Co-occurs With Mosaic Loss of the Y Chromosome in a Population-based Cohort. Hemasphere (2023) doi:10.1097/HS9.0000000000000956
- Cheng, C. et al. Mosaic Chromosomal Alterations Are Associated With Increased Lung Cancer Risk: Insight From the INTEGRAL-ILCCO Cohort Analysis. Journal of Thoracic Oncology (2023) doi:10.1016/j.jtho.2023.05.001
2022
- Czamara, D. et al. Genome-Wide Copy Number Variant and High-Throughput Transcriptomics Analyses of Placental Tissues Underscore Persisting Child Susceptibility in At-Risk Pregnancies Cleared in Standard Genetic Testing. International Journal of Molecular Sciences (2022) doi:10.3390/ijms231911448
- Babadi, M. et al. GATK-gCNV: A Rare Copy Number Variant Discovery Algorithm and Its Application to Exome Sequencing in the UK Biobank. medRxiv (2022) doi:10.1101/2022.08.25.504851
- Meduri, E. et al. The RNA editing landscape in acute myeloid leukemia reveals associations with disease mutations and clinical outcome. iScience (2022) doi:10.1016/j.isci.2022.105622
- Banes, G. L. et al. Nine out of ten samples were mistakenly switched by The Orang-utan Genome Consortium. Sci Data (2022) doi:10.1038/s41597-022-01602-0
- Qin, N. et al. Association of the interaction between mosaic chromosomal alterations and polygenic risk score with the risk of lung cancer: an array-based case-control association and prospective cohort study. The Lancet Oncology (2022) doi:10.1016/S1470-2045(22)00600-3
- Vasquez Kuntz, K. L. et al. Inheritance of somatic mutations by animal offspring. Genetics (2022) doi:10.1126/sciadv.abn0707
- Uddin, M. et al. Germline genomic and phenomic landscape of clonal hematopoiesis in 323,1121 individuals. medRxiv (2022) doi:10.1101/2022.07.29.22278015
- Hsu, Y. et al. Human rs75776403 polymorphism links differential phenotypic and clinical outcomes to a CLEC18A p.T151M-driven multiomics. Journal of Biomedical Science (2022) doi:10.1186/s12929-022-00822-1
- Eberhardt, R. Y. et al. Detection of mosaic chromosomal alterations in children with severe developmental disorders recruited to the DDD study. bioRxiv (2022) doi:10.1101/2022.03.28.22273024
2021
- Koskela, J. T. et al. Genetic variant in SPDL1 reveals novel mechanism linking pulmonary fibrosis risk and cancer protection. medRxiv (2021) doi:10.1101/2021.05.07.21255988
- Vogel, I. SureTypeSCR: R package for rapid quality control and genotyping of SNP arrays from single cells. F1000Research (2021) doi:10.12688/f1000research.53287.1
- Ainsworth, R. Genetic models to predict the development of colorectal cancer. (2021) doi:10.26021/11897
- Zekavat S., et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat Medicine (2021) doi:10.1038/s41591-021-01371-0
- Sherman, M. A. et al. Large mosaic copy number variations confer autism risk. Nat Neurosci (2021) doi:10.1038/s41593-020-00766-5
- Pini, T. et al. Liquid chromatography-tandem mass spectrometry reveals an active response to DNA damage in human spermatozoa. F&S Science 2, 153-163 (2021) doi:10.1016/j.xfss.2021.03.001
2020
- Kitchen, S. A. et al. STAGdb: a 30K SNP genotyping array and Science Gateway for Acropora corals and their dinoflagellate symbionts. Sci Rep 10, 12488 (2020) doi:10.1038/s41598-020-69101-z
- Kuntz, K. L. V. et al. Juvenile corals inherit mutations acquired during the parent's lifespan. bioRxiv (2020) doi:10.1101/2020.10.19.345538
Releases
Version 2024-09-27 source and Linux x86_64 binaries (compiled for BCFtools 1.20)
Version 2024-05-05 source and Linux x86_64 binaries (compiled for BCFtools 1.20)
Version 2023-09-19 source and Linux x86_64 binaries (compiled for BCFtools 1.17)
Version 2022-12-21 source and Linux x86_64 binaries (compiled for BCFtools 1.16)
Version 2022-05-18 source and Linux x86_64 binaries (compiled for BCFtools 1.15.1)
Version 2022-01-12 source and Linux x86_64 binaries (compiled for BCFtools 1.14)
Version 2021-10-15 source and Linux x86_64 binaries (compiled for BCFtools 1.13)
Version 2021-05-14 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2021-03-15 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2021-01-20 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2020-09-01 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-25 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-13 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-11 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-07-20 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Credits
gtc2vcf is developed by Giulio Genovese at the Broad Institute and at the McCarroll Lab in the Harvard Medical School Department of Genetics under the supervision of Steven McCarroll
We would like to thank the following people: Ryan Kelley at Illumina for sharing specifications about Illumina binary formats, Jay Carey and George Grant at the Broad Institute for useful discussions about the implementation, Brad Hamilton at GoodCell for needed motivation to implement the indel support, Weiyin Zhou for feedback and testing with the code for Illumina data, Bryan Gorman at the VA and Aoxing Liu at FIMM for testing with the code for Affymetrix data, Heng Li, Petr Danecek, John Marshall, James Bonfield, and Shane McCarthy for developing HTSlib and BCFtools