gtc2vcf

gtc2vcf is a free software tool released under the MIT license for rapidly converting DNA microarray data from Illumina or Affymetrix in standard VCF files. It can be used for the preparation of data for the analysis through the BCFtools/mocha plugin or the MoChA WDL pipelines If necessary, you can use the liftover tool from score to convert output VCFs to a different reference, though we advise instead to realign the manifest files using the --fasta-flank/--sam-flank infrastructure

gtc2vcf can read Illumina IDAT, BPM, EGT, and GTC binary files and Affymetrix CEL and CHP binary files

gtc2vcf is entirely written in C as a BCFtools plugin. It can be compiled with BCFtools or downloaded as a set of binary files. It requires BCFtools 1.19 or newer to run. A script to plot single variants is also provided based on ggplot2

Download

You can download from this page the latest Linux x86_64 BCFtools plugin binaries for the stable version and the development version
Source code is also available for the the stable version and the development version

To run a BCFtools plugin binary, say gtc2vcf.so, there are four options:
$ export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools +gtc2vcf
$ export BCFTOOLS_PLUGINS=/path/to/bcftools/plugins && bcftools plugin gtc2vcf
$ bcftools +$BCFTOOLS_PLUGINS/gtc2vcf.so
$ bcftools plugin $BCFTOOLS_PLUGINS/gtc2vcf.so
To find more information on how to run the software, try our github page. For any feedback, send an email to giulio.genovese@gmail.com

Beta

For Ubuntu users, a debian package will be provided in the future to install the plugins

Publications

Publications that used gtc2vcf (or affy2vcf):

Releases

Version 2023-09-19 source and Linux x86_64 binaries (compiled for BCFtools 1.17)
Version 2022-12-21 source and Linux x86_64 binaries (compiled for BCFtools 1.16)
Version 2022-05-18 source and Linux x86_64 binaries (compiled for BCFtools 1.15.1)
Version 2022-01-12 source and Linux x86_64 binaries (compiled for BCFtools 1.14)
Version 2021-10-15 source and Linux x86_64 binaries (compiled for BCFtools 1.13)
Version 2021-05-14 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2021-03-15 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2021-01-20 source and Linux x86_64 binaries (compiled for BCFtools 1.11)
Version 2020-09-01 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-25 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-13 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-08-11 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)
Version 2020-07-20 source and Linux x86_64 binaries (compiled for BCFtools 1.10.2)

Credits

gtc2vcf is developed by Giulio Genovese at the Broad Institute and at the McCarroll Lab in the Harvard Medical School Department of Genetics under the supervision of Steven McCarroll

We would like to thank the following people: Ryan Kelley at Illumina for sharing specifications about Illumina binary formats, Jay Carey and George Grant at the Broad Institute for useful discussions about the implementation, Brad Hamilton at GoodCell for needed motivation to implement the indel support, Weiyin Zhou for feedback and testing with the code for Illumina data, Bryan Gorman at the VA and Aoxing Liu at FIMM for testing with the code for Affymetrix data, Heng Li, Petr Danecek, John Marshall, James Bonfield, and Shane McCarthy for developing HTSlib and BCFtools