Haplotype map formatDictionary | Created 2017-12-24

Some Picard tools require a haplotype map that maps SNPs to LD (linkage disequilibrium) blocks. These tools include CrosscheckReadGroupFingerprints and CheckFingerprint. You can find a poster about fingerprinting here.

For these tools, the HAPLOTYPE_MAP parameter is used to specify the file. There are two acceptable formats for this file: a plain text-based file with tab-separated fields, and VCF (supported extensions: .vcf, .vcf.gz or .bcf), following the requirements outlined below.

The original haplotype map file format

It has a header and a body as shown below.

The header is a standard SAM header, with an @HD line to define the file type and @SQ lines to define the reference contigs. You can easily derive such a header from your reference dictionary file.

The body contains a column header line starting with a # hash followed by lines that annotate SNPs and blocks in high LD.

• NAME is a SNP identifier, e.g. dbSNP rsID.
• MAF is the minor allele frequency.
• ANCHOR_SNP refers to the NAME of a SNP that groups SNPs in high LD with each other. The tool counts all of the SNPs with the same ANCHOR_SNP as one group.
• Although the column header requires the PANELS label, the PANELS column field value is optional.

Again, the SNPs listed with the same ANCHOR_SNP will be in the same haplotype. If there is a discrepancy between the MAFs within a block, the tool considers the MAF of the first SNP, i.e. that with the smallest genomic position, the MAF of the block. Again, MAF stands for minor allele frequency.

The VCF-based haplotype map

Starting with Picard version v2.10.1 (released 2017/7/11), tools will recognize a VCF format if the file extension ends in .vcf, .vcf.gz or .bcf. Tools will interpret all other file extensions fas the original text-based format we describe above.

• The VCF format haplotype map contains exactly one sample whose genotype calls are all heterozygous, e.g. 0/1 or 0|1.
• The tool determines haplotype block grouping using phased genotypes (with a pipe |) and the PS (phase set) format field annotation.