# Picard haplotype map file formatDictionary | Created 2017-05-03 | Last updated 2017-07-11

The haplotype map that certain Picard tools require is a file that maps SNPs to LD (linkage disequilibrium) blocks. These tools include Picard CrosscheckReadGroupFingerprints and CheckFingerprint. For these tools, the HAPLOTYPE_MAP parameter defines the file.

• For details on what the tools do and their parameters, see https://broadinstitute.github.io/picard/.
• For an overview of fingerprinting math and comparative results for different data types, see the related poster. You can find a link to posters on this page.
• To view the javadoc documentation for tools within the Picard Jar, type

java -jar picard.jar <tool name> -h

As of this writing (5/5/2017, Picard v2.9.0), the HAPLOTYPE_MAP file is a text-based file that tab-separates fields. In a future release of Picard, this field will also accept VCF formats ending in .vcf, .vcf.gz or .bcf. At that time, tools will interpret all other file extensions for this parameter as the original text-based format.

These two formats differ in their requirements as we outline below.

## The original haplotype map file format

It has a header and a body as shown.

The header is a standard SAM header, with an @HD line to define the file type and @SQ lines to define the reference contigs. You can easily derive such a header from your reference dictionary file.

The body contains a column header line starting with a # hash followed by lines that annotate SNPs and blocks in high LD.

• NAME is a SNP identifier, e.g. dbSNP rsID
• MAF is minor allele frequency
• ANCHOR_SNP refers to the NAME of a SNP that groups SNPs in high LD with each other. The tool counts all of the SNPs with the same ANCHOR_SNP as one group.
• Although the column header requires the PANELS label, the PANELS column field value is optional.

Again, the SNPs listed with the same ANCHOR_SNP will be in the same haplotype. If there is a discrepancy between the MAFs within a block, the tool considers the MAF of the first SNP, i.e. that with the smallest genomic position, the MAF of the block. Again, MAF stands for minor allele frequency.

## The VCF-based haplotype map

Picard v2.10.1+ (released 2017/7/11) accepts this format. Tools will recognize a VCF format if the file extension ends in .vcf, .vcf.gz or .bcf. Tools will interpret all other file extensions fas the original text-based format we describe above.

• The VCF format haplotype map contains exactly one sample whose genotype calls are all heterozygous, e.g. 0/1 or 0|1.
• The tool determines haplotype block grouping using phased genotypes (with a pipe |) and the PS (phase set) format field annotation.