Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
CombineGVCFs is meant to be used for hierarchical merging of gVCFs that will eventually be input into GenotypeGVCFs. One would use this tool when needing to genotype too large a number of individual gVCFs; instead of passing them all in to GenotypeGVCFs, one would first use CombineGVCFs on smaller batches of samples and then pass these combined gVCFs to GenotypeGVCFs.
Two or more Haplotype Caller gVCFs to combine.
A combined multisample gVCF.
java -jar GenomeAnalysisTK.jar \ -T CombineGVCFs \ -R reference.fasta \ --variant sample1.g.vcf \ --variant sample2.g.vcf \ -o cohort.g.vcf
Only gVCF files produced by HaplotypeCaller (or CombineGVCFs) can be used as input for this tool. Some other programs produce files that they call gVCFs but those lack some important information (accurate genotype likelihoods for every position) that GenotypeGVCFs requires for its operation.
If the gVCF files contain allele specific annotations, add -G Standard -G AS_Standard to the command line.
These Read Filters are automatically applied to the data by the Engine before processing by CombineGVCFs.
This tool uses a sliding window on the reference.
All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
|Argument name(s)||Default value||Summary|
|NA||One or more input gVCF files|
|stdout||File to which the combined gVCF should be written|
||0||If > 0, reference bands will be broken up at genomic positions that are multiples of this number|
|[StandardAnnotation]||One or more classes/groups of annotations to apply to variant calls|
|false||If specified, convert banded gVCFs to all-sites gVCFs|
|[AS_RMSMappingQuality]||One or more specific annotations to recompute. The single value 'none' removes the default annotations|
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
One or more specific annotations to recompute. The single value 'none' removes the default annotations
Which annotations to recompute for the combined output VCF file.
If > 0, reference bands will be broken up at genomic positions that are multiples of this number
To reduce file sizes our gVCFs group similar reference positions into bands. However, there are cases when users will want to know that no bands span across a given genomic position (e.g. when scatter-gathering jobs across a compute farm). The option below enables users to break bands at pre-defined positions. For example, a value of 10,000 would mean that we would ensure that no bands span across chr1:10000, chr1:20000, etc. Note that the --convertToBasePairResolution argument is just a special case of this argument with a value of 1.
int 0 [ [ -∞ ∞ ] ]
If specified, convert banded gVCFs to all-sites gVCFs
The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.
This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3
One or more classes/groups of annotations to apply to variant calls
Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.
File to which the combined gVCF should be written
One or more input gVCF files
The gVCF files to merge together
R List[RodBindingCollection[VariantContext]] NA
GATK version 3.8-0-ge9d806836 built at 2017/07/29 01:40:22.