Showing docs for version 3.6-0 | The latest version is 4.1.4.0


CombineGVCFs

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

Category Variant Manipulation Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

CombineGVCFs is meant to be used for hierarchical merging of gVCFs that will eventually be input into GenotypeGVCFs. One would use this tool when needing to genotype too large a number of individual gVCFs; instead of passing them all in to GenotypeGVCFs, one would first use CombineGVCFs on smaller batches of samples and then pass these combined gVCFs to GenotypeGVCFs.

Input

Two or more Haplotype Caller gVCFs to combine.

Output

A combined multisample gVCF.

Usage example

 java -jar GenomeAnalysisTK.jar \
   -T CombineGVCFs \
   -R reference.fasta \
   --variant sample1.g.vcf \
   --variant sample2.g.vcf \
   -o cohort.g.vcf
 

Caveat

Only gVCF files produced by HaplotypeCaller (or CombineGVCFs) can be used as input for this tool. Some other programs produce files that they call gVCFs but those lack some important information (accurate genotype likelihoods for every position) that GenotypeGVCFs requires for its operation.


Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by CombineGVCFs.

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 1 bp after the locus

Command-line Arguments

Engine arguments

All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

CombineGVCFs specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--variant
 -V
NA One or more input gVCF files
Optional Inputs
--dbsnp
 -D
none dbSNP file
Optional Outputs
--out
 -o
stdout File to which the combined gVCF should be written
Optional Parameters
--breakBandsAtMultiplesOf
0 If > 0, reference bands will be broken up at genomic positions that are multiples of this number
--group
 -G
[StandardAnnotation] One or more classes/groups of annotations to apply to variant calls
Optional Flags
--convertToBasePairResolution
 -bpResolution
false If specified, convert banded gVCFs to all-sites gVCFs
Advanced Parameters
--annotation
 -A
[AS_RMSMappingQuality] One or more specific annotations to recompute. The single value 'none' removes the default annotations

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotation / -A

One or more specific annotations to recompute. The single value 'none' removes the default annotations
Which annotations to recompute for the combined output VCF file.

List[String]  [AS_RMSMappingQuality]


--breakBandsAtMultiplesOf / -breakBandsAtMultiplesOf

If > 0, reference bands will be broken up at genomic positions that are multiples of this number
To reduce file sizes our gVCFs group similar reference positions into bands. However, there are cases when users will want to know that no bands span across a given genomic position (e.g. when scatter-gathering jobs across a compute farm). The option below enables users to break bands at pre-defined positions. For example, a value of 10,000 would mean that we would ensure that no bands span across chr1:10000, chr1:20000, etc. Note that the --convertToBasePairResolution argument is just a special case of this argument with a value of 1.

int  0  [ [ -∞  ∞ ] ]


--convertToBasePairResolution / -bpResolution

If specified, convert banded gVCFs to all-sites gVCFs

boolean  false


--dbsnp / -D

dbSNP file
The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.

This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

RodBinding[VariantContext]  none


--group / -G

One or more classes/groups of annotations to apply to variant calls
Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.

String[]  [StandardAnnotation]


--out / -o

File to which the combined gVCF should be written

VariantContextWriter  stdout


--variant / -V

One or more input gVCF files
The gVCF files to merge together

R List[RodBindingCollection[VariantContext]]  NA


Return to top


See also GATK Documentation Index | Tool Docs Index | Support Forum

GATK version 3.6-0-g89b7209 built at 2017/02/09 12:52:48.