Showing docs for version 3.6-0 | The latest version is 4.1.4.0


ValidateVariants

Validate a VCF file with an extra strict set of criteria

Category Variant Evaluation Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

This tool is designed to validate much of the information inside a VCF file. In addition to standard adherence to the VCF specification, this tool performs extra strict validations to ensure the information contained within the file is correct. These include:

REF
the correctness of the reference base(s).
CHR_COUNTS
accuracy of AC & AN values.
IDS
tests against rsIDs when a dbSNP file is provided. Notice that for this one to work, you need to provide a reference to the dbsnp variant containing file using the --dbsnp as show in examples below.
ALLELES
and that all alternate alleles are present in at least one sample.

By default it will apply all the strict validations unless you indicate which one you want you want to exclude using -Xtype|--validationTypeToExclude <code>, where code is one of the listed above. You can exclude as many types as you want

Yo can exclude all strict validations with the special code ALL. In this case the tool will only test the adherence to the VCF specification.

Input

A variant set to validate using -V or --variant as shown below.

Usage examples

To perform VCF format tests and all strict validations

 java -jar GenomeAnalysisTK.jar \
   -T ValidateVariants \
   -R reference.fasta \
   -V input.vcf \
   --dbsnp dbsnp.vcf
 

To perform VCF format tests and all strict validations with the VCFs containing alleles <= 208 bases

 java -jar GenomeAnalysisTK.jar \
   -T ValidateVariants \
   -R reference.fasta \
   -V input.vcf \
   --dbsnp dbsnp.vcf
   --reference_window_stop 208
 

To perform only VCF format tests

 java -jar GenomeAnalysisTK.jar \
   -T ValidateVariants \
   -R reference.fasta \
   -V input.vcf \
   --validationTypeToExclude ALL
 

To perform all validations except the strict ALLELE validation

 java -jar GenomeAnalysisTK.jar \
   -T ValidateVariants \
   -R reference.fasta \
   -V input.vcf \
   --validationTypeToExclude ALLELES
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by ValidateVariants.

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 100 bp after the locus

Command-line Arguments

Engine arguments

All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

ValidateVariants specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--variant
 -V
NA Input VCF file
Optional Inputs
--dbsnp
 -D
none dbSNP file
Optional Parameters
--validationTypeToExclude
 -Xtype
[] which validation type to exclude from a full strict validation
Optional Flags
--doNotValidateFilteredRecords
false skip validation on filtered records
--validateGVCF
 -gvcf
false Validate this file as a GVCF
--warnOnErrors
false just emit warnings on errors instead of terminating the run at the first instance

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--dbsnp / -D

dbSNP file

This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

RodBinding[VariantContext]  none


--doNotValidateFilteredRecords / -doNotValidateFilteredRecords

skip validation on filtered records
By default, even filtered records are validated.

Boolean  false


--validateGVCF / -gvcf

Validate this file as a GVCF
Validate this file as a gvcf. In particular, every variant must include a allele, and that every base in the territory under consideration is covered by a variant (or a reference block). If you specifed intervals (using -L or -XL) to restrict analysis to a subset of genomic regions, those intervals will need to be covered in a valid gvcf.

Boolean  false


--validationTypeToExclude / -Xtype

which validation type to exclude from a full strict validation

List[ValidationType]  []


--variant / -V

Input VCF file
Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

R RodBinding[VariantContext]  NA


--warnOnErrors / -warnOnErrors

just emit warnings on errors instead of terminating the run at the first instance

Boolean  false


Return to top


See also GATK Documentation Index | Tool Docs Index | Support Forum

GATK version 3.6-0-g89b7209 built at 2017/02/09 12:52:48.