Showing tool doc from version 4.1.0.0 | The latest version is 4.1.1.0

**EXPERIMENTAL** CNNVariantWriteTensors

Write variant tensors for training a CNN to filter variants

Category Variant Filtering


Overview

Write variant tensors for training a Convolutional Neural Network (CNN) for filtering variants. After running this tool, a model can be trained with the CNNVariantTrain tool.

Inputs

  • The input variants to make into tensors. These variant calls must be annotated with the standard best practices annotations.
  • The truth VCF has validated variant calls, like those in the genomes in a bottle, platinum genomes, or CHM VCFs. Variants in both the input VCF and the truth VCF will be used as positive training data.
  • The truth BED is a bed file define the confident region for the validated calls. Variants from the input VCF inside this region, but not included in the truth VCF will be used as negative training data.
  • The --tensor-type argument determines what types of tensors will be written. Set it to "reference" to write 1D tensors or "read_tensor" to write 2D tensors.
  • The bam-file argument is necessary to write 2D tensors which incorporate read data.

Outputs

  • data-dir This directory is created and populated with variant tensors. it will be divided into training, validation and test sets and each set will be further divided into positive and negative SNPs and INDELs.

Usage example

Write Reference Tensors

 gatk CNNVariantWriteTensors \
   -R reference.fasta \
   -V input.vcf.gz \
   -truth-vcf platinum-genomes.vcf \
   -truth-bed platinum-confident-region.bed \
   -tensor-type reference \
   -output-tensor-dir my-tensor-folder
 

Write Read Tensors

 gatk CNNVariantWriteTensors \
   -R reference.fasta \
   -V input.vcf.gz \
   -truth-vcf platinum-genomes.vcf \
   -truth-bed platinum-confident-region.bed \
   -tensor-type read_tensor \
   -bam-file input.bam \
   -output-tensor-dir my-tensor-folder
 

CNNVariantWriteTensors specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--output-tensor-dir
null Directory of training tensors. Subdivided into train, valid and test sets.
--reference
 -R
null Reference fasta file.
--truth-bed
null Confident region of the validated VCF file.
--truth-vcf
null Validated VCF file.
--variant
 -V
null Input VCF file
Optional Tool Arguments
--arguments_file
[] read one or more arguments files and add them to the command line
--bam-file
"" BAM or BAMout file to use for read data when generating 2D tensors.
--downsample-indels
0.5 Fraction of INDELs to write tensors for.
--downsample-snps
0.05 Fraction of SNPs to write tensors for.
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
"" Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
--help
 -h
false display the help message
--max-tensors
1000000 Maximum number of tensors to write.
--tensor-type
reference Name of the tensors to generate.
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
null A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
null Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--annotation-set
best_practices Which set of annotations to use.
--channels-last
true Store the channels in the last axis of tensors, tensorflow->true, theano->false
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotation-set / -annotation-set

Which set of annotations to use.

String  best_practices


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--bam-file / -bam-file

BAM or BAMout file to use for read data when generating 2D tensors.

String  ""


--channels-last / -channels-last

Store the channels in the last axis of tensors, tensorflow->true, theano->false

boolean  true


--downsample-indels / -downsample-indels

Fraction of INDELs to write tensors for.

float  0.5  [ [ -∞  ∞ ] ]


--downsample-snps / -downsample-snps

Fraction of SNPs to write tensors for.

float  0.05  [ [ -∞  ∞ ] ]


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.

String  ""


--help / -h

display the help message

boolean  false


--max-tensors / -max-tensors

Maximum number of tensors to write.

int  1000000  [ [ 0  ∞ ] ]


--output-tensor-dir / -output-tensor-dir

Directory of training tensors. Subdivided into train, valid and test sets.

R String  null


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--reference / -R

Reference fasta file.

R String  null


--showHidden / -showHidden

display hidden arguments

boolean  false


--tensor-type / -tensor-type

Name of the tensors to generate.

The --tensor-type argument is an enumerated type (TensorType), which can have one of the following values:

reference
read_tensor

TensorType  reference


--tmp-dir / NA

Temp directory to use.

String  null


--truth-bed / -truth-bed

Confident region of the validated VCF file.

R String  null


--truth-vcf / -truth-vcf

Validated VCF file.

R String  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--variant / -V

Input VCF file

R String  null


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.1.0.0 built at Wed, 30 Jan 2019 10:21:04 +0530.