Showing tool doc from version 4.0.4.0 | The latest version is 4.1.2.0

**EXPERIMENTAL** CNNVariantTrain

Train a CNN model for filtering variants

Category Variant Evaluation and Refinement


Overview

Train a Convolutional Neural Network (CNN) for filtering variants. This tool expects requires training data generated by CNNVariantWriteTensors.

Inputs

  • data-dir The training data created by CNNVariantWriteTensors.
  • The tensor-name argument determines what types of tensors the model will expect. Set it to "reference" for 1D tensors or "read_tensor" for 2D tensors.

Outputs

  • output-dir The model weights file and semantic configuration json are saved here. This default to the current working directory.
  • model-name The name for your model.

Usage example

Train a 1D CNN on Reference Tensors

 gatk CNNVariantTrain \
   -tensor-type reference \
   -input-tensors-dir my_tensor_folder \
   -model-name my_1d_model
 

Train a 2D CNN on Read Tensors

 gatk CNNVariantTrain \
   -input-tensors-dir my_tensor_folder \
   -tensor-type read-tensor \
   -model-name my_2d_model
 

CNNVariantTrain specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--input-tensor-dir
null Directory of training tensors to create.
Optional Tool Arguments
--arguments_file
[] read one or more arguments files and add them to the command line
--epochs
10 Maximum number of training epochs.
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
 -h
false display the help message
--image-dir
null Path where plots and figures are saved.
--model-name
variant_filter_model Name of the model to be trained.
--output-dir
./ Directory where models will be saved, defaults to current working directory.
--tensor-type
reference Name of the tensors to generate, reference for 1D reference tensors and read_tensor for 2D tensors.
--training-steps
10 Number of training steps per epoch.
--validation-steps
2 Number of validation steps per epoch.
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
null A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--TMP_DIR
[] Undocumented option
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--annotation-set
best_practices Which set of annotations to use.
--channels-last
true Store the channels in the last axis of tensors, tensorflow->true, theano->false
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotation-set / -annotation-set

Which set of annotations to use.

String  best_practices


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--channels-last / -channels-last

Store the channels in the last axis of tensors, tensorflow->true, theano->false

boolean  true


--epochs / -epochs

Maximum number of training epochs.

int  10  [ [ 0  ∞ ] ]


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--help / -h

display the help message

boolean  false


--image-dir / -image-dir

Path where plots and figures are saved.

String  null


--input-tensor-dir / -input-tensor-dir

Directory of training tensors to create.

R String  null


--model-name / -model-name

Name of the model to be trained.

String  variant_filter_model


--output-dir / -output-dir

Directory where models will be saved, defaults to current working directory.

String  ./


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--tensor-type / -tensor-type

Name of the tensors to generate, reference for 1D reference tensors and read_tensor for 2D tensors.

The --tensor-type argument is an enumerated type (TensorType), which can have one of the following values:

reference
read_tensor

TensorType  reference


--TMP_DIR / NA

Undocumented option

List[File]  []


--training-steps / -training-steps

Number of training steps per epoch.

int  10  [ [ 0  ∞ ] ]


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--validation-steps / -validation-steps

Number of validation steps per epoch.

int  2  [ [ 0  ∞ ] ]


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.0.4.0 built at 23-40-2018 11:40:56.