Showing tool doc from version 4.0.3.0 | The latest version is 4.1.3.0

AnalyzeCovariates

Evaluate and compare base quality score recalibration (BQSR) tables

Category Diagnostics and Quality Control


Overview

Evaluate and compare base quality score recalibration tables

This tool generates plots to assess the quality of a recalibration run as part of the Base Quality Score Recalibration (BQSR) procedure.

Summary of the BQSR procedure

The goal of this procedure is to correct for systematic bias that affects the assignment of base quality scores by the sequencer. The first pass consists of calculating error empirically and finding patterns in how error varies with basecall features over all bases. The relevant observations are written to a recalibration table. The second pass consists of applying numerical corrections to each individual basecall based on the patterns identified in the first step (recorded in the recalibration table) and writing out the recalibrated data to a new BAM or CRAM file.

Inputs

The tool can take up to three different sets of recalibration tables. The resulting plots will be overlaid on top of each other to make comparisons easy.

SetArgumentLabelColorDescription
Original-beforeBEFOREMaroon1 First pass recalibration tables obtained from applying org.broadinstitute.hellbender.transformers.BQSRReadTransformer on the original alignment.
Recalibrated-afterAFTERBlue Second pass recalibration tables results from the application of org.broadinstitute.hellbender.transformers.BQSRReadTransformer on the alignment recalibrated using the first pass tables
Input-bqsrBQSRBlack Any recalibration table without a specific role

You need to specify at least one set. Multiple sets need to have the same values for the following parameters:

covariate (order is not important), no_standard_covs, run_without_dbsnp, solid_recal_mode, solid_nocall_strategy, mismatches_context_size, mismatches_default_quality, deletions_default_quality, insertions_default_quality, maximum_cycle_value, low_quality_tail, default_platform, force_platform, quantizing_levels and binary_tag_name

Outputs

Currently this tool generates two outputs:

-plots my-report.pdf
A pdf document that encloses plots to assess the quality of the recalibration
-csv my-report.csv
A csv file that contains a table with all the data required to generate those plots

You need to specify at least one of them.

Usage examples

Plot a single recalibration table

   gatk AnalyzeCovariates \
     -bqsr recal1.table \
     -plots AnalyzeCovariates.pdf
 

Plot "before" (first pass) and "after" (second pass) recalibration tables to compare them

   gatk AnalyzeCovariates \
     -before recal1.table \
     -after recal2.table \
     -plots AnalyzeCovariates.pdf
 

Plot up to three recalibration tables for comparison

   gatk AnalyzeCovariates \
     -bqsr recal1.table \
     -before recal2.table \
     -after recal3.table \
     -plots AnalyzeCovariates.pdf
 

Notes

  • Sometimes you may want to compare recalibration tables where the "after" table was actually generated first. To suppress warnings about the dates of creation of the files, use the `--ignore-last-modification-times` argument.
  • You can ignore the before/after semantics completely if you like, but all tables must have been generated using the same parameters.

AnalyzeCovariates specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Optional Tool Arguments
--after-report-file
 -after
null file containing the BQSR second-pass report file
--arguments_file
[] read one or more arguments files and add them to the command line
--before-report-file
 -before
null file containing the BQSR first-pass report file
--bqsr-recal-file
 -bqsr
null Input covariates table file for on-the-fly base quality score recalibration
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--help
 -h
false display the help message
--ignore-last-modification-times
false do not emit warning messages related to suspicious last modification time order of inputs
--intermediate-csv-file
 -csv
null location of the csv intermediate file
--plots-report-file
 -plots
null location of the output report
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
null A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--TMP_DIR
[] Undocumented option
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--after-report-file / -after

file containing the BQSR second-pass report file
File containing the recalibration tables from the second pass.

File  null


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--before-report-file / -before

file containing the BQSR first-pass report file
File containing the recalibration tables from the first pass.

File  null


--bqsr-recal-file / -bqsr

Input covariates table file for on-the-fly base quality score recalibration
Enables recalibration of base qualities, intended primarily for use with BaseRecalibrator and ApplyBQSR (see Best Practices workflow documentation). The covariates tables are produced by the BaseRecalibrator tool. Please be aware that you should only run recalibration with the covariates file created on the same input bam(s).

File  null


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--help / -h

display the help message

boolean  false


--ignore-last-modification-times / NA

do not emit warning messages related to suspicious last modification time order of inputs
If true, it won't show a warning if the last-modification time of the before and after input files suggest that they have been reversed.

boolean  false


--intermediate-csv-file / -csv

location of the csv intermediate file
Output csv file name.

File  null


--plots-report-file / -plots

location of the output report
Output report file name.

File  null


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--TMP_DIR / NA

Undocumented option

List[File]  []


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Docs Index | Support Forum

GATK version 4.0.3.0 built at 09-43-2018 09:43:10.