Showing tool doc from version 4.1.2.0 | The latest version is 4.1.2.0

DenoiseReadCounts

Denoises read counts to produce denoised copy ratios

Category Copy Number Variant Discovery


Overview

Denoises read counts to produce denoised copy ratios.

Typically, a panel of normals produced by CreateReadCountPanelOfNormals is provided as input. The input counts are then standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if the panel contains GC-content annotated intervals), 3) filtering intervals to those contained in the panel, 4) dividing by interval medians contained in the panel, 5) dividing by the sample median, and 6) transforming to log2 copy ratio. The result is then denoised by subtracting the projection onto the specified number of principal components from the panel.

If no panel is provided, then the input counts are instead standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if GC-content annotated intervals are provided), 3) dividing by the sample median, and 4) transforming to log2 copy ratio. No denoising is performed, so the denoised result is simply taken to be identical to the standardized result.

If performed, explicit GC-bias correction is done by GCBiasCorrector.

Note that number-of-eigensamples principal components from the input panel will be used for denoising; if only fewer are available in the panel, then they will all be used. This parameter can thus be used to control the amount of denoising, which will ultimately affect the sensitivity of the analysis.

See comments for CreateReadCountPanelOfNormals regarding coverage on sex chromosomes. If sex chromosomes are not excluded from coverage collection, it is strongly recommended that case samples are denoised only with panels containing only individuals of the same sex as the case samples.

Inputs

  • Counts TSV or HDF5 file from CollectReadCounts.
  • (Optional) Panel-of-normals from CreateReadCountPanelOfNormals. If provided, it will be used to standardize and denoise the input counts. This may include explicit GC-bias correction if annotated intervals were used to create the panel.
  • (Optional) GC-content annotated-intervals from AnnotateIntervals. This can be provided in place of a panel of normals to perform explicit GC-bias correction.

Outputs

  • Standardized-copy-ratios file. This is a tab-separated values (TSV) file with a SAM-style header containing a read group sample name, a sequence dictionary, a row specifying the column headers contained in CopyRatioCollection.CopyRatioTableColumn, and the corresponding entry rows.
  • Denoised-copy-ratios file. This is a tab-separated values (TSV) file with a SAM-style header containing a read group sample name, a sequence dictionary, a row specifying the column headers contained in CopyRatioCollection.CopyRatioTableColumn, and the corresponding entry rows.

Usage examples

     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --count-panel-of-normals panel_of_normals.pon.hdf5 \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 
     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --annotated-intervals annotated_intervals.tsv \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 
     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 

DenoiseReadCounts specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--denoised-copy-ratios
null Output file for denoised copy ratios.
--input
 -I
null Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts).
--standardized-copy-ratios
null Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided.
Optional Tool Arguments
--annotated-intervals
null Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored.
--arguments_file
[] read one or more arguments files and add them to the command line
--count-panel-of-normals
null Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals).
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
"" Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
--help
 -h
false display the help message
--number-of-eigensamples
null Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used.
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
null A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
null Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotated-intervals / NA

Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored.

File  null


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--count-panel-of-normals / NA

Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals).

File  null


--denoised-copy-ratios / NA

Output file for denoised copy ratios.

R File  null


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.

String  ""


--help / -h

display the help message

boolean  false


--input / -I

Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts).

R File  null


--number-of-eigensamples / NA

Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used.

Integer  null


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--standardized-copy-ratios / NA

Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided.

R File  null


--tmp-dir / NA

Temp directory to use.

GATKPathSpecifier  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.1.2.0 built at Tue, 23 Apr 2019 14:55:55 -0400.