Showing tool doc from version 4.0.10.0 | The latest version is 4.0.10.1

CalculateContamination

Calculate the fraction of reads coming from cross-sample contamination

Category Diagnostics and Quality Control


Overview

Calculates the fraction of reads coming from cross-sample contamination, given results from GetPileupSummaries. The resulting contamination table is used with FilterMutectCalls.

This tool is featured in the Somatic Short Mutation calling Best Practice Workflow. See Tutorial#11136 for a step-by-step description of the workflow and Article#11127 for an overview of what traditional somatic calling entails. For the latest pipeline scripts, see the Mutect2 WDL scripts directory.

This tool borrows from ContEst by Cibulskis et al the idea of estimating contamination from ref reads at hom alt sites. However, ContEst uses a probabilistic model that assumes a diploid genotype with no copy number variation and independent contaminating reads. That is, ContEst assumes that each contaminating read is drawn randomly and independently from a different human. This tool uses a simpler estimate of contamination that relaxes these assumptions. In particular, it works in the presence of copy number variations and with an arbitrary number of contaminating samples. In addition, this tool is designed to work well with no matched normal data. However, one can run GetPileupSummaries on a matched normal bam file and input the result to this tool.

Usage examples

Tumor-only mode

 gatk CalculateContamination \
   -I pileups.table \
   -O contamination.table
 

Matched normal mode

 gatk CalculateContamination \
   -I tumor-pileups.table \
   -matched normal-pileups.table \
   -O contamination.table
 

The resulting table provides the fraction contamination, one line per sample, e.g. SampleID--TAB--Contamination. The file has no header.

CalculateContamination specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--input
 -I
null The input table
--output
 -O
null The output table
Optional Tool Arguments
--arguments_file
[] read one or more arguments files and add them to the command line
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
"" Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
--help
 -h
false display the help message
--high-coverage-ratio-threshold
3.0 The maximum coverage relative to the mean.
--low-coverage-ratio-threshold
0.5 The minimum coverage relative to the median.
--matched-normal
 -matched
null The matched normal input table
--tumor-segmentation
 -segments
null The output table containing segmentation of the tumor by minor allele fraction
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
null A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
null Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.

String  ""


--help / -h

display the help message

boolean  false


--high-coverage-ratio-threshold / NA

The maximum coverage relative to the mean.

double  3.0  [ [ -∞  ∞ ] ]


--input / -I

The input table

R File  null


--low-coverage-ratio-threshold / NA

The minimum coverage relative to the median.

double  0.5  [ [ -∞  ∞ ] ]


--matched-normal / -matched

The matched normal input table

File  null


--output / -O

The output table

R File  null


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--tmp-dir / NA

Temp directory to use.

String  null


--tumor-segmentation / -segments

The output table containing segmentation of the tumor by minor allele fraction

File  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.0.10.0 built at 04-52-2018 02:52:21.