Showing tool doc from version 4.0.3.0 | The latest version is 4.1.2.0

CrosscheckReadGroupFingerprints (Picard)

DEPRECATED: USE CrosscheckFingerprints. Checks if all read groups within a set of BAM files appear to come from the same individual

Category Diagnostics and Quality Control


Overview

Program to check that all read groups within the set of BAM files appear to come from the same individual.

CrosscheckReadGroupFingerprints (Picard) specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--HAPLOTYPE_MAP
 -H
null The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.
--INPUT
 -I
[] One or more input files (or lists of files) with which to compare fingerprints.
Optional Tool Arguments
--ALLOW_DUPLICATE_READS
false Allow the use of duplicate reads in performing the comparison. Can be useful when duplicate marking has been overly aggressive and coverage is low.
--arguments_file
[] read one or more arguments files and add them to the command line
--CROSSCHECK_BY
READGROUP Specificies which data-type should be used as the basic comparison unit. Fingerprints from readgroups can be "rolled-up" to the LIBRARY, SAMPLE, or FILE level before being compared. Fingerprints from VCF can be be compared by SAMPLE or FILE.
--CROSSCHECK_LIBRARIES
false Instead of producing the normal comparison of read-groups, roll fingerprints up to the library level and print out a library x library matrix with LOD scores.
--CROSSCHECK_SAMPLES
false Instead of producing the normal comparison of read-groups, roll fingerprints up to the sample level and print out a sample x sample matrix with LOD scores.
--EXIT_CODE_WHEN_MISMATCH
1 When one or more mismatches between groups is detected, exit with this value instead of 0.
--EXPECT_ALL_GROUPS_TO_MATCH
false Expect all groups' fingerprints to match, irrespective of their sample names. By default (with this value set to false), groups (readgroups, libraries, files, or samples) with different sample names are expected to mismatch, and those with the same sample name are expected to match.
--EXPECT_ALL_READ_GROUPS_TO_MATCH
false Expect all read groups' fingerprints to match, irrespective of their sample names. By default (with this value set to false), read groups with different sample names are expected to mismatch, and those with the same sample name are expected to match.
--GENOTYPING_ERROR_RATE
0.01 Assumed genotyping error rate that provides a floor on the probability that a genotype comes from the expected sample. Must be greater than zero.
--help
 -h
false display the help message
--LOD_THRESHOLD
 -LOD
0.0 If any two groups (with the same sample name) match with a LOD score lower than the threshold the tool will exit with a non-zero code to indicate error. Program will also exit with an error if it finds two groups with different sample name that match with a LOD score greater than -LOD_THRESHOLD. LOD score 0 means equal likelihood that the groups match vs. come from different individuals, negative LOD score -N, mean 10^N time more likely that the groups are from different individuals, and +N means 10^N times more likely that the groups are from the same individual.
--LOSS_OF_HET_RATE
0.5 The rate at which a heterozygous genotype in a normal sample turns into a homozygous (via loss of heterozygosity) in the tumor (model assumes independent events, so this needs to be larger than reality).
--MATRIX_OUTPUT
 -MO
null Optional output file to write matrix of LOD scores to. This is less informative than the metrics output and only contains Normal-Normal LOD score (i.e. doesn't account for Loss of Heterozygosity). It is however sometimes easier to use visually.
--NUM_THREADS
1 The number of threads to use to process files and generate fingerprints.
--OUTPUT
 -O
null Optional output file to write metrics to. Default is to write to stdout.
--OUTPUT_ERRORS_ONLY
false If true then only groups that do not relate to each other as expected will have their LODs reported.
--SECOND_INPUT
 -SI
[] A second set of input files (or lists of files) with which to compare fingerprints. If this option is provided the tool compares each sample in INPUT with the sample from SECOND_INPUT that has the same sample ID. In addition, data will be grouped by SAMPLE regardless of the value of CROSSCHECK_BY. When operating in this mode, each sample in INPUT must also have a corresponding sample in SECOND_INPUT. If this is violated, the tool will proceed to check the matching samples, but report the missing samples and return a non-zero error-code.
--version
false display the version number for this tool
Optional Common Arguments
--COMPRESSION_LEVEL
5 Compression level for all compressed files created (e.g. BAM and VCF).
--CREATE_INDEX
false Whether to create a BAM index when writing a coordinate-sorted BAM file.
--CREATE_MD5_FILE
false Whether to create an MD5 digest for any BAM or FASTQ files created.
--GA4GH_CLIENT_SECRETS
client_secrets.json Google Genomics API client_secrets.json file path.
--MAX_RECORDS_IN_RAM
500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
--QUIET
false Whether to suppress job-summary info on System.err.
--REFERENCE_SEQUENCE
 -R
null Reference sequence file.
--TMP_DIR
[] One or more directories with space available to be used by this program for temporary storage of working files
--USE_JDK_DEFLATER
 -use_jdk_deflater
false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
--USE_JDK_INFLATER
 -use_jdk_inflater
false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
--VALIDATION_STRINGENCY
STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--VERBOSITY
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--ALLOW_DUPLICATE_READS / NA

Allow the use of duplicate reads in performing the comparison. Can be useful when duplicate marking has been overly aggressive and coverage is low.

boolean  false


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--COMPRESSION_LEVEL / NA

Compression level for all compressed files created (e.g. BAM and VCF).

int  5  [ [ -∞  ∞ ] ]


--CREATE_INDEX / NA

Whether to create a BAM index when writing a coordinate-sorted BAM file.

Boolean  false


--CREATE_MD5_FILE / NA

Whether to create an MD5 digest for any BAM or FASTQ files created.

boolean  false


--CROSSCHECK_BY / NA

Specificies which data-type should be used as the basic comparison unit. Fingerprints from readgroups can be "rolled-up" to the LIBRARY, SAMPLE, or FILE level before being compared. Fingerprints from VCF can be be compared by SAMPLE or FILE.

The --CROSSCHECK_BY argument is an enumerated type (DataType), which can have one of the following values:

FILE
SAMPLE
LIBRARY
READGROUP

DataType  READGROUP


--CROSSCHECK_LIBRARIES / NA

Instead of producing the normal comparison of read-groups, roll fingerprints up to the library level and print out a library x library matrix with LOD scores.

boolean  false


--CROSSCHECK_SAMPLES / NA

Instead of producing the normal comparison of read-groups, roll fingerprints up to the sample level and print out a sample x sample matrix with LOD scores.

boolean  false


--EXIT_CODE_WHEN_MISMATCH / NA

When one or more mismatches between groups is detected, exit with this value instead of 0.

int  1  [ [ -∞  ∞ ] ]


--EXPECT_ALL_GROUPS_TO_MATCH / NA

Expect all groups' fingerprints to match, irrespective of their sample names. By default (with this value set to false), groups (readgroups, libraries, files, or samples) with different sample names are expected to mismatch, and those with the same sample name are expected to match.

boolean  false


--EXPECT_ALL_READ_GROUPS_TO_MATCH / NA

Expect all read groups' fingerprints to match, irrespective of their sample names. By default (with this value set to false), read groups with different sample names are expected to mismatch, and those with the same sample name are expected to match.

Exclusion: This argument cannot be used at the same time as EXPECT_ALL_GROUPS_TO_MATCH.

boolean  false


--GA4GH_CLIENT_SECRETS / NA

Google Genomics API client_secrets.json file path.

String  client_secrets.json


--GENOTYPING_ERROR_RATE / NA

Assumed genotyping error rate that provides a floor on the probability that a genotype comes from the expected sample. Must be greater than zero.

double  0.01  [ [ -∞  ∞ ] ]


--HAPLOTYPE_MAP / -H

The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.

R File  null


--help / -h

display the help message

boolean  false


--INPUT / -I

One or more input files (or lists of files) with which to compare fingerprints.

R List[File]  []


--LOD_THRESHOLD / -LOD

If any two groups (with the same sample name) match with a LOD score lower than the threshold the tool will exit with a non-zero code to indicate error. Program will also exit with an error if it finds two groups with different sample name that match with a LOD score greater than -LOD_THRESHOLD. LOD score 0 means equal likelihood that the groups match vs. come from different individuals, negative LOD score -N, mean 10^N time more likely that the groups are from different individuals, and +N means 10^N times more likely that the groups are from the same individual.

double  0.0  [ [ -∞  ∞ ] ]


--LOSS_OF_HET_RATE / NA

The rate at which a heterozygous genotype in a normal sample turns into a homozygous (via loss of heterozygosity) in the tumor (model assumes independent events, so this needs to be larger than reality).

double  0.5  [ [ -∞  ∞ ] ]


--MATRIX_OUTPUT / -MO

Optional output file to write matrix of LOD scores to. This is less informative than the metrics output and only contains Normal-Normal LOD score (i.e. doesn't account for Loss of Heterozygosity). It is however sometimes easier to use visually.

Exclusion: This argument cannot be used at the same time as SECOND_INPUT.

File  null


--MAX_RECORDS_IN_RAM / NA

When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

Integer  500000  [ [ -∞  ∞ ] ]


--NUM_THREADS / NA

The number of threads to use to process files and generate fingerprints.

int  1  [ [ -∞  ∞ ] ]


--OUTPUT / -O

Optional output file to write metrics to. Default is to write to stdout.

File  null


--OUTPUT_ERRORS_ONLY / NA

If true then only groups that do not relate to each other as expected will have their LODs reported.

boolean  false


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--REFERENCE_SEQUENCE / -R

Reference sequence file.

File  null


--SECOND_INPUT / -SI

A second set of input files (or lists of files) with which to compare fingerprints. If this option is provided the tool compares each sample in INPUT with the sample from SECOND_INPUT that has the same sample ID. In addition, data will be grouped by SAMPLE regardless of the value of CROSSCHECK_BY. When operating in this mode, each sample in INPUT must also have a corresponding sample in SECOND_INPUT. If this is violated, the tool will proceed to check the matching samples, but report the missing samples and return a non-zero error-code.

Exclusion: This argument cannot be used at the same time as MATRIX_OUTPUT, MO.

List[File]  []


--showHidden / -showHidden

display hidden arguments

boolean  false


--TMP_DIR / NA

One or more directories with space available to be used by this program for temporary storage of working files

List[File]  []


--USE_JDK_DEFLATER / -use_jdk_deflater

Use the JDK Deflater instead of the Intel Deflater for writing compressed output

Boolean  false


--USE_JDK_INFLATER / -use_jdk_inflater

Use the JDK Inflater instead of the Intel Inflater for reading compressed input

Boolean  false


--VALIDATION_STRINGENCY / NA

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency  STRICT


--VERBOSITY / NA

Control verbosity of logging.

The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Docs Index | Support Forum

GATK version 4.0.3.0 built at 09-43-2018 09:43:10.