Showing tool doc from version 4.1.4.0 | The latest version is 4.1.4.0

FilterIntervals

Filters intervals based on annotations and/or count statistics

Category Copy Number Variant Discovery


Overview

Given specified intervals, annotated intervals output by AnnotateIntervals, and/or counts output by CollectReadCounts, outputs a filtered Picard interval list. The set intersection of intervals from the specified intervals, the annotated intervals, and the first count file will be taken as the initial set of intervals on which to perform filtering. Parameters for filtering based on the annotations and counts can be adjusted. Annotation-based filters will be applied first, followed by count-based filters. The result may be passed via -L to other tools (e.g., DetermineGermlineContigPloidy and GermlineCNVCaller) to mask intervals from analysis.

Inputs

  • Intervals to be filtered (typically, the bins output by PreprocessIntervals). The argument interval-merging-rule must be set to IntervalMergingRule#OVERLAPPING_ONLY and all other common arguments for interval padding or merging must be set to their defaults. A blacklist of regions in which intervals should always be filtered (regardless of other annotation-based or count-based filters) may also be provided via -XL; this can be used to filter pseudoautosomal regions (PARs), for example. Partial bins created by interval exclusion may be dropped upon intersection with the intervals present in other optional inputs.
  • (Optional) Annotated-intervals file from AnnotateIntervals. Must be provided if no counts files are provided.
  • (Optional) Counts files (TSV or HDF5 output of CollectReadCounts). Must be provided if no annotated-intervals file is provided.

Output

  • Filtered Picard interval-list file.

Usage examples

     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          -XL blacklist_intervals.interval_list \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          --annotated-intervals annotated_intervals.tsv \
          -O filtered_intervals.interval_list
 
     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          --annotated-intervals annotated_intervals.tsv \
          -O filtered_intervals.interval_list
 
     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          -O filtered_intervals.interval_list
 

FilterIntervals specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--intervals
 -L
[] One or more genomic intervals over which to operate
--output
 -O
null Output Picard interval-list file containing the filtered intervals.
Optional Tool Arguments
--annotated-intervals
null Input file containing annotations for genomic intervals (output of AnnotateIntervals). Must be provided if no counts files are provided.
--arguments_file
[] read one or more arguments files and add them to the command line
--extreme-count-filter-maximum-percentile
99.0 Maximum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly greater than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)
--extreme-count-filter-minimum-percentile
1.0 Minimum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly less than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)
--extreme-count-filter-percentage-of-samples
90.0 Percentage-of-samples parameter for the extreme-count filter. Intervals with a count that has a percentile outside of [extreme-count-filter-minimum-percentile, extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than this will be filtered out. (This is the second count-based filter applied.)
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
"" Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
--help
 -h
false display the help message
--input
 -I
[] Input TSV or HDF5 files containing integer read counts in genomic intervals (output of CollectReadCounts). Must be provided if no annotated-intervals file is provided.
--interval-merging-rule
 -imr
ALL Interval merging rule for abutting intervals
--low-count-filter-count-threshold
5 Count-threshold parameter for the low-count filter. Intervals with a count strictly less than this threshold in a percentage of samples strictly greater than low-count-filter-percentage-of-samples will be filtered out. (This is the first count-based filter applied.)
--low-count-filter-percentage-of-samples
90.0 Percentage-of-samples parameter for the low-count filter. Intervals with a count strictly less than low-count-filter-count-threshold in a percentage of samples strictly greater than this will be filtered out. (This is the first count-based filter applied.)
--maximum-gc-content
0.9 Maximum allowed value for GC-content annotation (inclusive).
--maximum-mappability
1.0 Maximum allowed value for mappability annotation (inclusive).
--maximum-segmental-duplication-content
0.5 Maximum allowed value for segmental-duplication-content annotation (inclusive).
--minimum-gc-content
0.1 Minimum allowed value for GC-content annotation (inclusive).
--minimum-mappability
0.9 Minimum allowed value for mappability annotation (inclusive).
--minimum-segmental-duplication-content
0.0 Minimum allowed value for segmental-duplication-content annotation (inclusive).
--version
false display the version number for this tool
Optional Common Arguments
--exclude-intervals
 -XL
[] One or more genomic intervals to exclude from processing
--gatk-config-file
null A configuration file to use with the GATK.
--interval-exclusion-padding
 -ixp
0 Amount of padding (in bp) to add to each interval you are excluding.
--interval-padding
 -ip
0 Amount of padding (in bp) to add to each interval you are including.
--interval-set-rule
 -isr
UNION Set merging approach to use for combining interval inputs
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
null Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotated-intervals / NA

Input file containing annotations for genomic intervals (output of AnnotateIntervals). Must be provided if no counts files are provided.

File  null


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--exclude-intervals / -XL

One or more genomic intervals to exclude from processing
Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals).

List[String]  []


--extreme-count-filter-maximum-percentile / NA

Maximum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly greater than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)

double  99.0  [ [ 0  100 ] ]


--extreme-count-filter-minimum-percentile / NA

Minimum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly less than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)

double  1.0  [ [ 0  100 ] ]


--extreme-count-filter-percentage-of-samples / NA

Percentage-of-samples parameter for the extreme-count filter. Intervals with a count that has a percentile outside of [extreme-count-filter-minimum-percentile, extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than this will be filtered out. (This is the second count-based filter applied.)

double  90.0  [ [ 0  100 ] ]


--gatk-config-file / NA

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.

String  ""


--help / -h

display the help message

boolean  false


--input / -I

Input TSV or HDF5 files containing integer read counts in genomic intervals (output of CollectReadCounts). Must be provided if no annotated-intervals file is provided.

List[File]  []


--interval-exclusion-padding / -ixp

Amount of padding (in bp) to add to each interval you are excluding.
Use this to add padding to the intervals specified using -XL. For example, '-XL 1:100' with a padding value of 20 would turn into '-XL 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int  0  [ [ -∞  ∞ ] ]


--interval-merging-rule / -imr

Interval merging rule for abutting intervals
By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead.

The --interval-merging-rule argument is an enumerated type (IntervalMergingRule), which can have one of the following values:

ALL
OVERLAPPING_ONLY

IntervalMergingRule  ALL


--interval-padding / -ip

Amount of padding (in bp) to add to each interval you are including.
Use this to add padding to the intervals specified using -L. For example, '-L 1:100' with a padding value of 20 would turn into '-L 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int  0  [ [ -∞  ∞ ] ]


--interval-set-rule / -isr

Set merging approach to use for combining interval inputs
By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 --interval-set-rule INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.

The --interval-set-rule argument is an enumerated type (IntervalSetRule), which can have one of the following values:

UNION
Take the union of all intervals
INTERSECTION
Take the intersection of intervals (the subset that overlaps all intervals specified)

IntervalSetRule  UNION


--intervals / -L

One or more genomic intervals over which to operate

R List[String]  []


--low-count-filter-count-threshold / NA

Count-threshold parameter for the low-count filter. Intervals with a count strictly less than this threshold in a percentage of samples strictly greater than low-count-filter-percentage-of-samples will be filtered out. (This is the first count-based filter applied.)

int  5  [ [ 0  ∞ ] ]


--low-count-filter-percentage-of-samples / NA

Percentage-of-samples parameter for the low-count filter. Intervals with a count strictly less than low-count-filter-count-threshold in a percentage of samples strictly greater than this will be filtered out. (This is the first count-based filter applied.)

double  90.0  [ [ 0  100 ] ]


--maximum-gc-content / NA

Maximum allowed value for GC-content annotation (inclusive).

double  0.9  [ [ 0  1 ] ]


--maximum-mappability / NA

Maximum allowed value for mappability annotation (inclusive).

double  1.0  [ [ 0  1 ] ]


--maximum-segmental-duplication-content / NA

Maximum allowed value for segmental-duplication-content annotation (inclusive).

double  0.5  [ [ 0  1 ] ]


--minimum-gc-content / NA

Minimum allowed value for GC-content annotation (inclusive).

double  0.1  [ [ 0  1 ] ]


--minimum-mappability / NA

Minimum allowed value for mappability annotation (inclusive).

double  0.9  [ [ 0  1 ] ]


--minimum-segmental-duplication-content / NA

Minimum allowed value for segmental-duplication-content annotation (inclusive).

double  0.0  [ [ 0  1 ] ]


--output / -O

Output Picard interval-list file containing the filtered intervals.

R File  null


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--tmp-dir / NA

Temp directory to use.

GATKPathSpecifier  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.1.4.0 built at Wed, 9 Oct 2019 15:19:59 -0400.