Read filters are internal filters that can be applied by the GATK engine when using tools that take in read data. This allows us to select reads to be included for analysis based on various criteria. The full list of available read filters is available in the Tool Documentation section of the user guide.
The names of read filters are typically formulated to express what reads they allow through. For example
PairedReadFilter selects reads that have the "paired" flag and discards those that do not, and
NotDuplicateReadFilter select reads that have the "duplicate" flag, and discards those that do not. Note that in some cases (including the latter example) this logic is the inverse of what the corresponding filter did in older versions of GATK.
Most GATK tools apply one or more read filters by default. You can look up exactly what are the defaults for each tool in their respective Tool Documentation pages. We do not recommend disabling the default read filters used by a given tool, because the filters protect the tools from receiving types of data (e.g. malformed reads) that would make them malfunction. However, it is possible to disable all read filters by using the
AllowAllReadsReadFilter filter, which overrides all others.
To apply a read filter, use the following syntax in your command line:
Some read filters have an on/off behavior, while others take arguments that modify their behavior or allow you to set threshold values. For example, when using
ReadLengthReadFilter to filter reads based on their length, you can specify a maximum length like this:
--read-filter ReadLengthReadFilter --maxReadLength 76
And of course, you can add as many filters as you like by using multiple copies of the
--read-filter ReadLengthReadFilter --maxReadLength 76 --read-filter NotDuplicateReadFilter
These arguments are not positional, so the order in which you put them in the command does not matter. You don't even need to group them together in the command (there can be others that come in between) but in general we do recommend you try to keep them together for readability (do future-you a favor!).