MalformedReadFilter

Filter out malformed reads

Category Read Filters


Overview

This filter is applied automatically by all GATK tools in order to protect them from crashing on reads that are malformed. There are a few types of malformation (such as the absence of sequence bases) that are not filtered out by default and can cause errors, but these cases can be preempted by setting flags that cause the problem reads to also be filtered.

Criteria used by default

  • Invalid Alignment Start: Read alignment start is inconsistent with the read unmapped flag; either read is not flagged as 'unmapped', but alignment start is NO_ALIGNMENT_START, or read is not flagged as 'unmapped', but alignment start is -1.
  • Invalid Alignment End: Read aligns to negative number of bases in the reference.
  • Alignment Disagrees With Header: Read is aligned to nonexistent contig or read is aligned to a point after the end of the contig.
  • Missing or Undefined Read Group: Either the RG tag is missing, it is not defined in the header, or required elements such as RGID are missing.
  • Cigar Disagrees With Alignment: Read has a valid alignment start, but the CIGAR string is empty.
  • CIGAR Is Not Supported: Read CIGAR contains operators that are not supported (N which is treated separately).

Optional criteria

  • Mismatching Bases And Quals: Read does not have the same number of bases and base qualities.
  • Bases Not Stored: Read with no stored bases, has '*' instead in the SEQ field.
  • CIGAR With N Operator: Read CIGAR contains N operator (typical of RNA_seq data).

Usage example

Set the malformed read filter to also filter out reads that have no stored sequence bases

     java -jar GenomeAnalysisTk.jar \
         -T ToolName \
         -R reference.fasta \
         -I input.bam \
         -o output.file \
         -filterNoBases
 

Note that the MalformedRead filter itself does not need to be specified in the command line because it is set automatically.


Command-line Arguments

MalformedReadFilter specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Optional Flags
--filter_bases_not_stored
 -filterNoBases
false Filter out reads with no stored bases (i.e. '*' where the sequence should be), instead of failing with an error
--filter_mismatching_base_and_quals
 -filterMBQ
false Filter out reads with mismatching numbers of bases and base qualities, instead of failing with an error
--filter_reads_with_N_cigar
 -filterRNC
false Filter out reads with CIGAR containing the N operator, instead of failing with an error

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--filter_bases_not_stored / -filterNoBases

Filter out reads with no stored bases (i.e. '*' where the sequence should be), instead of failing with an error

boolean  false


--filter_mismatching_base_and_quals / -filterMBQ

Filter out reads with mismatching numbers of bases and base qualities, instead of failing with an error

boolean  false


--filter_reads_with_N_cigar / -filterRNC

Filter out reads with CIGAR containing the N operator, instead of failing with an error

boolean  false


Return to top


See also GATK Documentation Index | Tool Docs Index | Support Forum

GATK version 3.7-0-gcfedb67 built at 2017/02/09 12:35:06.