Sam flags down a boat

Posted by shlee on 18 Feb 2016 (0)

Let's talk about SAM flags. We'll breeze over some background (skipping details better left to CS majors), filter alignment records via their flag bits using Samtools, and end with how to create a valid BAM containing read pairs or sets where at least one record has the specified flag bit.

SAM flags summarize many properties of reads, represented by flag bits, into a single number. Think of each flag bit as one layer of watercolor paint on a canvas. Each SAM flag is then the distinct chroma resulting from multiple layers of overlapping pigment.

Cats pPuUrR12; Sam flags down a boat

That's the mnemonic I created to memorize the twelve characters in pPuUrR12sfdb. Typically, I use the Explain flags page on the Picard website to decipher SAM flag values. To do better, I made a summary chart and mnemonic for the twelve SAM flag bits. The chart's rows 1–8 refer to cats purring (pur1 and PUR2), and rows 9–12 refer to the title of this blogpost sfdb.

If you are an American football fan and/or watched the Super Bowl last week, you may know the Carolina Panthers' mascot Sir Purr. Sir Purr's team lost 24:10 and gives us another mnemonic: Sir pPuUrR12 loses Superbowl Fifty to Denver Broncos.

Dive deeper into bits with this exercise

If you're interested, this exercise uses Mac OS's Calculator app. Switch to Calculator's Programmer mode and play around with the features (A–E) I've highlighted in the figure.

In the example in (F), SAM flag 99 displays in binary as 0000 0110 0011. This is read right-to-left, where 0=no and 1=yes, giving us yes values for positions 1, 2, 6 and 7. Using our mnemonic, this translates to pPR1 or paired, properly paired, mate reverse and read1.

If you forget what a particular bit represents, use the samtools flags command as shown below.

samtools flags 256

Replacing 256 with SECONDARY or 0x100 returns the same answer.

0x100   256 SECONDARY

Filter reads by their flag bits using Samtools

The flagstat option gives a useful summary count for the SAM flag categories.

samtools flagstat xyz.bam 

The next set of commands either display actual records or return counts -c of records. Depending on capitalization, -f and -F either include or exclude records with the specified flag bits, respectively. Listing two flag values, e.g. -F 2 -f 256, returns records that satisfy both conditions, e.g. not proper pair and secondary.

samtools view -f 0x63 xyz.bam | more #display records with all bits represented by 99 
samtools view -c -f 99 xyz.bam #count records with all bits represented by 99
samtools view -c -f 1 -f 2 -f 32 -f 64 xyz.bam #same as above
samtools view -c -F 2 xyz.bam #count records that do not have the 2 bit in flag

Subset reads by flag bit into a valid BAM

For transforming BAMs, I stick to using Picard tools for downstream GATK compatibility.

At one point I wanted to visualize in isolation only secondary alignments but in the context of their read sets. To create such a subsetted BAM, I used three commands. The first uses the 256 SAM flag to extract the read names, the second sorts reads by queryname, and the third command uses FilterSamReads and the list of read names to generate a valid BAM containing our reads of interest.

# Generate a list of unique read names of secondary alignments
samtools view -f 256 xyz.bam | cut -f1 | sort | uniq > xyz_f256.txt 

# Queryname sort the input BAM
java -jar picard.jar SortSam INPUT=xyz.bam OUTPUT=xyz_querynamesort.bam SORT_ORDER=queryname

# Create a new BAM containing read sets
java -jar picard.jar FilterSamReads INPUT=xyz_querynamesort.bam OUTPUT=xyz_f256.bam \
FILTER=includeReadList READ_LIST_FILE=xyz_f256.txt SORT_ORDER=coordinate CREATE_INDEX=true TMP_DIR=/tmp

Since forward and reverse reads in a pair and multiple alignments for a given read are all identically named in the BAM, the list of read names pulls out the alignment set for a given read name. You can also set the FILTER parameter to excludeReadList.

Nautical flags at top spell SAM FLAGS. If I've gone overboard with sailing references, je m'excuse. I've had a great year racing solings.

Return to top

Thu 18 Feb 2016
Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19
New crop of GATK workshop videos now available on YouTube! Updated for the GATK4/2019 version of the Best Practices…
25 Jul 19
Don't miss this #GATK workshop -- we've got a great crew lined up and the location isn't half bad either :)
23 Jul 19
@Brunods1001 It’s been updated to use GATK4, which addresses the invalid bam output issue that affected the GATK3 v…
11 Jul 19
Wrapping up the #GATK workshop in Cambridge, UK -- it's been a blast. Great group of participants and fantastic hos…
11 Jul 19

- Our favorite tweets from others

In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no…
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying…
18 Jul 19
@RareSeas first attempt at teaching the GATK course, do I look puzzled up there?
11 Jul 19
Can you spot CDGP PhD student, Dr. Alice Denyer, brushing up on the latest bioinformatics tools from @gatk_dev? The…
10 Jul 19
GATK workshop materials available online! Learn it in your own time with @ProjectJupyter notebooks. ^MT
8 Jul 19

See more of our favorite tweets...