Sam flags down a boat

Posted by shlee on 18 Feb 2016 (0)

Let's talk about SAM flags. We'll breeze over some background (skipping details better left to CS majors), filter alignment records via their flag bits using Samtools, and end with how to create a valid BAM containing read pairs or sets where at least one record has the specified flag bit.

SAM flags summarize many properties of reads, represented by flag bits, into a single number. Think of each flag bit as one layer of watercolor paint on a canvas. Each SAM flag is then the distinct chroma resulting from multiple layers of overlapping pigment.

Cats pPuUrR12; Sam flags down a boat

That's the mnemonic I created to memorize the twelve characters in `pPuUrR12sfdb`. Typically, I use the Explain flags page on the Picard website to decipher SAM flag values. To do better, I made a summary chart and mnemonic for the twelve SAM flag bits. The chart's rows 1–8 refer to cats purring (`pur1` and `PUR2`), and rows 9–12 refer to the title of this blogpost `sfdb`.

If you are an American football fan and/or watched the Super Bowl last week, you may know the Carolina Panthers' mascot Sir Purr. Sir Purr's team lost 24:10 and gives us another mnemonic: Sir pPuUrR12 loses Superbowl Fifty to Denver Broncos.

Dive deeper into bits with this exercise

If you're interested, this exercise uses Mac OS's Calculator app. Switch to Calculator's Programmer mode and play around with the features (A–E) I've highlighted in the figure.

In the example in (F), SAM flag `99` displays in binary as `0000 0110 0011`. This is read right-to-left, where 0=no and 1=yes, giving us yes values for positions 1, 2, 6 and 7. Using our mnemonic, this translates to `pPR1` or paired, properly paired, mate reverse and read1.

If you forget what a particular bit represents, use the `samtools flags` command as shown below.

``samtools flags 256``

Replacing `256` with `SECONDARY` or `0x100` returns the same answer.

``0x100   256 SECONDARY``

Filter reads by their flag bits using Samtools

The `flagstat` option gives a useful summary count for the SAM flag categories.

``samtools flagstat xyz.bam ``

The next set of commands either display actual records or return counts `-c` of records. Depending on capitalization, `-f` and `-F` either include or exclude records with the specified flag bits, respectively. Listing two flag values, e.g. `-F 2 -f 256`, returns records that satisfy both conditions, e.g. not proper pair and secondary.

``````samtools view -f 0x63 xyz.bam | more #display records with all bits represented by 99
samtools view -c -f 99 xyz.bam #count records with all bits represented by 99
samtools view -c -f 1 -f 2 -f 32 -f 64 xyz.bam #same as above
samtools view -c -F 2 xyz.bam #count records that do not have the 2 bit in flag``````

Subset reads by flag bit into a valid BAM

For transforming BAMs, I stick to using Picard tools for downstream GATK compatibility.

At one point I wanted to visualize in isolation only secondary alignments but in the context of their read sets. To create such a subsetted BAM, I used three commands. The first uses the `256` SAM flag to extract the read names, the second sorts reads by queryname, and the third command uses FilterSamReads and the list of read names to generate a valid BAM containing our reads of interest.

``````# Generate a list of unique read names of secondary alignments
samtools view -f 256 xyz.bam | cut -f1 | sort | uniq > xyz_f256.txt

# Queryname sort the input BAM
java -jar picard.jar SortSam INPUT=xyz.bam OUTPUT=xyz_querynamesort.bam SORT_ORDER=queryname

# Create a new BAM containing read sets
java -jar picard.jar FilterSamReads INPUT=xyz_querynamesort.bam OUTPUT=xyz_f256.bam \

Since forward and reverse reads in a pair and multiple alignments for a given read are all identically named in the BAM, the list of read names pulls out the alignment set for a given read name. You can also set the `FILTER` parameter to `excludeReadList`.

Nautical flags at top spell SAM FLAGS. If I've gone overboard with sailing references, je m'excuse. I've had a great year racing solings.

Thu 18 Feb 2016

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

GATK Dev Team

@gatk_dev

@FanBoyShi For this kind of question we'd prefer you post in the forum so we can troubleshoot in detail -- but if… https://t.co/bQqUp4ZInM
8 Dec 19
RT @seandavis12: Calling Somatic SNVs and Indels with Mutect2 https://t.co/ntLaoaLU1X https://t.co/1Y218GLmtf
3 Dec 19
Couldn’t have said it better https://t.co/U5fpjQjpZQ
3 Dec 19
RT @broadinstitute: Genome sequencing technology allows for massive amounts of high-quality data to be produced. Researchers at Broad have…
27 Nov 19
Heads up: we’re moving the GATK website, docs and forum to a new platform. Full story and breakdown of key changes… https://t.co/S2vfAFcdny

- Our favorite tweets from others

@CBIB_UNAB @gatk_dev @TerraBioApp This project is the product of ongoing collaborations with @SGWilliams1980 and… https://t.co/y2mCQlnXdO
28 Oct 19
Si estas en #SOIBIO+10, acércate del poster 48! I will be talking about my latest research at @CBIB_UNAB looking i… https://t.co/KFjVEAL5F4
28 Oct 19
@FProgresoysalud @IAVANTE_CMAT @ClinicalBioinfo @broadinstitute @gatk_dev A really exciting course
25 Oct 19
After the Gatk workshop, I can only say thanks to @gatk_dev and @broadinstitute for their great effort to create a… https://t.co/SzHRDknSrZ
25 Oct 19
Hoy termina el GATK Workshop que nuestra Área de Bioinformática Clínica ha organizado en el centro de simulación cl… https://t.co/BY9AcfWaki
25 Oct 19

See more of our favorite tweets...