We discovered today that we made an error in the documentation article that describes the RNAseq Best Practices workflow. The error is not critical but is likely to cause an increased rate of False Positive calls in your dataset.

The error was made in the description of the "Split & Trim" pre-processing step. We originally wrote that you need to reassign mapping qualities to 60 using the ReassignMappingQuality read filter. However, this causes all MAPQs in the file to be reassigned to 60, whereas what you want to do is reassign MAPQs only for good alignments which STAR identifies with MAPQ 255. This is done with a different read filter, called ReassignOneMappingQuality. The correct command is therefore:

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

In our hands we see a bump in the rate of FP calls from 4% to 8% when the wrong filter is used. We don't see any significant amount of false negatives (lost true positives) with the bad command, although we do see a few more true positives show up in the results of the bad command. So basically the effect is to excessively increase sensitivity, at the expense of specificity, because poorly mapped reads are taken into account with a "good" mapping quality, where they would normally be discarded.

This effect will be stronger in datasets with lower overall quality, so your results may vary. Let us know if you observe any really dramatic effects, but we don't expect that to happen.

To be clear, we do recommend re-processing your data if you can, but if that is not an option, keep in mind how this affects the rate of false positive discovery in your data.

We apologize for this error (which has now been corrected in the documentation) and for the inconvenience it may cause you.



sirian on 11 Jun 2014


Thanks for the correction! I was actually wondering a little bit why you changed every score.

sboyle on 11 Jun 2014


Thanks for the correction Geraldine!

kam on 11 Jun 2014


Here's a working link for [ReassignOneMappingQuality](https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_engine_filters_ReassignOneMappingQualityFilter.php). Have you considered using the mapping quality formula proposed by the authors of subread? This mapping quality score works for any read mapper. See page 20 in the [Subread User's Guide](http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf).

Geraldine_VdAuwera on 11 Jun 2014


@kam This may be a good recommendation to make to the authors of the mappers.

justinjj on 11 Jun 2014


Dear Geraldine, Could you please clarify me is there any difference or issue if I use the mapq as 50 instead of 60 as suggested to run GATK? The bwa aligner higher quality is 60 but the tophat2 (uses bowtie2) provide higher quality/unique mapq as 50 and GATK runs without any error in both cases also by star aligner "--outSAMmapqUnique 50" The RNAseq mappers is already giving meaningful quality score? https://software.broadinstitute.org/gatk/guide/article?id=3891 (So we use the GATK’s ReassignOneMappingQuality read filter to reassign all good alignments to the default value of 60. This is not ideal, and we hope that in the future RNAseq mappers will emit meaningful quality scores, but in the meantime this is the best we can do.) Thanks.

Geraldine_VdAuwera on 11 Jun 2014


Yes, that's fine. If newer versions of the mappers produce MAPQ scores in that range, there is no need to reassign a different value.

justinjj on 11 Jun 2014


Thanks Geraldine.

NeillGibson on 11 Jun 2014


Hi, How can I do the re assignment of the STAR mapping quality in GATK4? I have some trouble finding the correct documentation / an GATK4 SplitNCigarReads CLI example. See also https://gatkforums.broadinstitute.org/gatk/discussion/10800/gatk4-how-to-reassign-star-mapping-quality-from-255-to-60-with-splitncigarreads/ Thank you.

Geraldine_VdAuwera on 11 Jun 2014


@NeillGibson Since this was published many moons ago, the STAR aligner gained the ability to assign meaningful mapping quality scores on request. So we did not implement the ability to reassign MAPQs into GATK4.

NeillGibson on 11 Jun 2014


@Geraldine_VdAuwera Thank you for this information. I passed this information on to the issue that I started at bcbio, where I found out first about this issue. https://github.com/chapmanb/bcbio-nextgen/issues/2163 One risk that I see is that using the `STAR --outSAMmapqUnique 60` option maybe fixes the issue with GATK, but that other downstream tools maybe still depend on the (still default) STAR mapping quality value of 255 (e.g. cufflinks). > The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, and int(-10*log10(1- > 1/Nmap)) for multi-mapping reads. This scheme is same as the one used by TopHat and is com- > patible with Cuffinks. The default MAPQ=255 for the unique mappers maybe changed with > --outSAMmapqUnique parameter (integer 0 to 255) to ensure compatibility with downstream tools > such as GATK. > https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Geraldine_VdAuwera on 11 Jun 2014


Ah, that’s fair concern. I do think that eventually we will want to port the read transformer that made it possible to change the mapqs, but to be frank it’s not a high priority for us. You can still use the read transformer capability using PrintReads in the old version as a stopgap in the meantime.

aggp11 on 11 Jun 2014


Are there any updated "Best Practices" for RNA-Seq variant calling with GATK 4.0?

Sheila on 11 Jun 2014


@aggp11 Hi, Not yet, but I think they are in progress. -Sheila

Geraldine_VdAuwera on 11 Jun 2014


That's right, there's been some work in that direction; the latest version of that (which is not yet all-GATK4) is at https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels

melnuesch on 11 Jun 2014


As a suggestion, it would be very nice if you do something like this https://software.broadinstitute.org/gatk/documentation/article.php?id=3891 , that serves as a nice guide, with the GATK 4 code. Especially for students that are learning this for the first time, this makes the difference. It is way more user friendly than the github workflow ( https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels )

Geraldine_VdAuwera on 11 Jun 2014


Yep that’s the plan, just haven’t found time to do it




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@hdeus To be clear we're not yet using convolutional neural nets (CNNs) in copy number (CNV) analysis. We're using… https://t.co/MYPQVjvzyC
18 May 18
Small correction on this #BioIT18 agenda item: Mark will actually be introducing Lee Lichtenstein from our team to… https://t.co/0UxLmX1uN0
16 May 18
RT @broadinstitute: Geraldine Van der Auwera of @gatk_dev will be presenting on the team’s Best Practices Pipeline tomorrow (5/16) at 12:40…
16 May 18
At Bio-IT World? Check out our abundant lineup of GATK4 talks and demos at #BioIT18 https://t.co/ahkUjny6Cw
15 May 18
Hey that pipeline looks familiar :) https://t.co/68lupDUkxN
15 May 18

- Our favorite tweets from others

Bioinformatics in a nutshell. 😑 #genetics #research #phd #gatk https://t.co/EjqaeFf4YZ
18 May 18
Want to hear the latest on WDL, Cromwell, FireCloud, and GATK #BioIT18 ? See this blog for tomorrow's schedule of t… https://t.co/S7kf58tECu
16 May 18
We caught up with @broadinstitute's Anthony Philippakis and Illumina's Susan Tousi today for perspective on today's… https://t.co/PNaVXbNB0r
15 May 18
#BioIT18 folks - come to booth #410 on 5/15 at 5:00 to learn about our $5 genome analysis pipeline (5 is clearly th… https://t.co/GH0zwrLlea
14 May 18
Excited to be registered for #GCCBOSC! Looking forward to the GATK4.0 and snakemake workshops. https://t.co/fMfcH38OhO
11 May 18

See more of our favorite tweets...