We discovered today that we made an error in the documentation article that describes the RNAseq Best Practices workflow. The error is not critical but is likely to cause an increased rate of False Positive calls in your dataset.

The error was made in the description of the "Split & Trim" pre-processing step. We originally wrote that you need to reassign mapping qualities to 60 using the ReassignMappingQuality read filter. However, this causes all MAPQs in the file to be reassigned to 60, whereas what you want to do is reassign MAPQs only for good alignments which STAR identifies with MAPQ 255. This is done with a different read filter, called ReassignOneMappingQuality. The correct command is therefore:

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

In our hands we see a bump in the rate of FP calls from 4% to 8% when the wrong filter is used. We don't see any significant amount of false negatives (lost true positives) with the bad command, although we do see a few more true positives show up in the results of the bad command. So basically the effect is to excessively increase sensitivity, at the expense of specificity, because poorly mapped reads are taken into account with a "good" mapping quality, where they would normally be discarded.

This effect will be stronger in datasets with lower overall quality, so your results may vary. Let us know if you observe any really dramatic effects, but we don't expect that to happen.

To be clear, we do recommend re-processing your data if you can, but if that is not an option, keep in mind how this affects the rate of false positive discovery in your data.

We apologize for this error (which has now been corrected in the documentation) and for the inconvenience it may cause you.


sirian


Thanks for the correction! I was actually wondering a little bit why you changed every score.

Wed 11 Jun 2014

sboyle


Thanks for the correction Geraldine!

Wed 11 Jun 2014

kam


Here's a working link for [ReassignOneMappingQuality](https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_engine_filters_ReassignOneMappingQualityFilter.php). Have you considered using the mapping quality formula proposed by the authors of subread? This mapping quality score works for any read mapper. See page 20 in the [Subread User's Guide](http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf).

Wed 11 Jun 2014

Geraldine_VdAuwera


@kam This may be a good recommendation to make to the authors of the mappers.

Wed 11 Jun 2014

justinjj


Dear Geraldine, Could you please clarify me is there any difference or issue if I use the mapq as 50 instead of 60 as suggested to run GATK? The bwa aligner higher quality is 60 but the tophat2 (uses bowtie2) provide higher quality/unique mapq as 50 and GATK runs without any error in both cases also by star aligner "--outSAMmapqUnique 50" The RNAseq mappers is already giving meaningful quality score? https://software.broadinstitute.org/gatk/guide/article?id=3891 (So we use the GATK’s ReassignOneMappingQuality read filter to reassign all good alignments to the default value of 60. This is not ideal, and we hope that in the future RNAseq mappers will emit meaningful quality scores, but in the meantime this is the best we can do.) Thanks.

Wed 11 Jun 2014

Geraldine_VdAuwera


Yes, that's fine. If newer versions of the mappers produce MAPQ scores in that range, there is no need to reassign a different value.

Wed 11 Jun 2014

justinjj


Thanks Geraldine.

Wed 11 Jun 2014




- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

It's official: GATK 4.0 will be released Jan 9, 2018 https://t.co/e76tduFJKk
16 Oct 17
GATK events and workshops update https://t.co/cnn39FPBzY
13 Oct 17
RT @BroadGenomics: Join our @WDL_dev & @gatk_dev teams to learn more on reproducible workflows! #ASHG17 https://t.co/qt6nypKSLc
13 Oct 17
We're excited to be part of this evening workshop demoing the GATK4 WDL pipelines! https://t.co/ScyNqIvmsl
6 Oct 17
I added a video to a @YouTube playlist https://t.co/JhzxN3INZ9 MPG Primer: High throughput sequencing and variant calling pipelines
6 Oct 17

- Our favorite tweets from others

Although it made me cry sometimes, I owe them a lot and love them much more. https://t.co/vUj0cBllgn
16 Oct 17
Round of applause at #BOSC2017 for GATK4 being open sourced. https://t.co/WRhTeKtKTX
23 Jul 17
Round of applause for @Katewanders - Broad Institute will open source data science tools from now on https://t.co/CvLhwgBQUK #BOSC2017
23 Jul 17
The @gatk_dev team, that delivered an excellent "GATK Best Practices for Variant Discovery" workshop this week, on… https://t.co/Z8GNduuDeJ
20 Jul 17
Amazing session @edgenome with @gatk_dev comes to an end. Enriched learning! Thx #gatk #gatk2017 #Genomics #Edinburgh #Bioinformatics
19 Jul 17
See more of our favorite tweets...