We discovered today that we made an error in the documentation article that describes the RNAseq Best Practices workflow. The error is not critical but is likely to cause an increased rate of False Positive calls in your dataset.

The error was made in the description of the "Split & Trim" pre-processing step. We originally wrote that you need to reassign mapping qualities to 60 using the ReassignMappingQuality read filter. However, this causes all MAPQs in the file to be reassigned to 60, whereas what you want to do is reassign MAPQs only for good alignments which STAR identifies with MAPQ 255. This is done with a different read filter, called ReassignOneMappingQuality. The correct command is therefore:

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

In our hands we see a bump in the rate of FP calls from 4% to 8% when the wrong filter is used. We don't see any significant amount of false negatives (lost true positives) with the bad command, although we do see a few more true positives show up in the results of the bad command. So basically the effect is to excessively increase sensitivity, at the expense of specificity, because poorly mapped reads are taken into account with a "good" mapping quality, where they would normally be discarded.

This effect will be stronger in datasets with lower overall quality, so your results may vary. Let us know if you observe any really dramatic effects, but we don't expect that to happen.

To be clear, we do recommend re-processing your data if you can, but if that is not an option, keep in mind how this affects the rate of false positive discovery in your data.

We apologize for this error (which has now been corrected in the documentation) and for the inconvenience it may cause you.


sirian


Thanks for the correction! I was actually wondering a little bit why you changed every score.

Wed 11 Jun 2014

sboyle


Thanks for the correction Geraldine!

Wed 11 Jun 2014

kam


Here's a working link for [ReassignOneMappingQuality](https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_engine_filters_ReassignOneMappingQualityFilter.php). Have you considered using the mapping quality formula proposed by the authors of subread? This mapping quality score works for any read mapper. See page 20 in the [Subread User's Guide](http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf).

Wed 11 Jun 2014

Geraldine_VdAuwera


@kam This may be a good recommendation to make to the authors of the mappers.

Wed 11 Jun 2014




At a glance


Follow us on Twitter

GATK Dev Team

@gatk_dev

New #GATK web address: https://t.co/SmXppw36ir (but www links or bookmarks will still work) https://t.co/PqbYbGhSWH
20 Jul 16
@yokofakun GenomeSTRiP is developed by a different group (Bob Handsaker at Harvard); expect Bob to answer on the forum.
18 Jul 16
@TechnicalVault and we're currently working on a public roadmap document for GATK4 (2/2)
11 Jul 16
@TechnicalVault Happy to discuss roadmap, not meant to be secret. We try to post updates on blog, eg https://t.co/1xKZXf3dLu (1/2)
11 Jul 16
@biobenkj @torstenseemann @broadinstitute Post question to #GATK forum -> will be answered. https://t.co/bEGBZUtyjL
11 Jul 16

Our favorite tweets from others

You’re right @gatk_dev honesty is key! About variants manual filtering: “In any case you're probably in for a world of pain.” Ha now I know!
11 Jul 16
.@gatk_dev I like the new documentation index page, the subheading has made my day! :D #doge #GeekHumourFTW https://t.co/9RXnDTMoBm
8 Jul 16
There is no NGS, NG is today so should only be called high-throughput sequencing #CSC #GATKworkshop https://t.co/paHcNimD7o
16 Jun 16
The @dgmacarthur lab leaving as they came, rock stars of science in their stretch limo https://t.co/IQ0eCOT5H6
14 May 16
Hey @BroadGenomics we just flew past 250,000 exomes + genomes. Good job everyone.
13 May 16
See more of our favorite tweets...
Search blog by tag

ad appistry ashg benchmarks best-practices bug bug-fixed cancer catvariants challenge cloud cluster commandline commandlinegatk competition compute conferences cram cromwell denovo depthofcoverage diagnosetargets error fix forum gatk3 genotype genotype-refinement genotypegvcfs google gvcf haploid haplotypecaller holiday hts htsjdk ibm java8 job job-offer jobs joint-discovery license meetings mendelianviolations multisample multithreading mutect mutect2 ngs nt outreach pairhmm paper parallelism patch performance phone-home picard pipeline plans ploidy polyploid poster presentations printreads profile promote reference-model release release-notes rnaseq runtime saas search selectvariants sequencing service slides snow speed status sting support syntax talks team terminology third-party-tools topstory trivia troll tutorial unifiedgenotyper variantannotator variantrecalibrator vcf-gz version-highlights versions vqsr wdl webinar workflow workshop