Allele-specific strand bias estimated by the Symmetric Odds Ratio test

Category Annotation Modules

VCF Field INFO (variant-level)

Type ActiveRegionBasedAnnotation, ReducibleAnnotation, AS_StandardAnnotation

Header definition line
  • INFO=<ID=AS_SOR,Number=A,Type=Float,Description="Allele specific strand Odds Ratio of 2x|Alts| contingency table to detect allele specific strand bias">

  • Overview

    Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other.

    The AS_StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of the Fisher Strand Test that is better at taking into account large amounts of data in high coverage situations. It is used to determine if there is strand bias between forward and reverse strands for the reference or alternate allele. It does so separately for each allele. The reported value is ln-scaled.

    Statistical notes

    Odds Ratios in the 2x2 contingency table below are

    $$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$

    and its inverse:

     + strand - strand

    The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where

    $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$


    $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$

    ensures that the annotation value is large only.

    See the method document on statistical tests for a more detailed explanation of this statistical test.


    The name AS_StrandOddsRatio is not entirely appropriate because the implementation was changed somewhere between the start of development and release of this annotation. Now SOR isn't really an odds ratio anymore. The goal was to separate certain cases of data without penalizing variants that occur at the ends of exons because they tend to only be covered by reads in one direction (depending on which end of the exon they're on), so if a variant has 10 ref reads in the + direction, 1 ref read in the - direction, 9 alt reads in the + direction and 2 alt reads in the - direction, it's actually not strand biased, but the FS score is pretty bad. The implementation that resulted derived in part from empirically testing some read count tables of various sizes with various ratios and deciding from there.

    Related annotations

    • StrandOddsRatio outputs a version of this annotation that includes all alternate alleles in a single calculation.
    • StrandBiasBySample outputs counts of read depth per allele for each strand orientation.
    • FisherStrand uses Fisher's Exact Test to evaluate strand bias.

    Return to top

    See also GATK Documentation Index | Tool Docs Index | Support Forum

    GATK version 3.8-0-ge9d806836 built at 2017/07/29 01:40:22.