RNA Secondary Structure Formats

BP (RNA base pairing)

A BP file (.bp) is text file format that describes connections between ranges of nucleotides, and is primarily used to indicate base pairing interactions or estimated pairing probabilities for RNA structures. BP files are rendered in IGV using colored semicircular arcs.

File Header. A file begins with any number of header lines listing all arc colors and associated labels. Each of these lines are tab-delimited, and must begin with "color", followed by the red, green, and blue color components 0-255, followed by an optional text label which will be shown in the track menu color legend. Arc colors will be rendered in listed order (i.e. the last listed color will be drawn on top). Track lines are not currently supported for this file type.

Example header line:  color: 51 114 38 High-probability basepairs

Paired Ranges. Each tab-delimited line in the rest of the file describes a single arc. The first field is the name of the associated IGV chromosome. The last field is a zero-based integer index indicating the arc color (from the colors listed in the header). The second through fifth fields are the 1-based inclusive nucleotide coordinates of paired ranges (a helix, if this is an RNA structure).

Example BP file: example.bp

 

The following RNA secondary structure formats can be imported into IGV and converted to the .bp format. After choosing a file to import, the user will be prompted to select the applicable chromosome and optional strand and starting position. IGV will then create a .bp file and load it.

DB (dot bracket)

DB (dot bracket) format (.db, .dbn) is a plain text format that can encode secondory structure. Lines beginning with > or # are currently ignored. Nucleotide sequence is currently ignored.

Secondary structure notation:

  • Unpaired nucleotides are indicated with the . or : characters.
  • Matching pairs of parentheses indicate base pairs.
  • To indicate non-nested base pairs (pseudoknots), additional brackets may be used: [], {}, or <>.

Files containing multiple sequences or structures are currently not supported.

Example:

GGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGCAUGUAGUACC
((((((((((((((.[[[[[[..))))).....]]]]]]........)))))...))))

CT (connectivity table)  

The CT format (.ct) is used by software packages such as RNAstructure. See the CT File Format on the Mathews Lab web page.

Only the first structure in a CT file will be imported by IGV. CT files with additional headers (often starting with the # character) are currently not supported.

Example CT file: example.ct

DP (dot plot or pairing probability) 

The DP file format (.dp) can be generated using the RNAstructure software package by running partition followed by ProbabilityPlot on the resulting .pfs file with the -t option for text file output. For modeling the structures of large mRNAs, the program Superfold runs partition on multiple overlapping windows, then heuristically merges the windows. Superfold outputs a merged .dp file by default.

File format:

  • 1st line is the number of entries in the file.
  • 2nd line is column names.
  • Remaining lines describe pairing probabilities between 1-based nucleotide positions, given as tab-separated

<left> <right> <-log10(probability of pairing)>

Upon import, IGV colors pairs above 80% probability dark green. Pairs between 30 and 80% probability are colored blue. Pairs between 10 and 30% probability are colored light yellow.

Other

IGV also supports viewing RNA secondary structures in BED format.