Segmentation task

Segmentation task

Scripture's main algorithm to "segment" the genome from the sequence data into regions enriched in read coverage takes as input a read alignment file, genome information and filtering parameters to produce a transcript graph. Requires about 2 GB of memory for the data in the paper.

Command: java -Xmx2000m jar scripture.jar <Mandatory parameters> <optional parameter>
Mandatory Parameters

 -alignment: Path to the a spliced read alignment file. In this first version only one alignment is supported, so various sequencing lanes must be combined before invoking scripture. Alignments must be in BAM or SAM format and need to be both sorted and indexed. To sort and index we recommend to use igvtools (for SAM) and samtools (from BAM).

-out: Path to a file for Scripture to write its output. The output of Scripture is a BED file format containing all identified transcripts, additionally scripture also outputs a full graph file containing all segments found in the data (significant or not) to a file named after the value specified by this parameter but with an extra .dot extension. The format of the graph file is DOT. This file is self contained and can be used to estimate expression or further processing (e.g. add paired data if non was used when this task was ran). See extract task for information on how to view information in this file.

-sizeFile: A 2-column tab separated file containing the chromosome name and size for the organism.

-chr: Chromosome to segment (in this version Scripture calls transcripts one chromsome at a time).

-chrSequence: Full path to the chromosome sequence in fasta format for the chromosome to segment.

Optional Parameters

-start: Start of region to segment if not segmenting the full chromosome.

-end: End of region to segment when not segmenting the full chromosome.

-windows: Comma separated list of fixed size windows to scan. By default Scripture identifies regions of uninterrupted coverage and uses this regions to segment the data. However, in some cases it is usefule to specify alternative or multiple window sizes.

-alpha: Desired genome-wide significance level, the default is 0.05.

-pairedEnd: Paired end data. This file can be in either SAM, BAM format and should contain the full insert  (from the end of the first pair to the beginning of the second pair) as it maps to the genome.

-upWeightSplices: Spliced regions are less common than reads that map without a splice, by requiring that these reads be flanked by splice sites their random or background distribution is likely different than reads that map contiguously. When this flag is present, spliced reads are given more weight when computing coverage. Use this flag to increase sensitivity to discover transcripts.