0) Choose whether to align reads or use pre-aligned reads
- If you wish to personally align reads and run Scripture from your own alignments, start from Step #1 below.
- If you wish to use pre-aligned reads:
- First, download reads that are aligned to the mouse (mm9) genome: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE20851/GSE20851_GSM521650_ES.aligned.sam.gz.
- Second, unzip the file:
>> gunzip GSE20851_GSM521650_ES.aligned.sam.gz
- Third, use the index command of IGVTools to index the alignment file:
>> igvtools index GSE20851_GSM521650_ES.aligned.sam
- Now, jump to Step #7 below, but replace the value of the -alignment parameter with GSE20851_GSM521650_ES.aligned.sam. Also, remove the -pairedEnd parameter.
1) Download reads from the Gene Expression Omnibus (GEO)
>> gunzip SRR039999_1.fastq.gz
>> gunzip SRR039999_2.fastq.gz
>> gunzip SRR040000_1.fastq.gz
>> gunzip SRR040000_2.fastq.gz
>> gunzip SRR040001_1.fastq.gz
>> gunzip SRR040001_2.fastq.gz
- These RNA-Seq reads, in FASTQ format, consist of three lanes, each of which has two files (*_1, *_2) for independent left and right reads.
2) Download Bowtie reference index for mouse (mm9)
>> unzip mm9.ebwt.zip
- This reference index is used by TopHat for aligning the reads.
3) Perform spliced alignment using TopHat
- A full manual for TopHat can be found at: http://tophat.cbcb.umd.edu/manual.html.
- Ensure that your path includes the bowtie command, which is used by TopHat.
- Let the BowTie reference index files from above (mm9.1.ebwt, mm9.2.ebwt, mm9.3.ebwt, and mm9.4.ebwt) be located in PATH_TO_BOWTIE_REFERENCE_INDEX.
- Align each of the six read files (SRR0*.fastq) to the mm9 genome:
>> tophat -o tophat_out_SRR039999_1 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR039999_1.fastq
>> tophat -o tophat_out_SRR039999_2 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR039999_2.fastq
>> tophat -o tophat_out_SRR040000_1 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR040000_1.fastq
>> tophat -o tophat_out_SRR040000_2 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR040000_2.fastq
>> tophat -o tophat_out_SRR040001_1 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR040001_1.fastq
>> tophat -o tophat_out_SRR040001_2 [PATH_TO_BOWTIE_REFERENCE_INDEX]/mm9 SRR040001_2.fastq
- Each of the above commands creates an alignment file in SAM format called accepted_hits.sam that is placed in the folder specified by the "-o" parameter (tophat_out_SRR*).
- Note: Each of the above commands may require about 4GB of memory.
4) Make paired end alignment files using Scripture
- Remove the headers from the six TopHat alignment files (headers begin with "@") and sort each by read name:
>> sed '1,2d' tophat_out_SRR039999_1/accepted_hits.sam | sort > tophat_out_SRR039999_1/accepted_hits.sorted.sam
>> sed '1,2d' tophat_out_SRR039999_2/accepted_hits.sam | sort > tophat_out_SRR039999_2/accepted_hits.sorted.sam
>> sed '1,2d' tophat_out_SRR040000_1/accepted_hits.sam | sort > tophat_out_SRR040000_1/accepted_hits.sorted.sam
>> sed '1,2d' tophat_out_SRR040000_2/accepted_hits.sam | sort > tophat_out_SRR040000_2/accepted_hits.sorted.sam
>> sed '1,2d' tophat_out_SRR040001_1/accepted_hits.sam | sort > tophat_out_SRR040001_1/accepted_hits.sorted.sam
>> sed '1,2d' tophat_out_SRR040001_2/accepted_hits.sam | sort > tophat_out_SRR040001_2/accepted_hits.sorted.sam
- On each of the three read lanes, run Scripture with task makePairedFile:
>> java -Xmx4000m -jar scripture.jar -task makePairedFile -pair1 tophat_out_SRR039999_1/accepted_hits.sorted.sam -pair2 tophat_out_SRR039999_2/accepted_hits.sorted.sam -out SRR039999.paired.sam -sorted
>> java -Xmx4000m -jar scripture.jar -task makePairedFile -pair1 tophat_out_SRR040000_1/accepted_hits.sorted.sam -pair2 tophat_out_SRR040000_2/accepted_hits.sorted.sam -out SRR040000.paired.sam -sorted
>> java -Xmx4000m -jar scripture.jar -task makePairedFile -pair1 tophat_out_SRR040001_1/accepted_hits.sorted.sam -pair2 tophat_out_SRR040001_2/accepted_hits.sorted.sam -out SRR040001.paired.sam -sorted
- For each lane, Scripture outputs a paired end alignment in SAM format.
5) Combine TopHat alignments, then sort and index
- Concatenate all six TopHat alignment files:
>> cat tophat_out_SRR039999_1/accepted_hits.sorted.sam tophat_out_SRR039999_2/accepted_hits.sorted.sam tophat_out_SRR040000_1/accepted_hits.sorted.sam tophat_out_SRR040000_2/accepted_hits.sorted.sam tophat_out_SRR040001_1/accepted_hits.sorted.sam tophat_out_SRR040001_2/accepted_hits.sorted.sam > all_alignments.sam
- Use the sort command of IGVTools to sort the concatenated alignment file by start position:
>> igvtools sort all_alignments.sam all_alignments.sorted.sam
- Use the index command of IGVTools to index the concatenated, sorted alignment file:
>> igvtools index all_alignments.sorted.sam
- You should now have two files that will be used by Scripture: all_alignments.sorted.sam and all_alignments.sorted.sam.sai.
6) Combine paired end alignments, then sort and index
- Concatenate all three paired end alignment files:
>> cat SRR039999.paired.sam SRR040000.paired.sam SRR040001.paired.sam > all_alignments.paired.sam
- Use the sort command of IGVTools to sort the concatenated paired end alignment file by start position:
>> igvtools sort all_alignments.paired.sam all_alignments.paired.sorted.sam
- Use the index command of IGVTools to index the concatenated, sorted paired end alignment file:
>> igvtools index all_alignments.paired.sorted.sam
- You should now have two files that will be used by Scripture: all_alignments.paired.sorted.sam and all_alignments.paired.sorted.sam.sai.
7) Run Scripture
- The main tool of Scripture is an ab initio transcriptome reconstruction. This segmentation task is performed on each chromosome.
- For an example, we will run Scripture on chromosome 19. Download mm9.sizes (the mouse sizes file) and chr19.fa (the mouse chr19 FASTA file).
- To run Scripture on this chromosome, using all of our previous data:
>>java –jar scripture.jar –alignment all_alignments.sorted.sam –out chr19.scriptureESTest.segments –sizeFile mm9.sizes –chr chr19 –chrSequence chr19.fa -pairedEnd all_alignments.paired.sorted.sam
- You will see the following output:
Using Version VPaperR2
Computing weights..... upweighting? false weight: 1.0
Has pairs: true
Has upweighting turned on: false
Computing weights..... upweighting? false weight: 1.0
AlignmentDataModel loaded, initializing model stats
model stats loaded, initializing model
Built the model: 0.001 free memory: 92240704
Loaded chromosome Sequence
Segmenting accross graph
Going to get read iterator to make graph with counts
Got read iterator
Made it through all reads
Made first graph
Got extended piecesMade second graph
Done making spliced graph
Got Simple paths...
Estimated distribution
Made path trees
Done adding paired ends (if available)
Done getting paths. Total: 99832
Done with local segmentation
Done setting local rate in graph
Done scoring paths.
Note: This will take ~15 minutes on a standard machine to complete
- The output is stored in 2 files. The first, chr19.scriptureESTest.segments, is a BED file containing all the transcripts reconstructed. This file also contains the expression level of this transcript as well as a significance level for the transcript. The second file is chr19.scriptureESTest.segments.dot. This file is the transcript graph reconstructed by Scripture. This file is in a language called dot and can be visualized in the program called GraphViz.
8) Visualizing the transcript graphs
- The transcript graphs can be visualized using programs such as GraphViz. However, the full chromosome graph file is too big to load into this program. Scripture includes a method to extract regions from this file for visualization purposes.
- To run this program use the main Scripture jar and run task “extractDot”. The arguments to use are:
-in: This is the path to the Dot file from a previous Scripture run
-chr: The chromosome of the region to extract
-start: The start coordinate of the region to extract
-end: The end coordinate of the region to extract
-out: The output file for the new extracted dot file
- To run on our example type:
>>java –jar –task extractDot –in chr19.scriptureESTest.segments.dot –chr chr19 -start 32165265 -end 32455031 -out sgms1.dot
- Load the file into GraphViz and layout using the dot layout engine. The graph will look like this