CufflinksCuffmergePipeline (v1)

Creates multiple transcript assemblies using Cufflinks and then merges these assemblies into one using Cuffmerge

Author: Genepattern

Contact:

gp-help@broadinstitute.org

Algorithm Version: v2.0.2

Introduction

This pipeline runs the Cufflinks and Cuffmerge modules to first create a set of transcript assemblies and then merge the individual assemblies into one. This merged assembly can then be used as input into the Cuffdiff module.

Cufflinks

Cufflinks assembles transcripts and estimates their abundances in RNA-seq samples. It accepts aligned RNA-seq reads, then assembles the alignments into a parsimonious set of transcripts, reporting as few full-length transcript fragments [transfrags] as are needed to explain the data. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. 

Cuffmerge

The main purpose of Cufflinks.cuffmerge is to merge together several Cufflinks assemblies, making it easier to produce an assembly GTF file suitable for use with Cufflinks.cuffdiff.  Cufflinks.cuffmerge also runs Cuffcompare in the background and automatically filters out transcribed fragments (transfrags) that are likely to be artifacts. 

Cufflinks.cuffmerge is essentially a "meta-assembler": it treats the assembled transfrags from Cufflinks the way that Cufflinks treats reads, by merging them together parsimoniously, producing the smallest number of transcripts that explain the data. Furthermore, when a reference genome annotation is available, Cufflinks.cuffmerge can integrate reference transcripts into the merged assembly. It can also perform a reference annotation based transcript (RABT) assembly to merge reference transcripts with sample transfrags and produces a single annotation file for use in downstream differential analysis.

References

Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. 2013;31:46-53.

Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578.

Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-SeqBioinformatics. 2011 Sep 1;27(17):2325-9.

Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.  Nat Biotechnol. 2010;28:511-515.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-SeqBioinformatics. 2009;25:1105-1111.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.

Parameters

Name Description
aligned files * One or more aligned SAM/BAM input files
GTF guide Reference annotation (GFF/GTF file) to guide RABT assembly
mask GTF file  A GTF file specifying transcripts to be ignored
library type  The library type used to generate reads.
min frags per transfrag  Assembled transfrags supported by fewer than this many aligned RNA-Seq fragments are not reported.
additional command line options  Additional options to be passed along to the Cufflinks program at the command line. This parameter gives you a means to specify otherwise unavailable Cufflinks options and switches not supported by the module; check the Cufflinks manual for details. Recommended for experts only; use this at your own discretion.
reference GTF  An optional reference annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output. Cuffmerge will use this to attach gene names and other metadata to the merged catalog.
genome file * A file containing the genomic DNA sequences for the reference. This should be a multi-FASTA file with all contigs present.

* - required

Input Files

  1. One or more files containing RNA-seq read alignments in SAM (a tab-delimited format) or BAM (a compressed binary version of SAM) format. 
  2. A reference annotation GTF file

Output Files

  1. merged.gtf - A GTF file containing a merged transcript assembly

Platform Dependencies

Task Type:
pipeline

CPU Type:
any

Operating System:
any

Language:
Java

Version Comments

Version Release Date Description
1 2014-07-30