CufflinksWrapperPipeline (v1)

A wrapper pipeline containing Cufflinks that is meant to be run only as part of the CufflinkCuffmerge scatter-gather pipeline.

Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology

Contact:

gp-help@broadinstitute.org

Algorithm Version: Cufflinks 2.0.2

Summary

This pipeline is a wrapper pipeline containing Cufflinks that is meant to be used only as part of the CufflinkCuffmerge scatter-gather pipeline. Cufflinks assembles transcripts and estimates their abundances in RNA-seq samples. It accepts aligned RNA-seq reads, then assembles the alignments into a parsimonious set of transcripts, reporting as few full-length transcript fragments [transfrags] as are needed to explain the data. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. For more information about Cufflinks please refer to the Cufflinks module documentation.

Parameters

Name Description
input file * Input file in SAM or BAM format
transfrag label  A label for the transfrags in the output files
GTF  Reference annotation (GFF/GTF file) for isoform expression estimates
GTF guide  Reference annotation (GFF/GTF file) to guide RABT assembly
mask file  A GTF file specifying transcripts to be ignored
frag bias correct  Reference (FASTA/FA) for bias detection and correction algorithm
multi read correct  Whether to do an initial estimation procedure to more accurately weight reads mapping to multiple locations in the genome
library type  The library type used to generate reads.
min frags per transfrag  Assembled transfrags supported by fewer than this many aligned RNA-Seq fragments are not reported.
additional cufflinks options  Additional options to be passed along to the Cufflinks program at the command line. This parameter gives you a means to specify otherwise unavailable Cufflinks options and switches not supported by the module; check the Cufflinks manual for details. Recommended for experts only; use this at your own discretion.

* - required

Input Files

  1. <input.file> (required)
    File of RNA-seq read alignments in SAM (a tab-delimited format) or BAM (a compressed binary version of SAM) format.  SAM is a standard short read alignment that allows aligners to attach custom tags to individual alignments.  This file is the output of a read mapping application, such as TopHat, and the alignment section contains information regarding the mapped location of each sequenced RNA-seq read on a reference genome.
    For more information on the SAM format, see the specification.

    Cufflinks will accept SAM alignments generated by any read mapper.  These must, however use the custom 'xs' tag.  This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).

    Also, the SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already.  If not, this can be done with the SortSam module.
  2. <GTF> (optional)
    A tab-delimited reference annotation file in GTF format.  This file is used by Cufflinks to estimate abundances of isoforms. These reference annotation files can be downloaded for many genomes from sites like UCSC Genome Browser.  For more information on the GTF format, see the specification.
    The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).

  3. <GTF.guide> (optional)
    A tab-delimited reference annotation file in GTF format.  This file is used by Cufflinks to guide RABT assembly.
    The GenePattern FTP site hosts a number of reference annotation GTFs, available in a dropdown selection (requires GenePattern 3.7.0+).

  4. <mask.file> (optional)
    A tab-delimited GTF file that specifies transcripts to be ignored.

  5. <frag.bias.correct> (optional)
    Reference multi-FASTA file for bias detection and correction algorithm.   For more information on the FASTA format, see this description.
    The GenePattern FTP site hosts a number of reference genomes, available in a dropdown selection (requires GenePattern 3.7.0+).

Output Files

  1. transcripts.gtf
    This GTF file contains Cufflinks' assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id" and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript.
  2. genes.fpkm_tracking
    This is a tab-delimited file containing one row per gene; the columns contain the attributes in the GTF file.  This file contains gene-level coordinates and expression values.  Note that since the output for Cufflinks is for a single sample, the "q" numbering format (see the file format information) is not used.
  3. isoforms.fpkm_tracking
    This is a tab-delimited file containing one row per isoform; the columns contain the attributes in the GTF file.  This file contains transcript-level coordinates and expression values.  Note that since the output for Cufflinks is for a single sample, the "q" numbering format (see the file format information) is not used.

Platform Dependencies

Task Type:
pipeline

CPU Type:
x86_64

Operating System:
Mac, Linux

Language:
C++, Perl

Version Comments

Version Release Date Description
1 2014-07-18