FastQC (v1)

Generates a QC report on raw sequence data.

Author: Brabaham Institute

Contact:

Marc-Danie Nazaire, gp-help@broadinstitute.org

Algorithm Version: 0.10.1

Introduction

The FastQC module runs the FastQC quality control tool developed at Brabaham instititute. FastQC takes as input the raw sequencing data (short read data contained within a FastQ, BAM or SAM file) produced by an NGS sequencing platform and produces a quality control report which can identify problems that might have originated either in the sequencer or during library preparation.  FastQC's analysis is performed by a series of analysis modules.  The report provides a quick overview that presents a status (normal, slightly abnormal, very unusual) for each quality analysis module.   For each quality analysis module the report contains a graph or table presenting corresponding quality statistics.  

Parameters

Name Description
input file * A raw sequence file - .fastq, .sam, .bam.
input format Bypasses the normal sequence file format detection and forces the program to use the specified format. Valid formats are bam,sam,bam_mapped,sam_mapped and fastq
contaminant file  Specifies a non-default file which contains the list of contaminants to screen overrepresented sequences against. The file must contain sets of named contaminants in the form name[tab]sequence. Lines prefixed with a hash will be ignored.
kmer size * Specifies the length of Kmer to look for in the Kmer content module. Specified Kmer length must be between 2 and 10. Default length is 5.
extract output Whether to output an uncompressed version of the report. Set this to yes to view the report directly from within GenePattern.

* - required

Input Files

  1. input file
    A raw sequence file in FASTQ, SAM, or BAM format.
  2. contaminant file
    A tab delimited file in the following format name[tab]sequence. Header lines starting with "#" are ignored.
# This is an example contaminant file    
     
Illumina Single End Adapter 1 GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
Illumina Single End Adapter 2 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
Illumina Single End PCR Primer 1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

 

Output Files

  1. <input.file_basename>_fastqc.zip
    A zip file containing an HTML report. 

Example Data

An example of a report from a good Illumina dataset can be found here.

An example of a report from a bad Illumina dataset can be found here.

Platform Dependencies

Task Type:
RNA-seq

CPU Type:
any

Operating System:
any

Language:
Java

Version Comments

Version Release Date Description
1 2017-03-17 Production release
.5 2014-05-13 Beta Release