GenePattern

Analytic Technique for Assessment of RNAi by Similarity

Author: Aviad Tsherniak, Broad Institute, aviad@broadinstitute.org

Contact:

Aviad Tsherniak (Broad Institute), aviad@broadinstitute.org

Algorithm Version:

Introduction

This module implements the ATARiS method as described in [1]. It is designed to analyze phenotypic readouts from multiple-sample RNAi screens in which each gene is targeted by multiple RNAi reagents. Reagents designed to target the same gene often induce different degrees of on-target and off-target gene suppression, resulting in inconsistent phenotypes. This renders interpretation and downstream analysis of the gene-level phenotype challenging.

Algorithm

ATARiS generates two types of results:

1. Gene solutions

For each gene, ATARiS tries to identify subsets of its RNAi reagents that produce a significantly similar phenotypic pattern across the screened samples. If such a subset is found, the data from these reagents are summarized into a gene solution profile. The solution consists of one phenotype value for each sample, representing the phenotypic outcome of suppressing the target gene in that sample relative to the outcome in other screened samples. Genes for which no such subset can be found will have no gene solutions associated with them. For some genes, more than one solution may exist (e.g., two pairs of reagents, each producing effects similar between the reagents but not between the pairs). In these cases, all gene solutions will be reported.

The resulting file, containing the gene solutions for all screened genes, can be used in downstream analyses using standard genomic tools (e.g. tools designed for gene expression analysis).

2. Reagent consistency scores For each reagent, ATARiS computes a consistency score that represents the confidence that its observed phenotypic effects are the result of on-target gene suppression.

References

1. D.D. Shao, A. Tsherniak et al., “ATARiS: Computational quantification of gene suppression phenotypes from multi-sample RNAi screens”, Genome Research, 2012

Parameters

Name	Description
gct file *	input GCT file. Unique reagent identifier in first column, gene symbol in second.
identifier *	text string to identify the current run
random seed *	seed for random number generator
null significance *	gene solution significance level (1 - null percentile) Default: 0.15
min A value *	minimum effect magnitude for RNAi reagent (relative to reagent with maximal effect) Default: 0.3

* - required

Input Files

gct.file

This file contains the phenotypic readouts for all RNAi reagents and samples screened. A GCT file is a tab-delimited text file originally used for gene expression data. It can be easily created using Excel or similar programs. See http://www.broadinstitute.org/cancer/software/genepattern/gp_guides/file-formats/sections/gct for more details.

Each data row of the input GCT file contains readouts of phenotype produced by one RNAi reagent. The first column of the file contains unique reagent IDs. The second column contains gene symbols/IDs (in any format) representing the genes targeted by each reagent. Each additional column holds the data for one screened sample with the column header being the (unique) sample name/ID. Note that the distributions of values for all samples are assumed to be similar (e.g., after sample-wise Z-score transformation).

Output Files

Output Files

<identifier>.Gs.gct

A tab-delimited table in GCT file format containing the identified gene solutions in rows. The first column is a unique gene solution identifier. The second column is the gene symbol/ID of the targeted gene. Each following column contains the phenotype values for the corresponding input sample.
<identifier>.hp.table.txt

A tab-delimited table. Each row contains information for one screened RNAi reagent. The field ‘isUsed’ signifies whether the data of the reagent was used to generate a gene solution. ‘sol.number’ identifies the solution of the targeted gene that used data from this reagent. ‘sol.name’ is a unique identifier for the solution generated using the reagent data. ‘sol.id’ is a binary string with each digit representing one reagent targeting the gene, according to their order of appearance in this file. A ‘1’ means that the reagent was used to generate the current gene solution and a ‘0’ means it was not. The ‘cscore’, ‘pval’ and ‘qval’ columns hold the consistency score and its corresponding p-value and q-value for each reagent.
<identifier>.gene.table.txt

A tab-delimited table detailing for each targeted gene the number of gene solutions identified for it and the total number of RNAi reagents specified in the input file as targeting it.
<identifier>.results.summary.txt

A general summary of the analysis. Contains, among others, the overall number of gene solutions found and the overall number of reagents used.

Example Data

To reproduce the results of running ATARiS on Achilles’ Heel dataset (Dec. 2012) as published in [1], run this module using the input file Achilles_102lines.gct, using the following settings:

random.seed – 12345
null.significance – 0.15
min.A.value – 0.3

Requirements

ATARiS can only be used on the GenePattern public server. Please contact the authors listed above if you have an interest in installing ATARiS locally.

Platform Dependencies

Module type: RNAi analysis

CPU type: Any

OS: Any

Language: R v2.14

Platform Dependencies

Task Type:
RNAi

CPU Type:
any

Operating System:
any

Language:
R

Version Comments

Version	Release Date	Description
3	2016-06-07	Bug-fix: added missing package installer declaration file.
2	2015-12-02	Updated to use R-2.15 and the new package installer mechanism.

ATARiS (v3)

Introduction

Algorithm

References

Parameters

Input Files

Output Files

Example Data

Requirements

Platform Dependencies

Platform Dependencies

Version Comments