ChainFinder is provided as a compiled executable file that is compatible with 64-bit unix systems. To run, ChainFinder requires the MATLAB Compiler Runtime (MCR), which can be downloaded from MathWorks at http://www.mathworks.com/products/compiler/mcr/index.html. Please note that ChainFinder version 1.0.0 requires MCR version 8.0 (2012b), while ChainFinder version 1.0.1 requires MCR version 8.1 (2013a).
To run ChainFinder, first download and unpack the folder from the link above. Then from within the ChainFinder folder, execute the command:
./run_ChainFinder.sh <MCR_directory>
where <MCR_directory> is the directory where the MCR or Matlab is installed.
Input files
ChainFinder requires several input files, as detailed below. Examples of each can be found within the folder “sample_data”.
The file “parameters.txt” must be provided and must list values for the parameters below in the following format (one parameter and value per line; see “parameters.txt” in the “ChainFinder” folder for an example):
<Parameter>: <value>
Parameter
(default value)
|
Comments
|
run_name
(new_run)
|
An identifier for a given analysis
|
rearrangement_data(sampledata/prostate_rr_sample.txt)
|
The name of a tab-delimited text file containing information about rearrangement breakpoints for each sample, formatted as described below
|
copy_number_data(sampledata/prostate_cn_sample.txt)
|
The name of a tab-delimited text file containing segmented copy number alteration data for each sample, formatted as described below
|
background_rate_file(sampledata/prostate_rr_sample.txt)
|
The name of a tab-delimited text file listing rearrangements that will be used to calculate local rates of chromosomal breakage. This can be the same file as specified for <rearrangement_data>, or another file formatted in the same manner
|
copy_number_type
(snp)
|
Indicates whether segmented copy number profiles were generated from SNP arrays (“snp”) or sequencing data (“seq”)
|
summarize_genes
(true)
|
Indicates whether or not to create an output file listing genes that are potentially disrupted by chains of rearrangements
|
mu_window
(1000000)
|
The size of the window in base-pairs for tallying the rearrangements listed in <background_rate_file> to estimate local rates of rearrangements across the genome
|
gene_test_window
(25000)
|
Genes that fall within this distance (in base-pairs) of a chain breakpoint will be noted in the gene summary output file
|
array_probes
(sampledata/probes_hg19.mat)
(note: this file is configured for Affymetrix SNP 6.0 arrays mapped to hg19 coordinates).
|
A “.mat” (MATLAB) file containing an array called “probes” composed of two columns. The chromosome number and genomic coordinate of each probe are listed in the first and second columns, respectively. The chromosomes and probe coordinates must be sorted in ascending order. X and Y chromosomes are specified as 23 and 24, respectively.
|
deletion_thresh
(-0.1)
|
Copy number segments with values below this threshold will be considered as deletions
|
probe_window
(8)
|
Indicates how far from a breakpoint to search for the edge of a deletion segment that may correspond to the breakpoint, in numbers of array probes. Note: this value is only used if <copy_number_type> is set to “snp”
|
bp_window
(5000)
|
Indicates how far from a breakpoint to search for the edge of a deletion segment that may correspond to the breakpoint, in base-pairs. Note: this value is only used if <copy_number_type> is set to “seq”
|
significance_thresh
(0.05)
|
The Benjamini-Hochberg-corrected q-value at which deviation from the independent model of rearrangements will be considered significant
|
genome_size
(2846426791)
|
Base-pairs in the reference genome build
|
gene_table
(gene_table_hg19.txt)
|
A text file containing the genomic coordinates of genes for annotation of output files (required only if <summarize_genes> is set to “true”)
|
test_distance_thresh
(1000000)
|
Breakpoints within this reference genome distance will be tested for significant adjacency
|
create_circos_file
(true)
|
Indicates whether ChainFinder should generate a “.conf” file for displaying rearrangements on a Circos plot
|
The input files listed in “parameters.txt” must be provided as tab-delimited text files in the following format. The columns listed below are required for each input file (please see the files in the “sample_data” folder for examples). The indicated header must be listed at the top of each column:
rearrangement_data:
Header
|
Values
|
sample
|
A unique name for each sample (must be consistent across all input files that refer to the sample and may not contain spaces)
|
num
|
A number to identify each rearrangement
|
chr1
|
Chromosome of the first breakpoint in the fusion
|
pos1
|
Base-pair coordinate of the first breakpoint in the fusion
|
str1
|
The strand direction of the first breakpoint (0 for forward, 1 for reverse)
|
chr2
|
Chromosome of the second breakpoint in the fusion
|
pos2
|
Base-pair coordinate of the second breakpoint in the fusion
|
str2
|
The strand direction of the second breakpoint (0 for forward, 1 for reverse)
|
site1 (optional)
|
An optional description of the genomic context of the first breakpoint (e.g., nearby genes)
|
site2 (optional)
|
An optional description of the genomic context of the second breakpoint (e.g., nearby genes)
|
background_rate_file:
Header
|
Values
|
sample
|
A unique name for each sample (must be consistent across all input files that refer to the sample and may not contain spaces)
|
chr1
|
Chromosome of the first breakpoint in the fusion
|
pos1
|
Base-pair coordinate of the first breakpoint in the fusion
|
chr2
|
Chromosome of the second breakpoint in the fusion
|
pos2
|
Base-pair coordinate of the second breakpoint in the fusion
|
copy_number_data:
Header
|
Values
|
sample
|
A unique name for each sample (must be consistent across all input files that refer to the sample and may not contain spaces)
|
chr
|
Chromosome
|
start
|
Base-pair coordinate of copy number segment start
|
end
|
Base-pair coordinate of copy number segment end
|
segment_mean
|
Amplitude of copy number segment (e.g. log2 ratio)
|
num_probes
|
If <copy_number_type> is set to “snp”, this indicates the number of array probes contained within the copy number segment
|
gene_table:
Header
|
Values
|
gene
|
Gene name
|
chr
|
Chromosome
|
gene_start
|
Base-pair coordinate of gene start
|
gene_end
|
Base-pair coordinate of gene end
|
Outputs:
Output file
|
Description
|
Chain_summary_<run_name>.txt
|
Summarizes rearrangement chain metrics for each sample and each chain
|
<sample>_chain_genes.txt
|
Summarizes genes that are potentially deleted in the context of a chain or are within <gene_test_window> of a chained breakpoint (one file is created for each sample)
|
<sample>_chains_final.txt
|
Annotated list of all rearrangements assigned to chains for a given sample
|
<sample>_chains_long.txt
|
Detailed output documenting the calculations performed by ChainFinder for a given sample
|
<sample>_chain_circos.conf
<sample>_chain_<#>.links
<sample>_cn.txt
|
Files created within the “Circos” folder that can be used as inputs to Circos to plot rearrangement chains coded by color
|