What is the ChainFinder Algorithm?

How does ChainFinder work?

An outline of the algorithm is provided below. For terminology and further details, please refer to the supplemental material of Baca et al., Punctuated Evolution of Prostate Cancer Genomes, Cell (2013) (http://dx.doi.org/10.1016/j.cell.2013.03.021) . It is important to note that the ability of ChainFinder to detect inter-dependent rearrangements depends on the number of rearrangements within a given genome. The sensitivity to detect complex rearrangements is decreased in more extensively rearranged genomes; therefore, caution should be used when comparing complex rearrangements between tumor genomes.

ChainFinder Algorithm Outline:

1. Create a graph representation of the somatic rearrangement breakpoints specified in the file <rearrangement_data>. This graph is constructed as a sparse square matrix, where each row and column represents a breakpoint. The matrix describes the connectivity between breakpoint nodes in the graph, which in turn reflects the grouping of breakpoints that are deemed inter-dependent by ChainFinder. The value 1 at position (m,n) indicates that breakpoints m and n are connected by an edge on the graph. In this initial step, the nodes corresponding to the two breakpoints from each rearrangement are connected by an edge.

2. Search for pairs of breakpoints to test for deviation from the model of independent breakpoints.

a. These pairs of breakpoints must:

i. Be nearest neighbors on the reference genome

ii. Not be involved in the same rearrangement

iii. Fall within <test_distance_thresh> of each other

b. Calculate the expected local breakpoint density in the vicinity of each such pair of breakpoints.

i. Subdivide the genome into windows of size <mu_window> and tally the number of samples in <background_rate_file> with one or more breakpoints within each window.

ii. Use this value to scale the global breakpoint rate (equal to the number of breakpoints in the tumor genome divided by <genome_size>) to obtain μlocal, the expected local rate of breakpoints per base-pair.

c. For each pair of neighboring breakpoints, calculate the adjacency probability PXY as follows, where L is the reference genome distance between the breakpoints: PXY = 1-(1-2μlocal)L

d. Adjust all values of PXY for multiple hypothesis testing using the Bonferroni-Hochberg procedure.

e. Update the graph from step (1) to connect all neighboring breakpoint pairs with adjusted PXY values less than the threshold specified by <false_discovery_rate>.

3. Cycle through all breakpoints and identify segments of DNA deletion in <copy_number_data> with a boundary near a given breakpoint on the reference genome. This step identifies deletion events that may correspond to rearrangements detected by sequencing.

a. If <copy_number_type> is set to “snp”, search within <probe_window> SNP probes of a breakpoint for a corresponding deletion segment edge. The reference genome mapping of these SNP probes is documented in <array_probes>.

b. If <copy_number_type> is set to “seq”, search within <bp_window> of a breakpoint for a corresponding deletion segment edge.

c. Each breakpoint may only be associated with one segment edge; however, multiple candidate breakpoints may be provisionally assigned to the same deletion segment edge at this stage (a one-to-one pairing of segment edges to breakpoints is established later).

d. For a breakpoint to be paired to a deletion segment edge, the reads supporting a breakpoint must point toward the corresponding deletion segment.

4. Search for breakpoints and deletion segments that correspond to “simple deletions” and remove them from consideration as deletion bridge breakpoints in subsequent steps.

a. Breakpoints of simple deletions must:

i. Be assigned to a deletion segment edge (not necessarily the same segment)

ii. Be somatically fused to one another (i.e., from the same rearrangement)

iii. Point toward each other

b. In addition, the copy number between the two breakpoints at either end of a simple deletion must be consistently less than <deletion_thresh>.

5. Search for potential “deletion bridges” (deletion segments spanning breakpoints from two distinct rearrangements).

a. Create a list of all breakpoints corresponding to segment boundaries that are not involved in “simple deletions.”

b. List potential deletion bridges by finding all pairs of breakpoints from step (5a) that point toward each other and are connected by segments with copy number consistently less than <deletion_thesh>. This approach allows for the possibility of multiple, overlapping deletions that may arise on distinct alleles or in distinct tumor subclones.

c. Search for breakpoints and deletions that may correspond to the loss of a chromosomal end (i.e., deletions extending to the q- or p-terminus of a chromosomes that are bounded by a breakpoint only on one side). These deletions are treated as deletion bridges in subsequent steps.

d. Update the graph created in step (1) to provisionally connect the nodes of breakpoints that are potentially connected by a deletion bridge.

6. Search for cycles involving inter-dependent rearrangements and deletions. This step aids in detecting sets of inter-dependent rearrangements that were not identified based on the calculations of PXY in step (2).

a. Update the graph to temporarily connect all breakpoints within <test_distance_thresh> of each other.

b. Enumerate all sets of breakpoint nodes that are connected to one another by one or more paths. These sets are referred to as “metapaths”

7. Test each metapath as follows:

a. List all potential deletion bridges contained within the metapath.

b. Create an exclusivity matrix for bridges in the path that indicates which sets of bridges are mutually exclusive.

i. Bridges are mutually exclusive if they share the same breakpoint or deletion segment edge.

ii. Additionally, if deletion bridges overlap, the overlapping region must have a consistently lower copy number than the region on either side of the overlap (consistent with deletion of multiple alleles, rather than events affecting the same allele). If this criterion is not met for two potential deletion bridges, they are considered mutually exclusive.

c. Assess all cycles within the metapath to find collections of breakpoints that deviate significantly from the independent breakpoints model.

i. Search each metapath for cycles (closed paths beginning and ending at a single node that involve four or more breakpoint nodes).

ii. List the breakpoints and potential deletion bridges contained in each cycle.

iii. Remove any cycles from the list that involve mutually exclusive bridges.

iv. List all pairs of breakpoints contained in the cycle that are within <test_distance_thresh> of each other and look up the associated PXY values calculated in step (2).

v. Multiply all pairwise combinations of PXY values within the cycle. For example, for a cycle involving three pairs of neighboring breakpoints (ab, cd and ef) with PXY values given by Pab, Pcd and Pef, calculate the following products: {PabPcd; PabPef; PcdPef}

vi. Determine which cycles deviate significantly from the independent breakpoint model by finding cycles for which where every PXY product combination can be rejected at a family-wise error rate for all test below <false_discovery_rate>.

vii. Remove deletion bridges that are not contained within significant cycles. The breakpoints at either end of a deletion bridge must participate in a cycle that deviates significantly from the independent model.

d. If two or potential deletion bridge exist within a metapath and are mutually exclusive, choose between them as follows:

i. Create an exclusivity matrix for all significant cycles within the metapath based on whether they contain mutually exclusive deletion bridges. Cycles containing mutually exclusive deletion bridges are mutually exclusive.

ii. Choose between sets of mutually permissible cycles as follows:

1. Select the set that incorporates the maximum number of breakpoints in deletion bridges.

2. If a unique solution with a maximum number of breakpoints in deletion bridges does not exist, search for a unique solution with a maximum number of breakpoints in significant cycles.

3. If a unique solution with a maximum number of breakpoints in significant cycles does not exist, retain only cycles and bridges that are shared between all interpretations.

8. Finalize the graph based on adjacent breakpoint pairs and breakpoint cycles that deviate significantly from the independent breakpoint model (i.e., as calculated in steps (2) and (7)). Trim each metapath to contain only those breakpoints and associated deletion bridges for which the independent model was rejected.

9. List all genes that are spanned by deletions associated with chained (inter-dependent) rearrangements, or are within <gene_test_window> of a chained rearrangement breakpoint.