Flow Cytometry Gating and Clustering


Description

Gating is an inherent component of FCM data analysis; it is a process where particles (i.e., cells) are subsetted according to physical and fluorescence characteristics. These properties are reflected in parameter values of events stored in FCS files. In practice, gating corresponds to assigning classes (labels) to these events. This can be done either manually or automatically. While manual gating is still dominant in traditional FCM, automatic gating methods are becoming more important in contemporary and high throughput approaches. This suite supports the application of manually created gates saved in Gating-ML as well as a variety of clustering algorithms developed for the use with flow cytometry data.
(Click for documents)

Manual gating

  • ApplyGatingML applies a Gating-ML file to FCS data to gate (filter) and/or transform it. Each gate in the Gating-ML file creates a population saved in a CSV file.

Clustering

  • FlowClustClassifyFCS uses the model-based FlowClust algorithm to find populations in an FCS file.
  • FlowMeansCluster clusters flow cytometry data using the FlowMeans algorithm. This algorithm applies a nonparametric approach to perform automated gating of cell populations in flow cytometry data. Clustering results are obtained by counting the number of modes in every single dimension, followed by multi-dimensional clustering. Then adjacent clusters (in terms of Euclidean or Mahalanobis distance) are merged. The number of clusters is determined using a change point detection algorithm based on piecewise linear regression. This approach allows multiple clusters to represent the same population and enables the algorithm to find non-spherical cell populations.
  • FlowMergeCluster uses the FlowMerge algorithm to cluster flow cytometry data. The max BIC model fitting criterion for mixture models generally overestimates the number of cell populations in flow cytometry data because the number of mixture components required to accurately model a distribution is usually greater than the number of distinct cell populations. Model fitting criteria based on the entropy, such as the ICL, provide better estimates of the number of clusters but tend to provide a poor fit to the underlying distribution. FlowMerge combines these two approaches by merging mixture components from the max BIC fit based on an entropy criterion. This approach allows multiple mixture components to represent the same cell subpopulation. Merged clusters are mixtures themselves and are summarized by a weighted combination of their component model parameters. The result is a mixture model that retains the good model fitting properties of the max BIC solution but the number of components more closely reflects the true number of distinct cell subpopulations.
  • ImmPortFLOCK is a GenePattern implementation of FLOCK version 1 method at the Immunology Database and Analysis Portal (ImmPort). FLOCK is a grid-based density clustering method for automated population identification from multi-dimensional FCM data. It has been used to objectively identify seventeen distinct B cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus vaccinations and other vaccinations in peripheral blood. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study.
  • KMeansClassifyFCS uses the K-Means algorithm to find populations in an FCS file.
  • SamSPECTRALClusterFCS uses the SamSPECTRAL algorithm, which is a modification of spectral clustering, to cluster flow cytometry data. Spectral clustering is a non-parametric clustering method which has proved useful in many pattern recognition areas. It does not require a priori assumptions on the size, shape, and distributions of clusters and it is not sensitive to outliers or noise. However, there are serious empirical barriers in applying spectral clustering for large data sets, such as commonly present in flow cytometry. 
  • SuggestNumberOfPopulationsFCS suggests the number of clusters (K value) for a clustering algorithm. It tests a range of several K values and suggests a number of populations using the Bayesian Information Criterion (BIC) in combination with the FlowClust algorithm. In addition, it outputs the estimated BIC as well as the Integrated Completed Likelihood (ICL) score for each of the K values

Meta-clustering

  • ImmPortCrossSample applies centroids of an FCM data clustering result to a new FCM data file (TXT generated from FCS) for population identification and mapping between the two samples. The module is a GenePattern implementation of the cross-sample comparison method at the Immunology Database and Analysis Portal (ImmPort).
  • MClustClusterLabel performs a model-based labeling (matching) of clustered flow cytometry data. Independent clustering of a several flow cytometry samples (e.g., blood from different patients) typically results in dividing each of the FCS files into several groups corresponding to cell sub populations in each of the particular sample. This module will match these clustering results across different samples.
  • MClustClusterLabelBIC uses the Bayesian Information Criterion (BIC) for model-based labeling of clusters, i.e., it estimates the number of labels for MClustClusterLabel.

Feature extraction

  • FCMFeatureExtraction extracts features from gated flow cytometry data. These features typically include the number of events (cells) in each of the sub-population, the proportion (percentage) of events in each of the sub-population, and/or the mean value of (each or selected) parameters (e.g., the Mean Fluorescence Intensity [MFI]) of each of the sub-population.

References