ClusterSeparationAnnotator documentation

ClusterSeparationAnnotator

Annotator that reports on the separation of read depth clusters used in genotyping.

Category: Variant Annotators

The ClusterSeparation annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.

Introduction

The ClusterSeparation annotator reports on the model fit and cluster separation of the Gaussian mixture model used in read depth genotyping. The cluster separation statistics are an important part of filtering sites for accurate genotyping.

Output Formats

This annotator can produce the following outputs: annotated VCF, report file.

The annotated VCF contains the following additional INFO fields:

GSCLUSTERSEP
Cluster separation metric (mean Mahalanobis distance between the CN1 and CN2 clusters across population samples).
GSCLUSTERSEPWEIGHTEDMEAN
Mean of cluster separation metric weighted by estimated sample copy number.
GSCLUSTERSEPWEIGHTEDMEDIAN
Median cluster separation metric weighted by estimated sample copy number.
The report file contains one line per site, with the following columns:

ID
Site identifier.
GSM1
Genotyping model parameter M1 (scaling factor for cluster means).
GSM2
Genotyping model parameter M2 (scaling factor for cluster variances).
GSCLUSTERSEP
Cluster separation metric (mean Mahalanobis distance between the CN1 and CN2 clusters across population samples).
GSCLUSTERSEPWEIGHTEDMEAN
Mean of cluster separation metric weighted by estimated sample copy number.
GSCLUSTERSEPWEIGHTEDMEDIAN
Median cluster separation metric weighted by estimated sample copy number.

Example

 java -Xmx4g -cp SVToolkit.jar \
     org.broadinstitute.sv.main.SVAnnotator \
     -A ClusterSeparation \
     -R human_g1k_v37.fasta \
     -vcf input.vcf \
     -auxFilePrefix rundir/prefix \
     -O output.vcf \
     -writeReport true \
     -reportDirectory reportdir


ClusterSeparationAnnotator specific arguments

Name Type Default value Summary
Required Parameters
-auxFilePrefix String NA Path prefix to auxilliary data files generated by genotyping (typically a prefix of the output VCF)

Argument details

--auxFilePrefix / -auxFilePrefix ( required String )

Path prefix to auxilliary data files generated by genotyping (typically a prefix of the output VCF).