ClusterSeparationAnnotator documentation
ClusterSeparationAnnotator
Annotator that reports on the separation of read depth clusters used in genotyping.
Category: Variant Annotators
The ClusterSeparation annotator is invoked through the SVAnnotator framework, which defines arguments common to all annotators.
Introduction
The ClusterSeparation annotator reports on the model fit and cluster separation of the Gaussian mixture model used in read depth genotyping. The cluster separation statistics are an important part of filtering sites for accurate genotyping.
Output Formats
This annotator can produce the following outputs: annotated VCF, report file.
The annotated VCF contains the following additional INFO fields:
- GSCLUSTERSEP
- Cluster separation metric (mean Mahalanobis distance between the CN1 and CN2 clusters across population samples).
- GSCLUSTERSEPWEIGHTEDMEAN
- Mean of cluster separation metric weighted by estimated sample copy number.
- GSCLUSTERSEPWEIGHTEDMEDIAN
- Median cluster separation metric weighted by estimated sample copy number.
- ID
- Site identifier.
- GSM1
- Genotyping model parameter M1 (scaling factor for cluster means).
- GSM2
- Genotyping model parameter M2 (scaling factor for cluster variances).
- GSCLUSTERSEP
- Cluster separation metric (mean Mahalanobis distance between the CN1 and CN2 clusters across population samples).
- GSCLUSTERSEPWEIGHTEDMEAN
- Mean of cluster separation metric weighted by estimated sample copy number.
- GSCLUSTERSEPWEIGHTEDMEDIAN
- Median cluster separation metric weighted by estimated sample copy number.
Example
java -Xmx4g -cp SVToolkit.jar \ org.broadinstitute.sv.main.SVAnnotator \ -A ClusterSeparation \ -R human_g1k_v37.fasta \ -vcf input.vcf \ -auxFilePrefix rundir/prefix \ -O output.vcf \ -writeReport true \ -reportDirectory reportdir
ClusterSeparationAnnotator specific arguments
Name | Type | Default value | Summary |
---|---|---|---|
Required Parameters | |||
-auxFilePrefix | String | NA | Path prefix to auxilliary data files generated by genotyping (typically a prefix of the output VCF) |