Require Java 6 Runtime
java -Xmx2g -jar muTect-XXXX-XX-XX.jar
--analysis_type MuTect
--reference_sequence <reference>
--cosmic <cosmic.vcf>
--dbsnp <dbsnp.vcf>
--intervals <intervals_to_process>
--input_file:normal <normal.bam>
--input_file:tumor <tumor.bam>
--out <call_stats.out>
--coverage_file <coverage.wig.txt>
These parameters are based upon your genome build for your alignments:
For HG18
<reference> - Homo_sapiens_assembly18.fasta
<dbsnp.vcf> - dbsnp_132.hg18.vcf
<cosmic.vcf> - hg18_cosmic_v54_120711.vcf
For HG19/GRC37
<reference> - Homo_sapiens_assembly19.fasta
<dbsnp.vcf> - dbsnp_132_b37.leftAligned.vcf
<cosmic.vcf> - hg19_cosmic_v54_120711.vcf
For Mouse MM9
<reference> - Mus_musculus_assembly9.fasta
<dbsnp.vcf> - dbsnp_128_mm9.vcf
<cosmic.vcf> - there is no cosmic VCF available for mouse, this entire parameter can be eliminated
Whereas these parameters are related to the sample/BAM:
<intervals_to_process> - either a literal list of "chrom:start-end" separated by semicolons (e.g. chr1:1500-2500; chr2:2500-3500) or a file of such entries with one entry per line
<normal.bam> - BAM file for the Normal (positional, this must be before the tumor BAM file)
<tumor.bam> - BAM file for the Tumor
<call_stats.out> - filename to write detailed caller output
<coverage.wig.txt> - filename for coverage output
How do I interpret the output?
The output of the caller is extremely verbose currently in order to aid with development. However, it's very simple to restrict down to a set of confident calls by searching for lines that don't contain the string REJECT
grep -v REJECT <my.call_stats.txt>
You may also notice that output has quite a few columns in it. Here are some of the more prominent ones along with their definitions:
-
contig - the contig location of this candidate
-
position - the 1-based position of this candidate on the given contig
-
ref_allele - the reference allele for this candidate
-
alt_allele - the mutant (alternate) allele for this candidate
-
tumor_name - name of the tumor as given on the command line, or extracted from the BAM
-
normal_name - name of the normal as given on the command line, or extracted from the BAM
-
score - for future development
-
dbsnp_site - is this a dbsnp site as defined by the dbsnp bitmask supplied to the caller
-
covered - was the site powered to detect a mutation (80% power for a 0.3 allelic fraction mutation)
-
power - tumor_power * normal_power
-
tumor_power - given the tumor sequencing depth, what is the power to detect a mutation at 0.3 allelic fraction
-
normal_power - given the normal sequencing depth, what power did we have to detect (and reject) this as a germline variant
-
total_pairs - total tumor and normal read depth which come from paired reads
-
improper_pairs - number of reads which have abnormal pairing (orientation and distance)
-
map_Q0_reads - total number of mapping quality zero reads in the tumor and normal at this locus
-
init_t_lod - deprecated
-
t_lod_fstar - CORE STATISTIC: Log of (likelihood tumor event is real / likelihood event is sequencing error )
-
tumor_f - allelic fraction of this candidated based on read counts
-
contaminant_fraction - estimate of contamination fraction used (supplied or defaulted)
-
contaminant_lod - log likelihood of ( event is contamination / event is sequencing error )
-
t_ref_count - count of reference alleles in tumor
-
t_alt_count - count of alternate alleles in tumor
-
t_ref_sum - sum of quality scores of reference alleles in tumor
-
t_alt_sum - sum of quality scores of alternate alleles in tumor
-
t_ins_count - count of insertion events at this locus in tumor
-
t_del_count - count of deletion events at this locus in tumor
-
normal_best_gt - most likely genotype in the normal
-
init_n_lod - log likelihood of ( normal being reference / normal being altered )
-
n_ref_count - count of reference alleles in normal
-
n_alt_count - count of alternate alleles in normal
-
n_ref_sum - sum of quality scores of reference alleles in normal
-
n_alt_sum - sum of quality scores of alternate alleles in normal
-
judgement - final judgement of site KEEP or REJECT (not enough evidence or artifact)
Example
Here is an example invocation of the caller on a BAM aligned to HG19
java -Xmx2g -jar muTect-1.0.27783.jar
--analysis_type MuTect
--reference_sequence Homo_sapiens_assembly19.fasta
--dbsnp dbsnp_132_b37.leftAligned.vcf
--cosmic hg19_cosmic_v54_120711.vcf
--intervals 17:7577100-7577200
--input_file:normal Normal.cleaned.bam
--input_file:tumor Tumor.cleaned.bam
--out example.call_stats.txt
--coverage_file example.coverage.wig.txt
Which produces a call stats containing a single confident mutation in the 100bp window (of TP53 in this case):
contig position ref_allele alt_allele tumor_name normal_name score dbsnp_site covered power tumor_power normal_power total_pairs improper_pairs map_Q0_reads init_t_lod t_lod_fs
tar tumor_f contaminant_fraction contaminant_lod t_ref_count t_alt_count t_ref_sum t_alt_sum t_ins_count t_del_count normal_best_gt init_n_lod n_ref_count n_alt_count n_ref_sum
n_alt_sum failure_reasons judgement
17 7577106 G A TUMOR NORMAL 0 DBSNP COVERED 0.988119 0.988122 0.999996 90 0 0 13.724831 17.74093 0.162162 0 2.152275
31 6 1207 220 0 0 GG 13.545461 45 0 1760 0 KEEP