The Variant Call Format, or VCF: an admirable effort to strike an appropriate balance between human readability and machine readability. Too bad it manages to fail in both aspects. For those of you who have ever run into a VCF file (and if you're following this blog, we're betting you have) you'll know that the tab-separated values don't always align perfectly. And that INFO field? It's just a jumble of annotations! You shouldn't need to play pin-the-annotation-on-the-value every time you open a VCF. Similarly, trying to parse a VCF to collect annotations for each variant is a real pain, especially from the FORMAT fields.

Well, good news: GATK has a nifty tool called VariantsToTable that can export any information you want from a VCF to a handy table format! With all the extra time you'll save on trying to read or parse your VCF file, you can learn better party games than pin-the-annotation, like Keep Talking.


To the left, we see your typical VCF read out. It's messy, and there are a lot of fields for each variant call. But just look at that table to the right! It's nicely aligned, and you can tell right away what the values are. Don't worry though, it's not that much work to get your messy VCFs to look this nice. The only thing you need to figure out is which fields do you want to look at. You can even look at all of them if you're not sure. Specify fields of interest with the -F (for INFO annotations and VCF column headers) or -GF (for genotype fields like PL and GQ) inputs in the command line. When you open your VCF, you can browse through to see which annotations and fields are present in your files.

In my case, I want to compare the QUAL, the GQ (genotype quality), and DP (read depth) for my file. To keep track of what variant I'm looking at, I've included variant-identifying data (CHROM and POS) and then specified the 3 annotations I want included in the generated table.

 java -jar GenomeAnalysisTK.jar \
     -R reference.fasta \
     -T VariantsToTable \
     -V file.vcf \
     -F CHROM -F POS -F QUAL -GF GQ  -F DP \
     -o results.table

You may also wish to add the --allowMissingData argument to your command, if some of your variant records are missing values for any of the fields you want to display. This is particularly useful when not all of your variants are marked with the same annotations across the board.

"But wait," you say, "there are still variants missing in my table! I counted!" Fear not, those are just variants that failed a filter at some point along the way. By default, the tool ignores them (as do many other GATK tools). To export all the variants in your VCF (yes, even the no-good filter-failing ones), simply add --showFiltered to your command line.

After generating your spiffy new table, you can open it in any number of programs. My program of choice is RStudio, where you can simply Import Dataset > From Text File > Check 'Yes' under Heading. However, you can also import .table files into Matlab (Import Data > Select File > Select "Table" data type > Import Selection) or Excel (File > Open File). Once you have your data opened, there are all sorts of analyses you can do, ranging from generating distribution plots to comparing different sets of variants. * Please note, Matlab and Excel will not recognize table files by default, but they can open them

Now go out and make some tables!


Return to top

Thu 25 Feb 2016
Comment on this article


- Recent posts


- Upcoming events

See Events calendar for full list and dates


- Recent events

See Events calendar for full list and dates



- Follow us on Twitter

GATK Dev Team

@gatk_dev

@Brunods1001 It’s been updated to use GATK4, which addresses the invalid bam output issue that affected the GATK3 v… https://t.co/AUlbjmHKmm
11 Jul 19
Wrapping up the #GATK workshop in Cambridge, UK -- it's been a blast. Great group of participants and fantastic hos… https://t.co/bvwGTU7lYq
11 Jul 19
Check out this blog post for a quick tour of how to use #GATK workshop tutorial workspaces in @TerraBioApp https://t.co/xNoxTTFejp
11 Jul 19
#GATK workshop resources are now all available in the cloud on @TerraBioApp -- Here's the somatic variant analysis… https://t.co/Tecbcr7joa
11 Jul 19
@brent_p @nilshomer Yes you should be able to get GATK to recalculate those annotations appropriately. Please post… https://t.co/C18Qn5dhS6
9 Jul 19

- Our favorite tweets from others

Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying… https://t.co/er2klIcw5i
18 Jul 19
@RareSeas first attempt at teaching the GATK course, do I look puzzled up there? https://t.co/4mqkHbWJy4
11 Jul 19
Can you spot CDGP PhD student, Dr. Alice Denyer, brushing up on the latest bioinformatics tools from @gatk_dev? The… https://t.co/KAbdlWLbcb
10 Jul 19
GATK workshop materials available online! Learn it in your own time with @ProjectJupyter notebooks. ^MT https://t.co/IKDa6SGwaU
8 Jul 19
Bioinformatics Community Conference 2020 will be July 18-25 in Toronto!🇨🇦The greatness of @OBF_BOSC and #usegalaxy… https://t.co/zp7CeGwoPh
4 Jul 19

See more of our favorite tweets...