This section lays out all you need to know to start writing and running WDL workflows that do useful things -- TODAY.
Don't worry, this is going to be easier than you might think.
First, we'll introduce the building blocks of WDL and how they fit together to form the base structure of a WDL script. Then we'll show how to add variables so that input files and parameters can be specified outside of the script itself, which will allow you to use the same script for different runs without modification. Next, we'll cover how to add plumbing, i.e. how you can chain together the components that perform units of work in different ways to form sophisticated pipelines. Finally we'll look at how to validate syntax, which sounds boring but is really helpful since it will tell you quickly whether your WDL script is runnable or not. Because nobody likes starting a run only to see it fail because it's missing a semi-colon somewhere.
This is going to be short and sweet. We'll show you how to generate a JSON template for specifying inputs (spoiler: it's super easy) and fill it out. Then we'll present the main options for executing your WDL script, focusing on the execution engine we use in our own work, which is called Cromwell. If you had asked us two years ago if we'd ever have an execution engine that we could use both in development and in production, locally and on the cloud, we would have said when pigs fly...
Once you have grokked the core concepts involved in writing and running WDL workflows, the next step will be to actually do it yourself! In the Tutorials section, we provide a series of fully worked-out "build-a-WDL" examples in which we walk you through each step of composing workflows that demo all the key features of WDL. For best effects, work your way through the tutorials in the order suggested on the Tutorials page.
For a wider variety of use cases, you can also browse the Real Workflows which are scripts that address real analysis problems and that we in our own work. This includes the Broad Genomics production pipelines for variant discovery in exomes and whole genomes.
Below is a list of everything you need in order to run workflows written in WDL (using the Cromwell execution engine, because that's what we use), with installation instructions where necessary. Because we use GATK in most of the tutorials and example WDL scripts on this website, we include a link to GATK installation instructions as well, but this is optional if you don’t plan to run the GATK WDLs.
The wdltool toolkit is a utility package that provides accessory functionality for writing and running WDL scripts, including syntax validation and input template generation. You can download the latest release of the pre-compiled executable here.
You will need a text editor of some sort to write your WDL scripts. It is important to note that there is a difference between a word processor (like Microsoft Word) and a text editor (like Notepad); please use the latter option. If you have no preferred text editor, we would recommend installing SublimeText, as we find that it displays code visually better than other text editors we've tried. As an added convenience when developing WDL scripted workflows, syntax highlighting has been developed for SublimeText, TextMate, vim, and IntelliJ. You can follow the links for installation instructions for your editor of choice.
Cromwell is an execution engine capable of running scripts written in WDL, describing data processing and analysis workflows involving command line tools (such as pipelines implementing the GATK Best Practices for Variant Discovery). If you are familiar with GATK, you may have heard of or even used an execution engine called Queue that was designed to run GATK workflows written as Qscripts. Together, Cromwell and WDL constitute a user-friendly alternative to Queue and Qscripts.
The installation of Cromwell itself is quite simple. The latest release can be downloaded here in the form of a pre-compiled jar. For ease of use, you can also add an environment variable to your terminal profile pointing at the Cromwell jar file.
Cromwell requires Java version 8, which you can find here.
Cromwell is capable of utilizing Docker images to assist in specifying environments when running workflows. If you’ve never worked with Docker before, this page may answer many of your questions. Docker is optional if you are simply working on your local machine (i.e. your computer rather than a remote server). If you are using a remote server, more often than not Docker is required. In our tutorials, we always tell you which optional installations will be required.
To use Docker, please install it according to your operating system, following the instructions given on the installation page.
Our tutorials feature tools from the GATK (GenomeAnalysisToolkit) and Picard to demonstrate how to write WDL scripts that perform real data processing and analysis tasks; in order to follow them you’ll need to install GATK, Picard, and its own dependencies. To that effect, you can find a complete walkthrough for installing these on the GATK website. The linked document provides instructions for installing several additional software packages that are useful for GATK-specific tutorials, but the only one that you really need to install for running WDL tutorials, beside GATK and Picard, is Java 1.7*. Installing the R library
gsalib (available on CRAN) is optional but highly recommended. When following along with a tutorial on this website, we will always tell you which optional installations will be required. Note that GATK and Cromwell currently require different versions of Java, so see this article for help dealing with that temporary problem.
*Note: As of version 3.6, GATK runs with Java version 1.8. You will not need Java 1.7 if you use GATK 3.6.