Add Variables


In this context, variables are placeholders that we write into the script instead of actual filenames and parameter values. We can then specify the filenames and values we want to use at runtime (meaning when we run the script) without modifying the script at all, which is very convenient. We don't have to use variables for everything, mind you -- for some parameters it makes sense to hardcode the values if they're never going to change from run to run.

So let's look at how to include variables in our WDL script (we'll talk later about how to specify variable values at runtime). There are two different levels at which we might want to include variables: within an individual task, or at the level of the whole workflow, so that the variables will be available to any of the tasks that it calls. We'll start with task-level variables because it's the simplest case, then build on that to approach workflow-level variables, which come with a few (very reasonable and totally not hard) complications. In the course of this explanation we'll touch on how variable values can be inherited from one task to the next, which is a sort of preview for the next topic, on connecting tasks together.


Task-level variables


The command component is a required property of a task. The body of the command block specifies the literal command line to run (basically any command that you could otherwise run in a terminal shell) with placeholders (e.g. ${input_file}) for the variable parts of the command line that need to be filled in. Note that all variable placeholders MUST be defined in the task input definitions.

Usage example

command {
    java -jar myExecutable.jar \
        INPUT=${input_file} \
        OUTPUT=${output_basename}.txt
}

Further details in the WDL spec


The output component is a (mostly*) required property of a task. It is used to explicitly identify the output(s) of the task command for the purpose of flow control. The outputs identified here will be used to build the workflow graph, so it is important to include all outputs that are used as inputs to other tasks in the workflow.

  • Technically, output is not required for tasks that don't produce an output that is used anywhere else, like in the canonical "Hello World" example. But this is very rare, as most of the time when you are writing a workflow that actually does something useful, each task command will produce some sort of output. Because otherwise, why would you run it, right?

All types of variables accepted by WDL can be included here. The output definitions MUST include an explicit type declaration.

Usage example

output {
    File out = "${output_basename}.txt"
}

Further details in the WDL spec


Note that this is not technically a formal WDL component like task or workflow since there is no entity named variables in the WDL specification. We are covering variables as a group and lumping them in with the proper components for the purposes of simplicity in documentation. We sure hope that doesn't backfire and confuse everyone more instead.

As is common in most programming languages, WDL distinguishes 5 basic types of variables, also called primitive types:

  • String : A series of alphanumeric characters; typically used to store (short) text, filenames or, in genomics, information such as DNA sequences.
  • Float : A decimal number; for example 3.1459 (can be negative too).
  • Int : An integer number; for example 16 (can be negative too).
  • Boolean : A logical element that represents binary values; for example true or false.
  • File : An object that represents a file, which is a bit different from just the filename itself.

These primitive types can be grouped into more complex data structures, also called compound types:

  • Array : A list of elements that are stored, sorted and retrieved by their index position; for example [A,B,C,D] is an array of strings or Array[String] where we can pick the B element by taking the second element (index position 1 since WDL arrays are 0-indexed).
  • Map : A list of key-value pairs; for example {"color": "blue", "size": "large"} is a map of strings to strings or Map[String, String] where we can ask what is the color of our object.
  • Object : This one is a little weird and not commonly used, so see the spec for more information.

You can declare variables in the task itself or in the workflow, using the following syntax:

Type variableName

Sometimes, you may not want to use a variable every time you call the task. To make a variable optional (meaning you do not need to set a value either in your JSON inputs file or in your workflow call), simply add the ? modifier*, like so:

Type? variableName

Note that you may also see the ? modifier used next to the Type with a space, like this: Type ? variableName. Both formats are currently allowed (up to Cromwell v24), but the extra space may be disallowed in future versions.

When working with optional variables in your command, you can specify a default value. This tells the execution engine, "if I don't give my own value for this variable, use this default value instead." The syntax for that is:

${default="value" variableName}

At this time, it is not possible to use optional variables at the workflow level.

Further details in the WDL spec


Adding task-level variables


Going back to task_A from our earlier example, let's look at what it actually contains in its command and output component blocks. We made up an imaginary program called do_stuff which presumably does something more interesting than printing "Hello World". This program requires two files to be provided with the arguments R= and I=, respectively, and produces an output file that must be named using the argument O=. If we were to hardcode the values, we could just write the command line as we would run it in the terminal, e.g. do_stuff R=reference.fa I=input.bam O=variants.vcf.

To replace these hardcoded values by variables, we first have to declare the variables, which is a fancy way of saying we write their name and what type of value they stand for at the top of the task block, as in e.g. File ref. Then we can insert the variable name within the command, at the appropriate place, within curly braces and prefaced by a dollar sign, as in e.g. R=${ref}.

Here for the value of O= we use the variable to specify only a base name. The script will automagically concatenate this base name with the .ext file extension that we hardcoded, producing the complete output file name from ${id}.ext.

Finally, we identify any arguments of the command that we want to track as program outputs (in this case, the O= argument) and declare them by copying their assigned contents to the output block as shown in the example. Notice that here too we specify the variable type explicitly.


Workflow-level variables


The call component is used within the workflow body to specify that a particular task should be executed. In its simplest form, a call just needs a task name.

Optionally, we can add a code block to specify input variables for the task. We can also modify the call statement to call the task under an alias, which allows the same task to be run multiple times with different parameters within the same workflow. This makes it very easy to reuse code; how this works in practice is explained in detail in the Plumbing Options section of the Quick Start guide.

Note that the order in which call statements are executed does not depend on the order in which they appear if the script; instead it is determined based on a graph of dependencies between task calls. This means that the program infers what order task calls should run in by evaluating which of their inputs are outputs of other task calls. This is also explained in detail in the Plumbing Options section.

Usage examples

# in its simplest form 
call my_task

# with input variables
call my_task{
    input: task_var1= workflow_var1, task_var2= workflow_var2, ...
}

# with an alias and input variables
call my_task as task_alias {
    input: task_var1= workflow_var1, task_var2= workflow_var2, ...
}

Further details in the WDL spec


The task component is a top-level component of WDL scripts. It contains all the information necessary to "do something" centering around a command accompanied by definitions of input files and parameters, as well as the explicit identification of its output(s) in the output component. It can also be given additional (optional) properties using the runtime, meta and parameter_meta components.

Tasks are "called" from within the workflow command, which is what causes them to be executed when we run the script. The same task can be run multiple times with different parameters within the same workflow, which makes it very easy to reuse code. How this works in practice is explained in detail in the Plumbing Options section.

Usage example

task my_task {
    [ input definitions ]
    command { ... }
    output { ... }
}

Further details in the WDL spec


The workflow component is a required top-level component of a WDL script. It contains call statements that invoke task components, as well as workflow-level input definitions.

There are various options for chaining tasks together through call and other statements; these are all detailed in the Plumbing Options documentation.

Usage example

workflow myWorkflowName {
    call my_task
}

Further details in the WDL spec


Note that this is not technically a formal WDL component like task or workflow since there is no entity named variables in the WDL specification. We are covering variables as a group and lumping them in with the proper components for the purposes of simplicity in documentation. We sure hope that doesn't backfire and confuse everyone more instead.

As is common in most programming languages, WDL distinguishes 5 basic types of variables, also called primitive types:

  • String : A series of alphanumeric characters; typically used to store (short) text, filenames or, in genomics, information such as DNA sequences.
  • Float : A decimal number; for example 3.1459 (can be negative too).
  • Int : An integer number; for example 16 (can be negative too).
  • Boolean : A logical element that represents binary values; for example true or false.
  • File : An object that represents a file, which is a bit different from just the filename itself.

These primitive types can be grouped into more complex data structures, also called compound types:

  • Array : A list of elements that are stored, sorted and retrieved by their index position; for example [A,B,C,D] is an array of strings or Array[String] where we can pick the B element by taking the second element (index position 1 since WDL arrays are 0-indexed).
  • Map : A list of key-value pairs; for example {"color": "blue", "size": "large"} is a map of strings to strings or Map[String, String] where we can ask what is the color of our object.
  • Object : This one is a little weird and not commonly used, so see the spec for more information.

You can declare variables in the task itself or in the workflow, using the following syntax:

Type variableName

Sometimes, you may not want to use a variable every time you call the task. To make a variable optional (meaning you do not need to set a value either in your JSON inputs file or in your workflow call), simply add the ? modifier*, like so:

Type? variableName

Note that you may also see the ? modifier used next to the Type with a space, like this: Type ? variableName. Both formats are currently allowed (up to Cromwell v24), but the extra space may be disallowed in future versions.

When working with optional variables in your command, you can specify a default value. This tells the execution engine, "if I don't give my own value for this variable, use this default value instead." The syntax for that is:

${default="value" variableName}

At this time, it is not possible to use optional variables at the workflow level.

Further details in the WDL spec


Adding workflow-level variables


Moving one level out, to the body of the workflow, you see that we have now also declared a set of variables at the top. These declarations follow essentially the same rules as those inside a task. All we need to do now is connect these two levels, so that arguments passed to the workflow can be used as inputs to the task.

To do so, we simply add a code block to the call function. This block simply contains an input: line that enumerates which workflow-level variables connect to which task-level variables.

We do something very similar for the second task that we're calling in this workflow, task_B, with a key difference. First, the common part: task_B also takes in the reference file as input, so we can simply feed the same workflow-level variable, my_ref. to the corresponding variable in task_B. However, for its other input we need to give task_B the output of task_A. Conveniently, we can simply refer to that using a task_name.output_variable syntax -- so in this case, task_A.out.

Finally, we still need to know how to pass values to the workflow to populate all of those variables, don't we? Yes. Yes we do.

You could hardcode the values when you declare the variables, of course, and for some parameters that you know will always be the same, it makes sense to do it that way. But there are certainly some variables that you need to keep, well, variable, so you don't have to edit your script from run to run.

How that is done in practice is mostly up to the execution engine, not WDL. Pipelining systems typically use one of two main strategies to fulfill this need: either provide the values as part of the command line that launches the workflow execution, or provide a sort of configuration file that lists all the desired values. In the Cromwell execution engine we prefer to use a file of input definitions, as detailed in a later section.


Well that wasn't so bad. In the next section, we'll look at how we can connect tasks through their inputs and outputs, plus some additional functions, to produce full-featured without unnecessary code complexity.


Return to top
Getting Started Add Plumbing