Cromwell 26 has a new way to deal with I/O (Input/Output) operations. This new approach reduces network load and provides better control over the resources allocated throughout the system. This blogpost will describe some of the optimizations and how they improve Cromwell stability and reliability.
Cromwell 25 was released a few weeks ago but already Cromwell 26 is available! You can get the latest JAR file on GitHub.
Failure metadata will now be in a consistent JSON format, previously it varied depending on the originating Cromwell version. Failures will be an array of JSON objects, each representing a failure. See the Changelog for an example.
Cromwell will retry your workflow when faced with transient errors from the Pipelines API (formerly known as JES). For example, when authentication fails (like this user faced) or it cannot access files, Cromwell will try a few more times to get past these errors.
You can configure the number of I/O queries that Cromwell makes in the config file. This is mostly useful as a performance tuning option for the Pipelines API backend.
To promote reusing the same code, we packaged up our Docker hashes for the FireCloud team so they can fully enable Call Caching (aka Job Avoidance). Go team!
But it won't because there is new WDL syntax that supports
then. For example,
Boolean morning = ... String greeting = "good " + if morning then "morning" else "afternoon"
It will take some additional time to upgrade to Cromwell 26 from Cromwell 25 (or a previous version of Cromwell). We can give you a rough estimate of how long (if you're curious), see the note in the Changelog.
We upgraded the Lenthall and WDL4S repos, though we are still publishing 2.11 artifacts for each.
See the Changelog for more details.
TL;DR :In Cromwell 25, we added a backend named TES to Cromwell’s portfolio, promoting the GA4GH vision of interoperability between genomic analysis tools. We exercise the TES backend using Funnel, a neat piece of software coming out of Kyle Ellrott’s group at OHSU that allows us to dispatch jobs to a variety of platforms using the same API.
The Global Alliance For Genomics & Health (aka GA4GH) is an international coalition formed to enable the sharing of genomic and clinical data in order to help unlock potential advancements in medicine and science. For the most part the GA4GH provides APIs that are implemented by frameworks and tools throughout our field. With these standardized APIs, analysts and software developers are able to take advantage of a much broader and richer ecosystem of tools than they previously were able to. But more on this later. The take home message is that now the scientific community as a whole is able to spend more time working towards bettering humanity instead of just gluing tools together.
Within the GA4GH is the Containers & Workflows (CWF) working group, which focuses on providing APIs that define generic schemes for identifying tools, submitting workflows, and submitting jobs to compute platforms (disclosure: I happen to co-chair this group). At my day job as one of the developers of Cromwell I’m interested in how we can implement these APIs to promote interoperability of different computing platforms in the bioinformatics space.
Well, no, there isn't a Cromwell/WDL musical (yet) -- it's just that I gave a talk last night in New York, at a meetup organized by Phosphorus and hosted by FirstMark. We had a great crowd, and I have to give them mad props for listening to me go on about GATK workflows and pipelining strategies for well over the allotted hour. Especially considering I've been getting over a bout of laryngitis and my voice kept oscillating between high-pitched whine and raspy whisper... There will be a video posted on YouTube in the next few days and it would be awesome if they could get someone to do a voice dub! In the meantime, my slide deck is available here.
UPDATE: And here's the video on YouTube.
Happy 25th birthday, Cromwell! You're a quarter of a century old!
Using Docker Hashes, call caching will soon be available for FireCloud. Call caching is already available through Cromwell directly, and by default it is disabled.
For users who are concerned about running repeatable WDLs we recommend avoiding floating tags, such as
"ubuntu:latest". When you use floating tags or expressions those jobs will not be call cached when using Cromwell directly. The reasoning behind this decision is that if the value changes, such as the latest version of Ubuntu, then it could create different results than without call caching. FireCloud will still use call caching when there are floating tags in WDLs.
Note that currently Cromwell doesn't track changes to the output files (there's a feature request), so you could get a call cache hit to a file that has been modified, rather than the originally cached output. To avoid this, make sure to rename the file if you change it.
Every call in Cromwell is now labeled by default so you can query them, see the README for how to apply custom labels.
MySQL users can use batched inputs by adding
"rewriteBatchedStatements=true" to the JDBC URL.
Members of the Cromwell community added support for TES (Task Execution Schema), thanks! Stay tuned to the WDL blog for a post this week with more information about TES.
WDL-away your day with us on the WDL blog! These blog posts will be geared towards developers who are looking for the nitty-gritty-codey details of WDL and Cromwell, such as new backends, parameters, etc. We'll post about new WDL documents and Cromwell releases, as well as other technical topics.
And don't forget to follow us on Twitter!
Cromwell version 0.21 was released on 9/23/2016. You can download the release here. See below for the release notes.