July 2018 Release Notes

Posted by KateN on 10 Jul 2018 (0)


July 31, 2018

  • FireCloud has upgraded to Cromwell 34. See release notes for versions 33 and 34 for new features available to you here.
    • WDL 1.0 support, requester pays, and PAPI v2 are not yet available in FireCloud. Work is underway to implement these specific components.
  • Notebooks clusters should no longer auto-pause when a Jupyter notebook or a terminal is open in a user's browser.
  • Leo now contains an optional user script that can be used to install GATK on clusters.
  • The notebooks docker image now includes the tidyverse R package by default.
  • Leo now contains an optional user script to allow downloading notebooks as PDF files. Note this significantly increases the size of the notebooks image which leads to slower cluster creation time, so use with caution. A follow-on change will be made to disallow downloading as PDF if this script is not used.
  • FireCloud no longer tracks user behaviors in Google Analytics.
  • Minor performance improvement via reducing unnecessary ajax requests when switching tabs inside a workspace.
  • Performance improvement via caching of WDL validation. Users may notice improved performance when viewing a method configuration inside a workspace.

Read the whole post
See comments (0)



Cromwell version 32 was released last Thursday evening, June 7th, and if you saw the release notes, you may have wondered what we meant by “File read limits." For those who didn’t see the notes, we explained that this feature is improving Cromwell and thus FireCloud stability, but didn’t get into too much detail about it. In this post, I’ll explain how this will help stability by framing the problem, solution, and potential impacts.

Problem FireCloud will slow down or completely stop working if a user plugs in a large file into the WDL read_lines() call. The read_lines() call is frequently used to ingest a list of filenames, genomic intervals and things like that for scattering purposes. Users have accidentally set this to read in a bam file causing Cromwell’s memory to load up and thus slowing down the engine. This vulnerability means that one user can take down the service.

Solution File read limits is a Cromwell configuration option that limits what can be read in through a WDL read_lines() call. By establishing a lower limit, the system is safeguarded from being taken down by a single user. FireCloud’s Cromwell configuration now uses a file read limit of 1 MB. Limits are also used for other read_X functions and can be found here.

Potential Impacts If you see an error message like this, you've been impacted: "File is larger than 1000000 Bytes…” We've listed a few workarounds in our Solutions to Problems section.

Help If you find yourself blocked by these changes, please don't hesitate to reach out for help. We want to ensure that the benefit of making the system more stable outweighs any individual disruptions.

See comments (0)


Release Notes: June 2018

Posted by KateN on 7 Jun 2018 (0)


June 26, 2018

  • At the ease of a click, FireCloud can now populate the outputs in your method config with reasonable default attribute names, so that you don't have to. The button is right next to the Output's section in your config in your workspace.
  • You can now choose "Copy Link Address" for the metadata files ("Download 'x' metadata") in the Data tab. This will increase the speed of downloading these files when using the command line.

June 21, 2018

  • UX improvements related to call caching, submission and workflow monitoring:
    • Call caching status is now displayed at the submission level and has been removed from the workflow level.
    • Call caching status now accurately reflects the value supplied by the user at submission time. Previously, the call caching value could falsely show as disabled for certain workflows.
    • Hovering over a submission's status column in the Monitor tab now shows the counts of that submission's workflows, grouped by workflow status.
    • When viewing an individual workflow, that workflow's status now shows as both icon and text. Previously it only had an icon.
    • When viewing an individual workflow, that workflow's unexpanded calls now now show their status.
    • When a workflow or a call does not have stdout or stderr logs, the stdout/stderr fields are now hidden. Previously the fields were displayed with a blank value, taking up screen real estate.
  • Updated the swagger-ui response models for the Monitor submission status and Retrieve workflow cost
  • Fixed intermittent errors after restarting a cluster and opening a notebook.

Read the whole post
See comments (0)



By Eric Weitz, software engineer, Data Sciences Platform at the Broad Institute

Have you heard of the FAIR principles? They are a set of guidelines proposed as part of a growing movement to make data more Findable, Accessible, Interoperable, and Reusable. As this movement gains traction, we are seeing more FAIR-related activities at major meetings and conferences. For example the recent Bio-IT World meeting in Boston included a conference track dedicated to FAIR, as well as a hackathon.

I was part of a team of four people from the Broad Institute's Data Sciences Platform that participated in the Bio-IT hackathon. Our goal: make data more FAIR in Single Cell Portal, which is built on top of FireCloud. In addition to improving the Single Cell Portal’s scientific data management, the hackathon also gave our team a chance to work with developers from other organizations in a manner that was uniquely nimble.


Read the whole post
See comments (1)



We are excited to introduce a new Featured workspace that demonstrates the GenoMetric Query Language (GMQL) created by a team from Politecnico di Milano in Italy. For some context on Featured workspaces, please read our previous blog post.

GMQL is a high-level, declarative language supporting queries over thousands of heterogeneous datasets and samples; as such, it enables genomic “big data” analysis. Based on Hadoop framework and the Apache Spark platform, GMQL is designed to be highly scalable, flexible, and simple to use. You can try the system here through its several interfaces, with documentation and biological query examples on ENCODE, TCGA and other public datasets or clone the Featured workspace and launch an example analysis.

The GMQL 101 workspace features three methods, each with increasing levels of complexity to give you a taste of how the query language works. One method shows how to join two datasets, and then extracts a third dataset based on a specific condition: pairs of regions that are less than 1000 bases a part. The second method takes a VCF and performs an epigenomic analysis using gene annotation and Chip-Seq results. It shows how you can select high confidence regions, use RefSeq annotations to find regions that overlap a gene, and count the mutations falling within the high confidence regions. Finally, the third method is a combination of GATK4’s Mutect 2 pipeline and the second method, showing an epigenomic analysis from start (calling somatic variants) to finish (annotating variants). For any GMQL-specific questions or problems you can visit the GMQL GitHub page.

Many thanks to Luca Nanni, Arif Canakoglu, Pietro Pinoli, and Stefano Ceri for putting together this workspace. It takes a lot of thought and effort to create a valuable learning resource like this, and we are still figuring out the most successful way to do this. Please share your thoughts in the Comments section below on the effectiveness of this workspace and any other Featured workspaces you try out. If you are interested in featuring examples of your methods in this way, please tell us here, and we can talk to you about the process.

See comments (0)



More and more method developers are using the Method Repository to make their pipelines publicly accessible to the FireCloud community. By making the methods public other researchers can use them instead of building their own, similar methods. However, just providing the method on it’s own, without a configuration, or documentation limits reusability. This post is about how Featured workspaces solved this problem for GATK4 methods and how an outside group will contribute the first third-party Featured workspace, demonstrating that any developer can do this.

Featured workspaces hold the latest version of a method, configured to work out of the box on an accompanying example dataset. This means you can launch the method without doing any setup, e.g., finding data or configuring pipelines. You can see the required inputs and configuration settings clearly, and once launched, check out all the outputs that it produces. When you are ready to launch it on your own data, all you need to do is replace the example dataset with your own, following the guidance in the docs. This takes the guesswork out of configuring a method on your own dataset. All together, these workspaces should make it easy to reuse methods.

We originally developed this “packaging” for methods with a group of GATK method developers we work closely with, to help people learn and test the most up-to-date GATK4 pipelines. These Featured workspaces went live once many of the GATK4 tools left beta status around January 2018 (GATK4 launch). People are interested in other pipelines besides GATK, and tomorrow we will announce a new Featured workspace put together by a team from Politecnico di Milano showcasing a different tool. Stay tuned!

Interested in putting together a workspace like this and having it featured? Let us know in this sign-up survey. We can walk you through the process we just went through with our friends from Politecnico di Milano.

See comments (0)



We are planning some system upgrades to address some of the recent stability issues that have affected reliability of service in FireCloud. The upgrade process will cause a temporary interruption of service; we estimate the interruption may last up to 30 minutes. We do not yet have a specific time to announce; we expect it will be in the afternoon or evening (EST) of Sunday, May 27. We will post an update here when we are able to narrow down the window of time more precisely. Thank you for your patience while we work to improve the quality of service in FireCloud.

See comments (1)



UPDATE: This issue described below has been resolved.

Due to an individual user's submission that amounts to a very large number of jobs (~60k), all new workflow submissions are currently being held in the queue (with status QueuedInCromwell). To be clear, as far as we can tell this is NOT a FireCloud malfunction; it seems to be a Google Cloud limitation that we are encountering for the first time. We are working with GCP support and evaluating options to unblock the queue, hopefully without interrupting that one very ambitious and totally legitimate submission. We will strive to resume normal workflow throughput by Monday morning EST.

We understand that this is causing many of you considerable inconvenience, yet we are hopeful that this case will provide an opportunity to push back the current limitations to the next level. Please remember that what we are all doing here, together, is blazing a new trail; building a new model for how we do science at scale, collaboratively. The fact that these scaling problems are arising at all demonstrates that we are on the right path, that the research community needs this level of scalability. And we will do everything in our power to deliver it.

Thank you for your patience and stay tuned for updates.

See comments (4)


FireCloud DataShuttle Release

Posted by KateN on 17 May 2018 (2)


Alpha version 0.1.1

Release Overview

Broad Institute’s Genomics Platform & Data Science Platform announce the general availability of the FireCloud DataShuttle 0.1.1. The FireCloud DataShuttle allows users to easily browse files, download and upload data directly between FireCloud workspaces & Google buckets and your local drives, and monitor the status of these transfers.

The FireCloud DataShuttle was developed to facilitate the work of researchers and project managers who transfer a high volume of files and desire a more efficient and clearer process.


Read the whole post
See comments (2)


Release Notes: May 2018

Posted by KateN on 1 May 2018 (0)


May 31, 2018

  • You can now include spaces in workspace names.
  • Actual cloud costs, when available, are now displayed in the details page for individual workflows. Note: These costs are currently only available for Broad-based billing accounts.
  • Fixed an issue where the links to open a GCS bucket were incorrect for certain subworkflows.

Read the whole post
See comments (0)




- Recent posts



- Follow us on Twitter

FireCloud

@BroadFireCloud

The analysis described in this paper is available in reproducible form in FireCloud; see https://t.co/uSChRZIoZg fo… https://t.co/j1zeh2TGRg
30 Nov 18
@xDBio_Inc @geoffjentry @WDL_dev @gatk_dev It’s pretty new, glad you like it! Think we should add the name itself a… https://t.co/65jwJEbyQu
23 Oct 18
We’re excited to deliver our #ASHG18 Invited Workshop on reproducible research tomorrow morning! Looking forward to… https://t.co/6vmB5qaA1H
17 Oct 18
RT @NCI_NCIP: Does @BroadFireCloud sound familiar? It should! @AllofUsResearch uses the same researcher workbench as this @NIH initiative.…
16 Oct 18
RT @broadinstitute: .@BroadGenomics put together a comprehensive list of @broadinstitute activities at #ASHG18. Find out about sessions, po…
16 Oct 18

- Our favorite tweets from others

The question is, how will @Microsoft and @Docker team up to solve collaborative challenges in the area of Bioinform… https://t.co/IKzElembVl
4 Dec 18
@geoffjentry Who doesn't love a Warp Pig? @WDL_dev and @gatk_dev are on the ball getting stickers out. Was happy to… https://t.co/91OODRpFOC
22 Oct 18
Does @BroadFireCloud sound familiar? It should! @AllofUsResearch uses the same researcher workbench as this @NIH in… https://t.co/8ZyoMBSG4x
12 Oct 18
Today at #GATK course, pipelining with WDL, Cromwell and Firecloud! @ClinicalBioinfo @FProgresoysalud @gatk_dev https://t.co/V4bLinpoPh
20 Sep 18
@dgmacarthur If anybody wants to sequence my genome to find the rare variant that is preventing me from going into… https://t.co/xGPGDZn9rQ
11 Jul 18

See more of our favorite tweets...