Release Notes: August 2018

Posted by KateN on 9 Aug 2018 (0)

August 21, 2018

  • Improved performance and stability in uncommon cases when reading from the entity data model, if the data model has extremely wide or long attributes.
  • Improved the UX for the NPS survey for new users. Users who have already responded to the survey will not see the survey again.
  • Squashed a rarely-occurring bug that caused errors when viewing individual workflows, if those workflows had no immediate calls/tasks
  • Added the ability to set a google client ID at cluster creation time. If provided, it allows notebook auth refresh to kick off without cross-tab communication to the notebook.
  • Added a v2 version of the createCluster endpoint, documented here:!/cluster/createClusterV2. The new endpoint is faster than the previous version, and not subject to occasional race conditions when creating clusters of the same name in quick succession.

Read the whole post
See comments (0)

By Moran Cabili, product manager, Data Sciences Platform at the Broad Institute

We heard from many of you --both new FireCloud users and experienced WDL pipeline developers-- that you need to be able to quickly test that a WDL workflow can be run successfully on FireCloud. Until now, you were required to reference an entity in the workspace data model, which took extra effort and tended to confuse newcomers. We are happy to announce that this speed bump has been eliminated; you can now bypass the data model and even upload a JSON of inputs to get your WDL up and running in record time.

Read the whole post
See comments (0)

Today we bring to you a new facet of our forum: the feature request section!

Within this new section, you will be able to suggest new features, upvote existing ones, and see the status of features we implement. We can better gauge the number of people interested in a particular feature by the vote count, which helps to determine which features we work on first. So, if you represent a group of people who all care about a certain feature, ask everyone to vote on your feature request!

Features can be requested in this category of the forum. Simply click the blue New Idea button to start your thread. Be as clear as possible in your title so other users will be able to see what your feature is about and will be more likely to vote on it. If you see a feature you like, please upvote it so we know you'd like to see it implemented.

We've pre-populated this section with all the feature requests that have been posted in the last two months. You can search for your own feature requests by looking at discussions you've started. If we missed yours, or if there's a feature from further back than May, you can ask us to move it by tagging @KateN in the thread. Or, simply create a new request.

Take a peek at what's been posted and vote for what you want to see!

See comments (1)

July 2018 Release Notes

Posted by KateN on 10 Jul 2018 (0)

July 31, 2018

  • FireCloud has upgraded to Cromwell 34. See release notes for versions 33 and 34 for new features available to you here.
    • WDL 1.0 support, requester pays, and PAPI v2 are not yet available in FireCloud. Work is underway to implement these specific components.
  • Notebooks clusters should no longer auto-pause when a Jupyter notebook or a terminal is open in a user's browser.
  • Leo now contains an optional user script that can be used to install GATK on clusters.
  • The notebooks docker image now includes the tidyverse R package by default.
  • Leo now contains an optional user script to allow downloading notebooks as PDF files. Note this significantly increases the size of the notebooks image which leads to slower cluster creation time, so use with caution. A follow-on change will be made to disallow downloading as PDF if this script is not used.
  • FireCloud no longer tracks user behaviors in Google Analytics.
  • Minor performance improvement via reducing unnecessary ajax requests when switching tabs inside a workspace.
  • Performance improvement via caching of WDL validation. Users may notice improved performance when viewing a method configuration inside a workspace.

Read the whole post
See comments (0)

Cromwell version 32 was released last Thursday evening, June 7th, and if you saw the release notes, you may have wondered what we meant by “File read limits." For those who didn’t see the notes, we explained that this feature is improving Cromwell and thus FireCloud stability, but didn’t get into too much detail about it. In this post, I’ll explain how this will help stability by framing the problem, solution, and potential impacts.

Problem FireCloud will slow down or completely stop working if a user plugs in a large file into the WDL read_lines() call. The read_lines() call is frequently used to ingest a list of filenames, genomic intervals and things like that for scattering purposes. Users have accidentally set this to read in a bam file causing Cromwell’s memory to load up and thus slowing down the engine. This vulnerability means that one user can take down the service.

Solution File read limits is a Cromwell configuration option that limits what can be read in through a WDL read_lines() call. By establishing a lower limit, the system is safeguarded from being taken down by a single user. FireCloud’s Cromwell configuration now uses a file read limit of 1 MB. Limits are also used for other read_X functions and can be found here.

Potential Impacts If you see an error message like this, you've been impacted: "File is larger than 1000000 Bytes…” We've listed a few workarounds in our Solutions to Problems section.

Help If you find yourself blocked by these changes, please don't hesitate to reach out for help. We want to ensure that the benefit of making the system more stable outweighs any individual disruptions.

See comments (0)

Release Notes: June 2018

Posted by KateN on 7 Jun 2018 (0)

June 26, 2018

  • At the ease of a click, FireCloud can now populate the outputs in your method config with reasonable default attribute names, so that you don't have to. The button is right next to the Output's section in your config in your workspace.
  • You can now choose "Copy Link Address" for the metadata files ("Download 'x' metadata") in the Data tab. This will increase the speed of downloading these files when using the command line.

June 21, 2018

  • UX improvements related to call caching, submission and workflow monitoring:
    • Call caching status is now displayed at the submission level and has been removed from the workflow level.
    • Call caching status now accurately reflects the value supplied by the user at submission time. Previously, the call caching value could falsely show as disabled for certain workflows.
    • Hovering over a submission's status column in the Monitor tab now shows the counts of that submission's workflows, grouped by workflow status.
    • When viewing an individual workflow, that workflow's status now shows as both icon and text. Previously it only had an icon.
    • When viewing an individual workflow, that workflow's unexpanded calls now now show their status.
    • When a workflow or a call does not have stdout or stderr logs, the stdout/stderr fields are now hidden. Previously the fields were displayed with a blank value, taking up screen real estate.
  • Updated the swagger-ui response models for the Monitor submission status and Retrieve workflow cost
  • Fixed intermittent errors after restarting a cluster and opening a notebook.

Read the whole post
See comments (0)

By Eric Weitz, software engineer, Data Sciences Platform at the Broad Institute

Have you heard of the FAIR principles? They are a set of guidelines proposed as part of a growing movement to make data more Findable, Accessible, Interoperable, and Reusable. As this movement gains traction, we are seeing more FAIR-related activities at major meetings and conferences. For example the recent Bio-IT World meeting in Boston included a conference track dedicated to FAIR, as well as a hackathon.

I was part of a team of four people from the Broad Institute's Data Sciences Platform that participated in the Bio-IT hackathon. Our goal: make data more FAIR in Single Cell Portal, which is built on top of FireCloud. In addition to improving the Single Cell Portal’s scientific data management, the hackathon also gave our team a chance to work with developers from other organizations in a manner that was uniquely nimble.

Read the whole post
See comments (1)

We are excited to introduce a new Featured workspace that demonstrates the GenoMetric Query Language (GMQL) created by a team from Politecnico di Milano in Italy. For some context on Featured workspaces, please read our previous blog post.

GMQL is a high-level, declarative language supporting queries over thousands of heterogeneous datasets and samples; as such, it enables genomic “big data” analysis. Based on Hadoop framework and the Apache Spark platform, GMQL is designed to be highly scalable, flexible, and simple to use. You can try the system here through its several interfaces, with documentation and biological query examples on ENCODE, TCGA and other public datasets or clone the Featured workspace and launch an example analysis.

The GMQL 101 workspace features three methods, each with increasing levels of complexity to give you a taste of how the query language works. One method shows how to join two datasets, and then extracts a third dataset based on a specific condition: pairs of regions that are less than 1000 bases a part. The second method takes a VCF and performs an epigenomic analysis using gene annotation and Chip-Seq results. It shows how you can select high confidence regions, use RefSeq annotations to find regions that overlap a gene, and count the mutations falling within the high confidence regions. Finally, the third method is a combination of GATK4’s Mutect 2 pipeline and the second method, showing an epigenomic analysis from start (calling somatic variants) to finish (annotating variants). For any GMQL-specific questions or problems you can visit the GMQL GitHub page.

Many thanks to Luca Nanni, Arif Canakoglu, Pietro Pinoli, and Stefano Ceri for putting together this workspace. It takes a lot of thought and effort to create a valuable learning resource like this, and we are still figuring out the most successful way to do this. Please share your thoughts in the Comments section below on the effectiveness of this workspace and any other Featured workspaces you try out. If you are interested in featuring examples of your methods in this way, please tell us here, and we can talk to you about the process.

See comments (0)

More and more method developers are using the Method Repository to make their pipelines publicly accessible to the FireCloud community. By making the methods public other researchers can use them instead of building their own, similar methods. However, just providing the method on it’s own, without a configuration, or documentation limits reusability. This post is about how Featured workspaces solved this problem for GATK4 methods and how an outside group will contribute the first third-party Featured workspace, demonstrating that any developer can do this.

Featured workspaces hold the latest version of a method, configured to work out of the box on an accompanying example dataset. This means you can launch the method without doing any setup, e.g., finding data or configuring pipelines. You can see the required inputs and configuration settings clearly, and once launched, check out all the outputs that it produces. When you are ready to launch it on your own data, all you need to do is replace the example dataset with your own, following the guidance in the docs. This takes the guesswork out of configuring a method on your own dataset. All together, these workspaces should make it easy to reuse methods.

We originally developed this “packaging” for methods with a group of GATK method developers we work closely with, to help people learn and test the most up-to-date GATK4 pipelines. These Featured workspaces went live once many of the GATK4 tools left beta status around January 2018 (GATK4 launch). People are interested in other pipelines besides GATK, and tomorrow we will announce a new Featured workspace put together by a team from Politecnico di Milano showcasing a different tool. Stay tuned!

Interested in putting together a workspace like this and having it featured? Let us know in this sign-up survey. We can walk you through the process we just went through with our friends from Politecnico di Milano.

See comments (0)

We are planning some system upgrades to address some of the recent stability issues that have affected reliability of service in FireCloud. The upgrade process will cause a temporary interruption of service; we estimate the interruption may last up to 30 minutes. We do not yet have a specific time to announce; we expect it will be in the afternoon or evening (EST) of Sunday, May 27. We will post an update here when we are able to narrow down the window of time more precisely. Thank you for your patience while we work to improve the quality of service in FireCloud.

See comments (1)

- Recent posts

- Follow us on Twitter



FireCloud is 3 yrs in the making! Read what's next:
7 Feb 19
RT @micknudsen: On my way home after a fabulous @gatk_dev workshop in Copenhagen. Looking forward to get started implementing #GATK4 in our…
1 Feb 19
Let us know what data format conversions you often need to make and we’ll add them to the preloaded utilities.
23 Jan 19
The analysis described in this paper is available in reproducible form in FireCloud; see fo…
30 Nov 18
@xDBio_Inc @geoffjentry @WDL_dev @gatk_dev It’s pretty new, glad you like it! Think we should add the name itself a…
23 Oct 18

- Our favorite tweets from others

converting bam files in the @BroadFireCloud #bioinfomatics @GCPcloud
18 Jan 19
Florence Nightingale made this plot... by hand... a mere ~50 years after the word statistics was first used, ever..…
22 Dec 18
Live streaming tomorrow (12/20, 8am ET): Robert Majovski talks “Reproducible analysis with FireCloud” in an…
20 Dec 18
The question is, how will @Microsoft and @Docker team up to solve collaborative challenges in the area of Bioinform…
4 Dec 18
@geoffjentry Who doesn't love a Warp Pig? @WDL_dev and @gatk_dev are on the ball getting stickers out. Was happy to…
22 Oct 18

See more of our favorite tweets...