Latest posts

Notebooks get a facelift

Posted by rtitle on 12 Sep 2018 (0)

By Robert Title, Engineering Manager, Data Sciences Platform at the Broad Institute

We are excited to announce that we have substantially improved the way you interact with Jupyter Notebooks in FireCloud. We hope this will increase your productivity and empower you to collaborate more effectively. These changes are publicly available as of today, Sep 12, 2018, and can be accessed in the Notebooks tab of your FireCloud workspace. Read on for a more detailed description of what is changing, and why it is better.

Read the whole post
See comments (0)

Release Notes: September 2018

Posted by KateN on 12 Sep 2018 (0)

September 25, 2018

  • Removed an infinite spinner triggered by attempting to preview a DOS object, if that DOS object resolved to a text file or log file.
  • Fixed an error caused by attempting to populate a method configuration with a json file larger than 4kb.
  • Attempting to download a file larger than 2GB with your browser through the FireCloud UI would fail. These downloads now work, though we recommend using gsutil for large file download instead of your browser.
  • Cluster creation errors no longer cause clusters to appear to get stuck in Creating status in the UI
  • The cluster creation dialog now includes static text informing users of the expected time to create a cluster
  • Users should no longer see 404 errors when opening a notebook from FireCloud
  • FireCloud now displays appropriate error messages when a user does not have permission to modify notebooks
  • Enable auto pause of clusters after 30 minutes of idle time
  • Leonardo now better handles situations when clusters or projects are deleted in Google
  • Leonardo is now more resilient to timeouts from GCR when pulling the notebook image

Read the whole post
See comments (0)

Release Notes: August 2018

Posted by KateN on 9 Aug 2018 (0)

August 21, 2018

  • Improved performance and stability in uncommon cases when reading from the entity data model, if the data model has extremely wide or long attributes.
  • Improved the UX for the NPS survey for new users. Users who have already responded to the survey will not see the survey again.
  • Squashed a rarely-occurring bug that caused errors when viewing individual workflows, if those workflows had no immediate calls/tasks
  • Added the ability to set a google client ID at cluster creation time. If provided, it allows notebook auth refresh to kick off without cross-tab communication to the notebook.
  • Added a v2 version of the createCluster endpoint, documented here:!/cluster/createClusterV2. The new endpoint is faster than the previous version, and not subject to occasional race conditions when creating clusters of the same name in quick succession.

Read the whole post
See comments (0)

By Moran Cabili, product manager, Data Sciences Platform at the Broad Institute

We heard from many of you --both new FireCloud users and experienced WDL pipeline developers-- that you need to be able to quickly test that a WDL workflow can be run successfully on FireCloud. Until now, you were required to reference an entity in the workspace data model, which took extra effort and tended to confuse newcomers. We are happy to announce that this speed bump has been eliminated; you can now bypass the data model and even upload a JSON of inputs to get your WDL up and running in record time.

Read the whole post
See comments (0)

Today we bring to you a new facet of our forum: the feature request section!

Within this new section, you will be able to suggest new features, upvote existing ones, and see the status of features we implement. We can better gauge the number of people interested in a particular feature by the vote count, which helps to determine which features we work on first. So, if you represent a group of people who all care about a certain feature, ask everyone to vote on your feature request!

Features can be requested in this category of the forum. Simply click the blue New Idea button to start your thread. Be as clear as possible in your title so other users will be able to see what your feature is about and will be more likely to vote on it. If you see a feature you like, please upvote it so we know you'd like to see it implemented.

We've pre-populated this section with all the feature requests that have been posted in the last two months. You can search for your own feature requests by looking at discussions you've started. If we missed yours, or if there's a feature from further back than May, you can ask us to move it by tagging @KateN in the thread. Or, simply create a new request.

Take a peek at what's been posted and vote for what you want to see!

See comments (1)

July 2018 Release Notes

Posted by KateN on 10 Jul 2018 (0)

July 31, 2018

  • FireCloud has upgraded to Cromwell 34. See release notes for versions 33 and 34 for new features available to you here.
    • WDL 1.0 support, requester pays, and PAPI v2 are not yet available in FireCloud. Work is underway to implement these specific components.
  • Notebooks clusters should no longer auto-pause when a Jupyter notebook or a terminal is open in a user's browser.
  • Leo now contains an optional user script that can be used to install GATK on clusters.
  • The notebooks docker image now includes the tidyverse R package by default.
  • Leo now contains an optional user script to allow downloading notebooks as PDF files. Note this significantly increases the size of the notebooks image which leads to slower cluster creation time, so use with caution. A follow-on change will be made to disallow downloading as PDF if this script is not used.
  • FireCloud no longer tracks user behaviors in Google Analytics.
  • Minor performance improvement via reducing unnecessary ajax requests when switching tabs inside a workspace.
  • Performance improvement via caching of WDL validation. Users may notice improved performance when viewing a method configuration inside a workspace.

Read the whole post
See comments (0)

Cromwell version 32 was released last Thursday evening, June 7th, and if you saw the release notes, you may have wondered what we meant by “File read limits." For those who didn’t see the notes, we explained that this feature is improving Cromwell and thus FireCloud stability, but didn’t get into too much detail about it. In this post, I’ll explain how this will help stability by framing the problem, solution, and potential impacts.

Problem FireCloud will slow down or completely stop working if a user plugs in a large file into the WDL read_lines() call. The read_lines() call is frequently used to ingest a list of filenames, genomic intervals and things like that for scattering purposes. Users have accidentally set this to read in a bam file causing Cromwell’s memory to load up and thus slowing down the engine. This vulnerability means that one user can take down the service.

Solution File read limits is a Cromwell configuration option that limits what can be read in through a WDL read_lines() call. By establishing a lower limit, the system is safeguarded from being taken down by a single user. FireCloud’s Cromwell configuration now uses a file read limit of 1 MB. Limits are also used for other read_X functions and can be found here.

Potential Impacts If you see an error message like this, you've been impacted: "File is larger than 1000000 Bytes…” We've listed a few workarounds in our Solutions to Problems section.

Help If you find yourself blocked by these changes, please don't hesitate to reach out for help. We want to ensure that the benefit of making the system more stable outweighs any individual disruptions.

See comments (0)

Release Notes: June 2018

Posted by KateN on 7 Jun 2018 (0)

June 26, 2018

  • At the ease of a click, FireCloud can now populate the outputs in your method config with reasonable default attribute names, so that you don't have to. The button is right next to the Output's section in your config in your workspace.
  • You can now choose "Copy Link Address" for the metadata files ("Download 'x' metadata") in the Data tab. This will increase the speed of downloading these files when using the command line.

June 21, 2018

  • UX improvements related to call caching, submission and workflow monitoring:
    • Call caching status is now displayed at the submission level and has been removed from the workflow level.
    • Call caching status now accurately reflects the value supplied by the user at submission time. Previously, the call caching value could falsely show as disabled for certain workflows.
    • Hovering over a submission's status column in the Monitor tab now shows the counts of that submission's workflows, grouped by workflow status.
    • When viewing an individual workflow, that workflow's status now shows as both icon and text. Previously it only had an icon.
    • When viewing an individual workflow, that workflow's unexpanded calls now now show their status.
    • When a workflow or a call does not have stdout or stderr logs, the stdout/stderr fields are now hidden. Previously the fields were displayed with a blank value, taking up screen real estate.
  • Updated the swagger-ui response models for the Monitor submission status and Retrieve workflow cost
  • Fixed intermittent errors after restarting a cluster and opening a notebook.

Read the whole post
See comments (0)

By Eric Weitz, software engineer, Data Sciences Platform at the Broad Institute

Have you heard of the FAIR principles? They are a set of guidelines proposed as part of a growing movement to make data more Findable, Accessible, Interoperable, and Reusable. As this movement gains traction, we are seeing more FAIR-related activities at major meetings and conferences. For example the recent Bio-IT World meeting in Boston included a conference track dedicated to FAIR, as well as a hackathon.

I was part of a team of four people from the Broad Institute's Data Sciences Platform that participated in the Bio-IT hackathon. Our goal: make data more FAIR in Single Cell Portal, which is built on top of FireCloud. In addition to improving the Single Cell Portal’s scientific data management, the hackathon also gave our team a chance to work with developers from other organizations in a manner that was uniquely nimble.

Read the whole post
See comments (1)

We are excited to introduce a new Featured workspace that demonstrates the GenoMetric Query Language (GMQL) created by a team from Politecnico di Milano in Italy. For some context on Featured workspaces, please read our previous blog post.

GMQL is a high-level, declarative language supporting queries over thousands of heterogeneous datasets and samples; as such, it enables genomic “big data” analysis. Based on Hadoop framework and the Apache Spark platform, GMQL is designed to be highly scalable, flexible, and simple to use. You can try the system here through its several interfaces, with documentation and biological query examples on ENCODE, TCGA and other public datasets or clone the Featured workspace and launch an example analysis.

The GMQL 101 workspace features three methods, each with increasing levels of complexity to give you a taste of how the query language works. One method shows how to join two datasets, and then extracts a third dataset based on a specific condition: pairs of regions that are less than 1000 bases a part. The second method takes a VCF and performs an epigenomic analysis using gene annotation and Chip-Seq results. It shows how you can select high confidence regions, use RefSeq annotations to find regions that overlap a gene, and count the mutations falling within the high confidence regions. Finally, the third method is a combination of GATK4’s Mutect 2 pipeline and the second method, showing an epigenomic analysis from start (calling somatic variants) to finish (annotating variants). For any GMQL-specific questions or problems you can visit the GMQL GitHub page.

Many thanks to Luca Nanni, Arif Canakoglu, Pietro Pinoli, and Stefano Ceri for putting together this workspace. It takes a lot of thought and effort to create a valuable learning resource like this, and we are still figuring out the most successful way to do this. Please share your thoughts in the Comments section below on the effectiveness of this workspace and any other Featured workspaces you try out. If you are interested in featuring examples of your methods in this way, please tell us here, and we can talk to you about the process.

See comments (0)

Latest posts

- Recent posts

- Follow us on Twitter



@vanilla This issue has been resolved; everything should be working properly now.
20 Sep 18
Service alert: forums and docs are currently down due to a @vanilla database outage. Will advise when service is re…
20 Sep 18
RT @jpflorido: Today at #GATK course, pipelining with WDL, Cromwell and Firecloud! @ClinicalBioinfo @FProgresoysalud @gatk_dev
20 Sep 18
Notebooks in focus Clusters are in the background Life just got better
13 Sep 18
Today's release notes are up. You can now pause and restart clusters in the Notebooks tab:
25 Jul 18

- Our favorite tweets from others

Today at #GATK course, pipelining with WDL, Cromwell and Firecloud! @ClinicalBioinfo @FProgresoysalud @gatk_dev
20 Sep 18
@dgmacarthur If anybody wants to sequence my genome to find the rare variant that is preventing me from going into…
11 Jul 18
Brian O'connor @ucscgenomics on scaling analysis on the cloud.
8 Jul 18
Happy to see the GenoMetric Query Language (GQML) by @LucaNanni93, @acanakoglu, @piepino, and @StefanoCeri (@polimi
5 Jul 18
@Juanmicroguy @sheffi @googlecloud Couple of hours of demoing #GATK4 and @BroadFireCloud at the GCP booth — Dobby earned those socks.
29 Jun 18

See more of our favorite tweets...