FireCloud is now powered by Terra! -- STARTING MAY 1st, 2019 THIS WEBPAGE WILL NO LONGER BE UPDATED.
From now on, please visit the Terra Help Center for documentation, tutorials, roadmap and feature announcements.
Want to talk to a human? Email the helpdesk, post feature requests or chat with peers in the community forum.

Notebooks get a facelift

Posted by rtitle on 12 Sep 2018 (0)

By Robert Title, Engineering Manager, Data Sciences Platform at the Broad Institute

We are excited to announce that we have substantially improved the way you interact with Jupyter Notebooks in FireCloud. We hope this will increase your productivity and empower you to collaborate more effectively. These changes are publicly available as of today, Sep 12, 2018, and can be accessed in the Notebooks tab of your FireCloud workspace. Read on for a more detailed description of what is changing, and why it is better.

First, some background. Earlier this year, we announced a beta preview of Jupyter Notebooks in FireCloud. This feature brought the ability to spin up a dedicated cluster (a compute environment based on Google Cloud Dataproc) in your billing project, on which you can run a Jupyter Notebook. Since then, we’ve released several important improvements to the functionality, including the ability to pause/resume clusters, auto-pause to save you money, support for Jupyter extensions, bug/security fixes, new kernels, library upgrades, and more. The overall user experience, however, has essentially remained the same -- until now.

A major limitation of the initial Notebooks Beta release was that it focused on the cluster management and lacked utilities for managing notebooks themselves. In the old system, a notebook was ultimately just a text file that was stored on your cluster running in the cloud. When you deleted your cluster, everything on it -- including notebooks -- was deleted as well. To prevent losing work, you had to download notebooks using the Jupyter UI and store them in some other place, such as a directory on your laptop; we've heard of people copying the contents of their notebooks to a Google Doc! This is error-prone and inconvenient. It also isn't aligned with FireCloud’s philosophy of openness and collaboration, since clusters are not shared with other members of the workspace, and therefore any notebooks that live on them are not shared either.

Now, instead of displaying clusters (which are only visible to you), the new Notebooks tab displays notebooks (which are visible to all members of the workspace). When you work in a notebook, any changes you make are automatically persisted back to the workspace. This enables some powerful new ways of using notebooks in FireCloud. For example, your team can collaborate to develop notebooks containing analysis code, results, and documentation. You can then share your workspace containing notebooks so other researchers can easily reproduce the analysis.

We’re going to post some documentation updates with more detailed instructions on how to use the new Notebooks management interface, though we think it might be intuitive enough that you won't need to read them! Here is a screenshot:

You can Create or Upload a notebook, which adds it to the workspace. You can also Rename/Duplicate/Delete existing notebooks in the workspace. These operations do not require starting a cluster at all: they simply perform file operations on the notebook files stored in the workspace bucket.

To actually open a notebook and execute code, you need to associate the notebook with a cluster, which you can create with a couple of clicks, or choose from a list of existing clusters -- and yes you can associate multiple notebooks with the same cluster. Once the association is made, the notebook is copied to the cluster and can be opened with Jupyter. We also handle saving any changes you make back to your workspace as you work. There is no more need to upload/download notebook files using the Jupyter UI, although you are still free to do that if you wish. If you do upload a file using the Jupyter UI, we'll save it back to your workspace for you.

Here is a diagram illustrating the above flow using an example workspace containing three notebooks and two clusters.

Development Roadmap

Following the initial release, here are a few follow-on UI improvements that we’d like to make in the short term:

  • HTML preview of notebooks We’d like to add the ability to preview a notebook-in-the-workspace rendered as HTML, without needing to actually launch a cluster. This will allow workspace readers to look at notebooks even if they don’t have can-compute permissions.
  • JupyterLab & terminal In addition to Jupyter Notebooks, we would like to provide access to JupyterLab in FireCloud. We’d also like to make it easier to access Jupyter’s in-browser bash terminal.
  • Additional cluster management options There are some cluster management features that are not exposed in FireCloud, including configurable Jupyter extensions; auto-pause configuration, and environment customization. We would like FireCloud to make use of these features.

Furthermore, in the longer term we are looking at the following themes to improve our product. The dates are very rough estimates, but they provide some sense of their relative prioritization. For context, Leonardo is the service which provides notebooks functionality to FireCloud, and is where most of the development effort is focused.

  • More Analysis Tools We believe we can provide users with other analysis tools besides Jupyter using the Leonardo infrastructure. The next tool we support will most likely be RStudio in Q4 2018, followed by IGV Desktop in Q1 2019.
  • Bring your own Docker (Q4 2018) We have some capabilities to customize the notebook environment via a user-provided bash script. We’d also like to support custom Docker images as a more powerful way users can control their notebook environment.
  • Hail 0.2 support (Q4 2018) We currently install Hail 0.1 on Leo-created clusters. We would like to upgrade to Hail 0.2 and deprecate 0.1.
  • Data access (Q1 2019) Today in a notebook you can access Google Cloud Storage or BigQuery data using standard libraries in python or R. For python users, we also provide the FireCloud client library which can be used to access FireCloud objects such as workspaces and the workspace data model. We’d like to improve our client library offering by providing an R version and making the python library more user-friendly.
  • Collaboration (Q2 2019) We now have notebook persistence in the workspace, but we don’t have more sophisticated collaboration tools such as collaborative editing (think Google Docs) or version control. There is some exciting development from the Jupyter team on this front which we’d like to try and make use of in the future.

We hope that the above changes will be beneficial to your work in FireCloud. If you have any further questions or comments, our team closely monitors notebook-related posts on the FireCloud Forum. Happy notebooking!

Return to top

Wed 12 Sep 2018
Comment on this article

- Recent posts

- Follow us on Twitter



RT @TerraBioApp: Terra #OpenScience Contest -- You be the judge! Over the past month we ran a contest in which four teams created workspace…
26 Jun 19
FireCloud project resources are affected by this GCP outage as well.
2 Jun 19
RT @jklemm: Also available on @BroadFireCloud where it was leveraged to process all of the RNA-Seq data from TCGA and GTEx through STAR-fus…
29 May 19
RT @TerraBioApp: Do you have a pet workflow or a favorite notebook? Have you thought about sharing them with the world, but keep pushing it…
18 May 19
RT @jklemm: Great meeting this week with #NCICloud and Data Commons Framework teams discussing cancer research priorities for #NCICommons.…
15 May 19

- Our favorite tweets from others

See the theme? Green!
24 Jul 19
I will be introducing Terra to aspiring bioinformatics researchers later this month. I discovered FireCloud (predec…
2 May 19
Pipelines API is used by a number of popular tools, such as Firecloud/Terra from @broadinstitute. @BroadFireCloud
11 Apr 19
The macaque genome isn't finished so it has over 200K contigs. I call them the Rhesus pieces.
15 Mar 19

See more of our favorite tweets...