By Robert Title, Engineering Manager, Data Sciences Platform at the Broad Institute
We are excited to announce that we have substantially improved the way you interact with Jupyter Notebooks in FireCloud. We hope this will increase your productivity and empower you to collaborate more effectively. These changes are publicly available as of today, Sep 12, 2018, and can be accessed in the Notebooks tab of your FireCloud workspace. Read on for a more detailed description of what is changing, and why it is better.
First, some background. Earlier this year, we announced a beta preview of Jupyter Notebooks in FireCloud. This feature brought the ability to spin up a dedicated cluster (a compute environment based on Google Cloud Dataproc) in your billing project, on which you can run a Jupyter Notebook. Since then, we’ve released several important improvements to the functionality, including the ability to pause/resume clusters, auto-pause to save you money, support for Jupyter extensions, bug/security fixes, new kernels, library upgrades, and more. The overall user experience, however, has essentially remained the same -- until now.
A major limitation of the initial Notebooks Beta release was that it focused on the cluster management and lacked utilities for managing notebooks themselves. In the old system, a notebook was ultimately just a text file that was stored on your cluster running in the cloud. When you deleted your cluster, everything on it -- including notebooks -- was deleted as well. To prevent losing work, you had to download notebooks using the Jupyter UI and store them in some other place, such as a directory on your laptop; we've heard of people copying the contents of their notebooks to a Google Doc! This is error-prone and inconvenient. It also isn't aligned with FireCloud’s philosophy of openness and collaboration, since clusters are not shared with other members of the workspace, and therefore any notebooks that live on them are not shared either.
Now, instead of displaying clusters (which are only visible to you), the new Notebooks tab displays notebooks (which are visible to all members of the workspace). When you work in a notebook, any changes you make are automatically persisted back to the workspace. This enables some powerful new ways of using notebooks in FireCloud. For example, your team can collaborate to develop notebooks containing analysis code, results, and documentation. You can then share your workspace containing notebooks so other researchers can easily reproduce the analysis.
We’re going to post some documentation updates with more detailed instructions on how to use the new Notebooks management interface, though we think it might be intuitive enough that you won't need to read them! Here is a screenshot:
You can Create or Upload a notebook, which adds it to the workspace. You can also Rename/Duplicate/Delete existing notebooks in the workspace. These operations do not require starting a cluster at all: they simply perform file operations on the notebook files stored in the workspace bucket.
To actually open a notebook and execute code, you need to associate the notebook with a cluster, which you can create with a couple of clicks, or choose from a list of existing clusters -- and yes you can associate multiple notebooks with the same cluster. Once the association is made, the notebook is copied to the cluster and can be opened with Jupyter. We also handle saving any changes you make back to your workspace as you work. There is no more need to upload/download notebook files using the Jupyter UI, although you are still free to do that if you wish. If you do upload a file using the Jupyter UI, we'll save it back to your workspace for you.
Here is a diagram illustrating the above flow using an example workspace containing three notebooks and two clusters.
Following the initial release, here are a few follow-on UI improvements that we’d like to make in the short term:
Furthermore, in the longer term we are looking at the following themes to improve our product. The dates are very rough estimates, but they provide some sense of their relative prioritization. For context, Leonardo is the service which provides notebooks functionality to FireCloud, and is where most of the development effort is focused.
We hope that the above changes will be beneficial to your work in FireCloud. If you have any further questions or comments, our team closely monitors notebook-related posts on the FireCloud Forum. Happy notebooking!