Latest posts

Today we bring to you a new facet of our forum: the feature request section!

Within this new section, you will be able to suggest new features, upvote existing ones, and see the status of features we implement. We can better gauge the number of people interested in a particular feature by the vote count, which helps to determine which features we work on first. So, if you represent a group of people who all care about a certain feature, ask everyone to vote on your feature request!

Features can be requested in this category of the forum. Simply click the blue New Idea button to start your thread. Be as clear as possible in your title so other users will be able to see what your feature is about and will be more likely to vote on it. If you see a feature you like, please upvote it so we know you'd like to see it implemented.

We've pre-populated this section with all the feature requests that have been posted in the last two months. You can search for your own feature requests by looking at discussions you've started. If we missed yours, or if there's a feature from further back than May, you can ask us to move it by tagging @KateN in the thread. Or, simply create a new request.

Take a peek at what's been posted and vote for what you want to see!

See comments (1)

July 2018 Release Notes

Posted by KateN on 10 Jul 2018 (0)

July 17, 2018

  • You can now populate method config inputs directly from a JSON file.
  • Fixed an issue that caused timeouts and errors when attempting to view a workflow that contains many subworkflows.
  • Cluster creation now takes an optional value for the cluster auto-pause threshold. If not specified, a system default will be used. See Leonardo swagger for details on the new setting.

Read the whole post
See comments (0)

Cromwell version 32 was released last Thursday evening, June 7th, and if you saw the release notes, you may have wondered what we meant by “File read limits." For those who didn’t see the notes, we explained that this feature is improving Cromwell and thus FireCloud stability, but didn’t get into too much detail about it. In this post, I’ll explain how this will help stability by framing the problem, solution, and potential impacts.

Problem FireCloud will slow down or completely stop working if a user plugs in a large file into the WDL read_lines() call. The read_lines() call is frequently used to ingest a list of filenames, genomic intervals and things like that for scattering purposes. Users have accidentally set this to read in a bam file causing Cromwell’s memory to load up and thus slowing down the engine. This vulnerability means that one user can take down the service.

Solution File read limits is a Cromwell configuration option that limits what can be read in through a WDL read_lines() call. By establishing a lower limit, the system is safeguarded from being taken down by a single user. FireCloud’s Cromwell configuration now uses a file read limit of 1 MB. Limits are also used for other read_X functions and can be found here.

Potential Impacts If you see an error message like this, you've been impacted: "File is larger than 1000000 Bytes…” We've listed a few workarounds in our Solutions to Problems section.

Help If you find yourself blocked by these changes, please don't hesitate to reach out for help. We want to ensure that the benefit of making the system more stable outweighs any individual disruptions.

See comments (0)

Release Notes: June 2018

Posted by KateN on 7 Jun 2018 (0)

June 26, 2018

  • At the ease of a click, FireCloud can now populate the outputs in your method config with reasonable default attribute names, so that you don't have to. The button is right next to the Output's section in your config in your workspace.
  • You can now choose "Copy Link Address" for the metadata files ("Download 'x' metadata") in the Data tab. This will increase the speed of downloading these files when using the command line.

June 21, 2018

  • UX improvements related to call caching, submission and workflow monitoring:
    • Call caching status is now displayed at the submission level and has been removed from the workflow level.
    • Call caching status now accurately reflects the value supplied by the user at submission time. Previously, the call caching value could falsely show as disabled for certain workflows.
    • Hovering over a submission's status column in the Monitor tab now shows the counts of that submission's workflows, grouped by workflow status.
    • When viewing an individual workflow, that workflow's status now shows as both icon and text. Previously it only had an icon.
    • When viewing an individual workflow, that workflow's unexpanded calls now now show their status.
    • When a workflow or a call does not have stdout or stderr logs, the stdout/stderr fields are now hidden. Previously the fields were displayed with a blank value, taking up screen real estate.
  • Updated the swagger-ui response models for the Monitor submission status and Retrieve workflow cost
  • Fixed intermittent errors after restarting a cluster and opening a notebook.

Read the whole post
See comments (0)

By Eric Weitz, software engineer, Data Sciences Platform at the Broad Institute

Have you heard of the FAIR principles? They are a set of guidelines proposed as part of a growing movement to make data more Findable, Accessible, Interoperable, and Reusable. As this movement gains traction, we are seeing more FAIR-related activities at major meetings and conferences. For example the recent Bio-IT World meeting in Boston included a conference track dedicated to FAIR, as well as a hackathon.

I was part of a team of four people from the Broad Institute's Data Sciences Platform that participated in the Bio-IT hackathon. Our goal: make data more FAIR in Single Cell Portal, which is built on top of FireCloud. In addition to improving the Single Cell Portal’s scientific data management, the hackathon also gave our team a chance to work with developers from other organizations in a manner that was uniquely nimble.

Read the whole post
See comments (1)

We are excited to introduce a new Featured workspace that demonstrates the GenoMetric Query Language (GMQL) created by a team from Politecnico di Milano in Italy. For some context on Featured workspaces, please read our previous blog post.

GMQL is a high-level, declarative language supporting queries over thousands of heterogeneous datasets and samples; as such, it enables genomic “big data” analysis. Based on Hadoop framework and the Apache Spark platform, GMQL is designed to be highly scalable, flexible, and simple to use. You can try the system here through its several interfaces, with documentation and biological query examples on ENCODE, TCGA and other public datasets or clone the Featured workspace and launch an example analysis.

The GMQL 101 workspace features three methods, each with increasing levels of complexity to give you a taste of how the query language works. One method shows how to join two datasets, and then extracts a third dataset based on a specific condition: pairs of regions that are less than 1000 bases a part. The second method takes a VCF and performs an epigenomic analysis using gene annotation and Chip-Seq results. It shows how you can select high confidence regions, use RefSeq annotations to find regions that overlap a gene, and count the mutations falling within the high confidence regions. Finally, the third method is a combination of GATK4’s Mutect 2 pipeline and the second method, showing an epigenomic analysis from start (calling somatic variants) to finish (annotating variants). For any GMQL-specific questions or problems you can visit the GMQL GitHub page.

Many thanks to Luca Nanni, Arif Canakoglu, Pietro Pinoli, and Stefano Ceri for putting together this workspace. It takes a lot of thought and effort to create a valuable learning resource like this, and we are still figuring out the most successful way to do this. Please share your thoughts in the Comments section below on the effectiveness of this workspace and any other Featured workspaces you try out. If you are interested in featuring examples of your methods in this way, please tell us here, and we can talk to you about the process.

See comments (0)

More and more method developers are using the Method Repository to make their pipelines publicly accessible to the FireCloud community. By making the methods public other researchers can use them instead of building their own, similar methods. However, just providing the method on it’s own, without a configuration, or documentation limits reusability. This post is about how Featured workspaces solved this problem for GATK4 methods and how an outside group will contribute the first third-party Featured workspace, demonstrating that any developer can do this.

Featured workspaces hold the latest version of a method, configured to work out of the box on an accompanying example dataset. This means you can launch the method without doing any setup, e.g., finding data or configuring pipelines. You can see the required inputs and configuration settings clearly, and once launched, check out all the outputs that it produces. When you are ready to launch it on your own data, all you need to do is replace the example dataset with your own, following the guidance in the docs. This takes the guesswork out of configuring a method on your own dataset. All together, these workspaces should make it easy to reuse methods.

We originally developed this “packaging” for methods with a group of GATK method developers we work closely with, to help people learn and test the most up-to-date GATK4 pipelines. These Featured workspaces went live once many of the GATK4 tools left beta status around January 2018 (GATK4 launch). People are interested in other pipelines besides GATK, and tomorrow we will announce a new Featured workspace put together by a team from Politecnico di Milano showcasing a different tool. Stay tuned!

Interested in putting together a workspace like this and having it featured? Let us know in this sign-up survey. We can walk you through the process we just went through with our friends from Politecnico di Milano.

See comments (0)

We are planning some system upgrades to address some of the recent stability issues that have affected reliability of service in FireCloud. The upgrade process will cause a temporary interruption of service; we estimate the interruption may last up to 30 minutes. We do not yet have a specific time to announce; we expect it will be in the afternoon or evening (EST) of Sunday, May 27. We will post an update here when we are able to narrow down the window of time more precisely. Thank you for your patience while we work to improve the quality of service in FireCloud.

See comments (1)

UPDATE: This issue described below has been resolved.

Due to an individual user's submission that amounts to a very large number of jobs (~60k), all new workflow submissions are currently being held in the queue (with status QueuedInCromwell). To be clear, as far as we can tell this is NOT a FireCloud malfunction; it seems to be a Google Cloud limitation that we are encountering for the first time. We are working with GCP support and evaluating options to unblock the queue, hopefully without interrupting that one very ambitious and totally legitimate submission. We will strive to resume normal workflow throughput by Monday morning EST.

We understand that this is causing many of you considerable inconvenience, yet we are hopeful that this case will provide an opportunity to push back the current limitations to the next level. Please remember that what we are all doing here, together, is blazing a new trail; building a new model for how we do science at scale, collaboratively. The fact that these scaling problems are arising at all demonstrates that we are on the right path, that the research community needs this level of scalability. And we will do everything in our power to deliver it.

Thank you for your patience and stay tuned for updates.

See comments (4)

FireCloud DataShuttle Release

Posted by KateN on 17 May 2018 (1)

Alpha version 0.1.1

Release Overview

Broad Institute’s Genomics Platform & Data Science Platform announce the general availability of the FireCloud DataShuttle 0.1.1. The FireCloud DataShuttle allows users to easily browse files, download and upload data directly between FireCloud workspaces & Google buckets and your local drives, and monitor the status of these transfers.

The FireCloud DataShuttle was developed to facilitate the work of researchers and project managers who transfer a high volume of files and desire a more efficient and clearer process.

Read the whole post
See comments (1)

Latest posts

- Recent posts

- Follow us on Twitter



Read more about the Feature Requests section here:
11 Jul 18
Have a feature request? Tell us here, in the new Feature Requests section of the forum: - a…
11 Jul 18
Released some #Jupyter notebook enhancements & bug fixes this afternoon. Notes here:
10 Jul 18
RT @DavideChicco_it: Happy to see the GenoMetric Query Language (GQML) by @LucaNanni93, @acanakoglu, @piepino, and @StefanoCeri (@polimi @d…
5 Jul 18
RT @gatk_dev: The #GATK website is back and fully functional — our deepest gratitude to @broadinstitute ops heroes @jaredbancroft and team!
25 Jun 18

- Our favorite tweets from others

@dgmacarthur If anybody wants to sequence my genome to find the rare variant that is preventing me from going into…
11 Jul 18
Brian O'connor @ucscgenomics on scaling analysis on the cloud.
8 Jul 18
Happy to see the GenoMetric Query Language (GQML) by @LucaNanni93, @acanakoglu, @piepino, and @StefanoCeri (@polimi
5 Jul 18
@Juanmicroguy @sheffi @googlecloud Couple of hours of demoing #GATK4 and @BroadFireCloud at the GCP booth — Dobby earned those socks.
29 Jun 18
@BroadFireCloud should be back up!
24 Jun 18

See more of our favorite tweets...