Data privacy and control of sensitive data are significant requirements in the world of genetic analysis. Yet research is frequently collaborative--collaboration which FireCloud was designed for! So how do you balance keeping data securely tucked away, but still easy to share?
Conventionally, you as a researcher working with sensitive data (let’s say it’s private to your lab) would need to be authorized to work with it. You have a workspace in which you did your analysis, and you’ve derived data from the original sensitive data. Let's assume that the derived data must still be kept under controlled access.
You know you are a member of your lab (indicated by the blue color in the diagram above), so you can access your lab’s data (blue again) in the workspace. You do some analysis and generate a clone of the workspace containing your derived data, which is possible because you have this lab authorization. Let’s say a new coworker asks you to share the workspace with them. You would be responsible for checking that this new coworker is officially a part of your lab rather than an imposter! (Unlikely, but still.) Your cloned workspace has no inherent way of checking whether or not the recipient has the proper authorization. You’d have to keep track for yourself which of your fellow scientists do and do not have up-to-date authorization.
Enter Authorization Domains. Think of it like a badge you wear that gives you access to workspaces with the same badge. When you do work and clone the workspace, that badge stays with the new copy. You no longer need to worry about accidentally sharing sensitive data because if you try to share the cloned workspace with a user who doesn’t have the right badge, that researcher won’t be able to enter the workspace.
Let’s look back at our example from before. You now have your lab’s badge, and you’re working with a workspace that also has the lab’s badge. You do your analysis, generate some derived data, and create a clone. The copied workspace also has the lab badge. Now, you try to share the workspace with your new coworker, and the authorization domain will check to see if your coworker has the right badge, or if they are an imposter! (Or more likely, need to go get a badge.) Either way, the responsibility is on the authorization domain to keep track of access, not you.
So how do you use them?
When a creating a workspace, you can select one or more groups to set as the Authorization Domain. (If you don’t see your group in the list, you may need to create it. See Step 2.) An Authorization Domain can only be set when creating the workspace, and once set, cannot be removed from the workspace. It will be copied over to any cloned version of the workspace to keep any derived data protected as well.
When multiple groups are set as the Authorization Domain, the system looks for the user to be a member of all groups in order to access the workspace. This is because there are strict guidelines with third-party dbGaP registered datasets (TCGA and Target). For example, say there is a workspace whose Authorization Domain contains the TCGA and Target groups. If a user is invited to the workspace, the system checks the TCGA and the Target whitelist for their accounts before allowing access to the workspace.
To import data from another workspace (pictured to the right) , the groups in the Authorization Domain of the source workspace (where data is coming from) must be a subset of the groups in the destination workspace (where the data is going to). For example, if the destination workspace has TCGA-dbGap-Authorized and Tiffs-Test-Group groups in the Authorization Domain, you can import data from workspace’s whose Authorization domain is set to TCGA-dbGap-Authorized only (row 2, 3, 5-10), Tiffs-Test-Group only (row 4), both groups (row 1) or no groups. If the source workspace had additional groups, you would not be able to import from it. In this example, FireCloud informs you there are six workspaces that are unavailable because of this.
There are two ways to get your badge depending on whether you use a third-party or user-defined group. The difference between third-party and user-defined is how membership to the group is managed.
In the case of third party groups, external permissions are checked in order to give you access. At the time of writing, FireCloud supports two third-party party groups, TCGA Controlled Access and Target. To gain access to it, you must link your FireCloud account to your eRA Commons or NIH account on your Profile page. FireCloud then checks for the user ID of the linked account in the dbGAP whitelist to complete the authorization.
User-defined groups are created and managed within FireCloud. Groups are simple to set up, and are perfect to use when you want to share data with a set group of people (like within your lab). Your PI can create a group (user-defined) by going to the Groups page found in the menu under your username, e.g. “FireCloudWorkshop”, and add each member of your lab to the group, thereby giving them the “FireCloudWorkshop” badges. The PI (or anyone they give Owner access to the group) is then responsible for giving and revoking these badges.
Now to complete the process, share the workspace. You can share the workspace with the group you used in the Authorization Domain or share with an individual. To share with a group, start typing the name into the Sharing dialog and choose from the autocomplete options.
If you share with individuals or a group who is not in the Authorization Domain, they will see the workspace greyed out in their workspace list. When they click it, FireCloud facilitates a request process that sends an email to all owners of the groups in the Authorization domain. Once the user has the proper badge(s), they can enter the workspace to see the protected data.
All this is to help you share the data you want to work with amongst all your collaborators--without the headache of double-checking that each user has the proper access permissions. FireCloud keeps track of all of that for you!