FireCloud is now powered by Terra! -- STARTING MAY 1st, 2019 THIS WEBPAGE WILL NO LONGER BE UPDATED.
From now on, please visit the Terra Help Center for documentation, tutorials, roadmap and feature announcements.
Want to talk to a human? Email the helpdesk, post feature requests or chat with peers in the community forum.
FIRECLOUD | Doc #10738 | (howto) Import metadata

# (howto) Import metadataTutorials | Created 2017-11-07 | Last updated 2018-04-27

#### Copying from an existing workspace

1. Choose the workspace you want to import metadata from. You will notice that you can only import data from workspaces that are compatible with the Authorization domain you have set.
2. Pick the participants, samples, pairs, or sets you want. Importing sets will bring over all the data required for the set. For example, if you import a sample set, the sample and participant data linked to the set will also be copied over. Notes: • Import conflicts can occur if you already have an identical participant, sample or pair in your workspace that matches what you are importing. FireCloud will notify you that the entity already exists in the workspace. • Copying metadata from another workspace will not import any linked files into your workspace bucket. Rather, it will refer to file paths in the bucket of the workspace you copied. Thus, if that workspace bucket is deleted, your workspace data model will no longer refer to an existing bucket path.

#### Importing a file

You import metadata corresponding to entity type -- Participant, Sample, or Pair -- by uploading load files in tab-separated-value format, a type of text file (.tsv or .txt). Separate files must be used for uploading each entity type. The first line of each file must contain the appropriate field names in their respective column headers. See the individual entity entries for examples of load files.

Note that for each of the basic entities, the data model also supports set entities, which are essentially lists of the basic entity type:

• Participant Set
• Sample Set
• Pair Set

In set load files, each line lists the membership of a non-set entity (e.g., participant) in a set (e.g., participant set). The first column contains the identifier of the set entity and the second column contains a key referencing a member of that set. For example, a load file for a participant set looks like this:

Note that multiple rows in a set load file may have the same set entity id (e.g. TCGA_COAD).

Load files must be imported in a strict order due to references to other entities.

The order is as follows ("A > B" means entity type A must precede B in order of upload):

• participants > samples
• samples > pairs
• participants > participant sets
• samples > sample sets
• pairs > pair sets
• set membership > set entity, e.g., participants > samples > sample set membership > sample set entity.

You may be in the situation where you have multiple files or strings of metadata that belong to one participant, sample, pair, or sets of these. For example, say you have been given genotyping files in VCF format for a collection of samples, for a total of twenty-two files per sample set. For each sample set, you don’t want to create a new column in the data table for each file because that's time-consuming. You would also have to launch the analysis in FireCloud repeatedly to run on each file. Instead, you want to build a WDL that inputs an array of VCF files because you’d like your tools to run on each item in the array without manual intervention.

To get the array into your data model, you can write WDL code that will output a file of file paths or strings into an array format. This requires a file that contains a list of file paths or strings as the input. A task in your WDL can read the lines of the file, output it to your data model as an array, then you can use the method configuration to assign it to a workspace attribute (“workspace.X”) or an attribute of the participant, sample, pair, or set that you are running on (“this.X”).

Here are two examples that can be altered for your use case. In the example above, the input would be a file that has a list of VCF file paths, one per line using “gs://” format.

Example 1 has a command portion left blank so that you can manipulate the array if you desire. This WDL will copy your files to the virtual machine the task spins up, which makes sense if you are manipulating the array of files further. 50 GB disk size is to account for those files that are being copied to the virtual machine and should be changed for your use case. If you do not want to manipulate the array, see Example 2.

#### Example 1:

Example 1’s Method and Method Configuration are published in the Methods Repository.

workflow fof_usage_wf {
File file_of_files
input:
fof = file_of_files
}
output {
}
}
File fof
command {
#do stuff with arrays below
#....
}
runtime {
docker : "ubuntu:16.04"
disks: "local-disk 50 HDD"
memory: "2 GB"
}
output {
Array[File] array_of_files = my_files
}
}

#### Example 2:

workflow fof_usage_wf {
File file_of_files
}