This module imports data from TCGA by taking in a GDC manifest file, downloading the files listed on that manifest, renaming them to be human-friendly, and compiling them into a GCT file to be computer-friendly.
Author: Edwin Juarez
Contact:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!forum/genepattern-help
Algorithm Version:
This module imports data from TCGA by taking in a GDC manifest file, downloading the files listed on that manifest, renaming them to be human-friendly, and compiling them into a GCT file to be computer-friendly.
Remember that you will need to download a manifest file and a metadata file from the GDC data portal (https://portal.gdc.cancer.gov/). To dowload these two files follow these intructions: https://github.com/genepattern/TCGAImporter/blob/master/how_to_download_a_manifest_and_metadata.pdf
If you'd like a more comprehensive tutorial of the GDC website, you can find it here: https://docs.gdc.cancer.gov/Data_Portal/Users_Guide/Getting_Started/
conda create --name GP_dfgdc_env pip
source activate GP_dfgdc_env
pip install -r requirements.txt
Note that you will need to have the GDC download client on the same folder. If you don't know what this means, read more here: https://docs.gdc.cancer.gov/Data_Portal/Users_Guide/Getting_Started
Name | Description |
---|---|
imanifest * | The relative path of the manifest used to download the data. This file is obtained from the GDC data portal (https://portal.gdc.cancer.gov/). |
metadata * |
The metadata file obtained from obtained from the GDC data portal (https://portal.gdc.cancer.gov/) |
output_file_name * |
The base name to use for output files. E.g., if you type "TCGA_dataset" then the GCT file will be named "TCGA_dataset.gct" |
gct * | whether or not to create a gct file
|
translate_gene_id * | Whether or not to translate ENSEMBL IDs (e.g., ENSG00000012048) to Hugo Gene Symbol (e.g., BRCA1) |
cls * | Whether or not to translate create a cls file separating Normal and Tumor classes based on TCGA Sample ID. |
* - required
TCGAImporter is distributed under a modified BSD license available at https://raw.githubusercontent.com/genepattern/TCGAImporter/master/LICENSE
Task Type:
Download dataset
CPU Type:
any
Operating System:
any
Language:
Python 3.6
Version | Release Date | Description |
---|---|---|
5 | 2018-08-06 | Fixing small bugs and increasing performance of gene name translation |
4 | 2018-05-16 | Renaming the module from download_from_gdc to TCGAImporter |
3 | 2018-04-16 | preparing for prebuild |
1 | 2018-04-16 | Initial version |