Downloaded databases on Oct, 23, 2015 1. How to build database Fasta files? Response: Step 1: a. In case of Bacteria, Fungi, Archaea: Download genome sequence files from NCBI reference sequence database (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/) b. In case of Viruses / Phage: i. Download genome sequence files by using the following queries and combined them: http://www.ncbi.nlm.nih.gov/nuccore?term=Viruses[Organism]+AND+srcdb_refseq[PROP]+NOT+wgs[prop]+NOT+cellular+organisms[ORGN]+NOT+AC_000001%3AAC_999999[pacc] http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Search&db=nuccore&term=Viruses[Organism]+NOT+srcdb_refseq[PROP]+NOT+cellular+organisms[ORGN]+AND+nuccore+genome+samespecies[Filter]+NOT+nuccore+genome[filter]+NOT+gbdiv+syn[prop] c. In case of Human genomes i. Ensembl_plus_RNA database is constructed using the follwing fasta files wget -nd ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.74.cdna.all.fa.gz wget -nd ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA/rna.fa.gz ii.Female reference database Fasta wget -nd ftp://ftp.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz iii. Human Genome_plus_Transcriptome Database Fasta wget -nd ftp://ftp.ncbi.nih.gov/blast/db/human_genomic_transcript.tar.gz iv. Human reference genome from NCBI wget -nd -r "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/seq/*.fa.gz" Step 2: Clean up the reference files: i. In case of Bacteria and Archaea, a. remove plasmid sequences from the sequence database b. remove sequences that are shorter then 10K sequence length for Bacteria ii. In case of Viruses, remove patent sequences from the sequence database Step 3: Build Human plus Microbial Database from the Bacteria, Viruses, Archaea, Phage, Fungi, and Human reference genome. CURATED DATABASES ARE FOUND IN http://software.broadinstitute.org/pathseq/Downloads.html