Downloaded databases on Oct, 23, 2015

1. How to build database Fasta files?
Response:
Step 1: 
a. In case of Bacteria, Fungi, Archaea: 
	Download genome sequence files from NCBI reference sequence database (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/)

b. In case of Viruses / Phage:

	i. Download genome sequence files by using the following queries and combined them:
		http://www.ncbi.nlm.nih.gov/nuccore?term=Viruses[Organism]+AND+srcdb_refseq[PROP]+NOT+wgs[prop]+NOT+cellular+organisms[ORGN]+NOT+AC_000001%3AAC_999999[pacc]
		http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Search&db=nuccore&term=Viruses[Organism]+NOT+srcdb_refseq[PROP]+NOT+cellular+organisms[ORGN]+AND+nuccore+genome+samespecies[Filter]+NOT+nuccore+genome[filter]+NOT+gbdiv+syn[prop]

c. In case of Human genomes

	i. Ensembl_plus_RNA database is constructed using the follwing fasta files
		wget -nd ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.74.cdna.all.fa.gz
		wget -nd ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA/rna.fa.gz

	ii.Female reference database Fasta
		wget -nd ftp://ftp.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz

	iii. Human Genome_plus_Transcriptome Database Fasta
		wget -nd ftp://ftp.ncbi.nih.gov/blast/db/human_genomic_transcript.tar.gz
	
	iv. Human reference genome from NCBI
		wget -nd -r "ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/seq/*.fa.gz"
		
	
Step 2: Clean up the reference files:
	i. In case of Bacteria and Archaea, 
		a. remove plasmid sequences from the sequence database
		b. remove sequences that are shorter then 10K sequence length for Bacteria

	ii. In case of Viruses, remove patent sequences from the sequence database

Step 3: Build  Human plus Microbial Database from the Bacteria, Viruses, Archaea, Phage, Fungi, and Human reference genome.

CURATED DATABASES ARE FOUND IN http://software.broadinstitute.org/pathseq/Downloads.html