PathSeq
PathSeq is a computational tool for the identification and
analysis of microbial sequences in high-throughput human
sequencing data that is designed to work with large numbers of
sequencing reads in a scalable manner. This process is
composed of a subtractive phase in which input reads are
subtracted by alignment to human reference sequences, and an
analytic phase in which the remaining reads are aligned to
microbial reference sequences (viral, fungal, bacterial,
archaeal). Reads of unknown origin are identified for
pathogen discovery in downstream analysis (i.e. de novo assembly).
PathSeq is currently available as a tool suite in the Genome
Analysis Toolkit (GATK).
The following figure illustrates the typical approach one would take to pathogen discovery with PathSeq. RNA or DNA is extracted from the tissue of interest and sequencing libraries are constructed to be run on the next-generation DNA sequencing platform of choice. The resulting sequence data is run through the GATK PathSeq pipeline in a variety of computing environments. PathSeq reports potential microbes in the sequence data as well as the complete set of reads that could not be identified as human or microbial sequences.