PathSeq is a computational tool for the identification and analysis of microbial sequences in high-throughput human sequencing data that is designed to work with large numbers of sequencing reads in a scalable manner. This process is composed of a subtractive phase in which input reads are subtracted by alignment to human reference sequences, and an analytic phase in which the remaining reads are aligned to microbial reference sequences (viral, fungal, bacterial, archaeal). Reads of unknown origin are identified for pathogen discovery in downstream analysis (i.e. de novo assembly).

PathSeq is currently available as a tool suite in the Genome Analysis Toolkit (GATK).

The following figure illustrates the typical approach one would take to pathogen discovery with PathSeq. RNA or DNA is extracted from the tissue of interest and sequencing libraries are constructed to be run on the next-generation DNA sequencing platform of choice. The resulting sequence data is run through the GATK PathSeq pipeline in a variety of computing environments. PathSeq reports potential microbes in the sequence data as well as the complete set of reads that could not be identified as human or microbial sequences.