|
Note: The program has been discontinued and superceeded by MUMmerGPU. Similar to Cmatch,
MUMmerGPU aligns reads in parallel on the GPU, but MUMmerGPU can compute
substring alignments in addition to end-to-end alignments computed by
Cmatch.
Our program, Cmatch, performs exact string matching for a set of query
sequences and achieves a speedup of as much as 35x on a recent GPU over
the equivalent CPU-bound version. String matching has a long history in
computational biology with roots in finding similar proteins and gene sequences
in a database of known sequences. The explosion in sequence data available
in the 80s and 90s motivated the development of ever faster techniques for
searching for similar sequences, and ultimately lead the use of parallelized
execution of string matching algorithms using sophisticated data structures
called suffix trees. Suffix trees can be constructed time proportional to
the length of the corpus, and provide exact matching of a query in time
proporional to the length of the query, completely independent of the size
of the corpus. Here, we present our string-matching kernel for use in the
Compute Unified Device Architecture, which executes parallelized searching
of a suffix tree for finding exact matches for a set of query strings. We
compare our GPGPU suffix tree search to a serial CPU version of the algorithm,
and analogous components of the widely used CPU program MUMmer, and explore
issues associated with storing a suffix tree in a graphics card’s memory,
and data distribution among the GPU’s processing units.
Paper
Powerpoint Presentation (PDF)
Source Code
Sample Data
|