BlastReduce: High Performance Short Read Mapping with MapReduce
Next-generation DNA sequencing machines generate sequence data at an
unprecedented rate, but traditional single-processor sequence alignment
algorithms are struggling to keep pace with them. BlastReduce is a new
parallel read mapping algorithm optimized for aligning sequence data from
those machines to reference genomes, for use in a variety of biological
analyses, including SNP discovery, genotyping, and personal genomics. It is
modeled after the widely used BLAST sequence alignment algorithm, but uses the
open-source Hadoop implementation of MapReduce to parallelize execution to
multiple compute nodes. To evaluate its performance, BlastReduce was used to
map next generation sequence data to a reference bacterial genome in a variety
of configurations. The results show BlastReduce scales linearly for the number
of sequences processed, and with high speedup as the number of processors
increases. Furthermore, BlastReduce is fully compatible with cloud
computing, and can be easily executed on massively parallel remote resources
to meet peak demand. BlastReduce is available open-source at:
http://www.cbcb.umd.edu/software/blastreduce/.
Source Code coming soon
|