I am a Research Associate in the Biostatistics department at the Johns Hopkins Bloomberg School of Public Health. I work at the intersection of computer science and genomics, especially genomics as it relates to second-generation DNA sequencing (See: 1, 2). Second-generation DNA sequencing instruments are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data, enough to cover the human genome hundreds of times over, in about a week for a few thousand dollars (see 1, 2). Consequently, sequencing is now a common tool in the study of molecular biology, genetics, and human disease.
But with these developments comes a problem: growth in per-sequencer throughput, currently increasing at about 4-fold per year, is outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power (see 1, 2). My goal is to address these computational problems using approaches from computer science, including from text indexing, approximate string matching, and data-intensive computing.
At Johns Hopkins, I collaborate with biologists, biostatisticians, and other computer scientists to develop efficient methods for analyzing second-generation sequencing data.
I defended my Ph.D. at the
University of Maryland
Department of Computer Science
in February 2012 and officially graduate in May.
My Ph.D. advisor is
Steven L. Salzberg.
I also received an M.Sc. from
University of Maryland in 2009, co-advised by
Mihai Pop and
Steven L. Salzberg.
As a graduate student, I leveraged my background in high-performance
software and hardware into research toward high-performance solutions
for contemporary genome-sequence analysis problems, especially the
short read alignment problem. For my
Master's thesis,
I wrote and
released an extremely efficient short read aligner,
Bowtie,
in collaboration with fellow CBCB graduate student
Cole Trapnell and others. I
also explored ways of adapting Bowtie and other algorithms to a
cloud computing environment. That work culminated in
Crossbow,
a scalable software pipeline for whole genome resequencing analysis
capable of running automatically on clusters rented from the
Amazon AWS cloud computing service.
Crossbow was created in collaboration with fellow student
Mike Schatz
and others.
I received my B.A. in Computer Science from Columbia University in June 2003. I then worked full-time for more than 4 years at Reservoir Labs, a small Computer Science R&D consulting firm in New York. While there I worked for a diverse set of clients contributing expertise in high-performance software and compilers.

