Mug shot

I am a Research Associate in the Biostatistics department at the Johns Hopkins Bloomberg School of Public Health. I work at the intersection of computer science and genomics, especially genomics as it relates to second-generation DNA sequencing (See: 1, 2). Second-generation DNA sequencing instruments are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data, enough to cover the human genome hundreds of times over, in about a week for a few thousand dollars (see 1, 2). Consequently, sequencing is now a common tool in the study of molecular biology, genetics, and human disease.

But with these developments comes a problem: growth in per-sequencer throughput, currently increasing at about 4-fold per year, is outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power (see 1, 2). My goal is to address these computational problems using approaches from computer science, including from text indexing, approximate string matching, and data-intensive computing.

At Johns Hopkins, I collaborate with biologists, biostatisticians, and other computer scientists to develop efficient methods for analyzing second-generation sequencing data.

I defended my Ph.D. at the University of Maryland Department of Computer Science in February 2012 and officially graduate in May. My Ph.D. advisor is Steven L. Salzberg. I also received an M.Sc. from University of Maryland in 2009, co-advised by Mihai Pop and Steven L. Salzberg. As a graduate student, I leveraged my background in high-performance software and hardware into research toward high-performance solutions for contemporary genome-sequence analysis problems, especially the short read alignment problem. For my Master's thesis, I wrote and released an extremely efficient short read aligner, Bowtie, in collaboration with fellow CBCB graduate student Cole Trapnell and others. I also explored ways of adapting Bowtie and other algorithms to a cloud computing environment. That work culminated in Crossbow, a scalable software pipeline for whole genome resequencing analysis capable of running automatically on clusters rented from the Amazon AWS cloud computing service. Crossbow was created in collaboration with fellow student Mike Schatz and others.

I received my B.A. in Computer Science from Columbia University in June 2003. I then worked full-time for more than 4 years at Reservoir Labs, a small Computer Science R&D consulting firm in New York. While there I worked for a diverse set of clients contributing expertise in high-performance software and compilers.