I am an Assistant Professor in the Computer Science department in the Whiting School of Engineering at Johns Hopkins University. I am jointly appointed in the Department of Biostatistics in the Bloomberg School of Public Health, and I am affiliated with the McKusick-Nathans Institute of Genetic Medicine.
DNA sequencers are improving rapidly and are now capable of generating enough data to cover the human genome dozens of times over in about a week. Consequently, sequencing is now a ubiquitous tool in the study of biology, genetics and disease. But because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, power.
My goal is to make all types of high-throughput biological data, especially sequencing data, easy to analyze and interpret. I use approaches from computer science -- algorithms, text indexing, and high performance computing, especially cloud computing -- to create high-impact software tools (see sidebar) benefiting the wide community of scientists who rely on these data for their research. Because it is reasonable to abstract DNA and RNA sequences as strings, and because most interesting questions in genomics ultimately boil down to questions of sequence similarity and statistics, computer scientists are uniquely positioned to drive biology and genomics research forward.
At Johns Hopkins University, I collaborate with biologists, biostatisticians, and other computer scientists to develop efficient methods for analyzing second-generation sequencing data.