I am an Assistant Professor in the Computer Science department in the Whiting School of Engineering at Johns Hopkins University. I am jointly appointed in the Department of Biostatistics in the Bloomberg School of Public Health, and I am affiliated with the McKusick-Nathans Institute of Genetic Medicine.
High-throughput life science instruments, especially DNA sequencers are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in about a week. Consequently, sequencing is now a ubiquitous tool in the study of biology, genetics and disease. But because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, power.
My laboratory's goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal in three ways:
- We develop methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: Bowtie, Bowtie 2.
- We develop scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: Crossbow, Myrna, ReCount.
- We work on making output from our software as interpretable as possible, meaning that when a life scientist looks at a result, he or she should be able to understand what conclusion was reached, why it was reached, and what degree of confidence to have in it.
We use approaches from computer science -- algorithms, text indexing, and high performance computing, especially cloud computing -- and from statistics to create high-impact software tools (see sidebar) benefiting the wide community of life scientists who rely on these data for their research.