Mihai's Research

Below is a list of several ongoing projects in my lab. Note that not all projects are necessarily active or current. They all represent interest research of mine and work in these areas depends on the availability of funding, time, and "able bodies". For software packages developed as part of this research see our Software page.

Some of these projects provide opportunities undergraduate research, either as summer projects or as part of the CS honors program. For more information on how to apply for such research opportunities see our Undergraduate Programs page.

  • Metagenomics - analysis of entire microbial communities
    As sequencing technologies are becoming cheaper and generating more data, it has become feasible to apply sequencing beyond the study of single organisms. This has led to the creation of a new scientific field, called metagenomics, that involves large-scale sequencing of entire microbial communities. For more information on this field see our Metagenomics page.

    Research in our lab is focused on the following problems:
    Metagenomic assembly - Funded under the Human Microbiome Project, we are developing algorithms and software tools for assembling metagenomic data, with a particular focus on uncovering polymorphisms between closely related organisms in a sample.
    Comparison of clinical metagenomic datasets. We have developed a statistical package, Metastats, that can be used to compare clinical datasets comprising large numbers of samples in order to identify taxonomic groups that correlate with phenotype (e.g. disease state).
    Understanding the causes of diarrhea in 3rd world countries. We are involved in a collaboration with the UM School of Medicine to uncover the causes of diarrhea in children from 3rd world countries. In this study we will be analyzing fecal samples from hundreds of sick children and matched controls.
    Systems biology. The ultimate goal of our research is to develop ways to analyze the interactions between organisms within an environment or between the environment and the human host. We would like to develop predictive models that allow us to explore, for example, how antibiotics will affect the normal gut flora.

  • Extracting information from genome assembly graphs
    We are developing algorithm for extracting information about interesting biological phenomena (such as genome variation) from genome assembly graphs. This work is related to the metagenomics research described above but also has applications to the study of polymorphic organisms.

  • Applications of Cloud Computing to Bioinformatics
    New DNA sequencing technologies are generating large amounts of data at significantly higher pace than possible just a few years ago. The analysis of new generation sequencing data poses significant computational challenges, both due to the sheer size of the data-sets being analyzed and due to individual characteristics of the new sequences. We are currently conducting research to evaluate whether highly-parallel computing clusters can be used to efficiently analyze such data, with the goal of providing researchers with the ability to rent CPU cycles rather than have to implement and maintain an expensive computational infrastructure in their labs. We are primarily focused on algorithms for sequence alignment and for genome assembly.

  • Genome assembly using new sequencing and mapping technologies
    Here we are focused on developing new ways to leverage the characteristics of new experimental technologies in order to reconstruct the genomes of organisms. In particular we are working on new assembly algorithms for short-read sequencing data (also see the Genome Assembly with Short Reads page), and on ways to incorporate other experimental data, such as mate-pair information (see our scaffolder Bambus) or optical mapping data (see Scaffolding with Optical Maps).

  • Understanding antibiotic resistance
    The ability of microorganisms to acquire resistance to antibiotics is a significant threat to human health. Surprisingly, we found that little information on antibiotic resistance was available in a curated, computer-readable format. To enable computational analyses of antibiotic resistance in both isolate genomes and metagenomes, we created a database of all information we could easily extract from literature and other public databases - ARDB. This database is freely available to all scientists both through the web as well as a flat-file download from ftp://ftp.cbcb.umd.edu/pub/data/ARDB.

  • Prokaryotic genome annotation
    Together with colleagues at the NMRC we have developed a modular prokaryotic annotation pipeline, primarily for use in various genome projects we are involved in, but also as a framework for exploring research questions regarding the functional annotation of genomes and metagenomes. The software is available, open-source, from http://sourceforge.net/projects/diyg.

  • Semi-automated genome finishing
    This research is an unexpected result of our research on genome assembly and on incorporating new types of data in the assembly process. We noticed that mate-pair information, optical mapping data, as well as other information generated during the genome assembly process (specifically assembly graphs) can be used to improve genome assemblies (e.g. by resolving certain classes of repeats) as well as to guide the design of experiments aimed at finishing genomes. We have successfully applied some of these ideas to the finishing of Aggregatibacter aphrophilus and Vibrio harveyi, and we are currently in the process of finishing Yersinia rohdei and Yersinia ruckeri.

  • Basic research on assembly complexity
    Research on genome assembly algorithms has been ongoing since the late 80's, early 90's, including a fair amount of research on the computational complexity of various related problems. New research opportunities have been created, however, by the emergence of new sequencing technologies and other high-throughput experimental techniques. I am particularly interested in research at the boundary between theory and practice, e.g. exploring the complexity of various assembly problems when faced with real data-sets, rather than the worst-case scenarios usually assumed in complexity analysis. Furthermore, I am interested in how information from sequencing and mapping technologies can be incorporated in the assembly process, and whether such information can simplify the assembly problem.