Mihai's Research
Below is a list of several ongoing projects in
my lab. Note that not all projects are necessarily active or current.
They all represent interest research of mine and work in these areas
depends on the availability of funding, time, and "able bodies".
For software packages developed as part of this research see our
Software page.
Some of these
projects provide opportunities undergraduate research, either as
summer projects or as part of the CS honors program. For more
information on how to apply for such research opportunities see our
Undergraduate
Programs page.
- Metagenomics - analysis of entire
microbial communities
As sequencing technologies are becoming
cheaper and generating more data, it has become feasible to apply
sequencing beyond the study of single organisms. This has led to the
creation of a new scientific field, called metagenomics, that
involves large-scale sequencing of entire microbial communities. For
more information on this field see our Metagenomics
page.
Research in our lab is focused on the following problems:
Metagenomic assembly - Funded under the Human Microbiome
Project, we are developing algorithms and software tools
for assembling metagenomic data, with a particular focus on
uncovering polymorphisms between closely related organisms in
a sample.
Comparison of clinical metagenomic
datasets. We have
developed a statistical package, Metastats, that can
be used to compare clinical datasets comprising large numbers
of samples in order to identify taxonomic groups that
correlate with phenotype (e.g. disease
state).
Understanding the causes of diarrhea
in 3rd world countries. We are involved in a
collaboration with the UM School of Medicine to uncover the
causes of diarrhea in children from 3rd world countries. In
this study we will be analyzing fecal samples from hundreds of
sick children and matched controls.
Systems biology. The ultimate goal of our research is
to develop ways to analyze the interactions between organisms
within an environment or between the environment and the human
host. We would like to develop predictive models that allow us
to explore, for example, how antibiotics will affect the
normal gut flora.
- Extracting
information from genome assembly graphs
We are
developing algorithm for extracting information about
interesting biological phenomena (such as genome variation)
from genome assembly graphs. This work is related to the
metagenomics research described above but also has
applications to the study of polymorphic organisms.
- Applications
of Cloud Computing to Bioinformatics
New DNA sequencing technologies are generating large amounts
of data at significantly higher pace than possible just a few
years ago. The analysis of new generation sequencing data
poses significant computational challenges, both due to the
sheer size of the data-sets being analyzed and due to
individual characteristics of the new sequences. We are
currently conducting research to evaluate whether
highly-parallel computing clusters can be used to efficiently
analyze such data, with the goal of providing researchers with
the ability to rent CPU cycles rather than have to implement
and maintain an expensive computational infrastructure in
their labs. We are primarily focused on algorithms for
sequence alignment and for genome assembly.
- Genome assembly using new
sequencing and mapping technologies
Here we are focused
on developing new ways to leverage the characteristics of new
experimental technologies in order to reconstruct the genomes
of organisms. In particular we are working on new assembly
algorithms for short-read sequencing data (also see the Genome
Assembly with Short Reads page), and on ways to
incorporate other experimental data, such as mate-pair
information (see our scaffolder Bambus) or
optical mapping data (see Scaffolding with Optical
Maps).
- Understanding antibiotic
resistance
The ability of microorganisms to acquire
resistance to antibiotics is a significant threat to human health.
Surprisingly, we found that little information on antibiotic
resistance was available in a curated, computer-readable format. To
enable computational analyses of antibiotic resistance in both
isolate genomes and metagenomes, we created a database of all
information we could easily extract from literature and other public
databases - ARDB. This
database is freely available to all scientists both through the web
as well as a flat-file download from
ftp://ftp.cbcb.umd.edu/pub/data/ARDB.
- Prokaryotic genome
annotation
Together with colleagues at the NMRC we have
developed a modular prokaryotic annotation pipeline, primarily for
use in various genome projects we are involved in, but also as a
framework for exploring research questions regarding the functional
annotation of genomes and metagenomes. The software is available,
open-source, from http://sourceforge.net/projects/diyg.
- Semi-automated genome
finishing
This research is an unexpected result of our
research on genome assembly and on incorporating new types of data
in the assembly process. We noticed that mate-pair information,
optical mapping data, as well as other information generated during
the genome assembly process (specifically assembly graphs) can be
used to improve genome assemblies (e.g. by resolving certain classes
of repeats) as well as to guide the design of experiments aimed at
finishing genomes. We have successfully applied some of these ideas
to the finishing of Aggregatibacter
aphrophilus and Vibrio harveyi, and we are
currently in the process of finishing Yersinia rohdei and
Yersinia ruckeri.
- Basic research on assembly
complexity
Research on genome assembly algorithms has been
ongoing since the late 80's, early 90's, including a fair amount of
research on the computational complexity of various related
problems. New research opportunities have been created, however, by
the emergence of new sequencing technologies and other
high-throughput experimental techniques. I am particularly
interested in research at the boundary between theory and practice,
e.g. exploring the complexity of various assembly problems when
faced with real data-sets, rather than the worst-case scenarios
usually assumed in complexity analysis. Furthermore, I am
interested in how information from sequencing and mapping
technologies can be incorporated in the assembly process, and
whether such information can simplify the assembly problem.
|