Assembly and Analysis Software for Exploring the Human Microbiome
People
Faculty: Mihai Pop, Steven Salzberg
Post-doctoral fellows:Niranjan Nagarajan, Arthur Brady
Students:Sergey Koren, Mohammad Ghodsi, Bo Liu, James White, Ted Gibbons
Funding
Our work is supported by the NIH through grant
R01-HG-004885
to Mihai Pop.
Metagenomic assembly
The main challenge in metagenomic assembly arises from the
heterogeneous nature of metagenomic data. Most environments contain
an uneven representation of the member species, and furthermore, the
organisms in the environment frequently belong to clusters of closely
related strains whose genomes are largely similar but differ due to
mobile genetic elements and point mutations. These characteristics of
the data make it virtually impossible to construct a single assembly
of each organisms present in a sample, instead many organisms will be
under-sampled and will be assembled in a highly fragmented form, while
groups of closely related organisms will end up assembled together
into a polymorphic structure that can be modeled as a computational
graph.
We are currently exploring several approaches for analyzing and
visualizing metagenomic assembly graphs, including procedures for
graph simplification, for detection of genomic polymorphisms (work related
to our research on the analysis of genomic variation from assembly
information), and new approaches for repeat identification and resolution.
Metagenomic gene finding
Gene finding in metagenomic data-sets is complicated by the fragmented
nature of metagenomic assemblies, and by the fact that many organisms
are only poorly sampled, potentially leading to fragmentation and
frame-shifts due to high error rates. We are working on extensions of
the Glimmer gene finder to accommodate these characteristics of
metagenomic data.
Metagenomic binning
We have developed a metagenomic binning program specifically targeted
at short DNA fragments (such as reads). This program, called Phymm, uses the
Interpolated Markov Model framework from Glimmer to accurately
classify reads as short as 100bp. We are currently exploring whether
binning reads prior to assembly can improve the quality of metagenomic
analysis.
Research sub-projects
Publications
Software
- AMOS - open-source genome
assembly framework. The assembly software developed in this project
will be incorporated within AMOS.
- Phymm - metagenomic binning software.
|