Genome Sequencing
CBCB scientists are involved in many different genome sequencing
projects, both as principal investigators and as collaborators.
Most of the DNA sequencing projects that we participate in are based at
The Institute for Genomic Research,
a
large, highly efficient production facility. We also
collaborate with other sequencing centers, including the Broad Institute, the Washington University Genome
Sequencing Center, the DOE Joint
Genome Institute, and Agencourt Biosciences. Our
research interests include assembly, annotation, evolutionary analysis,
and other aspects of genome sequence analysis that vary with each
project. We are actively interested in seeking scientific
partners with whom to launch new genome sequencing projects.
Current Projects
Genomes we are currently working on include the influenza virus, Xanthomonas oryzae (described
below), Aedes aegyptii (the
yellow fever mosquito), Entamoeba
histolytica, Toxoplasma gondii,
Trichomonas vaginalis, Plasmodium vivax, Sphingomonas wittichii, cichlid
fish (several species), Danio rerio
(zebrafish), Drosophila (several species), and the papaya plant.
Influenza
virus sequencing
This project was initiated by David Lipman (NCBI) and Steven
Salzberg (UMD) in 2004, and has grown into an international consortium
that aims to
dramatically improve the availability of influenza genomic sequence in
the public domain. We have been sequencing the complete genomes of a
large
collection of human influenza A isolates, as well as a select number of
avian and other non-human influenza strains, including highly
pathogenic H5N1. The strains are
chosen
to represent a wide geographical and chronological survey of the
influenza virus. All data are released immediately in the public
domain. The sequencing effort was originally led by a group at
The
Institute for Genomic Research (TIGR), and continues now under TIGR's
successor organization, in close collaboration with NCBI, which
maintains an excellent influenza virus
site. The project has been funded from the beginning by NIAID,
which maintains a site describing milestones and with information
for those who want to contribute flu samples.
This joint project between Steven Salzberg and Vincent
Lee is one of our first efforts to sequence new bacterial genomes
entirely with very short read technology. We have generated over
8.5 million reads using Solexa sequencing technology, with a read
length of 33bp each. Assembly and annotation are ongoing.
Our first target is a clinical isolate of P. aeruginosa, a significant human
pathogen that is a major cause of infection in cystic fibrosis
patients. Our plans include the sequencing of many more strains.
This joint project between Adam Bogdonave at Iowa State and Steven
Salzberg at the University of Maryland is sequencing to completion the
genomes of three Xanthomonas pathovars: X. oryzae oryzicola (Xoc), X. campestris armoraciae (Xca), and
X. oryzae oryzae (Xoo).
Xanthomonas oryzae is a Gram-negative bacterium and is the causative
agent of bacterial blight on rice. Bacterial blight is a major disease
in tropical Asian countries where high-yielding rice cultivars are
often highly susceptible to it. It is a vascular disease resulting in
tannish-gray to white lesions along the leaf veins. In severely
infested fields, bacterial blight can cause yield losses up to
50%. For more information see the Bogdanove
lab website and the Xanthomonas
genomics page at Iowa State.
Dumpster-diving for genomes
Recently we have discovered that the "raw" sequence data at the NCBI
Trace Archive sometimes contains previously undetected genomes.
These genomes are endosymbionts that live inside the cells of other
organisms, and their genomes are "accidentally" sequenced when the host
organism is sequenced. For example, we recently found the genomes
of three brand-new species of the endosymbiotic bacterium Wolbachia lurking in the genomes of
three Drosophila species. (See our paper on this
finding.) We have found traces of bacteria in other species
as well, and will continue to scan the Trace Archive for new species as
that repository grows.
Environmental sequencing (Metagenomics)
Metagenomics
or environmental sequencing is
a new field of research in which scientists analyze the genomes of
organisms recovered directly from the environment. Most naturally
occuring bacteria cannot yet be grown in culture and therefore cannot
be analyzed
by traditional means. Metagenomic studies provide us with a
mechanism for analyzing previously unknown organisms. At the same
time we can examine the diversity of organisms present in specific
environments as well as analyze the complex interactions between
members of a specific environment. While most metagenomic studies
to date have concentrated on bacterial populations, it is important to
note that viral and fungal populations are also of significant
scientific interest.
We have also been analyzing the
bacterial populations present within the human gastrointestinal (GI)
tract. In a recent publication in Science
(see below), a collaboration with The
Institute for Genomic Research, we
have sequenced and assembled the bacterial populations from two healthy
human subjects, in an attempt to understand not only the variety of GI
bacteria, but also the differences in bacterial populations between
different individuals.
Selected Publications
- Whole-Genome
Analysis of Human Influenza A Virus Reveals Multiple Persistent
Lineages and Reassortment among Recent H3N2 Viruses. Edward C.
Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten
St. George, Bryan T. Grenfell, Steven L. Salzberg, Claire M. Fraser,
David J. Lipman, Jeffery K. Taubenberger. PLoS Biology 3:9 (2005),
e300. [Local
PDF copy]
- Metagenomic
Analysis of the Human Distal Gut Microbiome. Steven R.
Gill, Mihai Pop, Robert T. DeBoy, Paul B. Eckburg, Peter J.
Turnbaugh, Buck S. Samuel, Jeffrey I. Gordon, David A. Relman, Claire
M. Fraser-Liggett, Karen E. Nelson. Science 312:5778 (2006),
1355-1359.
- Comparative
Genomics of Trypanosomatid Parasitic Protozoa. Najib M.
El-Sayed et al. Science 309:5733 (2005), 404-409.
- Serendipitous discovery
of Wolbachia genomes in multiple Drosophila species.
(local
PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop,
D.R. Smith, M.B. Eisen, and W.C. Nelson. Genome
Biology 2005, 6:R23.
- Genomic
insights into methanotrophy: the complete genome sequence
of Methylococcus capsulatus (Bath). N. Ward, et al., PLoS Biology 10:2 (2004), e303.
- Comparative
genome sequencing for discovery of novel polymorphisms in Bacillus
anthracis. T.D. Read, S.L. Salzberg, M. Pop, M. Shumway,
L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp,
D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002),
2028-2033.
- Genome
sequence of the human malaria parasite Plasmodium falciparum.
M.J. Gardner et al., Nature 419 (2002), 498-511.
|