Genome Projects
CBCB scientists are involved in many different genome sequencing
projects, both as principal investigators and as collaborators.
We collaborate with large-scale sequencing centers, including the Broad Institute, the Washington University Genome
Sequencing Center, the DOE Joint
Genome Institute, and with a number of smaller centers using
next-generation sequencing technology. Many of us worked
previously at
The Institute for Genomic Research,
a
large, highly efficient genome center that existed from
1992-2006. Our
research interests include assembly, annotation, evolutionary analysis,
and other aspects of genome sequence analysis that vary with each
project. We are actively interested in seeking scientific
partners with whom to launch new genome sequencing projects.
Current and recent projects
Genomes we are currently working on include the domestic cow, Bos taurus; the influenza virus; Xanthomonas oryzae; the papaya
plant, Carica papaya; the
anthrax bacterium, Bacillus
anthracis; Aedes aegyptii (the
yellow fever mosquito); Entamoeba
histolytica; Toxoplasma gondii;
Trichomonas vaginalis; Plasmodium vivax; Sphingomonas wittichii; cichlid
fish (several species), Drosophila
(several species), and several Wolbachia
endosymbionts.
Influenza
virus sequencing
This project was initiated by David Lipman (NCBI) and Steven
Salzberg (UMD) in 2004, and grew into an international consortium
that continues to
dramatically improve the availability of influenza genomic sequence in
the public domain. The project has been sequencing the complete genomes
of a
large
collection of human influenza A isolates, as well as a select number of
avian and other non-human influenza strains, including highly
pathogenic H5N1. The strains are
chosen
to represent a wide geographical and chronological survey of the
influenza virus. All data are released immediately in the public
domain. The sequencing effort was originally led by a group at
The
Institute for Genomic Research (TIGR), and continues now under TIGR's
successor organization, in close collaboration with NCBI, which
maintains an excellent influenza virus
site. The project has been funded from the beginning by NIAID,
which maintains a site describing milestones and with information
for those who want to contribute flu samples.
The sequencing of the domestic cow was led by the Baylor Human
Genome Sequencing Center, which began the project in 2003 and
completed sequencing in 2005. Working closely with members of the
cow research community, we re-assembled the genome using data from
Baylor and other centers, and released our first re-assembly in early
2008. Our most recent assembly (as of March 2009) is UMD2.0,
which we released in November 2008 and is available
on our ftp site here, for download. (A publication is
pending.) We also provide a BLAST
service for the Bos taurus
genome, through which anyone can search the assembly for genes and
other regions of interest.
This joint project between Steven Salzberg and Vincent
Lee is one of our first efforts to sequence new bacterial genomes
entirely with very short read technology. We have generated over
8.5 million reads using Solexa sequencing technology, with a read
length of 33bp each. Assembly and annotation are ongoing.
Our first target is a clinical isolate of P. aeruginosa, a significant human
pathogen that is a major cause of infection in cystic fibrosis
patients. Our plans include the sequencing of many more strains.
Bacillus anthracis
(anthrax)
Following the anthrax attacks through the U.S. mail in 2001,
researchers at TIGR, including several scientists now at CBCB, were
asked to sequence the genome of the Bacillus
anthracis used in those attacks. We published our
first analysis of the attack strain in Science in 2002, and we
continued to work with the FBI to sequence different isolates from
several different sources, including the letters mailed to the offices
of U.S. Senators Tom Daschle and Patrick Leahy. Most of the
investigation remained secret until mid-2008, but since then the
requirements for secrecy have been lifted. As a result, we
recently (2009) published
the complete genome sequence of the Ames "ancestor" strain, the
original source of the anthrax bacteria deposited at Ft. Detrick in
1981. Analyses of other samples from the attacks will be
published in the near future.
This joint project between Adam Bogdonave at Iowa State and Steven
Salzberg at the University of Maryland has sequenced to completion the
genomes of three Xanthomonas pathovars: X. oryzae oryzicola (Xoc), X. campestris armoraciae (Xca), and
X. oryzae oryzae (Xoo).
Xanthomonas oryzae is a Gram-negative bacterium and is the causative
agent of bacterial blight on rice. Bacterial blight is a major disease
in tropical Asian countries where high-yielding rice cultivars are
often highly susceptible to it. It is a vascular disease resulting in
tannish-gray to white lesions along the leaf veins. In severely
infested fields, bacterial blight can cause yield losses up to
50%. For more information see the Bogdanove
lab website and the Xanthomonas
genomics page at Iowa State. The Xoo genome paper was
published in 2008 in BMC Genomics. and the Xoc and Xca genomes
should appear soon. All genomes have been available in GenBank,
with no restrictions, since their completion.
Dumpster-diving for genomes
In 2005, we discovered that the "raw" sequence data at the NCBI
Trace Archive sometimes contains previously undetected genomes.
These genomes are endosymbionts that live inside the cells of other
organisms, and their genomes are "accidentally" sequenced when the host
organism is sequenced. For example, we found the genomes
of three brand-new species of the endosymbiotic bacterium Wolbachia lurking in the genomes of
three Drosophila species. (See our paper on this
finding.) We have found traces of bacteria in other species
as well, and will continue to scan the Trace Archive for new species as
that repository grows. More recently we assembled the
near-complete genome of the Wolbachia endosymbiont of Culex
quinquefasciatus JHB, which we
published in J. Bacteriology in 2009.
Environmental sequencing (Metagenomics)
Metagenomics
or environmental sequencing is
a new field of research in which scientists analyze the genomes of
organisms recovered directly from the environment. Most naturally
occuring bacteria cannot yet be grown in culture and therefore cannot
be analyzed
by traditional means. Metagenomic studies provide us with a
mechanism for analyzing previously unknown organisms. At the same
time we can examine the diversity of organisms present in specific
environments as well as analyze the complex interactions between
members of a specific environment. While most metagenomic studies
to date have concentrated on bacterial populations, it is important to
note that viral and fungal populations are also of significant
scientific interest.
We have also been analyzing the
bacterial populations present within the human gastrointestinal (GI)
tract. In a 2006 publication in Science
(see below), a collaboration with TIGR,
we sequenced and assembled the bacterial populations from two healthy
human subjects, in an attempt to understand not only the variety of GI
bacteria, but also the differences in bacterial populations between
different individuals.
Selected Publications
- Whole-Genome
Analysis of Human Influenza A Virus Reveals Multiple Persistent
Lineages and Reassortment among Recent H3N2 Viruses. Edward C.
Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten
St. George, Bryan T. Grenfell, Steven L. Salzberg, Claire M. Fraser,
David J. Lipman, Jeffery K. Taubenberger. PLoS Biology 3:9 (2005),
e300. [Local
PDF copy]
- Metagenomic
Analysis of the Human Distal Gut Microbiome. Steven R.
Gill, Mihai Pop, Robert T. DeBoy, Paul B. Eckburg, Peter J.
Turnbaugh, Buck S. Samuel, Jeffrey I. Gordon, David A. Relman, Claire
M. Fraser-Liggett, Karen E. Nelson. Science 312:5778 (2006),
1355-1359.
- Comparative
Genomics of Trypanosomatid Parasitic Protozoa. Najib M.
El-Sayed et al. Science 309:5733 (2005), 404-409.
- Serendipitous discovery
of Wolbachia genomes in multiple Drosophila species.
(local
PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop,
D.R. Smith, M.B. Eisen, and W.C. Nelson. Genome
Biology 2005, 6:R23.
- Genomic
insights into methanotrophy: the complete genome sequence
of Methylococcus capsulatus (Bath). N. Ward, et al., PLoS Biology 10:2 (2004), e303.
- Comparative
genome sequencing for discovery of novel polymorphisms in Bacillus
anthracis. T.D. Read, S.L. Salzberg, M. Pop, M. Shumway,
L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp,
D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002),
2028-2033.
- Genome
sequence of the human malaria parasite Plasmodium falciparum.
M.J. Gardner et al., Nature 419 (2002), 498-511.
|