Genome Sequencing

CBCB scientists are involved in many different genome sequencing projects, both as principal investigators and as collaborators.  Most of the DNA sequencing projects that we participate in are based at The Institute for Genomic Research, a large, highly efficient production facility.   We also collaborate with other sequencing centers, including the Broad Institute, the Washington University Genome Sequencing Center, the DOE Joint Genome Institute, and Agencourt Biosciences.  Our research interests include assembly, annotation, evolutionary analysis, and other aspects of genome sequence analysis that vary with each project.  We are actively interested in seeking scientific partners with whom to launch new genome sequencing projects.

Current Projects

Genomes we are currently working on include the influenza virus, Xanthomonas oryzae (described below), Aedes aegyptii (the yellow fever mosquito), Entamoeba histolytica, Toxoplasma gondii, Trichomonas vaginalis, Plasmodium vivax, Sphingomonas wittichii, cichlid fish (several species), Danio rerio (zebrafish), Drosophila (several species), and the papaya plant.


Influenza virus sequencing

This project was initiated by David Lipman (NCBI) and Steven Salzberg (UMD) in 2004, and has grown into an international consortium that aims to dramatically improve the availability of influenza genomic sequence in the public domain. We have been sequencing the complete genomes of a large collection of human influenza A isolates, as well as a select number of avian and other non-human influenza strains, including highly pathogenic H5N1. The strains are chosen to represent a wide geographical and chronological survey of the influenza virus. All data are released immediately in the public domain.  The sequencing effort was originally led by a group at The Institute for Genomic Research (TIGR), and continues now under TIGR's successor organization, in close collaboration with NCBI, which maintains an excellent influenza virus site.  The project has been funded from the beginning by NIAID, which maintains a site describing milestones and with information for those who want to contribute flu samples.

Pseudomonas aeruginosa and short-read sequencing

This joint project between Steven Salzberg and Vincent Lee is one of our first efforts to sequence new bacterial genomes entirely with very short read technology.  We have generated over 8.5 million reads using Solexa sequencing technology, with a read length of 33bp each.  Assembly and annotation are ongoing.  Our first target is a clinical isolate of P. aeruginosa, a significant human pathogen that is a major cause of infection in cystic fibrosis patients.  Our plans include the sequencing of many more strains.

Xanthomonas oryzae and related species

This joint project between Adam Bogdonave at Iowa State and Steven Salzberg at the University of Maryland is sequencing to completion the genomes of three Xanthomonas pathovars: X. oryzae oryzicola (Xoc), X. campestris armoraciae (Xca), and X. oryzae oryzae (Xoo).  Xanthomonas oryzae is a Gram-negative bacterium and is the causative agent of bacterial blight on rice. Bacterial blight is a major disease in tropical Asian countries where high-yielding rice cultivars are often highly susceptible to it. It is a vascular disease resulting in tannish-gray to white lesions along the leaf veins. In severely infested fields, bacterial blight can cause yield losses up to 50%.  For more information see the Bogdanove lab website and the Xanthomonas genomics page at Iowa State.

Dumpster-diving for genomes

Recently we have discovered that the "raw" sequence data at the NCBI Trace Archive sometimes contains previously undetected genomes.  These genomes are endosymbionts that live inside the cells of other organisms, and their genomes are "accidentally" sequenced when the host organism is sequenced.  For example, we recently found the genomes of three brand-new species of the endosymbiotic bacterium Wolbachia lurking in the genomes of three Drosophila species. (See our paper on this finding.)  We have found traces of bacteria in other species as well, and will continue to scan the Trace Archive for new species as that repository grows.

Environmental sequencing (Metagenomics)

Metagenomics or environmental sequencing is a new field of research in which scientists analyze the genomes of organisms recovered directly from the environment.  Most naturally occuring bacteria cannot yet be grown in culture and therefore cannot be analyzed by traditional means.  Metagenomic studies provide us with a mechanism for analyzing previously unknown organisms. At the same time we can examine the diversity of organisms present in specific environments as well as analyze the complex interactions between members of a specific environment. While most metagenomic studies to date have concentrated on bacterial populations, it is important to note that viral and fungal populations are also of significant scientific interest.

We have also been analyzing the bacterial populations present within the human gastrointestinal (GI) tract. In a recent publication in Science (see below), a collaboration with The Institute for Genomic Research, we have sequenced and assembled the bacterial populations from two healthy human subjects, in an attempt to understand not only the variety of GI bacteria, but also the differences in bacterial populations between different individuals.

Selected Publications