Genome Projects

CBCB scientists are involved in many different genome sequencing projects, both as principal investigators and as collaborators.  We collaborate with large-scale sequencing centers, including the Broad Institute, the Washington University Genome Sequencing Center, the DOE Joint Genome Institute, and with a number of smaller centers using next-generation sequencing technology.  Many of us worked previously at The Institute for Genomic Research, a large, highly efficient genome center that existed from 1992-2006.   Our research interests include assembly, annotation, evolutionary analysis, and other aspects of genome sequence analysis that vary with each project.  We are actively interested in seeking scientific partners with whom to launch new genome sequencing projects.

Current and recent projects

Genomes we are currently working on include the domestic cow, Bos taurus; the influenza virus; Xanthomonas oryzae; the papaya plant, Carica papaya; the anthrax bacterium, Bacillus anthracis; Aedes aegyptii (the yellow fever mosquito); Entamoeba histolytica; Toxoplasma gondii; Trichomonas vaginalis; Plasmodium vivax; Sphingomonas wittichii; cichlid fish (several species), Drosophila (several species), and several Wolbachia endosymbionts.


Influenza virus sequencing

This project was initiated by David Lipman (NCBI) and Steven Salzberg (UMD) in 2004, and grew into an international consortium that continues to dramatically improve the availability of influenza genomic sequence in the public domain. The project has been sequencing the complete genomes of a large collection of human influenza A isolates, as well as a select number of avian and other non-human influenza strains, including highly pathogenic H5N1. The strains are chosen to represent a wide geographical and chronological survey of the influenza virus. All data are released immediately in the public domain.  The sequencing effort was originally led by a group at The Institute for Genomic Research (TIGR), and continues now under TIGR's successor organization, in close collaboration with NCBI, which maintains an excellent influenza virus site.  The project has been funded from the beginning by NIAID, which maintains a site describing milestones and with information for those who want to contribute flu samples.

The domestic cow, Bos taurus

The sequencing of the domestic cow was led by the Baylor Human Genome Sequencing Center, which began the project in 2003 and completed sequencing in 2005.  Working closely with members of the cow research community, we re-assembled the genome using data from Baylor and other centers, and released our first re-assembly in early 2008.  Our most recent assembly (as of March 2009) is UMD2.0, which we released in November 2008 and is available on our ftp site here, for download.  (A publication is pending.) We also provide a BLAST service for the Bos taurus genome, through which anyone can search the assembly for genes and other regions of interest.

Pseudomonas aeruginosa and short-read sequencing

This joint project between Steven Salzberg and Vincent Lee is one of our first efforts to sequence new bacterial genomes entirely with very short read technology.  We have generated over 8.5 million reads using Solexa sequencing technology, with a read length of 33bp each.  Assembly and annotation are ongoing.  Our first target is a clinical isolate of P. aeruginosa, a significant human pathogen that is a major cause of infection in cystic fibrosis patients.  Our plans include the sequencing of many more strains.

Bacillus anthracis (anthrax)

Following the anthrax attacks through the U.S. mail in 2001, researchers at TIGR, including several scientists now at CBCB, were asked to sequence the genome of the Bacillus anthracis used in those attacks.  We published our first analysis of the attack strain in Science in 2002, and we continued to work with the FBI to sequence different isolates from several different sources, including the letters mailed to the offices of U.S. Senators Tom Daschle and Patrick Leahy.  Most of the investigation remained secret until mid-2008, but since then the requirements for secrecy have been lifted.  As a result, we recently (2009) published the complete genome sequence of the Ames "ancestor" strain, the original source of the anthrax bacteria deposited at Ft. Detrick in 1981.  Analyses of other samples from the attacks will be published in the near future.

Xanthomonas oryzae and related species

This joint project between Adam Bogdonave at Iowa State and Steven Salzberg at the University of Maryland has sequenced to completion the genomes of three Xanthomonas pathovars: X. oryzae oryzicola (Xoc), X. campestris armoraciae (Xca), and X. oryzae oryzae (Xoo).  Xanthomonas oryzae is a Gram-negative bacterium and is the causative agent of bacterial blight on rice. Bacterial blight is a major disease in tropical Asian countries where high-yielding rice cultivars are often highly susceptible to it. It is a vascular disease resulting in tannish-gray to white lesions along the leaf veins. In severely infested fields, bacterial blight can cause yield losses up to 50%.  For more information see the Bogdanove lab website and the Xanthomonas genomics page at Iowa State.  The Xoo genome paper was published in 2008 in BMC Genomics. and the Xoc and Xca genomes should appear soon.  All genomes have been available in GenBank, with no restrictions, since their completion.

Dumpster-diving for genomes

In 2005, we discovered that the "raw" sequence data at the NCBI Trace Archive sometimes contains previously undetected genomes.  These genomes are endosymbionts that live inside the cells of other organisms, and their genomes are "accidentally" sequenced when the host organism is sequenced.  For example, we found the genomes of three brand-new species of the endosymbiotic bacterium Wolbachia lurking in the genomes of three Drosophila species. (See our paper on this finding.)  We have found traces of bacteria in other species as well, and will continue to scan the Trace Archive for new species as that repository grows.  More recently we assembled the near-complete genome of the Wolbachia endosymbiont of Culex quinquefasciatus JHB, which we published in J. Bacteriology in 2009.

Environmental sequencing (Metagenomics)

Metagenomics or environmental sequencing is a new field of research in which scientists analyze the genomes of organisms recovered directly from the environment.  Most naturally occuring bacteria cannot yet be grown in culture and therefore cannot be analyzed by traditional means.  Metagenomic studies provide us with a mechanism for analyzing previously unknown organisms. At the same time we can examine the diversity of organisms present in specific environments as well as analyze the complex interactions between members of a specific environment. While most metagenomic studies to date have concentrated on bacterial populations, it is important to note that viral and fungal populations are also of significant scientific interest.

We have also been analyzing the bacterial populations present within the human gastrointestinal (GI) tract. In a 2006 publication in Science (see below), a collaboration with TIGR, we sequenced and assembled the bacterial populations from two healthy human subjects, in an attempt to understand not only the variety of GI bacteria, but also the differences in bacterial populations between different individuals.

Selected Publications