CBCB Seminar Series

Fall 2009


2:00 p.m. Thursday, Sept. 10, 2009 - two talks

Venue: Biomolecular Science Building Room 3118

Title: Searching for Genes in Novel Genomes.
By:
Brona Brejova, Department of Computer Science, Comenius University, Slovakia
Abstact: New rapid sequencing methods now allow affordable sequencing of previously unexplored genomes. The gene prediction in these novel genomes is difficult due to the lack of reliable training data necessary for adjusting parameters of models used for this task.

We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of the newly sequenced Schistosoma japonicum draft genome. Our new gene set provides a first glimpse at a gene complement of a flatworm (phylum platyhelmintes).

Joint work with Tomas Vinar, Dan Brown, Ming Li, and Yan Zhou.

Title: Evolutionary Histories of Gene Clusters in Primates.
By:
Tomas Vinar, Dept. of Applied Informatics, Comenius University in Bratislava
Abstact: Approximately 5% of the human genome is composed of complex gene clusters that arose by repeated segmental duplications. These regions are hot spots of evolutionary innovation and contain many biomedically important gene families. We propose that these gene clusters should be analyzed in the context of their duplication histories that allow construction of accurate gene trees for the purpose of comparative genomic analysis, enable analysis of chimeric genes and promoter regions, and facilitate transfer of annotations between species.

We have developed novel methods for reconstructing the duplication histories from genomic sequences of multiple species. Our methods are based on a simple probabilistic model of evolution of gene clusters by segmental duplication, and we use MCMC sampling to infer duplication histories with high likelihood under this model.

This is a joint work with Brona Brejova (Comenius), Adam Siepel (Cornell), Webb Miller (Penn State U.), and Eric Green (NHGRI).

2:00 p.m. Thursday, Oct. 8, 2009

Title: Chasing Change: Primate Centromere Evolution

By:
Mary Schueler,  National Human Genome Research Institute, National Institutes of Health

Venue: Biomolecular Science Building Room 3118

Abstact: Rapid evolution is a hallmark of centromeric DNA in eukaryotic genomes. The centromere has a conserved functional role mediated by the kinetochore protein complex in all species. We performed comparative mapping and sequencing of centromeric regions and the genomic loci of three foundation kinetochore proteins – Centromere Proteins A, B, and C - to gain a detailed view of the evolutionary events that have shaped the primate centromere.

A Histone H3 variant, Centromere Protein A (CENP-A), is the foundation of the centromere-specific nucleosome. Comparative sequence analyses involving 14 primate species has, for the first time, identified amino acid residues within both the histone fold domain and the N-terminal tail that are under strong positive selection in the primate lineage. Similar comparative analyses of CENP-B, a kinetochore protein with a specific binding site within alpha-satellite DNA, somewhat surprisingly, do not show signs of positive selection. However, CENP-C, another foundation protein essential for centromere function, is under strong positive selection. Residues under selection are found throughout the protein, including several in the centromere-localization and DNA-binding regions.

A model of progressive proximal expansion of alpha-satellite DNA at the primate X centromere predicts that older alpha satellite lacking higher-order structure lies adjacent to the chromosome arms, while regions of more recently evolved alpha satellite flank the higher-order alpha-satellite arrays. Comparative mapping and sequencing of these regions confirms this predicted organization in six primates. Our ongoing additional comparative genomic studies should further develop this model of centromere evolution, and provide the reagents necessary for testing a correlation between evolution of kinetochore proteins and centromeric DNA.

2:00 p.m. Thursday, Oct. 15, 2009

Title: Computational Techniques for Inferring Phylogenetic Relationships Using Multiple Loci.

By:
Luay Nakhleh, Department of Computer Science, Rice University

Venue: Biomolecular Science Building Room 3118

Abstact: Accurate inference of phylogenetic relationships of species, and understanding their relationships with gene trees are two central themes in molecular and evolutionary biology. Traditionally, a species tree is inferred by (1) sequencing a genomic region of interest from the group of species under study, (2) reconstructing its evolutionary history, and (3) declaring it to be the estimate of the species tree. However, recent analyses of increasingly available multi-locus data from various groups of organisms have demonstrated that different genomic regions may have evolutionary histories (called “gene trees”) that may disagree with each other, as well as with that of the species. This observation has called into question the suitability of the traditional approach to species tree inference. Further, when some, or all, of these disagreements are caused by reticulate evolutionary events, such as hybridization, then the phylogenetic relationship of the species is more appropriately modeled by a phylogenetic network than a tree. As a result, a new, post-genomic paradigm has emerged, in which multiple genomic regions are analyzed simultaneously, and their evolutionary histories are reconciled in order to infer the evolutionary history of the species, which may not necessarily be treelike.

In this talk, I will describe our recent work on developing mathematical criteria and algorithmic techniques for analyzing incongruence among gene trees, and inferring phylogenetic relationships among species despite such incongruence. This includes work on lineage sorting, reticulate evolution, as well as simultaneous treatment of both.

Speaker BIO: Luay Nakhleh is an Assistant Professor of Computer Science and Biochemistry and Cell Biology at Rice University. He received the B.Sc. degree from the Technion, Israel Institute of Technology, in 1996, the Master’s degree from Texas A&M University in 1998, and the PhD degree from the University of Texas at Austin in 2004all three degrees in Computer Science. His research interests fall in the general areas of computational biology and bioinformatics; in particular, he works on computational phylogenomics and its connection with other fields in biology. Luay has published over 50 manuscripts on his work, supervised the dissertations of two recent PhD graduates, and currently supervises the dissertations of 6 PhD students. Luay has received several awards, including the Texas Excellent Teaching Award from UT Austin in 2001, the Outstanding Dissertation Award from UT Austin in 2005, the Roy E. Campbell Faculty Development Award from Rice University in 2006, the DOE Early Career Award in 2006, the NSF CAREER Award in 2009, and the Phi Beta Kappa Teaching Prize in 2009.

11:00 a.m. Thursday, Oct. 22, 2009

Title: Finding the trees in Darwin's forest.
By:
Robert K. Bradley, Massachusetts Institute of Technology
Venue: Biomolecular Science Building Room 3118
Abstact: TBA

2:00 p.m. Thursday, Dec. 17, 2009

Title: "Genetic analysis of O-repeat biosynthesis in Neisseria sicca 4320"
By:
Clinton Miller
Venue: Biomolecular Science Building Room 3118
Abstact:

One of the important virulence determinants found in pathogenic Neisseria is lipooligosaccharide (LOS). LOS differs from lipopolysaccharide (LPS) in that it lacks the o-repeat characteristic of LPS. LOS has been shown to be important for invasion, host immune evasion, and bacterial attachment to host tissue. A great diversity of structures is found both within pathogenic species and commensal species. Variations between strains are mediated by changes in biosynthetic gene clusters. Variation within a strain is mediated by changes in the expression state of genes. While the genetic basis of LOS production in the pathogenic Neisseria has been extensively studied, little research has focused on the genetics underlying LPS/LOS production and resulting diversity in commensal Neisseria. A commensal strain that caused a fatal case of bacterial endocarditis, Neisseria sicca 4320, was found to produce a novel o-repeat structure in addition to the typical Neisserial LOS. The genome of N. sicca 4320 was sequenced and analyzed to identify genes possibly involved in the synthesis of the o-repeat structure. The identified genes were cloned and prepared for inactivation in order to generate knockout mutants.