CBCB Seminar Series
Spring 2006
2 p.m. Thursday January 26, 2006
Title: organizational meeting
By: Stephen M. Mount, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract: To discuss the
schedule in Spring 2006.
2 p.m. Thursday February 16, 2006
Title: High-throughput Biology:
Genome Assembly and Beyond
Speaker: Mihai Pop, Ph.D.
Venue: Computer Science Instructional
Center Room 1115
Abstract:
Computers have become indispensable tools in biological research. The
increasing use of high-throughput laboratory experiments has yielded large
amounts of data that cannot be managed, let alone analyzed, without the
help of specialized software. The integration of computational methods and
mathematical analyses into biological research have led to our ability to
sequence the DNA of organisms, recognize genes, and begin to unravel the
complex interactions that define life itself.
In this presentation I will describe several examples of this close
integration between computational science and biology, primarily from my
recent work in the field of genome sequencing and assembly. The talk will
provide an overview of the biological questions being addressed and will
highlight the computational challenges underlying each specific genome
analysis task. I will then present the techniques I used to analyze the
bacteria present in the human gastrointestinal tract and will conclude
with an overview of several exciting ongoing research projects.
(This is a candidate talk.)
2 p.m. Thursday February 23, 2006
Title: Relational life science
databases: Lessons from Cognia and NIAID
Speaker: Christopher Larsen, Ph.D.
(NIAID Bioinformatics Resource Center)
Venue: Biomolecular Sciences
Building #296 Room 3118
Abstract:
Life science databases store and relate millions of bits of information.
The data ranges in scale from DNA sequence to protein, organelle, cell,
and even tissue and epidemiology. Creating them is a necessary downstream
fact of both the genomics revolution and the long history of research
publication.
Dr. Larsen's work has been aimed at integrating all sources of life
science data. It has focused on building relational structures to house
and query that information. The talk will focus on the successes and
pitfalls of the last two efforts, and also will gather guidance from other
sources involved peripheral in his work, such as Genbank, Wiley
Interscience, GO (the Gene Ontology), SwissProt, BioPerl, and others.
Focus will be on the problems to be overcome in storing and relating data,
and potential paths in the future for the field to take.
2 p.m. Thursday March 9, 2006
Title: Towards an RNA Splicing
Code
Speaker: Christopher Burge, Ph.D.
(MIT)
Venue: Computer Science Instructional
Center Room 2117
Abstract:
I will describe my lab's progress toward understanding the rules for exon
recognition by the RNA splicing machinery in mammals. Current efforts are
focused on systematic identification and characterization of sequences
that function as exonic and intronic splicing silencers (ESS, ISS) and
enhancers (ESE, ISE), using a combination of cell-based and computational
screens. The identified splicing regulatory elements are being integrated
with statistical models of the core splice site motifs into computer
algorithms that simulate RNA splicing specificity. Recently, we have
shown that ESS sequences play general roles in splice site definition at
both the 5' and 3' splice sites, and we are investigating the mechanisms
of this activity. We have also obtained evidence that ESS sequences are
likely to control alternative 5' and 3' splice site usage in many exons, a
common type of alternative splicing in mammals.
12:30 p.m. Thursday March 16, 2006
Title: Understanding Protein
Function on a Genome-scale using Networks
Speaker: Mark B. Gerstein, Ph.D. (MB&B Dept. Yale
University)
Venue: Computer Science Instructional
Center Room 1115
Abstract:
My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this through
the prediction and analysis of biological networks -- both of
protein-protein interactions and transcription-factor-target
relationships. I will describe how these networks can be determined
through Bayesian integration of many genomic features and how they can be
analyzed in terms of various simple topological statistics.
http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org
References:
A Bayesian networks approach for predicting protein-protein
interactions from genomic data. R Jansen, H Yu, D Greenbaum, Y Kluger, NJ
Krogan, S Chung, A Emili, M Snyder, JF Greenblatt, M Gerstein (2003)
Science 302: 449-53.
ExpressYourself: A modular platform for processing and visualizing
microarray data. NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT
Chang, M Snyder, M Gerstein (2003) Nucleic Acids Res 31: 3477-82.
TopNet: a tool for comparing biological sub-networks, correlating
protein properties with topological statistics. H Yu, X Zhu, D Greenbaum,
J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.
Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M
Gerstein (2004) Nature 431: 308-12.
Annotation transfer between genomes: protein-protein interologs and
protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N
Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.
2 p.m. Thursday March 30, 2006
Title: Genome Explorations:
Bizarre Bacteria, Exotic Environments, and How They Interact
Speaker: Naomi Ward, Ph.D.
(TIGR)
Venue: Biomolecular Science
Building Room 3118
Abstract:
Genomics, which explores the biology of organisms through their genetic
blueprints, has led us to revise our definitions of microbial entities,
reconsider their capabilities, and re-evaluate the microbiological toolbox
of methods and approaches. In the breadth of its influence on various
subdisciplines of microbiology (e.g., physiology, ecology, host-pathogen
relationships), and its interaction with other disciplines (e.g., human
and veterinary medicine, agriculture, evolutionary biology, structural
biology), the impact of genomics on microbiology has been enormous. Some
of the most recently emerging disciplinary interactions (those occurring
between genomics, ecology, and taxonomy) will be presented, illustrated by
examples from recent projects. These include the predicted marine
"opportunitroph" Silicibacter pomeroyi, the morphologically bizarre
Hyphomonas neptunium, and Acidobacterium capsulatum, member of a
ubiquitous but poorly understood bacterial phylum. Recent work on the
deep-sea microbial communities associated with Alaskan corals and giant
tubeworms of the Galapagos Rift will also be presented.
Some papers that may be of interest:
Moran, M. A., A. Buchan, J.M. Gonzalez, J.F. Heidelberg, J. Henriksen,
W.B. Whitman, R.P. Kiene, L. Brinkac, M. Lewis, S. Johri, B. Weaver, G.
Pai, J.A. Eisen, G. King, M.R. Belas, C. Fuqua, E. Rahe, W. Sheldon, W.
Ye, J.M. Carlton, D.A. Rasko, I.T. Paulsen, Q. Ren, S.C. Daugherty, R.T.
Deboy, R.J. Dodson, A.S. Durkin, R. Madupu, W.C. Nelson, S.A. Sullivan, M.
J. Rosovitz, D.H. Haft, J. Selengut, and N. Ward. 2004. Genome Sequence
of Silicibacter pomeroyi reveals adaptations to the marine environment.
Nature 432:910-913.
Badger, J.H., J.A. Eisen, and N. Ward. 2005. Genomic analysis of
Hyphomonas neptunium contradicts 16S rRNA-based phylogenetic analysis;
implications for the taxonomy of the orders Rhodobacterales and
Caulobacterales. International Journal of Systematic and Evolutionary
Microbiology 55:1021-6.
Ward, N., and C.M. Fraser. 2005. How genomics has affected the concept
of microbiology. Current Opinions in Microbiology. 8(5):564-71.
Ward, N. 2006. New directions and interactions in metagenomics
research. FEMS Microbiology Ecology 55:331-8.
Penn, K., D. Wu, and N. Ward. 2006. Characterization of bacterial
communities associated with deep-sea corals on Gulf of Alaska seamounts.
Applied and Environmental Microbiology 72(2):1680-3.
2 p.m. Thursday April 6, 2006
Title: Decomposition of
overlapping protein complexes: a graph theoretical method for analyzing
static and dynamic protein associations
Speaker: Elena Zotenko
Venue: Biomolecular Science
Building Room 3118
Abstract:
(joint work with Katia Guimaraes, Raja Jothi, and Teresa Przytycka)
The complexity in biological systems arises not only from various
individual protein molecules but also from their organization into systems
with numerous interacting partners. In fact, most cellular processes are
carried out by groups of proteins that associate together to perform a
specific task. Recent advances in high-throughput determination of protein
interactions have resulted not only in complete protein interaction maps
for several model organisms, such as yeast and fruit fly, but also in more
specialized protein interaction maps that include proteins involved in a
particular cellular process, such as the NF-kB signaling pathway and
cell-cycle.
Protein interactions are routinely represented as graphs or protein
interaction networks, with proteins as nodes and interactions as edges.
Even though these networks may contain inaccuracies due to experimental
errors and may not capture all the complexity of protein interactions in
an underlying biological process, the study of their topological
properties has become an important tool in searching for general
principles that govern the organization of molecular networks. In 1999,
Hartwell et al. introduced a notion of a functional module, a group of
cellular components and their interaction that can be attributed a
specific biological function. The authors also suggested the modular
organization of molecular interaction networks, where each functional
module involves a small number of cellular components and is autonomous,
i.e., its interaction with other modules is limited to a few cellular
components.
I will start my talk with an overview of computational techniques proposed
for identification and analysis of functional modules within a protein
interaction network. In the second part of my talk I will describe our
recent work on identification and representation of functional groups
within a functional module. Intuitively, if a functional module performs
a function that requires a sequence of steps (as in the case of a
signaling pathway) then functional groups are snapshots of protein
associations at these steps. The proposed representation helps in
understanding the transitions between functional groups and depending on
the nature of the network, is capable of elucidating temporal relations
between functional groups. I will conclude my talk by showing the results
of applying our method to several protein interaction networks that
underlie well studied cellular processes.
2 p.m. Thursday April 13, 2006
Title: The truly horrific tale of
the generation and analysis of the Trichomonas vaginalis genome sequence,
a sexually transmitted pathogen of humans
Speaker: Jane Carlton, Ph.D.
(TIGR)
Venue: Biomolecular Science
Building Room 3118
Abstract:
Trichomonas vaginalis, a human extracellular parasite of the urogenital
tract, is the most prevalent sexually transmitted, non-viral, parasite
found in North America, where it is responsible for approximately 5
million cases of trichomoniasis annually. In addition to its prevalence,
infection with T. vaginalis is emerging as one of the most important
cofactors in amplifying HIV transmission, and in contributing to low birth
weight, stillbirth and neonatal death. A project to sequence the genome of
T. vaginalis at TIGR was funded in 2002 by the NIAID, NIH. At 7.2-fold
coverage the genome sequence is providing insights into the parasites
extraordinary biology. More than one third of the ~160 megabase genome
consists of highly similar copies of transposable elements and repeats,
indicative of a recent genome expansion that may have occurred during the
transition of the parasite from an enteric to a urogenital environment.
Selected amplification of many gene families has occurred, including
massive amplification of genes coding for cell surface molecules predicted
to be involved in pathogenesis. An unusual pathway for cysteine
biosynthesis has been identified. Genes coding for trichopores, lytic
pore-forming proteins, have been identified. Finally, lateral gene
transfer of bacterial genes, also predicted to have been transferred in
another lumenal parasite, has helped to shape the unique metabolism of the
parasite.
2 p.m. Thursday April 20, 2006
Title: Chromosomal abnormalities
underlying mental retardation
Speaker: Jonathan
Pevsner, Ph.D. (JHU & KKI)
Venue: Computer Science Instructional
Center Room 2117
Abstract:
Mental retardation affects 2-3% of the U.S. population. It is defined by
broad criteria including significantly subaverage intelligence, onset by
age 18, and impaired function in a group of adaptive skills. Down syndrome
(DS), caused by a trisomy of chromosome 21, is the most common genetic
cause of mental retardation. We have measured the effects of trisomy 21 on
transcription and translation, based on studies of gene and protein
expression in the developing brain and heart. In a parallel approach, we
have analyzed chromosomal abnormalities underlying mental retardation and
other disorders. In particular we have identified chromosomal anomalies
such as microdeletions and microduplications in Down syndrome and other
mental retardation cases through the analysis of single nucleotide
polymorphisms (SNPs). We developed SNPscan, a web-accessible tool to
analyze and visualize chromosomal abnormalities from SNP data.
2 p.m. Thursday April 27, 2006
Title: Sequence Polymorphism
Detection and Analysis
Speaker: Jim C. Mullikin, Ph.D.
(NIH/NHGRI)
Venue: Biomolecular Science
Building Room 3118
Abstract:
Most single nucleotide polymorphism (SNP) discovery across the human
genome, available through dbSNP, has been accomplished by random shotgun
sequencing of additional individuals and comparing those sequences to
the reference genome using my software package called ssahaSNP. The
ssahaSNP package is the combination of a very fast "Sequence Search and
Alignment by Hashing Algorithm" (SSAHA) followed by a SNP detection
based on the Neighborhood Quality Standard (NQS). Understanding the SNP
discovery process is important in many downstream analyses, therefore I
will describe the various phases of SNP discovery process. The
International Haplotype Map (HapMap) Project drew from this increasing
pool of publicly available SNPs, and now provides a dataset of nearly 4
million SNPs successfully genotyped across 270 individuals, i.e. over
one billion genotypes. This combination of SNP discovery and genotyping
provides an amazing resource for further analysis. I will show some
examples of how to access these data and some analyses I have performed.
2 p.m. Thursday May 4, 2006
Title: lslink: Enhancing
the Semantics of Links in Life Science Data Resources
Speaker: Woei-Jyh (Adam) Lee
Venue: Biomolecular Science
Building Room 3118
Abstract:
Web accessible data resources contain an abundance of data on scientific
objects such as genes, protein, sequences, citations, etc. Biologists
typically explore these resources by navigating links between entries in
data resources (object) as well as paths (informally, concatenations of
links). While these links capture a rich semantics that is often well
understood by the scientist, the link itself does not explicitly capture
or represent meaning. Consequently, scientists spend significant time
following links only to reject many data entries that are reached. The
lack of explicit meaning also limits the sharing of this knowledge among
groups of scientists who are not in the same specialization. Finally, the
advent of automated tools such as scripts or mediators that may be used
for data gathering and data integration are limited since they have no
knowledge of the implicit semantics.
Links between entries in the resources are created for many different
reasons. Biologists capture new discoveries of an experiment or study
using links, whereas data curators add links to augment, to complete or to
make consistent, the knowledge captured among multiple resources. For
example, a result reported in a paper in PubMed may lead a curator to
insert a link from a data entry in say OMIM to this publication in PubMed.
Algorithms insert links automatically when discovering similarities among
two data items, e.g., to represent sequence similarity following a BLAST
search. Manually curated links added by record originators or curators are
generally inserted into the database record itself, whereas
algorithmically generated links are generally kept in a separate linking
table. Thus, the simple unlabeled physical links that are in use today are
insufficient to represent such subtle and diverse relationships.
We have addressed this problem by developing a methodology of
lslinks between entries in resources. The lslink enhances
existing links with a label (meaning). We further develop a data model and
a query language that can exploit lslinks while traversing paths
through the data resources. Contributions of this research include the
following: (1) A methodology that includes information extraction, link
generation and link labeling to enhance the semantics of lslinks.
(2) An extended example of lslink extraction and labeling where we
enhance the link from PubMed entries to markers in the human genome. (3) A
proof-of-concept prototype comprising the extraction protocol, a hierarchy
of link labels, and an experiment on machine assisted labeling of links.
|
|