CBCB Seminar Series
Summer 2004
4 p.m. Thursday June 17, 2004
Title: Organizational meeting &
research projects presentation
Venue: Computer Science
Instructional Building Room 3120.
Abstract: To discuss the schedule
in Summer 2004.
4 p.m. Thursday June 24, 2004
Title: SNPs Problems, Complexity
and Algorithms
Speaker: Xue Wu
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
Single nucleotide polymorphisms (SNPs) are the most frequent form of human
genetic variation. They are of fundamental importance for a variety of
applications including medical diagnostic and drug design. They also
provide the highest-resolution genomic fingerprint for tracking disease
genes. This paper is devoted to algorithmic problems related to
computational SNPs validation based on genome assembly of diploid
organisms. In diploid genomes, there are two copies of each chromosome. A
description of the SNPs sequence information from one of the two
chromosomes is called SNPs haplotype. The basic problem addressed here is
the Haplotyping, i.e., given a set of SNPs prospects inferred from the
assembly alignment of a genomic region of a chromosome, find the maximally
consistent pair of SNPs haplotypes by removing data "errors" related to
DNA sequencing errors, repeats, and paralogous recruitment.
References:
Background knowledge of SNPs : NIH introduction
to SNPs
Recent research papers:
Survey papers:
4 p.m. Thursday July 1, 2004
Title: Discovering molecular
pathways from protein interaction and gene expression data
Speaker: Woei-Jyh (Adam) Lee
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
In this paper, we describe an approach for identifying pathways from gene
expression and protein interaction data. Our approach is based on the
assumption that many pathways exhibit two properties: their genes exhibit
a similar gene expression profile, and the protein products of the genes
often interact. Our approach is based on a unified probabilistic model,
which is learned from the data using the EM algorithm. We present results
on two Saccharomyces cerevisiae gene expression data sets, combined with a
binary protein interaction data set. Our results show that our approach is
much more successful than other approaches at discovering both coherent
functional groups and entire protein complexes.
References:
4 p.m. Thursday July 8, 2004
Title: Exploring Deterministic
Reconstruction of Repetitive DNA for Use in Genome Assembly
Speaker: Suzanne Sindi
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
Whole Genome Shotgun Assembly is a method for determining the sequence of
a genome. The presence of highly repetitive DNA complicates this method
and can impact the accuracy of the final sequence assembled. Using an
approach from symbolic dynamical systems we present a way to represent
highly repetitive sequences of DNA. We discuss potential applications of
these representations to Whole Genome Shotgun Assembly.
4 p.m. Thursday July 22, 2004
Title: Extracting synonymous gene
and protein terms from biological literature
Speaker: Rezarta Islamaj
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
Genes and proteins are often associated with multiple names. More names
are added as new functional or structural information is discovered.
Because authors can use any one of the known names for a gene or protein,
information retrieval and extraction would benefit from identifying the
gene and protein terms that are synonyms of the same substance. We have
explored four complementary approaches for extracting gene and protein
synonyms from text, namely the unsupervised, partially supervised, and
supervised machine-learning techniques, as well as the manual
knowledge-based approach. We report results of a large scale evaluation of
these alternatives over an archive of biological journal articles. Our
evaluation shows that our extraction techniques could be a valuable
supplement to resources such as SWISSPROT, as our systems were able to
capture gene and protein synonyms not listed in the SWISSPROT database.
References:
4 p.m. Thursday July 29, 2004
Title: Splice site prediction: the
general problem, the proposed methods, their characteristics and
differences
Speaker: Rezarta Islamaj
Venue: Computer Science Instructional
Building Room 2120.
Abstract:
Splice sites have been modeled by a variety of methods over the past
twenty years. Still the search for improvement continues as splice site
detection is a key ingredient for accurate gene finding. I will make a
review of several methods widely used in the literature aiming to state
their characteristics as well as differences. I will try to illustrate my
ideas with several experiments and test results. The results show that we
reach the best performance by combining boosted decision trees as a
modeling framework with information from a larger sequence window.
However, the splice site prediction problem is far from over. Currently, I
am continuing my experiments examining the possibilities of improvement.
4 p.m. Thursday August 12, 2004
Title: Indexing techniques in
protein structural comparison
Speaker: Elena Zotenko
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
Given a query protein the ability to identify all structurally similar
proteins is of primary importance in the study of protein evolution and
function. As the number of protein structures grows there is a need to
develop screening methods that will perform quick yet accurate filtering
of the database before a more computationally expensive protein structure
comparison method is applied.
The long term objective of my research is to develop such screening method
for VAST (Vector Alignment Search Tool), a protein structure comparison
method used at NCBI.
In this talk I am going to give an overview of protein structure and
protein structure comparison methods. Then I will concentrate on two
approaches to index protein structures. Finally I will talk about my
research in the past several months: investigating a possibility of a
screening method that borrows from the above two approaches.
References:
- L.Holm, C.Sander, "3-D
lookup: fast protein structure database searches at 90% reliability",
Proc Int Conf Intell Syst Mol Biol, 1995,3:179-87
- P.Rogen, B.Fain, "Automatic
classification of protein structure by using Gauss integrals",
PNAS, 2003, 100:119-124
- P.Rogen, B.Henrik, "A
new family of global protein shape descriptors", Mathematical
Biosciences, 2003, 182:167-181
4 p.m. Thursday August 26, 2004
Title: Allosteric determinantsin
guanine nucleotide-binding
Speaker: Nozomi Sakakibara
Venue: Computer Science
Instructional Building Room 2120.
Abstract:
For mapping energetic interactions in proteins, a technique was developed
that uses evolutionary data for a protein family to measure statistical
interactions between amino acid positions. For the PDZ domain family, this
analysis predicted a set of energetically coupled positions for a binding
site residue that includes unexpected long-range interactions. Mutational
studies conthrm these predictions, demonstrating that the statistical
energy function is a good indicator of thermodynamic coupling in proteins.
Sets of interacting residues form connected pathways through the protein
fold that may be the basis for efthcient energy conduction within
proteins.
Members of the G protein superfamily contain nucleotide-dependent switches
that dictate the specificity of their interactions with binding partners.
Using a sequence-based method termed statistical coupling analysis (SCA),
we have attempted to identify the allosteric core of these proteins, the
network of amino acid residues that couples the domains responsible for
nucleotide binding and protein-protein interactions. One-third of the 38
residues identified by SCA were mutated in the G protein Gs?, and the
interactions of guanosine 5'-3-O-(thio)triphosphate- and GDP-bound mutant
proteins were tested with both adenylyl cyclase (preferential binding to
GTP -Gs?) and the G protein ?? subunit complex (preferential binding to
GDP-Gs?). A two-state allosteric model predicts that mutation of residues
that control the equilibrium between GDP- and GTP-bound conformations of
the protein will cause the ratio of affinities of these species for
adenylyl cyclase and G?? to vary in a reciprocal fashion. Observed results
were consistent with this prediction. The network of residues identified
by the SCA appears to comprise a core allosteric mechanism conferring
nucleotide-dependent switching; the specific features of different G
protein family members are built on this core.
References:
- Steve W. Lockless, Rama Ranganathan, "Evolutionarily
Conserved Pathways of Energetic Connectivity in Protein Families",
Science, Vol 286, Issue 5438, 295-299, 8 October 1999
- Mark E. Hatley, Steve W. Lockless, Scott K. Gibson, Alfred G. Gilman
and Rama Ranganathan, "Allosteric
determinants in guanine nucleotide-binding", PNAS, Vol. 100,
No. 24, 14445-14450, 25 November 2003
|
|