2:00 p.m., Wednesday, April 14, 2010
Title: "Whole-Genome Sequence Analysis for Pathogen Detection and Diagnostics"
By: Adam Phillippy,
CBCB
Venue: 3118 Biomolecular Sciences
Abstact:
Pathogenic microbes, both natural and weaponized, pose
significant dangers to human health and safety. To defend against these
threats, it is essential to rapidly detect and characterize pathogens in any
environmental or clinical medium with high accuracy. Now that the genome
sequences of thousands of bacteria and viruses are known, it is possible to
design biomolecular tests to rapidly detect and characterize pathogens based
solely on their DNA. Possible applications are far-reaching and include
real-time clinical diagnosis and biosurveillance. However, these tests
require sophisticated computational design and analysis to operate
effectively.
This dissertation presents novel computational methods for improving the
accuracy of three modern diagnostic technologies: polymerase chain reaction
(PCR), array comparative genomic hybridization (CGH), and whole-genome
sequencing. For designing real-time PCR detection assays, an efficient
search algorithm and data structure are presented for analyzing over 100
billion nucleotides of genomic DNA to identify the most distinguishing
sequences of a pathogen. Laboratory validation shows that these "signature"
sequences can be used to detect pathogens in complex samples and
differentiate them from their non-pathogenic relatives. For CGH, pan-genome
array design and analysis algorithms are presented for the characterization
of microbial isolates. These methods are used to study multiple strains of
the foodborne pathogen, Listeria monocytogenes, revealing new insights into
the diversity and evolution of the species. Finally, multiple methods are
presented for the validation of whole-genome sequence assemblies. These
validated assemblies provide the ultimate diagnostic, decoding the entire
DNA sequence of a genome with high confidence.
A Dissertation Defense for the degree of Ph.D. in Computer Science
2:00 p.m., Thursday, April 15, 2010
RECOMB 2010 Practice Talks
Venue: 3118 Biomolecular
Sciences
directions
Title (RECOMB 2010 Practice
Talk): "Dense Subgraphs with
Restrictions and Applications to Gene Annotation Graphs"
Authors: Barna Saha, Allison
Hoch, Samir Khuller, Louiqa Raschid and Xiao-Ning Zhang
Speaker: Barna Saha,
a third year Computer Science graduate student working with Samir
Khuller on
algorithm
design and analysis.
Abstract:
We focus on finding complex annotation patterns representing
novel and interesting hypotheses from gene annotation data.
We define a generalization of the densest subgraph problem by adding
an additional distance restriction (defined by a separate metric)
to the nodes of the subgraph.
We show that while this generalization makes the problem NP-hard
for arbitrary metrics,
when the metric comes from the distance metric of a tree, or an
interval graph, the problem can be solved optimally in polynomial time.
We also show that the densest subgraph problem with a specified
subset of vertices that have to be included in the solution
can be solved optimally in polynomial time. In addition, we consider
other extensions when not just one solution needs to be found, but
we wish to list all subgraphs of almost maximum density as well.
We apply this method to a dataset of genes and their annotations
obtained from The Arabidopsis Information Resource (TAIR).
A user evaluation confirms that the patterns found in the distance
restricted densest subgraph for a dataset of photomorphogenesis genes
are indeed validated in the literature; a control dataset validates
that these are not random patterns. Interestingly, the complex
annotation patterns potentially lead to new and as yet unknown
hypotheses.
We perform experiments to determine the properties of the dense
subgraphs, as we vary parameters, including the number of genes and the
distance.
-------------
Title (RECOMB 2010 Practice
Talk): "Extracting between-pathway models from E-MAP interactions using expected graph compression"
Speaker: David Kelley
Abstract:
Genetic interactions (such as synthetic lethal interactions)
have become quantifiable on a large-scale using the epistatic miniarray
profile (E-MAP) method. An E-MAP allows the construction of a large,
weighted network of both aggravating and alleviating genetic interactions
between genes. By clustering genes into modules and establishing relationships
between those modules, we can discover compensatory pathways.
We introduce a general framework for applying greedy clustering
heuristics to probabilistic graphs.We use this framework to apply a graph
clustering method called graph summarization to an E-MAP that targets
yeast chromosome biology. This results in a new method for clustering
E-MAP data that we call Expected Graph Compression (EGC). We validate
modules and compensatory pathways using enriched Gene Ontology
annotations and a novel method based on correlated gene expression
from a comprehensive collection of expression experiments. EGC finds a
number of modules that are not found by any of the previous methods to
cluster E-MAP data. Further, EGC uncovers core submodules contained
within several previously found modules, suggesting that EGC can reveal
the finer structure of E-MAP networks.
1:00 p.m., Friday, April 16, 2010
Title: " High Performance Computing for DNA Sequence Alignment and Assembly"
By: Michael C. Schatz,
CBCB
Venue: 3118 Biomolecular Sciences
Abstact:
We are at the dawn of a new era in computational biology. DNA
sequencing projects that required years of effort and hundreds of millions
of dollars of equipment just a few years ago, can now be performed quickly
and cheaply by individual labs. This dramatic shift is expanding the scale
and scope of sequencing to previously unimaginable limits, and will
ultimately lead to new discoveries about our basic biology, the diversity of
life, and personalized medicine. However, these ambitious goals can only be
realized if we can develop new computational methods that can effectively
analyze the overwhelming volumes of data generated.
In my presentation, I'll describe my research developing efficient methods
for analyzing large biological datasets, including by using highly parallel
commodity graphics processing units produced by nVidia, and the parallel
computing framework MapReduce developed by Google. My dissertation research
demonstrates how these technologies can be applied to the critical tasks of
large-scale alignment and assembly, enabling genotyping and de novo assembly
of whole genome genomes from billions of short reads. Coupled with
inexpensive cloud computing, these programs can quickly, cheaply, and
accurately analyze tremendous biological datasets and have the potential to
make otherwise infeasible studies practical.
A Dissertation Defense for the degree of Ph.D. in Computer Science
2:00 p.m., Thursday, April 29, 2010
Title:
"Structural Assembly of Molecular Complexes Based on Residual Dipolar Couplings"
Speaker: Konstantin
Berlin, a finishing PhD student in Computer Science
Venue: 3118 Biomolecular
Sciences
directions
Abstact:
We present PATI, a computationally efficient and accurate abinitio predictor of the residual dipolar couplings (RDCs) from a protein structure. Building upon PATI, we develop and evaluate a rigid-body molecular docking method, called PATIDOCK, that relies solely on the three-dimensional structure of the individual components and the experimentally derived RDCs for the complex, and show that it is possible to accurately assemble a protein-protein complex by utilizing PATI to guide the docking method. The proposed docking method is robust against experimental errors in the RDCs and computationally efficient. We analyze the accuracy and efficiency of this method using experimental or synthetic RDC data for several proteins, as well as synthetic data for a large variety of protein-protein complexes. We also test our method on two protein systems for which the structure of the complex and steric-alignment data are available (Lys48-linked diubiquitin and a complex of ubiquitin and a ubiquitin-associated domain) and analyze the effect of flexible unstructured tails on the outcome of docking. The results demonstrate that it is fundamentally possible to assemble a protein-protein complex based solely on experimental RDC data and the prediction of the alignment tensor from three-dimensional structures. Additionally we show a method for combining RDCs with other experimental data, such as ambiguous constraints from interface mapping, to further improve structure characterization of the protein complexes.