CBCB Seminar Series
Spring 2007
12:30 p.m. Thursday January 25, 2007
Title: organizational meeting
By: Stephen M. Mount, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract: To discuss the
schedule in Spring 2007.
12:30 p.m. Thursday February 1, 2007
Title: Evolutionary dynamics of
microbial gene overlaps
Speaker: Carl Kingsford, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
Among sequenced prokaryotes, more than 29% of all annotated genes overlap
at least one of their two flanking genes. We study this common phenomenon
and present a comprehensive analysis of adjacent genes where the 3' ends
either overlap or nearly overlap. We describe the non-uniform
distribution of the lengths of the overlap regions and explain this
pattern using a simple evolutionary model based on extension to the
next-occurring stop codon. We further report a mirror-image pattern in
the distribution of separation distances of closely spaced genes, and we
conjecture that this distribution results from the conversion of
overlapping genes to non-overlapping genes.
Joint work with Art Delcher and Steven Salzberg.
12:30 p.m. Thursday February 8, 2007
Title: Reconciliation of Genome
Assemblies
Speaker: Aleksey Zimin, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
Draft genome assemblies have misassemblies and gaps. Many genomes (for
example, eight species of Drosophila, Rhesus Macaque) are assembled by
several centers, using their own assembly software, and then the
collaboration picks the draft assembly that they judge to be the best.
The other assemblies are usually discarded. The draft assemblies produced
by different assembly programs differ, and frequently one assembly program
is able to properly assemble a difficult region of the genome, while the
others couldn't. There is a wealth of information available through these
alternative assemblies. We have developed a technique that we call
assembly reconciliation that can merge draft genome assemblies. It takes
one draft assembly, detects apparent errors, and, when possible, patches
the problem areas using pieces from alternative draft assemblies. It also
closes gaps in places where one of the alternative assemblies has spanned
the gap correctly.
11 a.m. Tuesday February 13, 2007
(This is part of the Computer Science Distinguished Colloquium Series.)
Title: Are There Rearrangement
Hotspots in the Human Genome?
Speaker: Pavel A. Pevzner, Ph.D.
(University of California at San Diego)
Venue: A.V. Williams Building, ECE Conference Room
2460
Abstract:
Rearrangements are genomic "earthquakes" that change the chromosomal
architectures. The fundamental question in molecular evolution is whether
there exist "chromosomal faults" where rearrangements are happening over
and over again. In a landmark paper, Nadeau and Taylor (J.H.Nadeau and
B.A. Taylor. Proceedings of the National Academy of Sciences, 81, 814-818
(1984)) formulated the Random Breakage Model (RBM) of chromosome evolution
that postulates that there are no rearrangement hotspots in human genome.
In the next two decades, numerous mapping and sequencing studies with
progressively increasing levels of resolution, made RBM the de facto
theory of chromosome evolution. Despite the fact that RBM had prophetic
prediction power, it was recently refuted by Pevzner and Tesler
(P.Pevzner and G.Tesler. Proceedings of the National Academy of Sciences,
100, 7672-7677 (2003)) who introduced the Fragile Breakage Model (FBM)
postulating that human genome is a mosaic solid regions (with low
propensity for rearrangements) and fragile regions (rearrangement
hotspots). However, the rebuttal of RBM caused a controversy and led to a
split among researchers studying genome evolution. In particular, it
remains unclear whether some complex rearrangements (e.g., transpositions)
can create an appearance of rearrangement hotspots. We contribute to the
ongoing debate by analyzing multi-break rearrangements that break a genome
into multiple fragments and further glue them together in a new order.
While multi-break rearrangements were studied in depth for k=2 breaks, the
k-break rearrangement distance problem for arbitrary k remains unsolved.
We prove a theorem for computing multi-break rearrangement distance and
use it to resolve the "FBM versus RBM" controversy.
This is a joint work with Max Alekseyev.
Biography:
Dr. Pevzner is Ronald R. Taylor Chair professor of Computer Science and
Director of the Center for Algorithmic and Systems Biology at University
of California, San Diego. He holds Ph.D. (1988) from Moscow Institute of
Physics and Technology, Russia. Dr. Pevzner has authored graduate textbook
"Computational Molecular Biology: An Algorithmic Approach" in 2000 and
undergraduate textbook "Introduction to Bioinformatics Algorithms" in 2004
(jointly with Neal Jones). He was named Howard Hughes Medical Institute
Professor in 2006.
11:00 A.M. Monday February 19, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Detection and
characterization of genes in genomic sequences
Speaker: Lillian Florea (George
Washington University)
Venue: A.V. Williams Building,
Room 3258
Abstract:
New and more effective sequencing technologies will bring a
proliferation in the number of genomes available over the next few
years, which will need to be analyzed to determine genes and other
functional elements. Interpreting the raw sequence data into useful
biological information, also known as genome annotation, is a complex
process that requires the efficient integration of computational
analyses, auxiliary sequence data, and biological expertise. We
describe our ongoing work to create a collection of algorithms,
methods and tools for annotating genome sequences, starting from i)
tools and mathematical models for fast and accurate high-throughput
alignment of cDNA sequences to a target genome, to generate primary
data, to ii) methods for inferring genes and their variations
(alternative splice variants) in genomic sequences from the primary
evidence, and to iii) large-scale bioinformatics analyses of gene
annotation data to extract biologically meaningful patterns such as
models of exon evolution and potential underlying regulatory elements.
Our tools are fast, accurate and efficient to meet the demands of
timely and up-to-date annotation of newly sequenced model organisms.
12:30 p.m. Thursday February 22, 2007
Title: Prospects for association
mapping in Lake Malawi cichlid fishes
Speaker: Thomas D. Kocher,
Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
Genome projects are underway for several cichlid fish species. Most of
the genomic resources have been developed from tilapia (Oreochromis
niloticus), including genetic and BAC fingerprint maps. NIH has approved
a project to develop a 5x draft assembly of tilapia. The haplochromine
cichlids which dominate the East African lakes have also been targeted for
sequencing. NIH has also approved sequencing 2x from Astatotilapia
burtoni (Lake Tanganyika), 2x from Paralabidichromis chilotes (Lake
Victoria) and 2x from Metriaclima zebra (Lake Malawi). This complex but
fundamentally sparse data set may require new strategies for comparative
assembly. DOE-JGI has deposited 0.1x shotgun coverage for each of 5
species of Lake Malawi cichlids in the Trace Archives. Since the
radiation of species in Lake Malawi can be likened to a recombinant inbred
panel, we hope to use SNPs mined from these data for association mapping
of quantitative traits in the Lake Malawi cichlid species flock.
11:00 A.M. Monday February 26, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Learning predictive models
of gene regulation
Speaker: Christina Leslie, Ph.D.
(Computational Biology
Group, Columbia University)
Venue: CSIC BUilding, Room
1115
Abstract:
Studying the behavior of gene regulatory networks by learning from
high-throughput genomic data has become one of the central problems in
computational systems biology. Most work in this area has focused on
learning structure from data -- e.g. finding clusters or modules of
potentially co-regulated genes, or building a graph of putative
regulatory "edges" between genes -- and has been successful at
generating qualitative hypotheses about regulatory networks.
Instead of adopting the structure learning viewpoint, our focus is to
build predictive models of gene regulation that allow us both to make
accurate quantitative predictions on new or held-out experiments (test
data) and to capture mechanistic information about transcriptional
regulation. Our algorithm, called MEDUSA, integrates promoter
sequence, mRNA expression, and transcription factor occupancy data to
learn gene regulatory programs that predict the differential
expression of target genes. Instead of using clustering or
correlation of expression profiles to infer regulatory relationships,
the algorithm learns to predict up/down expression of target genes by
identifying condition-specific regulators and discovering regulatory
motifs that may mediate their regulation of targets. We use boosting,
a technique from statistical learning, to help avoid overfitting as
the algorithm searches through the high dimensional space of potential
regulators and sequence motifs. We will report computational results
on the yeast environmental stress response, where MEDUSA achieves high
prediction accuracy on held-out experiments and retrieves key
stress-related transcriptional regulators, signal transducers, and
transcription factor binding sites. We will also describe recent
results on the hypoxic response in yeast, where we used MEDUSA to
propose the first global model of the oxygen sensing and regulatory
network, including new putative context-specific regulators. Through
our experimental collaborator on this project, the Zhang Lab at
Columbia University, we are in the process of validating our
computational predictions with wet lab experiments.
11:00 A.M. Wednesday February 28, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Deciphering Information
Encoded in the Dark Matter of the Human Genome
Speaker: Xiaohui Xie, Ph.D. (Broad
Institute of Massachusetts Institute of Technology and Harvard University)
Venue: CSIC BUilding, Room
1122
Abstract:
Among the 3 billion bases contained in the human genome, only 1.5% are
well characterized, primarily in the form of protein-coding genes. One
of the main challenges in genomics is to understand the function of the
other 98.5% of the genome. Comparison of the human genome to several
other related genomes has revealed that these regions harbor a
strikingly large number of highly conserved noncoding elements,
accounting for over two-thirds of the portion of the human genome under
selection.
And yet the function of these conserved noncoding elements (CNEs)
remains largely unknown. We also know little about their evolutionary
origins, or the molecular mechanisms that have preserved them through
millions of years' evolution.
I will describe computational methods for systematically dissecting the
function of the CNEs. Using statistical analysis and comparative
genomics, we have uncovered hundreds of novel regulatory motifs within
the CNEs, matching hundreds of thousands of conserved instances in the
genome. These motifs form distinct classes, including transcriptional
regulatory elements, small RNA genes, microRNA targeting sites, and
chromatin barriers.
I will also describe an effort to characterize the evolution of
regulatory sequences. I will propose the creative role of transposable
elements as a major force for duplicating and dispersing regulatory
elements in the human genome. Comparison of metatherian and eutherian
genomes reveals that over 15% of the eutherian CNEs arose from sequence
inserted by transposons.
In a few years, genome sequences of over 50 mammals will become
available. I will discuss how these data will empower the methods I have
described, and provide us an opportunity to unravel all information
coded in the human genome.
12:30 p.m. Thursday March 1, 2007
Title: Systems biology as seen
from inside the Drosophila blastoderm
Speaker: John
Reinitz, Ph.D. (The State University of New York at Stony Brook)
Venue: Biomolecular Science
Building Room 3118
Abstract:
This talk will be concerned with two fundamental questions. The first
is the determination of a moprphogenetic field, and the second
is the control of transcription in metazoan genes with large promoters.
One of the central ideas in animal development is that of
the determination of cell fates in a morphogenetic field. A second
central idea, or perhaps observation, is that morphogenetic fields are
capable of regulation, a classical term for the correction of
errors. In the past, regulation was investigated by surgical
perturbation of embryos. In the modern context regulation can also be
studied in the context of genetic perturbations or of individual
variations in gene expression in an isogenic population. We consider
this problem in the early embryo of the fruit fly Drosophila, a
well characterized system for molecular developmental genetics which
can also be used as a naturally grown differential display system for
reverse engineering networks of genes. This system is being used by
ourselves and others to address fundamental questions about the
reliability of developmental processes.
In the Drosophila system which we study, determination
of the morphogenetic field is implemented by means of differential
regulation of transcription. The control of this process by
groups of binding sites is as yet poorly understood. We present
a new model of transcriptional control and show how it can be used
to understand anomalous expression of even-skipped stripe
7 and to predict the results of site directed mutagenesis experiments.
Biography:
John Reinitz works in the Department of Applied Mathematics and Statistics
at Stony Brook University, although his flies live across campus at the
Center for Developmental Genetics. Starting in 1982, Professor Reinitz has
been using methods from quantitative biology, bioinformatics, mathematics,
and numerical computing to investigate fundamental problems in gene
regulation and development. His PhD work under the direction of J. Rimas
Vaisnys, "A Theoretical and Experimental Analysis of a Genetic Switch in
Phage Lambda" (Yale, 1988) explored a simple system. Since then he has
focused on the Drosophila blastoderm, spending time at Columbia University
(with Dr. Michael Levine), the Santa Fe Institute (where he remains an
external faculty member), Yale Medical School, Mount Sinai School of
Medicine and Stony Brook University, where he has been since 2001.
11:00 A.M. Wednesday March 7, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Computational Prediction of
Protein Structure and Transcription Termination Signals
Speaker: Carl Kingsford, Ph.D.
(University of Maryland, College Park)
Venue: CSIC Building, Room
1115
Abstract:
Because experimentalists generate sequences of new genes more quickly
than the corresponding 3D protein structures can be determined,
computational methods for predicting a protein's shape from its amino
acid sequence are necessary. I will discuss my work applying
mathematical programming to finding the optimal (i.e. lowest-energy)
configuration of protein side chains, given only the protein's
sequence and backbone shape. This approach has been used successfully
for homology modeling and for designing proteins with desired shapes.
While we have shown the underlying graph problem to be NP-hard to
approximate, our method can find optimal solutions to real-world
instances quickly. In addition, our method is easily extensible to other
settings.
In the second half of my talk, I will address a separate problem
concerning the organization of bacterial genomes. In many bacteria,
transcription of DNA to RNA is terminated by a signal in the DNA called
a Rho-independent transcription terminator. Detecting such terminators
can shed light on the grouping of genes into transcription units and can
improve gene function prediction.
I will describe a computational method to rapidly and accurately find
these Rho-independent transcription terminators. We have used our method
to predict the locations of terminators in 343 prokaryotic genomes. This
is the largest collection of such predictions available, and they have
had immediate utility in the study of biological phenomena: Using them,
we have developed new insights about the relationship between
terminators and DNA uptake signals (a type of genomic signal involved in
importing external DNA into the cell) and discovered a new uptake signal
in the organism Haemophilus ducreyi.
These two topics illustrate types of contributions that computer science
can make to the biological sciences and also demonstrate the breadth of
computational techniques that must be brought to bear to make sense of
biological data.
12:30 p.m. Thursday March 8, 2007
Title: Human-specific gene
inactivation or modification by coding sequence disruptions
Speaker: Yoonsoo Hahn, Ph.D.
(National Cancer Institute, National Institutes of Health)
Venue: Biomolecular Science
Building Room 3118
Abstract:
Some of the loss of ape characters and gain of human traits can be
achieved by gene inactivation during human evolution. We devised
bioinformatics methods for systematic identification of putative
human-specific coding region disruptions that might have occurred after
the human and chimpanzee divergence. First, we collected human genes
showing an insertion, a deletion, or a premature stop codon when compared
with the orthologous chimpanzee genome sequence. Then, we selected those
cases wherein the chimpanzee ortholog maintains the ancestral open reading
frame as demonstrated by the presence of an intact homolog in a third
species. Using this procedure, we identified seven frameshift, nine
nonsense, and two exon-deletion mutations in the human genes, which have
not been reported previously. Possible functional influence of the
mutation on each gene will be discussed. We propose that inactivation or
modification of genes by coding sequence disruption is a part of the
normal process of induction and facilitation of certain phenotypic traits
in the human.
Biography:
Yoonsoo Hahn is a post-doctoral Visiting Fellow at National Cancer
Institute (NCI), NIH where he works with Dr. BK Lee. He earned his PhD in
Molecular Biology in 2000 at Korea Advanced Institute of Science and
Technology, Korea. He served as a Senior Scientist at Korea Research
Institute of Bioscience and Biotechnology until he joined NCI in 2002. His
research interests are to identify and characterize genetic changes in the
human genome during evolution and to relate them with human-specific
phenotypic traits.
11:00 A.M. Monday March 12, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Bayesian Learning for
Deciphering Gene Regulation
Speaker: Yuan (Alan) Qi, Ph.D.
(Massachusetts Institute of Technology)
Venue: A.V. Williams Building,
Room 2460
Abstract:
Gene regulation plays a fundamental role in biological systems. As more
high-throughput biological data becomes available it is possible to
quantitatively study gene regulation in a systematic way. In this talk I
will present my work on three problems related to gene regulation:
(1)identifying genes that affect organism development; (2) detecting
protein-DNA binding events and cis-regulatory elements; (3) and
deciphering regulatory cascades at the transcriptional levels for stem
cell development. To address these problems, I developed biologically
interpretable Bayesian models and designed novel learning methods. They
capture key aspects of biological processes and make functional
predictions, some of which were confirmed by biological experiments. I
will conclude with brief descriptions of my plan for future work,
including fusing multiple data sources and deciphering gene regulation
at the post-transcriptional level.
11:00 A.M. Wednesday March 14, 2007
Special Seminar (Faculty Candidate
Talk)
Title: Inferring biological
networks from diverse genomic data
Speaker: Chad Myer (Princeton
University)
Venue: CSIC Building, Room
1115
Abstract:
Understanding protein function and modeling protein-protein interactions
in biological networks is a key challenge in modern systems biology.
Recent developments in biotechnology have enabled high-throughput
measurement of several cellular phenomena including gene expression,
protein-protein interactions, protein localization, and sequence. The
wealth of data generated by such technology promises to support
computational prediction of network models, but so far, successful
approaches that translate these data into accurate, experimentally
testable hypotheses have been limited.
I will discuss key insights into why we face this imbalance between
genomic data and established knowledge and present computational
approaches for addressing these challenges. Specifically, I will focus
on methods for measuring genomic dataset reliability and illustrate how
reliability often varies across different biological contexts. We have
developed a Bayesian framework for leveraging this variation to improve
network prediction accuracy and implemented this approach in a public,
web-based system for user-driven search and visualization of genomic
data. I will describe the supporting machine learning methods as well
as important data visualization features, which play a critical role in
making the system practical. To illustrate the power of our approach, I
will demonstrate how we have used it to correctly predict function for
several previously uncharacterized genes in yeast and to elucidate the
behavior of Hsp90, a target of recent cancer drugs. I will close with a
brief overview of plans for future research motivated by this work.
12:30 p.m. Thursday March 15, 2007
Title: Peptide Identification by
Spectral Matching of Tandem Mass Spectra using Hidden Markov Models
Speaker: Xue Wu
Venue: Biomolecular Science
Building Room 3118
Abstract:
Peptide identification by tandem mass spectrometry is the dominant
proteomics workflow for protein characterization in complex samples. The
peptide fragmentation spectra generated by these work- flows exhibit
characteristic fragmentation patterns that can be used to identify the
peptide. In other fields, where the compounds of interest do not have the
convenient linear structure of peptides, fragmentation spectra are
identified by comparing new spectra with libraries of identified spectra,
an approach called spectral matching. In contrast to sequence based tandem
mass spectrometry search engines used for peptides, spectral matching can
make use of the intensities of fragment peaks in library spectra to assess
the quality of a match. We evaluate a hidden Markov model approach to
spectral matching, in which many examples of a peptide s fragmentation
spectrum are summarized in a generative probabilistic model that captures
not only the expected ion intensities, but also the variation in the
intensities of the peak. Results show HMMs can identify many additional
mass spectra not identified by traditional tandem mass spectrometry
database search engines such as X!Tandem.
12:30 p.m. Thursday March 29, 2007
Title: The Molecular Basis for
Cold-Adaptation, and the Evolution of Polar Protist Floras
Speaker: Michael P. Cummings, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
Extreme cold presents a formidable challenge for life. For example,
microtubules, which are required for cell division, spontaneously
depolymerize at cold temperatures. However, some organisms, particularly
some protists require cold temperatures to live. To understand the nature
of cold-adaptation at the molecular level, we examined a large sample of
cold-adapted and warm adapted tubulins. Using machine learning methods we
identified the residues associated with cold-adaptation. Additionally we
used the tubulin sequences and rDNA sequences from the same organisms to
address hypotheses regarding the origins of polar floras.
12:30 p.m. Thursday April 5, 2007
Title: cis-Regulatory Sequence
Evolution Across the Metazoa
Speaker: Cristian
I. Castillo-Davis, Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
While recent large-scale studies have revealed which functional classes of
protein-coding sequences are highly conserved in different species, little
is known about genome-wide rates of noncoding sequence change across the
Metazoa relative to gene function. Here, we investigate divergence in 5'
proximal noncoding sequences in six genomes, between representative
species pairs in three morphologically and phylogenetically distinct
animal phyla: Chordata, Arthropoda, and Nematoda. Results reveal a
consistent pattern within each phylum; the most highly conserved 5
noncoding sequences in each genome are proximal to genes involved in basic
developmental processes including embryogenesis, organogenesis and
neurogenesis as determined by database annotations, microarray
experiments, and whole genome RNAi data. These results are consistent with
greater cis-regulatory complexity in developmental genes and/or stronger
purifying selection on developmentally-related regulatory sequences in
animals. These findings suggest a shared genomic regulatory architecture
across the higher Metazoa.
12:30 p.m. Thursday April 12, 2007
Title: Wavelet Transformation and
Genome Analysis
Speaker: Jiuzhou (John) Song,
Ph.D.
Venue: Biomolecular Science
Building Room 3118
Abstract:
Comparative genomics has been a valuable method for extracting and
extrapolating genome information among closely related bacteria. The
efficiency of the traditional methods is extremely influenced by the
software method used. To overcome the problem here, we propose using
wavelet analysis to perform comparative genomics. First, global comparison
using wavelet analysis gives the difference at a quantitative level. Then
local comparison using keto-excess or purine-excess plots shows precise
positions of inversions, translocations, and horizontally transferred DNA
fragments. We firstly found that the level of energy spectra difference is
related to the similarity of bacteria strains; it could be a quantitative
index to describe the similarities of genomes. The strategy is described
in detail by comparisons of closely related strains: S.typhi CT18,
S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and
H.pylori J99.
12:30 p.m. Thursday April 19, 2007
Title: Using Annotations from
Controlled Vocabularies to Find Meaningful Associations
Speaker: Woei-Jyh (Adam) Lee
Venue: Biomolecular Science
Building Room 3118
Abstract:
In this talk, I will present the LSLink (or Life Science Link)
methodology that provides users with a set of tools to explore the rich
Web of interconnected and annotated objects in multiple repositories, and
to identify meaningful associations. Consider a physical link between
objects in two repositories, where each of the objects is annotated with
controlled vocabulary (CV) terms from two ontologies. Using a set of
LSLink instances generated from a background dataset of knowledge
we identify associations between pairs of CV terms that are potentially
significant and may lead to new knowledge. We develop an approach based on
the logarithm of the odds (LOD) to determine a confidence and
support in the associations between pairs of CV terms. Using a case
study of Entrez Gene objects annotated with GO terms linked to PubMed
objects annotated with MeSH terms, we describe a user validation and
analysis task to explore potentially significant associations.
12:30 p.m. Thursday April 26, 2007
Venue: Biomolecular Science
Building Room 3118
Title: Finding misassemblies in
draft genomes
Speaker: Guillaume Marçais
Abstract:
I will describe new methods for finding misassemblies in genomes. I will
present an example of a 4000-base omission in Drosophila
melanogaster.
Title: Closing Gaps in Assemblies
Speaker: Poorani Subramanian
Abstract:
Closing gaps in draft assemblies often involves resequencing or other
expensive and time consuming techniques. We propose an algorithm for
closing gaps using existing data and discuss its uses in solving similar
problems.
12:30 p.m. Thursday May 3, 2007
Title: Detection of Pathogens in
the Presence of Complex Backgrounds
Speaker: Yuriy Fofanov, Ph.D. (University of
Houston)
Venue: Biomolecular Science
Building Room 3118
Abstract:
Reliable detection and identification of pathogens in complex biological
samples or in the presence of contaminating DNA from a variety of sources,
is compounded by the difficulty in finding a single, unique genomic
sequence that is present simultaneously in all genomes of a pathogen
species and absent in the genomes of the host and/or sample background. A
variety of nucleic acid-based tests have been developed for viral pathogen
identification, including PCR, microarrays, etc. Despite this, the
probability of false positives due to mispriming with the host/background
DNA remains a problem.
We have developed a set of novel algorithms that make it possible
to efficiently calculate for each subsequence in the target (pathogen)
genome the number of base changes necessary to convert a signature
sequence to the closest sequence present in the host genome where all
possible base changes and combinations of base changes are considered.
This allows exclusion of all subsequences that are present in a selected
host/background genome (e.g., human) in the PCR primer and/or microarray
probe design step with greatly increased speed and effectiveness compared
to current design methods. As a result, we are able to identify
ultraspecific signatures for pathogen detection. These ultraspecific
signatures greatly improve the reliability of a detection assay as it is
less likely to misprime with non-target organisms and thus has a lower
probability of false positive identification.
While ultraspecific signatures have worked well within the
laboratory, application to real clinical and environmental samples must be
considered as both often have an unknown number of genomic material also
present. Knowledge of the total genomic diversity, incorporating the
total length of all genomes present in clinical, environmental (air,
water, soil, or surface) or food samples, is critical for genome-based
identification approaches as it allows one to estimate the probability of
false positives and determine the number and length of probes/primers
needed. We are currently developing new DNA technology for the estimation
of the effective genomic sizes of environmental and clinical samples.
Coupled with the ultraspecific design strategy for improved quality
signature selection, more robust and reliable assays can be designed for
essentially any organism of interest in any complex sample.
Biography:
Dr. Yuriy Fofanov received M.Sc. in 1977 and Ph.D. in 1988 at Kuibyshev
(Samara) State University (USSR). He is currently an Assistant Professor
at the Dept. of Computer Science in University of Houston, and an Adjunct
Assistant Professor at the Dept. Health Informatics in the School of
Health Information Science in University of Texas since 2001. His research
includes population scale HLA typing, new approach to detect the presence
of foreign DNA in human clinical samples, tools for ultraspecific
probe/primer design, and bioinformatics approach and assay development for
the estimation of the total genomic diversity of complex backgrounds.
|
|