CBCB Seminar Series

Spring 2007

12:30 p.m. Thursday January 25, 2007

Title: organizational meeting
By: Stephen M. Mount, Ph.D.
Venue: Biomolecular Science Building Room 3118
Abstract: To discuss the schedule in Spring 2007.

12:30 p.m. Thursday February 1, 2007

Title: Evolutionary dynamics of microbial gene overlaps
Speaker: Carl Kingsford, Ph.D.
Venue: Biomolecular Science Building Room 3118

Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least one of their two flanking genes. We study this common phenomenon and present a comprehensive analysis of adjacent genes where the 3' ends either overlap or nearly overlap. We describe the non-uniform distribution of the lengths of the overlap regions and explain this pattern using a simple evolutionary model based on extension to the next-occurring stop codon. We further report a mirror-image pattern in the distribution of separation distances of closely spaced genes, and we conjecture that this distribution results from the conversion of overlapping genes to non-overlapping genes.

Joint work with Art Delcher and Steven Salzberg.

12:30 p.m. Thursday February 8, 2007

Title: Reconciliation of Genome Assemblies
Speaker: Aleksey Zimin, Ph.D.
Venue: Biomolecular Science Building Room 3118

Draft genome assemblies have misassemblies and gaps. Many genomes (for example, eight species of Drosophila, Rhesus Macaque) are assembled by several centers, using their own assembly software, and then the collaboration picks the draft assembly that they judge to be the best. The other assemblies are usually discarded. The draft assemblies produced by different assembly programs differ, and frequently one assembly program is able to properly assemble a difficult region of the genome, while the others couldn't. There is a wealth of information available through these alternative assemblies. We have developed a technique that we call assembly reconciliation that can merge draft genome assemblies. It takes one draft assembly, detects apparent errors, and, when possible, patches the problem areas using pieces from alternative draft assemblies. It also closes gaps in places where one of the alternative assemblies has spanned the gap correctly.

11 a.m. Tuesday February 13, 2007

(This is part of the Computer Science Distinguished Colloquium Series.)
Title: Are There Rearrangement Hotspots in the Human Genome?
Speaker: Pavel A. Pevzner, Ph.D. (University of California at San Diego)
Venue: A.V. Williams Building, ECE Conference Room 2460

Rearrangements are genomic "earthquakes" that change the chromosomal architectures. The fundamental question in molecular evolution is whether there exist "chromosomal faults" where rearrangements are happening over and over again. In a landmark paper, Nadeau and Taylor (J.H.Nadeau and B.A. Taylor. Proceedings of the National Academy of Sciences, 81, 814-818 (1984)) formulated the Random Breakage Model (RBM) of chromosome evolution that postulates that there are no rearrangement hotspots in human genome. In the next two decades, numerous mapping and sequencing studies with progressively increasing levels of resolution, made RBM the de facto theory of chromosome evolution. Despite the fact that RBM had prophetic prediction power, it was recently refuted by Pevzner and Tesler (P.Pevzner and G.Tesler. Proceedings of the National Academy of Sciences, 100, 7672-7677 (2003)) who introduced the Fragile Breakage Model (FBM) postulating that human genome is a mosaic solid regions (with low propensity for rearrangements) and fragile regions (rearrangement hotspots). However, the rebuttal of RBM caused a controversy and led to a split among researchers studying genome evolution. In particular, it remains unclear whether some complex rearrangements (e.g., transpositions) can create an appearance of rearrangement hotspots. We contribute to the ongoing debate by analyzing multi-break rearrangements that break a genome into multiple fragments and further glue them together in a new order. While multi-break rearrangements were studied in depth for k=2 breaks, the k-break rearrangement distance problem for arbitrary k remains unsolved. We prove a theorem for computing multi-break rearrangement distance and use it to resolve the "FBM versus RBM" controversy.

This is a joint work with Max Alekseyev.


Dr. Pevzner is Ronald R. Taylor Chair professor of Computer Science and Director of the Center for Algorithmic and Systems Biology at University of California, San Diego. He holds Ph.D. (1988) from Moscow Institute of Physics and Technology, Russia. Dr. Pevzner has authored graduate textbook "Computational Molecular Biology: An Algorithmic Approach" in 2000 and undergraduate textbook "Introduction to Bioinformatics Algorithms" in 2004 (jointly with Neal Jones). He was named Howard Hughes Medical Institute Professor in 2006.

11:00 A.M. Monday February 19, 2007

Special Seminar (Faculty Candidate Talk)
Title: Detection and characterization of genes in genomic sequences
Speaker: Lillian Florea (George Washington University)
Venue: A.V. Williams Building, Room 3258

New and more effective sequencing technologies will bring a proliferation in the number of genomes available over the next few years, which will need to be analyzed to determine genes and other functional elements. Interpreting the raw sequence data into useful biological information, also known as genome annotation, is a complex process that requires the efficient integration of computational analyses, auxiliary sequence data, and biological expertise. We describe our ongoing work to create a collection of algorithms, methods and tools for annotating genome sequences, starting from i) tools and mathematical models for fast and accurate high-throughput alignment of cDNA sequences to a target genome, to generate primary data, to ii) methods for inferring genes and their variations (alternative splice variants) in genomic sequences from the primary evidence, and to iii) large-scale bioinformatics analyses of gene annotation data to extract biologically meaningful patterns such as models of exon evolution and potential underlying regulatory elements. Our tools are fast, accurate and efficient to meet the demands of timely and up-to-date annotation of newly sequenced model organisms.

12:30 p.m. Thursday February 22, 2007

Title: Prospects for association mapping in Lake Malawi cichlid fishes
Speaker: Thomas D. Kocher, Ph.D.
Venue: Biomolecular Science Building Room 3118

Genome projects are underway for several cichlid fish species. Most of the genomic resources have been developed from tilapia (Oreochromis niloticus), including genetic and BAC fingerprint maps. NIH has approved a project to develop a 5x draft assembly of tilapia. The haplochromine cichlids which dominate the East African lakes have also been targeted for sequencing. NIH has also approved sequencing 2x from Astatotilapia burtoni (Lake Tanganyika), 2x from Paralabidichromis chilotes (Lake Victoria) and 2x from Metriaclima zebra (Lake Malawi). This complex but fundamentally sparse data set may require new strategies for comparative assembly. DOE-JGI has deposited 0.1x shotgun coverage for each of 5 species of Lake Malawi cichlids in the Trace Archives. Since the radiation of species in Lake Malawi can be likened to a recombinant inbred panel, we hope to use SNPs mined from these data for association mapping of quantitative traits in the Lake Malawi cichlid species flock.

11:00 A.M. Monday February 26, 2007

Special Seminar (Faculty Candidate Talk)
Title: Learning predictive models of gene regulation
Speaker: Christina Leslie, Ph.D. (Computational Biology Group, Columbia University)
Venue: CSIC BUilding, Room 1115

Studying the behavior of gene regulatory networks by learning from high-throughput genomic data has become one of the central problems in computational systems biology. Most work in this area has focused on learning structure from data -- e.g. finding clusters or modules of potentially co-regulated genes, or building a graph of putative regulatory "edges" between genes -- and has been successful at generating qualitative hypotheses about regulatory networks.

Instead of adopting the structure learning viewpoint, our focus is to build predictive models of gene regulation that allow us both to make accurate quantitative predictions on new or held-out experiments (test data) and to capture mechanistic information about transcriptional regulation. Our algorithm, called MEDUSA, integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, the algorithm learns to predict up/down expression of target genes by identifying condition-specific regulators and discovering regulatory motifs that may mediate their regulation of targets. We use boosting, a technique from statistical learning, to help avoid overfitting as the algorithm searches through the high dimensional space of potential regulators and sequence motifs. We will report computational results on the yeast environmental stress response, where MEDUSA achieves high prediction accuracy on held-out experiments and retrieves key stress-related transcriptional regulators, signal transducers, and transcription factor binding sites. We will also describe recent results on the hypoxic response in yeast, where we used MEDUSA to propose the first global model of the oxygen sensing and regulatory network, including new putative context-specific regulators. Through our experimental collaborator on this project, the Zhang Lab at Columbia University, we are in the process of validating our computational predictions with wet lab experiments.

11:00 A.M. Wednesday February 28, 2007

Special Seminar (Faculty Candidate Talk)
Title: Deciphering Information Encoded in the Dark Matter of the Human Genome
Speaker: Xiaohui Xie, Ph.D. (Broad Institute of Massachusetts Institute of Technology and Harvard University)
Venue: CSIC BUilding, Room 1122

Among the 3 billion bases contained in the human genome, only 1.5% are well characterized, primarily in the form of protein-coding genes. One of the main challenges in genomics is to understand the function of the other 98.5% of the genome. Comparison of the human genome to several other related genomes has revealed that these regions harbor a strikingly large number of highly conserved noncoding elements, accounting for over two-thirds of the portion of the human genome under selection.

And yet the function of these conserved noncoding elements (CNEs) remains largely unknown. We also know little about their evolutionary origins, or the molecular mechanisms that have preserved them through millions of years' evolution.

I will describe computational methods for systematically dissecting the function of the CNEs. Using statistical analysis and comparative genomics, we have uncovered hundreds of novel regulatory motifs within the CNEs, matching hundreds of thousands of conserved instances in the genome. These motifs form distinct classes, including transcriptional regulatory elements, small RNA genes, microRNA targeting sites, and chromatin barriers.

I will also describe an effort to characterize the evolution of regulatory sequences. I will propose the creative role of transposable elements as a major force for duplicating and dispersing regulatory elements in the human genome. Comparison of metatherian and eutherian genomes reveals that over 15% of the eutherian CNEs arose from sequence inserted by transposons.

In a few years, genome sequences of over 50 mammals will become available. I will discuss how these data will empower the methods I have described, and provide us an opportunity to unravel all information coded in the human genome.

12:30 p.m. Thursday March 1, 2007

Title: Systems biology as seen from inside the Drosophila blastoderm
Speaker: John Reinitz, Ph.D. (The State University of New York at Stony Brook)
Venue: Biomolecular Science Building Room 3118

This talk will be concerned with two fundamental questions. The first is the determination of a moprphogenetic field, and the second is the control of transcription in metazoan genes with large promoters.

One of the central ideas in animal development is that of the determination of cell fates in a morphogenetic field. A second central idea, or perhaps observation, is that morphogenetic fields are capable of regulation, a classical term for the correction of errors. In the past, regulation was investigated by surgical perturbation of embryos. In the modern context regulation can also be studied in the context of genetic perturbations or of individual variations in gene expression in an isogenic population. We consider this problem in the early embryo of the fruit fly Drosophila, a well characterized system for molecular developmental genetics which can also be used as a naturally grown differential display system for reverse engineering networks of genes. This system is being used by ourselves and others to address fundamental questions about the reliability of developmental processes.

In the Drosophila system which we study, determination of the morphogenetic field is implemented by means of differential regulation of transcription. The control of this process by groups of binding sites is as yet poorly understood. We present a new model of transcriptional control and show how it can be used to understand anomalous expression of even-skipped stripe 7 and to predict the results of site directed mutagenesis experiments.


John Reinitz works in the Department of Applied Mathematics and Statistics at Stony Brook University, although his flies live across campus at the Center for Developmental Genetics. Starting in 1982, Professor Reinitz has been using methods from quantitative biology, bioinformatics, mathematics, and numerical computing to investigate fundamental problems in gene regulation and development. His PhD work under the direction of J. Rimas Vaisnys, "A Theoretical and Experimental Analysis of a Genetic Switch in Phage Lambda" (Yale, 1988) explored a simple system. Since then he has focused on the Drosophila blastoderm, spending time at Columbia University (with Dr. Michael Levine), the Santa Fe Institute (where he remains an external faculty member), Yale Medical School, Mount Sinai School of Medicine and Stony Brook University, where he has been since 2001.

11:00 A.M. Wednesday March 7, 2007

Special Seminar (Faculty Candidate Talk)
Title: Computational Prediction of Protein Structure and Transcription Termination Signals
Speaker: Carl Kingsford, Ph.D. (University of Maryland, College Park)
Venue: CSIC Building, Room 1115

Because experimentalists generate sequences of new genes more quickly than the corresponding 3D protein structures can be determined, computational methods for predicting a protein's shape from its amino acid sequence are necessary. I will discuss my work applying mathematical programming to finding the optimal (i.e. lowest-energy) configuration of protein side chains, given only the protein's sequence and backbone shape. This approach has been used successfully for homology modeling and for designing proteins with desired shapes. While we have shown the underlying graph problem to be NP-hard to approximate, our method can find optimal solutions to real-world instances quickly. In addition, our method is easily extensible to other settings.

In the second half of my talk, I will address a separate problem concerning the organization of bacterial genomes. In many bacteria, transcription of DNA to RNA is terminated by a signal in the DNA called a Rho-independent transcription terminator. Detecting such terminators can shed light on the grouping of genes into transcription units and can improve gene function prediction.

I will describe a computational method to rapidly and accurately find these Rho-independent transcription terminators. We have used our method to predict the locations of terminators in 343 prokaryotic genomes. This is the largest collection of such predictions available, and they have had immediate utility in the study of biological phenomena: Using them, we have developed new insights about the relationship between terminators and DNA uptake signals (a type of genomic signal involved in importing external DNA into the cell) and discovered a new uptake signal in the organism Haemophilus ducreyi.

These two topics illustrate types of contributions that computer science can make to the biological sciences and also demonstrate the breadth of computational techniques that must be brought to bear to make sense of biological data.

12:30 p.m. Thursday March 8, 2007

Title: Human-specific gene inactivation or modification by coding sequence disruptions
Speaker: Yoonsoo Hahn, Ph.D. (National Cancer Institute, National Institutes of Health)
Venue: Biomolecular Science Building Room 3118

Some of the loss of ape characters and gain of human traits can be achieved by gene inactivation during human evolution. We devised bioinformatics methods for systematic identification of putative human-specific coding region disruptions that might have occurred after the human and chimpanzee divergence. First, we collected human genes showing an insertion, a deletion, or a premature stop codon when compared with the orthologous chimpanzee genome sequence. Then, we selected those cases wherein the chimpanzee ortholog maintains the ancestral open reading frame as demonstrated by the presence of an intact homolog in a third species. Using this procedure, we identified seven frameshift, nine nonsense, and two exon-deletion mutations in the human genes, which have not been reported previously. Possible functional influence of the mutation on each gene will be discussed. We propose that inactivation or modification of genes by coding sequence disruption is a part of the normal process of induction and facilitation of certain phenotypic traits in the human.


Yoonsoo Hahn is a post-doctoral Visiting Fellow at National Cancer Institute (NCI), NIH where he works with Dr. BK Lee. He earned his PhD in Molecular Biology in 2000 at Korea Advanced Institute of Science and Technology, Korea. He served as a Senior Scientist at Korea Research Institute of Bioscience and Biotechnology until he joined NCI in 2002. His research interests are to identify and characterize genetic changes in the human genome during evolution and to relate them with human-specific phenotypic traits.

11:00 A.M. Monday March 12, 2007

Special Seminar (Faculty Candidate Talk)
Title: Bayesian Learning for Deciphering Gene Regulation
Speaker: Yuan (Alan) Qi, Ph.D. (Massachusetts Institute of Technology)
Venue: A.V. Williams Building, Room 2460

Gene regulation plays a fundamental role in biological systems. As more high-throughput biological data becomes available it is possible to quantitatively study gene regulation in a systematic way. In this talk I will present my work on three problems related to gene regulation: (1)identifying genes that affect organism development; (2) detecting protein-DNA binding events and cis-regulatory elements; (3) and deciphering regulatory cascades at the transcriptional levels for stem cell development. To address these problems, I developed biologically interpretable Bayesian models and designed novel learning methods. They capture key aspects of biological processes and make functional predictions, some of which were confirmed by biological experiments. I will conclude with brief descriptions of my plan for future work, including fusing multiple data sources and deciphering gene regulation at the post-transcriptional level.

11:00 A.M. Wednesday March 14, 2007

Special Seminar (Faculty Candidate Talk)
Title: Inferring biological networks from diverse genomic data
Speaker: Chad Myer (Princeton University)
Venue: CSIC Building, Room 1115

Understanding protein function and modeling protein-protein interactions in biological networks is a key challenge in modern systems biology. Recent developments in biotechnology have enabled high-throughput measurement of several cellular phenomena including gene expression, protein-protein interactions, protein localization, and sequence. The wealth of data generated by such technology promises to support computational prediction of network models, but so far, successful approaches that translate these data into accurate, experimentally testable hypotheses have been limited.

I will discuss key insights into why we face this imbalance between genomic data and established knowledge and present computational approaches for addressing these challenges. Specifically, I will focus on methods for measuring genomic dataset reliability and illustrate how reliability often varies across different biological contexts. We have developed a Bayesian framework for leveraging this variation to improve network prediction accuracy and implemented this approach in a public, web-based system for user-driven search and visualization of genomic data. I will describe the supporting machine learning methods as well as important data visualization features, which play a critical role in making the system practical. To illustrate the power of our approach, I will demonstrate how we have used it to correctly predict function for several previously uncharacterized genes in yeast and to elucidate the behavior of Hsp90, a target of recent cancer drugs. I will close with a brief overview of plans for future research motivated by this work.

12:30 p.m. Thursday March 15, 2007

Title: Peptide Identification by Spectral Matching of Tandem Mass Spectra using Hidden Markov Models
Speaker: Xue Wu
Venue: Biomolecular Science Building Room 3118

Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these work- flows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach to spectral matching, in which many examples of a peptide s fragmentation spectrum are summarized in a generative probabilistic model that captures not only the expected ion intensities, but also the variation in the intensities of the peak. Results show HMMs can identify many additional mass spectra not identified by traditional tandem mass spectrometry database search engines such as X!Tandem.

12:30 p.m. Thursday March 29, 2007

Title: The Molecular Basis for Cold-Adaptation, and the Evolution of Polar Protist Floras
Speaker: Michael P. Cummings, Ph.D.
Venue: Biomolecular Science Building Room 3118

Extreme cold presents a formidable challenge for life. For example, microtubules, which are required for cell division, spontaneously depolymerize at cold temperatures. However, some organisms, particularly some protists require cold temperatures to live. To understand the nature of cold-adaptation at the molecular level, we examined a large sample of cold-adapted and warm adapted tubulins. Using machine learning methods we identified the residues associated with cold-adaptation. Additionally we used the tubulin sequences and rDNA sequences from the same organisms to address hypotheses regarding the origins of polar floras.

12:30 p.m. Thursday April 5, 2007

Title: cis-Regulatory Sequence Evolution Across the Metazoa
Speaker: Cristian I. Castillo-Davis, Ph.D.
Venue: Biomolecular Science Building Room 3118

While recent large-scale studies have revealed which functional classes of protein-coding sequences are highly conserved in different species, little is known about genome-wide rates of noncoding sequence change across the Metazoa relative to gene function. Here, we investigate divergence in 5' proximal noncoding sequences in six genomes, between representative species pairs in three morphologically and phylogenetically distinct animal phyla: Chordata, Arthropoda, and Nematoda. Results reveal a consistent pattern within each phylum; the most highly conserved 5 noncoding sequences in each genome are proximal to genes involved in basic developmental processes including embryogenesis, organogenesis and neurogenesis as determined by database annotations, microarray experiments, and whole genome RNAi data. These results are consistent with greater cis-regulatory complexity in developmental genes and/or stronger purifying selection on developmentally-related regulatory sequences in animals. These findings suggest a shared genomic regulatory architecture across the higher Metazoa.

12:30 p.m. Thursday April 12, 2007

Title: Wavelet Transformation and Genome Analysis
Speaker: Jiuzhou (John) Song, Ph.D.
Venue: Biomolecular Science Building Room 3118

Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

12:30 p.m. Thursday April 19, 2007

Title: Using Annotations from Controlled Vocabularies to Find Meaningful Associations
Speaker: Woei-Jyh (Adam) Lee
Venue: Biomolecular Science Building Room 3118

In this talk, I will present the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multiple repositories, and to identify meaningful associations. Consider a physical link between objects in two repositories, where each of the objects is annotated with controlled vocabulary (CV) terms from two ontologies. Using a set of LSLink instances generated from a background dataset of knowledge we identify associations between pairs of CV terms that are potentially significant and may lead to new knowledge. We develop an approach based on the logarithm of the odds (LOD) to determine a confidence and support in the associations between pairs of CV terms. Using a case study of Entrez Gene objects annotated with GO terms linked to PubMed objects annotated with MeSH terms, we describe a user validation and analysis task to explore potentially significant associations.

12:30 p.m. Thursday April 26, 2007

Venue: Biomolecular Science Building Room 3118

Title: Finding misassemblies in draft genomes
Speaker: Guillaume Marçais

I will describe new methods for finding misassemblies in genomes. I will present an example of a 4000-base omission in Drosophila melanogaster.

Title: Closing Gaps in Assemblies
Speaker: Poorani Subramanian

Closing gaps in draft assemblies often involves resequencing or other expensive and time consuming techniques. We propose an algorithm for closing gaps using existing data and discuss its uses in solving similar problems.

12:30 p.m. Thursday May 3, 2007

Title: Detection of Pathogens in the Presence of Complex Backgrounds
Speaker: Yuriy Fofanov, Ph.D. (University of Houston)
Venue: Biomolecular Science Building Room 3118

Reliable detection and identification of pathogens in complex biological samples or in the presence of contaminating DNA from a variety of sources, is compounded by the difficulty in finding a single, unique genomic sequence that is present simultaneously in all genomes of a pathogen species and absent in the genomes of the host and/or sample background. A variety of nucleic acid-based tests have been developed for viral pathogen identification, including PCR, microarrays, etc. Despite this, the probability of false positives due to mispriming with the host/background DNA remains a problem.

We have developed a set of novel algorithms that make it possible to efficiently calculate for each subsequence in the target (pathogen) genome the number of base changes necessary to convert a signature sequence to the closest sequence present in the host genome where all possible base changes and combinations of base changes are considered. This allows exclusion of all subsequences that are present in a selected host/background genome (e.g., human) in the PCR primer and/or microarray probe design step with greatly increased speed and effectiveness compared to current design methods. As a result, we are able to identify ultraspecific signatures for pathogen detection. These ultraspecific signatures greatly improve the reliability of a detection assay as it is less likely to misprime with non-target organisms and thus has a lower probability of false positive identification.

While ultraspecific signatures have worked well within the laboratory, application to real clinical and environmental samples must be considered as both often have an unknown number of genomic material also present. Knowledge of the total genomic diversity, incorporating the total length of all genomes present in clinical, environmental (air, water, soil, or surface) or food samples, is critical for genome-based identification approaches as it allows one to estimate the probability of false positives and determine the number and length of probes/primers needed. We are currently developing new DNA technology for the estimation of the effective genomic sizes of environmental and clinical samples. Coupled with the ultraspecific design strategy for improved quality signature selection, more robust and reliable assays can be designed for essentially any organism of interest in any complex sample.


Dr. Yuriy Fofanov received M.Sc. in 1977 and Ph.D. in 1988 at Kuibyshev (Samara) State University (USSR). He is currently an Assistant Professor at the Dept. of Computer Science in University of Houston, and an Adjunct Assistant Professor at the Dept. Health Informatics in the School of Health Information Science in University of Texas since 2001. His research includes population scale HLA typing, new approach to detect the presence of foreign DNA in human clinical samples, tools for ultraspecific probe/primer design, and bioinformatics approach and assay development for the estimation of the total genomic diversity of complex backgrounds.