CBCB Seminar Series


Spring 2009



2 p.m. Thursday February 5, 2009


Title: Improved assembly of the Bos taurus genome.
By: Aleksey Zimin, Institute for Physical Science and Technology, University of Maryland
Venue: Biomolecular Science Building Room 3118
Abstact: A genome of cow (Bos taurus) was recently sequenced and assembled by Baylor College of Medicine (BCM) Human Genome Sequencing Center (HGSC). The BCM's latest draft is called Btau4.2. We produced an independent assembly from the public Trace Archive Data using a variety of methods, including the Celera Assembler, the UMD Overlapper, and additional assembly debugging, mapping, and improvement tools. We used publicly available map data to map the scaffolds onti the chromosomes. Our latest draft is called Bos_taurus_UMD_2.0 and it was released in November 2008. Bos_taurus_UMD_2.0 places almost 6% more sequence onto the chromosomes and fixes a number of large inversions/omissions that are present in Btau4.2 and are independentely verified by our collaborators. In this talk the two assemblies will be compared on a variety of criteria including quantitative measures, agreement to the published maps and amount of coding sequence present. Procedures used to create the assembly and map the assembled scaffolds onto the chromosomes will be described briefly. Our assembly is publicly available and it is posted on our ftp site ftp.cbcb.umd.edu/pub/data/assembly/Bos_taurus/

Also: A brief meeting to discuss the schedule for Spring 2009.


2 p.m. Friday February 6, 2009

(UMIACS Special Seminar)
Title: Achieving Anonymity in Clinical Genomics Databases: Is it Possible?
By: Bradley A. Malin, Ph.D., Vanderbilt University
Venue: A.V. Williams Building Room 3258
Abstract:

For years, medical researchers have been directed to de-identify patients' health records and biological data before such information is shared beyond the collecting institution. This policy is reinforced by Institutional Review Boards, as well as regulations at the state and federal level, such as the Privacy Rule of the Health Insurance Portability and Accountability Act. De-identified data appears to be protected; however, the decreasing costs, and increasing adoption, of information and networking technologies have created a complex landscape that has eroded the protections afforded by such policies. Consequentially, our research has exposed that de-identification provides little in the form of protection guarantees. In this talk, I will review various automated approaches we have developed to link patients' identities to seemingly anonymous biomedical data, often using nothing more than publicly-available information. Yet, I will also explore why all hope is not lost and how we can integrate policy with statistical and computational formalisms to provably measure the risks associated with sharing data according to various policies, as well as how to provably protect patients' records from privacy invading attacks without preventing the workflow of worthwhile biomedical research endeavors. This talk will draw upon real emerging biomedical research infrastructures, such as de-identified repositories of electronic medical and genomic records at the National Institutes of Health.

Biography:

Brad Malin is an Assistant Professor of Biomedical Informatics in the School of Medicine and an Assistant Professor of Computer Science in the School of Engineering at Vanderbilt University. He is the founder and director of Vanderbilt's Health Information Privacy Laboratory (HIPLAB), which integrates computer science, policy, and biomedical knowledge to construct privacy enhancing technologies for emerging health information systems. His research on data privacy in electronic medical and genomic repositories has received several awards from the American and International Medical Informatics Associations and has been cited in various congressional briefings. Among other sponsored research projects, he currently directs a program in data privacy risk evaluation and protection for the National Human Genome Research Institute at the National Institutes of Health. He received a doctorate and master's in computer science, a master's in public policy and management, and a bachelor's in biological sciences, all from Carnegie Mellon University.

2 p.m. Thursday February 12, 2009

Title: LOCST: a Low Complexity Sequence Search Tool
By: Stephen M. Mount, University of Maryland
Venue: Biomolecular Science Building Room 3118
Abstract: Alignment-based tools such as blast are in widespread use for identifying similar proteins. Low-complexity regions are typically not included in such alignments even though they are often important for function. Examples include argnine-serine-rich proteins involved in splicing and proline-rich, glutamine-rich and acidic transcription activation domains. An approach for identifying and evaluating similar low-complexity regions within proteins based on shared repeated dipeptides will be presented, as will its implementation in the program LOCST (Low Complexity Sequence Search Tool). This is work was performed with Nicolas Tilmans and Stephen Fiorelli.


2 p.m. Thursday February 26, 2009

Title: Protein Annotation Prediction By Clustering Within Interaction Networks
By: Carl Kingsford, University of Maryland
Venue: Biomolecular Science Building Room 3118
Abstract: Determining protein function is a fundamental biological challenge, and protein-protein interaction networks are an increasingly useful data source from which to computationally predict protein annotations. One approach to automated detection of protein complexes and prediction of biological processes is to divide an interaction network into biologically meaningful modules or clusters. I will present several graph clustering techniques and illustrate their usefulness for predicting protein annotations. I will describe a novel method to decompose a hierarchical tree decomposition into a collection of clusters that optimally match a set of known annotations. We find that our approach generally outperforms commonly used heuristics for identifying protein complexes from hierarchical clusterings. The technique is general and may be of use in other applications where hierarchical clustering is used. I will also show how a graph compression technique called graph summarization leads to more biologically meaningful modules that other graph clustering algorithms. Time permitting, I will also describe how protein interaction networks can be used to transfer functional annotations between species.

4:00 p.m. Monday March 23, 2009

Title: Characterization of Human Epigenomes
By: Keji Zhao
Senior Investigator
Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institute of Health
Venue: Room 0467 Animal Sciences Building


2:00 p.m. Thursday March 26, 2009

Title: Protein recognition and gating in the ribosome exit tunnel
By: Paula Petrone, Stanford University Department of Biophysics, Group of Prof. Dr. V. Pande
Venue: Biomolecular Science Building Room 3118
Abstract: The ribosome is a large complex catalyst responsible for the synthesis of new proteins, an essential function for life. New proteins emerge from the ribosome through an exit tunnel as nascent polypeptide chains. Recent findings indicate that tunnel interactions with the nascent polypeptide chain might be relevant for the regulation of translation. However, the specific ribosomal structural features that mediate this process are unknown. In my talk, I will address the computational methods I have developed for the study of the physicochemical environment of the tunnel. By looking at the interactions between components of the ribosome exit tunnel and different chemical probes, our simulations indicate that transport out of the tunnel could be different for diverse amino acid species. By relating our simulation data to earlier biochemical studies, our analysis provides a context for interpreting sequence-dependent nascent chain phenomenology in the ribosome tunnel.

11:00 a.m., Tuesday March 31, 2009

Title: pplacer: Bayesian phylogenetic placement of metagenomic short reads
By: Erick Matsen, U.C. Berkeley
Venue: Biomolecular Science Building Room 3118
Abstact: An abundance of metagenomic short reads raises a very difficult question for bioinformaticians: how these short reads fit into previously-characterized diversity? Equally as important, how do we get confidence intervals on these placements? In this talk I will present "pplacer", which places short reads in a user-supplied reference gene tree. Pplacer takes a statistically rigorous Bayesian approach, where positions of the fragment sequence are evaluated according to normalized posterior probability; because we are fixing a reference tree, we can perform direct numerical integration over the likelihood function to obtain confidence estimates rather than resorting to MCMC. Pplacer is the first stand-alone such program which allows the user to supply a reference alignment and tree; it can be used via a simple command line interface or as part of a pipeline. We have also implemented a large-scale fragment simulation pipeline which allows the user to empirically determine an appropriate "cutoff" for accurate short read placement. Such simulations have also given us a new perspective on what phylogenetic placement scores mean, namely that posterior probability being spread over a number of locations can indicate global rather than local uncertainty.


1:00 p.m. Thursday, April 2, 2009

Title: Novel approaches to metagenomic analysis
By: James Robert White
Venue: Biomolecular Science Building Room 3118
(CBCB seminar and presentation for AMSC candidacy exam; note earlier time)
Abstact: The human body plays host to thousands of bacterial species in a variety of ecosystems. Until recently, microbial communities have been impossible to investigate thoroughly, as the vast majority of bacteria cannot be cultured through laboratory techniques. New technologies (e.g. high-throughput sequencing, 16S rRNA surveys) allow us to deeply sample the genetic content of a microbial environment in order to estimate its overall composition and functional capacity. Recent studies in this context have revealed that human obesity has a microbial component: obese gut microbiomes are distinct from the lean population. This result indicates potential therapeutic approaches to treating obesity by manipulating gut microflora. However, our limited knowledge of the microbial interactions in the gut hinders our ability to design future experiments or effective treatments. Using 16S rRNA time-series sequence data from obese individuals on a one-year diet, I have employed a mathematical model to study microbial population dynamics in the human gut. In this talk I will discuss the model formulation and predicted competitive and commensal interactions among dominant phyla in the distal gut. I will further discuss the application of this model to estimate the potential impact of prebiotic and probiotic therapies for treating human obesity. Through this problem, I hope to illustrate the insight mathematical modeling can bring to the field of metagenomics.