CBCB Seminar Series
Spring 2009
2 p.m. Thursday February 5, 2009
Title:
Improved assembly of the Bos taurus genome.
By: Aleksey Zimin, Institute for
Physical Science and Technology, University of Maryland
Venue: Biomolecular Science
Building Room 3118
Abstact:
A genome of cow (Bos taurus) was recently sequenced and assembled by
Baylor College of Medicine (BCM) Human Genome Sequencing Center
(HGSC). The BCM's latest draft is called Btau4.2. We produced an
independent assembly from the public Trace Archive
Data using a variety of methods, including the Celera Assembler,
the UMD Overlapper, and additional assembly debugging, mapping, and
improvement
tools. We used publicly available map data to map the scaffolds onti the
chromosomes. Our latest draft is
called Bos_taurus_UMD_2.0 and it was released in November 2008.
Bos_taurus_UMD_2.0 places almost 6% more sequence onto the chromosomes and
fixes a number of large inversions/omissions that are present in
Btau4.2 and are independentely verified by our collaborators.
In this talk the two assemblies will be compared on a variety of
criteria including quantitative measures, agreement to the published
maps and amount of coding sequence present. Procedures used to create
the assembly and map the assembled scaffolds onto the chromosomes will
be described briefly.
Our assembly is publicly available and it is posted on our ftp site
ftp.cbcb.umd.edu/pub/data/assembly/Bos_taurus/
Also: A brief meeting to discuss
the schedule
for Spring 2009.
2 p.m. Friday February 6, 2009
(UMIACS Special Seminar)
Title: Achieving Anonymity in
Clinical Genomics Databases: Is it Possible?
By: Bradley A. Malin, Ph.D.,
Vanderbilt University
Venue: A.V. Williams
Building Room 3258
Abstract:
For years, medical researchers have been directed to de-identify patients'
health records and biological data before such information is shared
beyond the collecting institution. This policy is reinforced by
Institutional Review Boards, as well as regulations at the state and
federal level, such as the Privacy Rule of the Health Insurance
Portability and Accountability Act. De-identified data appears to be
protected; however, the decreasing costs, and increasing adoption, of
information and networking technologies have created a complex landscape
that has eroded the protections afforded by such policies.
Consequentially, our research has exposed that de-identification provides
little in the form of protection guarantees. In this talk, I will review
various automated approaches we have developed to link patients'
identities to seemingly anonymous biomedical data, often using nothing
more than publicly-available information. Yet, I will also explore why
all hope is not lost and how we can integrate policy with statistical and
computational formalisms to provably measure the risks associated with
sharing data according to various policies, as well as how to provably
protect patients' records from privacy invading attacks without preventing
the workflow of worthwhile biomedical research endeavors. This talk will
draw upon real emerging biomedical research infrastructures, such as
de-identified repositories of electronic medical and genomic records at
the National Institutes of Health.
Biography:
Brad Malin is an Assistant Professor of Biomedical Informatics in the
School of Medicine and an Assistant Professor of Computer Science in the
School of Engineering at Vanderbilt University. He is the founder and
director of Vanderbilt's Health Information Privacy Laboratory (HIPLAB),
which integrates computer science, policy, and biomedical knowledge to
construct privacy enhancing technologies for emerging health information
systems. His research on data privacy in electronic medical and genomic
repositories has received several awards from the American and
International Medical Informatics Associations and has been cited in
various congressional briefings. Among other sponsored research projects,
he currently directs a program in data privacy risk evaluation and
protection for the National Human Genome Research Institute at the
National Institutes of Health. He received a doctorate and master's in
computer science, a master's in public policy and management, and a
bachelor's in biological sciences, all from Carnegie Mellon University.
2 p.m. Thursday February 12, 2009
Title:
LOCST: a Low Complexity Sequence Search Tool
By: Stephen M. Mount, University of Maryland
Venue: Biomolecular Science
Building Room 3118
Abstract: Alignment-based tools such as blast are in
widespread use for identifying similar proteins. Low-complexity regions are typically
not included in such alignments even though they are often important for function. Examples
include argnine-serine-rich proteins involved in splicing and proline-rich, glutamine-rich and
acidic transcription activation domains. An approach for identifying and evaluating similar
low-complexity regions within proteins based on shared repeated dipeptides will be presented,
as will its implementation in the program LOCST (Low Complexity Sequence Search Tool).
This is work was performed with Nicolas Tilmans and Stephen Fiorelli.
2 p.m. Thursday February 26, 2009
Title:
Protein Annotation Prediction By Clustering Within Interaction Networks
By: Carl Kingsford, University of Maryland
Venue: Biomolecular Science
Building Room 3118
Abstract:
Determining protein function is a fundamental biological challenge, and
protein-protein interaction networks are an increasingly useful data source
from which to computationally predict protein annotations. One approach to
automated detection of protein complexes and prediction of biological processes
is to divide an interaction network into biologically meaningful modules or
clusters. I will present several graph clustering techniques and illustrate
their usefulness for predicting protein annotations. I will describe a novel
method to decompose a hierarchical tree decomposition into a collection of
clusters that optimally match a set of known annotations. We find that our
approach generally outperforms commonly used heuristics for identifying protein
complexes from hierarchical clusterings. The technique is general and may be
of use in other applications where hierarchical clustering is used. I will
also show how a graph compression technique called graph summarization leads to
more biologically meaningful modules that other graph clustering algorithms.
Time permitting, I will also describe how protein interaction networks can be
used to transfer functional annotations between species.
|
|