CBCB Seminar Series

Most CBCB seminars are held from 2 p.m. until 3:15 p.m. on Thursdays in the CBCB seminar room, 3118 at Biomolecular Sciences Building #296
Some external seminars are listed here. These, and some other exceptions, will have a different time and/or place.
For directions to CBCB please scroll down.

2:00 pm Thursday, Feb. 11, 2010

Title: " HMMER: a new generation of homology search software"

By: Sean Eddy

Venue: 1103 Biosciences Research Bldg.

Abstract:Database homology searching might be the most important application in computational molecular biology, and since the 1990s, BLAST has been our main workhorse. Since BLAST's introduction, theoretical advances have been made in applying probabilistic inference methods to homology searches using hidden Markov model (HMM) approaches. General adoption of probabilistic methods has been limited by some key problems, including the fact that the popular HMM implementations (including my HMMER software) are computationally demanding. I will talk about HMMER3, a new generation of HMMER that aims to even more fully deploy probabilistic inference technology on homology searches, while at the same time attaining (and perhaps soon surpassing) BLAST's speed.

Speaker information:Sean Eddy is a group leader at Janelia Farm.
This talk was canceled due to snow and has been rescheduled for Sept. 8, 2011.

11:30 a.m. Monday, Feb. 22, 2010

Title: "Copy number variation detection from SNP genotyping data and next-generation sequencing data"

By: Kai Wang

Venue: 1140 BPS Biology-Psychology

Speaker information: kai.genotypic.com

2:00 p.m. Wednesday, Feb. 24, 2010

Title: "Protein interaction networks in viruses and bacteria"

By: Peter Uetz

Venue: 1103 BRB

Speaker information: JCVI web site

3:00-4:30 p.m. Wednesday, March 3, 2010

Title: "A Teaching Career to Facilitate Student Learning"

By: Malcolm Campbell

Venue: 1103 BRB (Bioscience Research Bldg., not Biomolecular Sciences)

More information: Dr. Malcolm Campbell is a biology professor at Davidson College, NC and the founding director of the Genome Consortium for Active Teaching (GCAT). This presentation is hosted by the Teaching and Learning Center of the College of Chemical and Life Sciences.

2:00 p.m., Thursday, March 4, 2010

Title: "Undergraduates Use Synthetic Biology to Build Bacterial Computers"

By: Malcolm Campbell

Venue: 3118 Biomolecular Sciences

Week of March 8. Special Seminars by Prof. Zhaobang Zeng, NC State University

Date and Time: Monday, March 8, 4:00 pm
Title: "Mapping multiple QTLs with epistasis: theory, method and practice"

Date and Time: Tuesday, March 9, 4:00 pm
Title: "eQTL Mapping analysis"

Date and Time: Wednesday, March 10, 4:00 pm
Title: "Identifying network structure from eQTL mapping and Epistasis and Gene pathway inference"

Date and Time: Thursday, March 11, 2:00
Title: "Study genetic basis and pathways of complex traits"

By: Dr. Zhao-Bang Zeng, North Carolina State University

Venue: 0467 Animal Sciences (reading room)

Speaker information: Zhao-Bang Zeng
Host: Jiuzhou Song Ph.D, Animal Sciences

2:00 p.m. Thursday, March 25, 2010

Title: Computational discovery of epistatic interactions in P. falciparum - A tale of two loci

By:
Yang Huang, Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH

Venue: 3118 Biomolecular Sciences (as usual)

Abstact: Identification of epistatic interactions between genomic loci is fundamental for understanding genome organization and gene regulation. In the last decade, expression Quantitative Trait Loci (eQTL) studies have been widely used to determine the relation between single-locus genotype and gene expression. However, computational and statistical challenges have limited genome-wide studies of epistatic interactions affecting gene expression. Previously, we developed a Graph based eQTL Decomposition method (GeD) that allows us to model genotype and expression data using an eQTL association graph. Based on the eQTL association graph, We developed a new method for reliable detection of epistatic interactions that can overcome some of the statistical limitations of classical methods and applied it to uncover epistatic interactions among genomic loci of the human malaria parasite P. falciparum. In the landscape of epistatic interactions of this parasite segregating in a set of 34 haploid progeny, we found numerous hotspots with potentially important regulatory functions. We also observed that elevated linkage disequilibrium (LD) between two loci on different chromosomes in P. falciparum correlates with the number of regulated target genes regulated jointly by these loci. Such results indicate LD's important role in maintaining parasite specific, biological functions.

2:00 p.m., Thursday, April 8, 2010

Title: "From genomics data integration to using functional relationship networks to understand disease at the molecular level"

By: Olga Troyanskaya

Venue: 3118 Biomolecular Sciences (directions)

Speaker information: Olga Troyanskaya

2:00 p.m., Friday, April 9, 2010

Title: "Transcript Assembly and Abundance Estimation with High-Throughput RNA Sequencing"

By: Cole Trapnell, CBCB

Venue: 3118 in CBCB (our usual venue)

Abstact: We present algorithms and statistical methods for the reconstruction and abundance estimation of transcript sequences from high throughput RNA sequencing ("RNA-Seq"). We evaluate these approaches through large-scale experiments of a well-studied model of muscle development.

We begin with an overview of sequencing assays and outline why the short read alignment problem is fundamental to the analysis of these assays. We then describe two approaches to the contiguous alignment problem, one of which uses massively parallel graphics hardware to accelerate alignment, and one of which exploits an indexing scheme based on the Burrows-Wheeler transform. We then turn to the spliced alignment problem, which is fundamental to RNA-Seq, and present an algorithm, TopHat. TopHat is the first algorithm that can align the reads from a large RNA-Seq experiment to a mammalian-sized genome without the aid of reference gene models.

In the second part of the thesis, we present the first comparative RNA-Seq assembly algorithm, Cufflinks, which is adapted from a constructive proof of Dilworth'sTheorem, a classic result in combinatorics. We evaluate Cufflinks by assembling the transcriptome from a time course RNA-Seq experiment of developing skeletal muscle cells. The assembly contains 13,689 known transcripts and 3,724 novel ones. Of the novel transcripts, 62% were strongly supported by earlier sequencing experiments or by homologous transcripts in other organisms. We further validated interesting genes with isoform-specific RT-PCR.

We then present a statistical model for RNA-Seq included in Cufflinks and with which we estimate abundances of transcripts from RNA-seq data. Simulation studies demonstrate that the model is highly accurate. We apply this model to the muscle data, and track the abundances of individual isoforms over development.

Finally, we present significance tests for changes in relative and absolute abundances between time points, which we employ to uncover differential expression and differential regulation. By testing for relative abundance changes within and between transcripts sharing a transcription start site, we find significant shifts in the rates of alternative splicing and promoter preference in hundreds of genes, including those believed to regulate muscle development.

A Dissertation Defense for the degree of Ph.D. in Computer Science

2:00 p.m., Wednesday, April 14, 2010

Title: "Whole-Genome Sequence Analysis for Pathogen Detection and Diagnostics"

By: Adam Phillippy, CBCB

Venue: 3118 Biomolecular Sciences

Abstact: Pathogenic microbes, both natural and weaponized, pose significant dangers to human health and safety. To defend against these threats, it is essential to rapidly detect and characterize pathogens in any environmental or clinical medium with high accuracy. Now that the genome sequences of thousands of bacteria and viruses are known, it is possible to design biomolecular tests to rapidly detect and characterize pathogens based solely on their DNA. Possible applications are far-reaching and include real-time clinical diagnosis and biosurveillance. However, these tests require sophisticated computational design and analysis to operate effectively.

This dissertation presents novel computational methods for improving the accuracy of three modern diagnostic technologies: polymerase chain reaction (PCR), array comparative genomic hybridization (CGH), and whole-genome sequencing. For designing real-time PCR detection assays, an efficient search algorithm and data structure are presented for analyzing over 100 billion nucleotides of genomic DNA to identify the most distinguishing sequences of a pathogen. Laboratory validation shows that these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic relatives. For CGH, pan-genome array design and analysis algorithms are presented for the characterization of microbial isolates. These methods are used to study multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies. These validated assemblies provide the ultimate diagnostic, decoding the entire DNA sequence of a genome with high confidence.

A Dissertation Defense for the degree of Ph.D. in Computer Science

2:00 p.m., Thursday, April 15, 2010

RECOMB 2010 Practice Talks

Venue: 3118 Biomolecular Sciences directions

Title (RECOMB 2010 Practice Talk): "Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs"

Authors: Barna Saha, Allison Hoch, Samir Khuller, Louiqa Raschid and Xiao-Ning Zhang

Speaker: Barna Saha, a third year Computer Science graduate student working with Samir Khuller on algorithm design and analysis.

Abstract: We focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.
-------------
Title (RECOMB 2010 Practice Talk): "Extracting between-pathway models from E-MAP interactions using expected graph compression"

Speaker: David Kelley

Abstract: Genetic interactions (such as synthetic lethal interactions) have become quantifiable on a large-scale using the epistatic miniarray profile (E-MAP) method. An E-MAP allows the construction of a large, weighted network of both aggravating and alleviating genetic interactions between genes. By clustering genes into modules and establishing relationships between those modules, we can discover compensatory pathways. We introduce a general framework for applying greedy clustering heuristics to probabilistic graphs.We use this framework to apply a graph clustering method called graph summarization to an E-MAP that targets yeast chromosome biology. This results in a new method for clustering E-MAP data that we call Expected Graph Compression (EGC). We validate modules and compensatory pathways using enriched Gene Ontology annotations and a novel method based on correlated gene expression from a comprehensive collection of expression experiments. EGC finds a number of modules that are not found by any of the previous methods to cluster E-MAP data. Further, EGC uncovers core submodules contained within several previously found modules, suggesting that EGC can reveal the finer structure of E-MAP networks.

1:00 p.m., Friday, April 16, 2010

Title: " High Performance Computing for DNA Sequence Alignment and Assembly"

By: Michael C. Schatz, CBCB

Venue: 3118 Biomolecular Sciences

Abstact: We are at the dawn of a new era in computational biology. DNA sequencing projects that required years of effort and hundreds of millions of dollars of equipment just a few years ago, can now be performed quickly and cheaply by individual labs. This dramatic shift is expanding the scale and scope of sequencing to previously unimaginable limits, and will ultimately lead to new discoveries about our basic biology, the diversity of life, and personalized medicine. However, these ambitious goals can only be realized if we can develop new computational methods that can effectively analyze the overwhelming volumes of data generated.

In my presentation, I'll describe my research developing efficient methods for analyzing large biological datasets, including by using highly parallel commodity graphics processing units produced by nVidia, and the parallel computing framework MapReduce developed by Google. My dissertation research demonstrates how these technologies can be applied to the critical tasks of large-scale alignment and assembly, enabling genotyping and de novo assembly of whole genome genomes from billions of short reads. Coupled with inexpensive cloud computing, these programs can quickly, cheaply, and accurately analyze tremendous biological datasets and have the potential to make otherwise infeasible studies practical.

A Dissertation Defense for the degree of Ph.D. in Computer Science

2:00 p.m., Thursday, April 29, 2010

Title: "Structural Assembly of Molecular Complexes Based on Residual Dipolar Couplings"

Speaker: Konstantin Berlin, a finishing PhD student in Computer Science

Venue: 3118 Biomolecular Sciences directions
Abstact: We present PATI, a computationally efficient and accurate abinitio predictor of the residual dipolar couplings (RDCs) from a protein structure. Building upon PATI, we develop and evaluate a rigid-body molecular docking method, called PATIDOCK, that relies solely on the three-dimensional structure of the individual components and the experimentally derived RDCs for the complex, and show that it is possible to accurately assemble a protein-protein complex by utilizing PATI to guide the docking method. The proposed docking method is robust against experimental errors in the RDCs and computationally efficient. We analyze the accuracy and efficiency of this method using experimental or synthetic RDC data for several proteins, as well as synthetic data for a large variety of protein-protein complexes. We also test our method on two protein systems for which the structure of the complex and steric-alignment data are available (Lys48-linked diubiquitin and a complex of ubiquitin and a ubiquitin-associated domain) and analyze the effect of flexible unstructured tails on the outcome of docking. The results demonstrate that it is fundamentally possible to assemble a protein-protein complex based solely on experimental RDC data and the prediction of the alignment tensor from three-dimensional structures. Additionally we show a method for combining RDCs with other experimental data, such as ambiguous constraints from interface mapping, to further improve structure characterization of the protein complexes.


Past Events



Other Events



Directions

More detailed transportation options to CBCB can be found here.

From the Capital Beltway to Parking Lot:
  • take Capital Beltway (I-495) Exit 25 and turn onto Baltimore Avenue (US Route 1) South
  • go two miles south on Baltimore Ave and enter the main gate at Campus Drive
  • take the right lane into campus and make first right turn onto Paint Branch Drive
  • stop at the first stop sign then pass Stadium Drive on the left
  • stop at the second stop sign then pass Parking Lot XX2 on the right
  • look for the Paint Branch Drive Visitor Lot on the left
  • turn left onto Technology Drive and park in the Paint Branch Drive Visitor Lot


From Parking Lot to CBCB:
  • the back of the Biomolecular Sciences Building #296 faces this parking lot
  • walk around to the front of the building and using the keypad near the front door
  • dial the number of one of the CBCB staff members in order to gain entrance to the building
  • CBCB is located on the third floor of the Biomolecular Sciences Building #296