CBCB Research in Progress Series (RIPS)

UPDATE (Sept. 13, 2020): The RIPS talks will be held virtually until a general return to campus and moderately-sized in person gathers are deemed safe (so, at least for the current semester). However, this is a great opportunity for everyone in our CBCB community to come together, and to learn about the research being done by our colleagues in different labs within the center. The format for the RIPS talks will be the same as in previous semesters; each week we will look to have 2 30-minute talks. However, if you feel you need an hour slot, please specify this in the second row of the corresponding date on the spreadsheet. The signup sheet for this semester is here. The day and time for the RIPS seminar this semester will be Thursdays from 10AM-11AM. Request the zoom meeting ID and passcode to view this weeks RIPS talk.

The CBCB RIP series provides an informal forum for computational biologists to keep abreast of colleagues' projects, to help students and postdocs hone their presentation skills, and to get expert feedback on new or ongoing projects. The forum is targeted towards anyone working at the interface of Biology and Analytical sciences.


Other seminars you may be interested in attending can be found HERE

    Fall Semester 2020

    Date

    Speaker

    PI/Lab/Host

    Topic & Abstract

    Time (if other than 10am)

    10/22/20

    Dr. Justin Zook

    NIST

    Genome in a Bottle Benchmarks for Challenging Genome Regions
    Abstract: The NIST-led Genome in a Bottle Consortium has developed widely used benchmarks for human genome sequencing and variant calling. Recently, we’ve used long reads and new de novo assembly and AI-based methods to expand these benchmarks to increasingly challenging regions of the genome such as segmental duplications and the Major Histocompatibility Complex (MHC). These benchmarks were used in the precisionFDA Truth Challenge V2 to encourage innovation in variant calling from short and long reads in these challenging regions, as well as in the Human Pangenome Reference Consortium to benchmark diploid assembly methods. We’re now working on using diploid assembly methods and work from the Telomere-to-Telomere Consortium to expand these benchmarks to some of the most challenging, repetitive regions of the genome.

    10/15/2020

    Jason Fan

    Patro Lab

    Matrix (Factorization) Reloaded: Flexible Methods for Imputing Genetic Interactions with Cross-Species and Side Information
    Abstract: Motivation: Mapping genetic interactions (GIs) can reveal important insights into cellular function, and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker’s yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data is scarce. Results: We address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker’s and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost.