SYLLABUS

CMSC828N: Computational Gene Finding and Genome Assembly


Tuesdays and Thursdays, 3:30-4:45, Room 3118 CSIC

Professor: Steven Salzberg, 3125 Biomolecular Sciences Building, salzberg (at) umd.edu
Office hours: By appointment.
Textbook: Computational Gene Prediction (CGP) by William H. Majoros
(available in class; reference section online here and a small erratum here)
Supplemental texts,
free online at the NCBI Bookshelf (click title to view):
Molecular Biology of the Cell, b
y Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.   Garland Publishing, 2002.
Genomes, by T.A. Brown, BIOS Scientific Publishers, 2002.

Note: syllabus will change as the semester progresses

Day 1: Thursday, Jan 26
Introduction to the course.  Molecular biology background.
Reading: CGP, "Introduction" chapter 1; "Mathematical Preliminaries" chapter 2.

Week 1: Jan 31-Feb 2
Biology background for gene finding. 
Computational gene finding defined.  Basic sequence alignment algorithms.
Reading: CGP, "Overview of Computational Gene Prediction" chapter 3.
Slides from Jan 31 lecture.
Laboratory 1.

Week 2: Feb 7-9
Bacterial gene finding.  Markov chains. 
Case study: the Glimmer gene finder. 
Reading: CGP, "Signal and Content Sensors" chapter 7, sections 7.1-7.2.
Feb 7 lecture notes and figures [1] [2].

Week 3: Feb 14-16
HMM algorithms: forward, Viterbi.  Signals in bacterial genomes: start sites, transcription terminators, operons.
Reading: 
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.
Lab 1 due Feb 16.
Get the Laboratory 2 assignment here.

Week 4: Feb 21-23
Case study: GlimmerHMM.  HMM algorithms continued: expectation maximization.
Reading: CGP, "Toy Exon Finder" chapter 5.
Slides from Mihaela Pertea's Feb 21 lecture on GlimmerHMM.
Suggested readings for class presentations here.

Week 5: Feb 28-Mar 2
Other eukaryotic gene finding topics.  cDNA and EST sequences, spliced alignment, alternative splicing, and micro-exons.
Reading: (1) CGP, "Hidden Markov Models" chapter 6; (2) Sean Eddy's Profile HMMs paper (link).

Week 6: Mar 7-9
Biotechnology background on sequencing, assembly.  Whole-genome shotgun sequencingMarch 9: special seminar at 2:00pm, followed by class discussion (at the usual class time) of RNA splicing machinery.
Readings: (a) Chapter 6, "Sequencing Genomes, in Genomes, by T.A. Brown, free at the NCBI Bookshelf.  (b) Gene Myers' 1999 intro paper on whole-genome sequencing.

Lab 2 due Mar 9.

Week 7: Mar 14-16
Class presentations on selected readings.
Reading: CGP, "Signal and Content Sensors", chapter 7, section 7.3 to end of chapter.
Get the Laboratory 3 assignment here.

Spring Break, Mar 20-24

Week 8: Mar 28-30
Shortest common superstring problem.  The greedy assembly algorithm.  Hash indexing for overlap computation.  Screening repeats.  Thursday: the AMOS assembly viewing toolkit, Assembly Investigator.
Reading:  (1) Pages 1-10 of
Kececioglu and Myers, Combinatorial Algorithms for DNA Sequence Assembly, Algorithmica 13 (1995); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
AMOS Validation and Visualization
Lab 3 files in AMOS format

Week 9: Apr 4-6
Base calling and trimming algorithms.  Thursday: Using MUMmer for assembly alignment and comparison.
Reading: Myers et al, A Whole-Genome Assembly of Drosophila, Science 287 (2000).
Project proposals due April 6.
Genome alignment and assembly validation slides

Week 10: Apr 11-13
Class presentations on selected readings.
Lab 3 due Apr 13.

Week 11: Apr 18-20
The Celera Assembler algorithm.
April 20: Special lecture by Prof. Peter Bickel, Dept. of Statistics, UC Berkeley: "Using Comparative Genomics to Assess the Function of Noncoding Sequences".  Lecture and discussion: 4:15-5:50pm, Physics Bldg., Room 1410.  This lecture is part of Statistics Day, http://www.statconsortium.umd.edu.

Reading: Batzoglou et al., ARACHNE: A whole genome shotgun assembler.  Genome Research 12 (2002).

Week 12: Apr 25-27
Continued discussion of Celera Assembler and Arachne.  Scaffolding with Bambus.
Art Delcher's assembly lecture slides.

Week 13: May 2-4
Additional assembly topics: comparative assembly, genome finishing and gap closure. The status of the human genome.
Reading: Tettelin et al., Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project.  Genomics 62 (1999), 500-507.

Week 14: May 9
New sequencing technology: pyrosequencing.
Take home exams distributed.
Project due: May 9.