SYLLABUS

CMSC828N: Computational Gene Finding and Genome Assembly


Tuesdays and Thursdays, 3:30-4:45pm, Room 3118 Biomolecular Sciences Building

Professor: Steven Salzberg, 3125 Biomolecular Sciences Building, salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene Prediction (CGP) by William H. Majoros


Supplemental texts,
free online at the NCBI Bookshelf (click title to view):
Molecular Biology of the Cell, b
y Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.   Garland Publishing, 2002.
Genomes, by T.A. Brown, BIOS Scientific Publishers, 2002.

Note: additional links to lecture notes and assignments will appear on the syllabus as the semester progresses

Day 1: Thursday, Jan 25
Introduction to the course.  Molecular biology background.

Reading: Chapter 1, The Human Genome, in
Genomes, by T.A. Brown, free at the NCBI Bookshelf.
Lecture slides from Jan 25.

Week 1: Jan 30-Feb 1
Biotechnology background on sequencing, assembly.  Whole-genome shotgun sequencing.  Pairwise sequence alignment. Basic assembly: shortest common superstring, greedy assembly algorithms.

Reading: (a) Chapter 6, "Sequencing Genomes, in Genomes, by T.A. Brown, free at the NCBI Bookshelf.  (b) Gene Myers' 1999 intro paper on whole-genome sequencing.
Slides from Jan 30 lecture and from Feb 1 lecture.

Week 2: Feb 6-8
The Celera Assembler algorithm..  Hash indexing for overlap computation.  Screening repeats.

Reading:  (1) Pages 1-10 of
Kececioglu and Myers, Combinatorial Algorithms for DNA Sequence Assembly, Algorithmica 13 (1995); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.

Get the laboratory 1 assignment here.    Delcher's alignment slides    Celera Assembler slides


Week 3: Feb 13-15
Tuesday, Feb 13, 11:00am-12:00pm, SPECIAL LECTURE by Prof. Pavel Pevzner, UC San Diego. "Are There Rearrangement Hotspots in the Human Genome?"  Location: A.V. Williams Building, ECE Conference Room 2460.

The Arachne assembler algorithm.


Reading: Myers et al, A Whole-Genome Assembly of Drosophila, Science 287 (2000). 

Arachne lecture notes

Week 4: Feb 20-22
Lab 1 due Feb 20.
Arachne continued.  Using MUMmer for assembly alignment and comparison.

Readings:
    S. Batzoglou et al., ARACHNE: A whole-genome shotgun assembler,  Genome Research
12, Issue 1, 177-189, January 2002.
    A.L. Delcher et al.,  Alignment of Whole Genomes   Nucleic Acids Research, 27:11 (1999), 2369-2376.  Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white.

Readings for class presentations: use Wentian Li's bibliography page as a starting point.

Slides from MUMmer lecture

Week 5: Feb 27-Mar 1
Debugging assemblies with Hawkeye, the assembly viewing tool.  Scaffolding with Bambus.

Get the Laboratory 2 assignment here, along with the input data file.

Week 6: Mar 6-8
Class presentations on selected readings.

Reading: Tettelin et al., Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project.  Genomics 62 (1999), 500-507.

Week 7: Mar 13-15
Lab 2 due Mar 13.
Additional assembly topics: genome finishing and gap closure.  Editing reads: AutoEditor. Introduction to computational gene finding topics.

Lecture notes on genome closure and finishing
Lecture notes on AutoEditor

Reading: Chapters 1-2 of CGP, Introduction" and "Mathematical preliminaries".

Spring Break, Mar 19-23

Week 8: Mar 27-29
Bacterial gene finding.  Markov chains.  Case study: the Glimmer gene finder.

Lecture slides on Markov chains

Reading:
CGP, "Overview of Computational Gene Prediction," Chapter 3.  Also: S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.

Get lab 3 here.


Week 9: Apr 3-5
Overlapping genes in bacteria (guest lecture by Carl Kingsford).  Eukaryotic gene finding: introduction to HMMs and the Forward algorithm.

Reading: CGP, "Signal and Content Sensors" chapter 7.

Lecture slides on Glimmer and bacterial gene finding.
Lecture on the Forward algorithm for HMMs.


Week 10: Apr 10-12
Class presentations on selected readings.
Lab 3 due Apr 12.

Reading: CGP, "Toy Exon Finder" chapter 5.
Lab 4, Mini-project, available here: Instructions and Files.

Week 11: Apr 17-19
HMM algorithms: forward, Viterbi, forward-backward.  Design of HMMs and the Toyscan algorithm.

Reading: CGP, "Hidden Markov Models" chapter 6.
Lecture on the Backward algorithm and the E-M algorithm for HMMs.
Bill Majoros' slides on HMM design for gene finding.

Week 12: Apr 24-26
Case study: GlimmerHMM.  Generalized HMM algorithms.   Gene finding in humans: the EGASP competition.

Reading: CGP, "Generalized HMMs" chapter 8.
Ela Pertea's GlimmerHMM lecture slides.
EGASP slides, part 1 (M. Reese) and part 2 (P. Flicek).

Week 13: May 1-3
Combining multiple gene finders with JIGSAW.  Exon splicing enhancers, pair HMMs, alternative splicing, and transcription terminators.
SPECIAL NOTE: May 3 class to be held in Bio/Psych Room 1230, at 2:00pm.  Topic: Solexa sequencing technology.

Reading: (1) the JIGSAW paper.  (2) CGP, "Signal and Content Sensors", chapter 7, section 7.3 to end of chapter.

Lecture notes on GeneSplicer and Combiner.

Week 14: May 8-10
The status of the human genome: assembly and annotation.

Lab 4 due May 8.
Take home exams distributed May 10.