SYLLABUS

CMSC828N: Computational Gene Finding and Genome Assembly


Tuesdays and Thursdays, 1230-1:45pm, Room 3118 Biomolecular Sciences Building

Professor: Steven Salzberg, 3125 Biomolecular Sciences Building, salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene Prediction (CGP) by William H. Majoros


Supplemental texts,
free online at the NCBI Bookshelf (click title to view):
Molecular Biology of the Cell, b
y Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.   Garland Publishing, 2002.
Genomes, by T.A. Brown, BIOS Scientific Publishers, 2002.

Note: additional links to lecture notes and assignments will appear on the syllabus as the semester progresses

Week 1: Sept 2-4
Introduction to the course.  Molecular biology background. 
Biotechnology background on sequencing, assembly. 

Reading: Chapter 1, The Human Genome, in
Genomes, by T.A. Brown, free at the NCBI Bookshelf.

Week 2: Sept 9-11
Whole-genome shotgun sequencing.  Pairwise sequence alignment. Basic assembly: shortest common superstring, greedy assembly algorithms.

Reading: (a) Chapter 6, "Sequencing Genomes, in Genomes, by T.A. Brown, free at the NCBI Bookshelf.  (b) Gene Myers' 1999 intro paper on whole-genome sequencing.


Week 3: Sept 16-18
The Celera Assembler algorithm..  Hash indexing for overlap computation.  Screening repeats.

Reading:  (1)
Myers, The Fragment Assembly String Graph, Bioinformatics 21 (2005); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.

Week 4: Sept 23-25

Topics: The Celera Assembler and the Arachne assembler algorithms.


Reading: Myers et al, A Whole-Genome Assembly of Drosophila, Science 287 (2000).

Celera assembler slides

Arachne lecture notes

Week 5: Sept 30-Oct 2
Lab 1 due Sept 27.
Short read assemblers: Velvet, Edena.  Using MUMmer for assembly alignment and comparison.
Oct 2: guest lecturer TBA.

Readings:
    S. Batzoglou et al., ARACHNE: A whole-genome shotgun assembler,  Genome Research
12, Issue 1, 177-189, January 2002.
    A.L. Delcher et al.,  Alignment of Whole Genomes   Nucleic Acids Research, 27:11 (1999), 2369-2376.  Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white.

Readings for class presentations: choose from this list or use Wentian Li's bibliography page for more choices.

Lecture notes on genome closure and finishing
Lecture notes on the Figaro trimming algorithm (by James White)

Week 6: Oct 7-9
Oct 7: guest lecturer TBA.
Multiple genome alignment with MUMmer.  Comparative assembly. 

Lab 2 assignment will be available here.


Week 7: Oct 14-16
Class presentations on selected readings.

Reading: Tettelin et al., Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project.  Genomics 62 (1999), 500-507.

Week 8: Oct 21-23

Additional assembly topics: Debugging assemblies with Hawkeye, the assembly viewing tool.  Scaffolding with Bambus. Introduction to computational gene finding topics.

Reading: Chapters 1-2 of CGP, Introduction" and "Mathematical preliminaries".

Week 9: Oct 28-20
Bacterial gene finding.  Markov chains.  Case study: the Glimmer gene finder.

Lab 2 due Oct 23 (tentative).

Reading:
CGP, "Overview of Computational Gene Prediction," Chapter 3.  Also: S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.

Week 10: Nov 4-6
Overlapping genes in bacteria.  Eukaryotic gene finding: introduction to HMMs and the Forward algorithm.

Reading: CGP, "Signal and Content Sensors" chapter 7.


Week 11: Nov 11-13
Nov 6: Class presentations on selected readings.

Lab 3 due Nov 11.
Reading: CGP, "Toy Exon Finder" chapter 5.

Week 12: Nov 18-20
HMM algorithms: forward, Viterbi, forward-backward.  Design of HMMs and the Toyscan algorithm.

Reading: CGP, "Hidden Markov Models" chapter 6.

Week 13: Nov 25 (Nov 27 is Thanksgiving)
Case study: GlimmerHMM.  Generalized HMM algorithms.   Gene finding in humans: the EGASP and NGASP competitions.

Reading: CGP, "Generalized HMMs" chapter 8.

Week 14: Dec 2-4
Combining multiple gene finders with JIGSAW.  Exon splicing enhancers, pair HMMs, alternative splicing, and transcription terminators.

Reading: (1) the JIGSAW paper.  (2) CGP, "Signal and Content Sensors", chapter 7, section 7.3 to end of chapter.


Week 15: Dec 9-11 (last week)
The status of the human genome: assembly and annotation.

Lab 4 due Dec 11.  Take home exams distributed Dec 11, due Dec 18.