SYLLABUS
CMSC828N:
Computational
Gene Finding and Genome Assembly
Tuesdays and Thursdays,
1230-1:45pm,
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros
Supplemental texts, free online at the NCBI
Bookshelf (click title to view):
Molecular
Biology of the Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by T.A. Brown, BIOS Scientific Publishers, 2002.
Note: additional links to lecture notes and assignments will appear on
the syllabus as the semester progresses
Week 1: Sept 2-4
Introduction to the course. Molecular biology
background. Biotechnology
background on
sequencing, assembly.
Reading: Chapter
1, The Human Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Week 2: Sept 9-11
Whole-genome shotgun
sequencing. Pairwise sequence
alignment. Basic assembly: shortest
common superstring, greedy assembly algorithms.
Reading: (a) Chapter
6, "Sequencing Genomes, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf. (b) Gene Myers' 1999
intro paper on whole-genome sequencing.
Week 3: Sept 16-18
The Celera Assembler algorithm..
Hash indexing for overlap
computation. Screening
repeats.
Reading: (1) Myers, The Fragment
Assembly String Graph, Bioinformatics
21 (2005); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
Week 4: Sept 23-25
Topics: The Celera Assembler and the Arachne assembler algorithms.
Reading: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
Celera
assembler slides
Arachne lecture notes
Week 5: Sept 30-Oct 2
Lab 1 due Sept 27.
Short read assemblers: Velvet, Edena. Using
MUMmer
for assembly alignment and comparison.
Oct 2: guest lecturer TBA.
Readings:
S. Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12,
Issue 1, 177-189, January 2002.
A.L. Delcher et al., Alignment of
Whole Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Note that Figure
6 is
supposed to be in color, and was mistakenly printed as black and
white.
Readings for class presentations: choose from
this list or use Wentian
Li's bibliography
page for more choices.
Lecture
notes on genome closure and finishing
Lecture
notes on the Figaro trimming algorithm (by James White)
Week 6: Oct 7-9
Oct 7: guest lecturer TBA.
Multiple genome alignment with
MUMmer. Comparative assembly.
Lab 2 assignment will be available here.
Week 7: Oct 14-16
Class presentations on
selected readings.
Reading: Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Week 8: Oct 21-23
Additional assembly topics: Debugging
assemblies with Hawkeye, the assembly viewing tool.
Scaffolding with Bambus. Introduction to computational gene finding topics.
Reading: Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries".
Week 9: Oct 28-20
Bacterial gene finding. Markov
chains. Case study: the Glimmer
gene
finder.
Lab 2 due Oct 23 (tentative).
Reading: CGP,
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Week 10: Nov 4-6
Overlapping genes in bacteria. Eukaryotic gene
finding: introduction to HMMs and the
Forward algorithm.
Reading: CGP, "Signal and Content Sensors"
chapter 7.
Week 11: Nov 11-13
Nov 6: Class presentations on selected readings.
Lab 3 due Nov 11.
Reading: CGP, "Toy Exon
Finder"
chapter 5.
Week 12: Nov 18-20
HMM algorithms: forward,
Viterbi, forward-backward. Design of HMMs and the Toyscan
algorithm.
Reading: CGP, "Hidden Markov Models" chapter 6.
Week 13: Nov 25 (Nov 27 is Thanksgiving)
Case study: GlimmerHMM.
Generalized HMM algorithms. Gene finding in humans: the EGASP and NGASP
competitions.
Reading: CGP, "Generalized HMMs" chapter 8.
Week 14: Dec 2-4
Combining multiple gene finders with
JIGSAW. Exon splicing enhancers, pair HMMs, alternative
splicing, and transcription terminators.
Reading: (1)
the
JIGSAW paper.
(2) CGP,
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Week 15: Dec 9-11 (last week)
The status of
the human genome: assembly and annotation.
Lab 4 due Dec 11. Take home exams
distributed Dec 11, due Dec 18.