SYLLABUS
CMSC828N:
Computational
Gene Finding and Genome Assembly
Tuesdays and Thursdays,
3:30-4:45pm,
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros
Supplemental texts, free online at the NCBI
Bookshelf (click title to view):
Molecular
Biology of the Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by T.A. Brown, BIOS Scientific Publishers, 2002.
Note: additional links to lecture notes and assignments will appear on
the syllabus as the semester progresses
Day 1: Thursday, Jan 25
Introduction to the course. Molecular biology background.
Reading: Chapter
1, The Human Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Lecture slides
from Jan 25.
Week
1: Jan 30-Feb 1
Biotechnology
background on
sequencing, assembly.
Whole-genome shotgun
sequencing. Pairwise sequence
alignment. Basic assembly: shortest
common superstring, greedy assembly algorithms.
Reading: (a) Chapter
6, "Sequencing Genomes, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf. (b) Gene Myers' 1999
intro paper on whole-genome sequencing.
Slides from Jan 30
lecture and from Feb 1 lecture.
Week 2: Feb 6-8
The Celera Assembler algorithm..
Hash indexing for overlap
computation. Screening
repeats.
Reading: (1) Pages 1-10 of Kececioglu and Myers, Combinatorial
Algorithms for DNA Sequence Assembly, Algorithmica 13
(1995); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
Get the laboratory 1
assignment here.
Delcher's alignment slides
Celera Assembler
slides
Week 3: Feb 13-15
Tuesday, Feb 13, 11:00am-12:00pm, SPECIAL LECTURE by Prof. Pavel
Pevzner, UC San Diego. "Are There Rearrangement Hotspots in the
Human Genome?" Location: A.V. Williams Building, ECE Conference
Room 2460.
The Arachne assembler algorithm.
Reading: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
Arachne lecture notes
Week 4: Feb 20-22
Lab 1 due Feb 20.
Arachne continued. Using MUMmer
for assembly alignment and comparison.
Readings:
S. Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12,
Issue 1, 177-189, January 2002.
A.L. Delcher et al., Alignment of
Whole Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Note that Figure
6 is
supposed to be in color, and was mistakenly printed as black and
white.
Readings for class presentations: use Wentian Li's bibliography
page as a starting point.
Slides from
MUMmer lecture
Week 5: Feb 27-Mar 1
Debugging
assemblies with Hawkeye, the assembly viewing tool.
Scaffolding with Bambus.
Get the Laboratory 2 assignment
here, along with the input
data file.
Week 6: Mar 6-8
Class presentations on
selected readings.
Reading: Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Week 7: Mar 13-15
Lab 2 due Mar 13.
Additional assembly topics: genome finishing and
gap closure. Editing reads: AutoEditor. Introduction to computational gene finding topics.
Lecture
notes on genome closure and finishing
Lecture notes
on AutoEditor
Reading: Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries".
Spring Break, Mar 19-23
Week 8: Mar 27-29
Bacterial gene finding. Markov
chains. Case study: the Glimmer
gene
finder.
Lecture slides
on Markov chains
Reading: CGP,
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Get
lab 3 here.
Week 9: Apr 3-5
Overlapping genes in bacteria (guest lecture by Carl
Kingsford). Eukaryotic gene finding: introduction to HMMs and the
Forward algorithm.
Reading: CGP, "Signal and Content Sensors"
chapter 7.
Lecture slides
on Glimmer and bacterial gene finding.
Lecture on the
Forward algorithm for HMMs.
Week 10: Apr 10-12
Class presentations on selected readings.
Lab 3 due Apr 12.
Reading: CGP, "Toy Exon
Finder"
chapter 5.
Lab 4, Mini-project, available here: Instructions and Files.
Week 11: Apr 17-19
HMM algorithms: forward,
Viterbi, forward-backward. Design of HMMs and the Toyscan
algorithm.
Reading: CGP, "Hidden Markov Models" chapter 6.
Lecture on the
Backward algorithm and the E-M algorithm for HMMs.
Bill
Majoros' slides on HMM design for gene finding.
Week 12: Apr 24-26
Case study: GlimmerHMM.
Generalized HMM algorithms. Gene finding in humans: the EGASP competition.
Reading: CGP, "Generalized HMMs" chapter 8.
Ela Pertea's
GlimmerHMM lecture slides.
EGASP slides, part 1
(M. Reese) and part
2 (P. Flicek).
Week 13: May 1-3
Combining multiple gene finders with
JIGSAW. Exon splicing enhancers, pair HMMs, alternative
splicing, and transcription terminators.
SPECIAL NOTE: May 3 class to be held in Bio/Psych Room 1230, at
2:00pm. Topic: Solexa sequencing technology.
Reading: (1)
the
JIGSAW paper.
(2) CGP,
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Lecture notes on
GeneSplicer and Combiner.
Week 14: May 8-10
The status of
the human genome: assembly and annotation.
Lab 4 due
May 8.
Take home exams distributed May 10.