SYLLABUS
CMSC828N:
Computational
Gene Finding and Genome Assembly
Tuesdays and Thursdays,
1230-1:45pm,
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros
Supplemental texts, free online at the NCBI
Bookshelf (click title to view):
Molecular
Biology of the Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by T.A. Brown, BIOS Scientific Publishers, 2002.
Note: additional links to lecture notes and assignments will appear on
the syllabus as the semester progresses
Week 1: Sept 2-4
Introduction to the course. Molecular biology
background. Biotechnology
background on
sequencing, assembly.
Reading: Chapter
1, The Human Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Lecture 1
slides here.
Lecture 2 slides
here.
Week 2: Sept 9-11
Whole-genome shotgun
sequencing. Pairwise sequence
alignment. Basic assembly: shortest
common superstring, greedy assembly algorithms. Repeat-induced
mis-assemblies.
Reading: (a) Chapter
6, "Sequencing Genomes, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf. (b) Gene Myers' 1999
intro paper on whole-genome sequencing.
Lecture
3 slides here.
Lab 1 instructions here.
Lab 1 sample data
here.
Lab
1 input data here.
Week 3: Sept 16-18
The Celera Assembler algorithm.
Genome sequencing technology. Error
correction with AutoEditor.
Reading: (1) Myers, The Fragment
Assembly String Graph, Bioinformatics
21 (2005); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
Celera
assembler slides
Sequencing technology slides, part1
and part2
AutoEditor slides
Week 4: Sept 23-25
The Arachne assembler algorithm.
Comparative assembly with AMOScmp.
Lab 1 due Sept
25.
Lab 2 assignment is here,
and the data file is here.
Readings: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
S.
Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12:1
(2002), 177-189.
Arachne lecture notes
AMOScmp slides
AFG file format slides
Week 5: Sept 30-Oct 2
Trimming with Figaro. Multiplex PCR for
closing gaps. Oct
2: guest lecture by Adam Phillippy: using
MUMmer
for assembly alignment and comparison.
Readings:
A.L. Delcher et al., Alignment of
Whole Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Note that Figure
6 is
supposed to be in color, and was mistakenly printed as black and
white.
Readings for class presentations: choose from
this list or use Wentian
Li's bibliography
page for more choices.
Genome
closure slides
James Whites'
Figaro slides.
Adam's
whole-genome alignment slides (here
they are in older PowerPoint format)
Week 6: Oct 7-9
Oct 7:
guest lecture by Mike Schatz: Assembly
debugging with Hawkeye. Short
read sequencing using 454 and Solexa technology.
Reading: Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Mike
Schatz's assembly validation slides
Next-gen
sequencing technology slides
Week 7: Oct 14-16
Class presentations on
selected readings.
Lab 2 due Friday, Oct 17.
Week 8: Oct 21-23
Genome assembly with short reads (conclusion).
Introduction to computational gene
finding topics.
Lab 3 available here.
Reading: Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries". See the textbook
website
for slides from Oct 21.
Week 9: Oct 28-30
Bacterial gene finding. Markov
chains. Case study: the Glimmer
gene
finder. Oct 30: Guest lecture by Mihaela Pertea on splice site
identification in eukaryotic genes.
Reading: CGP,
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Bacterial gene
finding slides, including Glimmer
Mihaela
Pertea's slides on signal prediction
Week 10: Nov 4-6
Overlapping genes in bacteria. Eukaryotic gene
finding: introduction to HMMs and the
Forward algorithm.
Reading: CGP, "Signal and Content Sensors"
chapter 7.
Slides on HMMs,
part 1
Week 11: Nov 11-13
Nov 6: Class presentations on selected readings.
Lab 3 due Friday, Nov 14.
Reading: CGP, "Toy Exon
Finder"
chapter 5.
Week 12: Nov 18-20
HMM algorithms: forward,
Viterbi, forward-backward. Design of HMMs and the Toyscan
algorithm.
Slides on HMMs,
part 2
Slides on
the HOMER HMM gene finder (from B. Majoros)
Details on
Lab4 (ToyScan)
Get Lab 4 here (due Dec 11).
Reading: CGP, "Hidden Markov Models" chapter 6.
Week 13: Nov 25 (Nov 27 is Thanksgiving)
Class presentations (2). Sequencing ancient DNA: the
mammoth genome.
Lecture notes on
exon splicing enhancers
Lecture notes on the Combiner
algorithm
Reading: CGP, "Generalized HMMs" chapter 8.
Week 14: Dec 2-4
Case study: GlimmerHMM.
Generalized HMM algorithms. Gene finding in humans: the EGASP and NGASP
competitions. Combining multiple gene
finders with
JIGSAW.
Lecture on GHMMs
Reading: (1)
the
JIGSAW paper.
(2) CGP,
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Week 15: Dec 9-11 (last week)
Pair HMMs. The status of
the human genome: assembly and annotation.
Lab 4 due Dec 11.
Take home exams
distributed Dec 11, due Dec 18.
GRADING: The first three labs
count for 15% of the grade each, the fourth lab counts for 25%, the
class presentation counts for 5%, and the final exam counts for 25%.