CMSC
828H:
Computational
Gene Finding and Genome Assembly
Syllabus, Fall
2010
Course meeting time: Tuesdays and
Thursdays,
2:00-3:45pm,
Room 3118 Biomolecular Sciences Building
Professor: Steven Salzberg, 3125
Biomolecular Sciences Building,
salzberg (at) umiacs.umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros (buy
it
from
Amazon)
Supplemental texts, free online at the NCBI
Bookshelf:
Molecular
Biology
of
the
Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by
T.A.
Brown,
BIOS
Scientific
Publishers,
2002.
Note: additional links to assignments and supplementary
material will appear on
the syllabus as the semester progresses
Week 1: Aug 31-Sept 2
Introduction to the course. Molecular biology
background. Biotechnology
background on
sequencing and assembly. Basic pairwise
sequence
alignment.
Lecture
notes
for
lecture
2
(Thursday)
Reading: Chapter
1,
The
Human
Genome, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf.
Week 2: Sept 7-9
Whole-genome shotgun
sequencing. Genome sequencing technology. Basic assembly: shortest
common superstring, greedy assembly algorithms. Problems caused
by repetitive DNA.
Lcture
notes
for
lecture
3
(Tuesday)
Get
Lab
1
here.
Reading: (1) Chapter
6,
"Sequencing
Genomes, in Genomes,
by
T.A. Brown, free at the NCBI Bookshelf. (2) Gene Myers' 1999
intro paper on whole-genome sequencing.
Week 3: Sept 14-16
The Celera Assembler algorithm.
Error
correction with AutoEditor.
Lecture slides for Celera
Assembler, AutoEditor,
and
SNP
overview.
Reading: (1) Myers, The Fragment
Assembly String Graph, Bioinformatics
21 (2005); (2) The Minimus assembler documentation, http://sourceforge.net/apps/mediawiki/amos/index.php?title=Minimus.
Week
4: Sept 21-23
The Arachne assembler algorithm.
Comparative assembly with AMOScmp.
Lab 1 due
Friday, Sept
24.
Readings: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
S.
Batzoglou et al., ARACHNE: A
whole-genome shotgun assembler, Genome Research 12:1
(2002), 177-189.
Week 5: Sept 28-30
Trimming with Figaro. Multiplex PCR for
closing gaps. Using
MUMmer
for assembly alignment and comparison.
Get Lab 2 here: lab02.txt
and lab02.afg.
Lecture
notes on MUMmer.
Readings: A.L. Delcher
et al., Alignment
of
Whole
Genomes Nucleic Acids Research,
27:11 (1999), 2369-2376. Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Week
6: Oct 5-7
Assembly
debugging with Hawkeye. Short
read sequencing using 454 and Illumina technology.
Guest lecture by David Kelley on Oct. 7: error
correction
with
Quake.
Week 7: Oct 12-14
No class Oct. 12. Student presentations Oct 14.
Lab 2 due Friday,
Oct 15.
Week 8: Oct 19-21
Short-read assembly with de Bruijn graphs.
The Velvet assembler. Introduction to computational gene
finding topics.
Lecture
notes
on
de
Bruijn
assembly (most slides courtesy of Mike Schatz)
Get Lab 3 here.
Readings:
Pevzner PA,
Tang H, Waterman MS, An
Eulerian
path
approach
to
DNA
fragment
assembly. Proc. Natl. Acad.
Sci. USA 2001 Aug 14; 98(17):9748-53.
Zerbino, D. and
E. Birney. Velvet: Algorithms for
de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829.
Chapters 1-2 of CGP,
Introduction" and "Mathematical
preliminaries". See the textbook
website for additional PowerPoint slides.
Week 9: Oct 26-28
Bacterial gene finding. Markov
chains. Case study: the Glimmer
gene
finder.
Reading: CGP,
"Overview of Computational Gene Prediction," Chapter 3. Also:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Week
10: Nov 2-4
Overlapping genes in bacteria. Eukaryotic gene
finding: introduction to HMMs and the
Forward algorithm.
Lab 3 due Friday, Nov 5.
Reading: CGP, "Signal and
Content Sensors"
chapter 7.
Lecture notes on HMMs: lecture1 and lecture2
Week 11: Nov 9-11
Student presentations on Nov. 9.
Nov 11:
Guest
lecture
by
Mihaela Pertea.
Reading: CGP, "Toy Exon
Finder"
chapter 5.
Lecture notes on GHMMs
from Mihaela Pertea.
Week 12: Nov 16-18
Get Lab 4 here.
This
is
the
mini-project, due on Dec. 9.
Topics: Explanation of lab4. Signal recognition: splice sites and exon
splicing enhancers. Time permitting:
ancient DNA introduction.
Nov. 18: Special lecture in CBG
seminar series, 1103 Biosciences Research Building, by
M. Thomas P. Gilbert, Centre
for Ancient Genetics, University of Copenhagen. Title:
"Palaeogenomics - challenges faced, progress made and future prospects."
Reading: CGP, "Hidden Markov Models" chapter 6; and "Generalized HMMs" chapter 8.
Week 13: Nov 23 (Nov 25 is
Thanksgiving)
Topic TBA.
Week 14: Nov 30-Dec 2
Gene
finding in humans: the EGASP and NGASP
competitions. Gene finding with
conditional random fields (CRFs).
Reading: (1)
the
JIGSAW paper.
(2) CGP,
"Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Week 15: Dec 7-9 (last week)
Pair HMMs. The status of
the human genome: assembly and annotation.
Lab 4 due Dec 9.
Take
home
exams
distributed
Dec
9,
due
Dec
15.
GRADING: The first three labs
count for 15% of the grade each, the fourth lab counts for 25%, the
class presentation counts for 5%, and the final exam counts for 25%.