SYLLABUS
CMSC828N:
Computational
Gene Finding and Genome Assembly
Tuesdays and Thursdays, 3:30-4:45,
Room 3118 CSIC
Professor: Steven Salzberg, 3125 Biomolecular Sciences Building,
salzberg (at) umd.edu
Office hours: By appointment.
Textbook: Computational Gene
Prediction (CGP) by William H. Majoros (available in class;
reference section online
here and a small erratum
here)
Supplemental texts, free online at the NCBI
Bookshelf (click title to view):
Molecular
Biology of the Cell, by Bruce
Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts,
and Peter Walter. Garland Publishing, 2002.
Genomes,
by T.A. Brown, BIOS Scientific Publishers, 2002.
Note: syllabus will change as the semester progresses
Day 1: Thursday, Jan 26
Introduction to the course. Molecular biology background.
Reading: CGP, "Introduction" chapter 1; "Mathematical Preliminaries"
chapter 2.
Week 1: Jan 31-Feb 2
Biology background for gene finding. Computational gene finding defined. Basic
sequence alignment algorithms.
Reading: CGP, "Overview of Computational Gene Prediction" chapter 3.
Slides from Jan 31
lecture.
Laboratory 1.
Week 2: Feb 7-9
Bacterial gene finding. Markov chains. Case study: the Glimmer gene
finder.
Reading: CGP, "Signal and Content Sensors"
chapter 7, sections 7.1-7.2.
Feb 7 lecture
notes and figures [1] [2].
Week 3: Feb 14-16
HMM algorithms: forward,
Viterbi. Signals in bacterial
genomes: start sites, transcription terminators, operons.
Reading:
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene
identification using interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
Lab 1 due Feb 16.
Get the Laboratory 2
assignment here.
Week 4: Feb 21-23
Case study: GlimmerHMM. HMM algorithms continued: expectation maximization.
Reading: CGP, "Toy Exon Finder"
chapter 5.
Slides from Mihaela
Pertea's Feb 21 lecture on GlimmerHMM.
Suggested
readings for class presentations here.
Week 5: Feb 28-Mar 2
Other eukaryotic gene finding
topics. cDNA and EST sequences, spliced alignment, alternative
splicing, and micro-exons.
Reading: (1) CGP,
"Hidden Markov Models"
chapter 6; (2) Sean Eddy's Profile HMMs
paper (link).
Week 6: Mar 7-9
Biotechnology background on
sequencing, assembly.
Whole-genome shotgun
sequencing. March 9: special seminar at 2:00pm, followed
by class discussion (at the usual class time) of RNA splicing machinery.
Readings: (a) Chapter
6, "Sequencing Genomes, in Genomes,
by T.A. Brown, free at the NCBI Bookshelf. (b) Gene Myers' 1999
intro paper on whole-genome sequencing.
Lab 2 due Mar 9.
Week 7: Mar 14-16
Class presentations on
selected readings.
Reading: CGP, "Signal and Content
Sensors", chapter 7, section 7.3 to end of chapter.
Get the Laboratory 3
assignment here.
Spring Break, Mar 20-24
Week 8: Mar 28-30
Shortest common superstring problem. The greedy assembly
algorithm. Hash indexing for overlap computation. Screening
repeats. Thursday: the AMOS assembly viewing toolkit, Assembly
Investigator.
Reading: (1) Pages 1-10 of Kececioglu and Myers, Combinatorial
Algorithms for DNA Sequence Assembly, Algorithmica 13
(1995); (2) The phrap assembler documentation, http://www.phrap.org/phredphrap/phrap.html.
AMOS
Validation and Visualization
Lab 3 files in AMOS format
Week 9: Apr 4-6
Base calling and trimming algorithms. Thursday: Using MUMmer
for assembly alignment and comparison.
Reading: Myers et
al, A
Whole-Genome Assembly of Drosophila, Science 287 (2000).
Project
proposals due April 6.
Genome alignment and
assembly validation slides
Week 10: Apr 11-13
Class presentations on selected readings.
Lab 3 due Apr 13.
Week 11: Apr 18-20
The Celera Assembler algorithm.
April 20: Special lecture by Prof. Peter Bickel, Dept. of Statistics,
UC Berkeley: "Using Comparative Genomics to Assess the Function of
Noncoding Sequences". Lecture and discussion: 4:15-5:50pm,
Physics Bldg., Room 1410. This lecture is part of Statistics Day,
http://www.statconsortium.umd.edu.
Reading: Batzoglou et al., ARACHNE:
A whole genome shotgun assembler. Genome Research
12 (2002).
Week 12: Apr 25-27
Continued discussion of Celera Assembler and Arachne.
Scaffolding with Bambus.
Art Delcher's
assembly lecture slides.
Week 13: May 2-4
Additional assembly topics:
comparative assembly, genome finishing and gap closure. The status of
the human genome.
Reading: Tettelin et al., Optimized
Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing
Project. Genomics
62 (1999), 500-507.
Week 14: May 9
New sequencing technology: pyrosequencing.
Take home exams distributed.
Project due: May 9.