Syllabus for CMSC858P: Algorithms for Bioscience Analysis (Spring 2008)


Course information


Lectures

Date
Topics
Slides/Lecture notes
Add'l resources
1/29
Introduction. Molecular biology primer.
Lecture notes 1

1/31
Exact matching
Gusfield chap. 1
Homework 1
Lecture notes

2/5
Boyer-Moore algorithm Gusfield chap. 2
Lecture notes

2/7
Boyer-Moore algorithm - Giancarlo Apostolico variation Gusfield chap. 3

2/12
Proof that Boyer-Moore runs in linear time Gusfield chap. 3
Lecture notes

2/14
KMP algorithm
Aho-Corasick algorithm
Proof Aho-Corasick construction is linear time
Gusfield chap. 3
Gusfield chap. 2

2/19
Proof Aho-Corasick construction is linear time
Aho-Corasick if patterns can be substrings of another patterns
Matching with don't care symbols
Suffix trees - introduction
Using suffix trees for exact matching
Gusfield chap. 6
Gusfield chap. 3
Lecture notes

2/21
Suffix trees - Ukkonen's construction
Gusfield chap. 6
Homework 2

2/26
Suffix trees - Ukkonen's construction cont'd
suffix trees implementation details
suffix trees applications
Gusfield chap. 6
Gusfield chap. 7
Project 1 handed out
Giegerich and Kurtz, Algorithmica, 19:331 (1997)
Note: Project 1 assumes all sequences are "forward" (no need to reverse complement)
Text.fa
Pattern.fa
2/28

Suffix trees - applications.
suffix arrays
Gusfield chap. 6
Gusfield chap. 7
Lecture notes
Manber, U. and Myers, G. SODA, 1990.
Abouelhoda, Kurtz, Ohlebusch .  WABI 2002
3/4
Suffix arrays - cont.
Data compression - Lempel Ziv, Burrows-Wheeler

Gusfield chap. 7


Wikipedia Lempel-Ziv entry
Wikipedia Burrows-Wheeler entry
3/6
Inexact matching (edit distance, global alignment) Gusfield chap. 11
Lecture notes



13/11
Inexact matching (sequence similarity, affine gap scores)  Gusfield chap. 11


3/13
Inexact matching  (heuristics: linear space, bounded error, chaining algorithm) Gusfield chap. 12
Gusfield chap. 13

3/17Spring Break
3/20Spring Break
3/25
Inexact matching (heuristics: MUMmer, FASTA, BLAST, minimizers, significance scores), p-values/e-values)

Delcher et. al. NAR, 1999 (MUMmer)
Delcher et al. NAR, 2002 (MUMmer)
Lipman, Pearson.  Science, 1985 (FASTA)
Pearson, Lipman.  PNAS, 1988 (FASTA)
Altshul et al.  J. Mol. Bio, 1990 (BLAST)
Roberts et al.  Bioinformatics, 2004 (Minimizers)
3/27
Inexact matching (significance scores, p-values/e-values)
Multiple sequence alignment
Durbin, 2.1, 2.7, 2.8
Gusfield chap. 14
Gusfield chap. 15
Korf, Yandell, Bedell.  BLAST, O'Reilly, 2003.  chap. 4
Practice exam

4/1
Multiple sequence alignment  Project 1 due
Gusfield, chap. 14

4/3
Midterm exam

4/8
Midterm recap
Multiple sequence alignment
Gusfield chap. 14
4/10
Multiple sequence alignment 
Practical considerations
Muscle, ClustalW, reAligner
Motif finding
Gusfield chap. 14
Project2
Anson, Myers.  J. Comp. Bio. 1997
Gribskov et al.  PNAS 1987
Thompson et al.  NAR, 1994
Edgar, NAR 2004
Edgar, BMC Bioinformatics 2004
4/15
Multiple sequence alignment - partial order alignment
Phylogenetic trees (distance based methods)

Gusfield chap. 14
Durbin, chap. 7
Lee et al. Bioinformatics 2002
Lee, Bioinformatics 2003
Grasso et al. Bioinformatics 2003
4/17
Phylogenetic trees (maximum likelihood)
Durbin, chap. 8

4/22
Project proposal defense Project 2  proposal due
Passover
Project presentation schedule
Homework 3
Project 1 Test Data

4/24
Project proposal defense Passover
Project presentations

4/29
RNA folding
Durbin, chap. 10 Zuker, Science 1989
5/1
Protein folding (in a lattice)  Lecture 25 Hart, Istrail, STOC 1995
Berger, Leighton,  RECOMB 1998
5/6
Protein folding (threading) Lecture 26 Bowie, Luthy, Eisenberg, Science 1991
Lathrop, Protein Eng., 1994
Lathrop, Smith, 1996
5/8
Short-read sequencing: alignment challenges

5/13
Recap
Q&A
Final project due
5/19
Final exam (10:30-12:30) GOOD LUCK