Syllabus for CMSC858P: Algorithms for Bioscience Analysis (Spring 2008)


Course information


Lectures

Date
Topics
Slides/Lecture notes
Add'l resources
1/29
Introduction. Molecular biology primer.
Lecture notes 1

1/31
Molecular biology primer.
Lecture notes 2

2/5
Exact matching Gusfield chap. 1

2/7
Boyer-Moore algorithm Gusfield chap. 2

2/12
Proof that Boyer-Moore runs in linear time Gusfield chap. 3

2/14
Proof that Boyer-Moore runs in linear time
KMP algorithm 
Gusfield chap. 3
Gusfield chap. 2

2/19
KMP algorithm
Aho-Corasick algorithm
Proof Aho-Corasick construction is linear time
Gusfield chap. 2
Gusfield chap. 3

2/21
Proof Aho-Corasick construction is linear time
Aho-Corasick if patterns can be substrings of another patterns
Matching with don't care symbols
Suffix trees - introduction
Using suffix trees for exact matching
Gusfield chap. 3
Gusfield chap. 6

2/26
Suffix trees - Ukkonen's construction Gusfield chap. 6

2/28
Suffix trees - Ukkonen's construction cont'd.  Gusfield chap. 6
Giegerich and Kurtz, Algorithmica, 19:331 (1997)
3/4
Suffix trees - implementation details
Suffix trees - applications
Gusfield chap. 6
Gusfield chap. 7
Project 1 handed out
(no homework)
Note: Project 1 assumes all sequences are "forward" (no need to reverse complement)
Text.fa
Pattern.fa
3/6
Suffix arrays
Data compression - Lempel-Ziv, Burrows-Wheeler
Gusfield chap. 7

Manber, U. and Myers, G. SODA, 1990.
Abouelhoda, Kurtz, Ohlebusch .  WABI 2002
Wikipedia Lempel-Ziv entry
Wikipedia Burrows-Wheeler entry
13/11
Inexact matching (edit distance, global alignment)  Gusfield chap. 11


3/13
Inexact matching (sequence similarity, affine gap scores)  Gusfield chap. 11

3/17Spring Break
3/20Spring Break
3/25
Inexact matching  (heuristics: linear space, bounded error, chaining algorithm) Gusfield chap. 12
Gusfield chap. 13

3/27
Inexact matching (heuristics: MUMmer, FASTA, BLAST, minimizers, significance scores), p-values/e-values)
Delcher et. al. NAR, 1999 (MUMmer)
Delcher et al. NAR, 2002 (MUMmer)
Lipman, Pearson.  Science, 1985 (FASTA)
Pearson, Lipman.  PNAS, 1988 (FASTA)
Altshul et al.  J. Mol. Bio, 1990 (BLAST)
Roberts et al.  Bioinformatics, 2004 (Minimizers)

4/1
Inexact matching (significance scores, p-values/e-values)
Multiple sequence alignment
Durbin, 2.1, 2.7, 2.8
Gusfield chap. 14
Gusfield chap. 15

Korf, Yandell, Bedell.  BLAST, O'Reilly, 2003.  chap. 4
4/3
Midterm exam

4/8
Multiple sequence alignment 
Gusfield chap. 14

4/10
Multiple sequence alignment  Gusfield chap. 14
Final project handed out

4/15
Multiple sequence alignment Project 1 due
Gusfield chap. 14

4/17
Multiple sequence alignment - practical considerations (Muscle, ClustalW, ...)
Phylogenetic trees (intro)
Durbin, chap. 7
Anson, Myers.  J. Comp. Bio. 1997
Gribskov et al.  PNAS 1987
Thompson et al.  NAR, 1994
Edgar, NAR 2004
Edgar, BMC Bioinformatics 2004
Lee et al. Bioinformatics 2002
Lee, Bioinformatics 2003
Grasso et al. Bioinformatics 2003
4/22
Project proposal defense Passover

4/24
Phylogenetic trees (distance based methods) Passover
Durbin, chap. 7

4/29
Phylogenetic trees (maximum likelihood)
Durbin, chap. 8
5/1
RNA folding Durbin, chap. 10 Zuker, Science 1989
5/6
Protein folding (on a lattice) Lecture notes 25 Hart, Istrail, STOC 1995
Berger, Leighton,  RECOMB 1998
5/8
Protein folding (threading) Lecture notes 26 Bowie, Luthy, Eisenberg, Science 1991
Lathrop, Protein Eng., 1994
Lathrop, Smith, 1996
5/13
Recap
Q&A
Final project due
5/19
Final exam (10:30-12:30) GOOD LUCK