Syllabus for CMSC858E: Algorithms for Bioscience Analysis (Fall 2006)

Course information

Lectures


Date
Topics
Slides/Lecture notes
Add'l resources
8 / 31
Introduction. Molecular biology primer.
Homework 1
Lecture notes 1

9 / 5
Molecular biology primer.
Lecture notes 2

9 / 7
Exact matching Gusfield chap. 1
Homework 2

9 / 12
Boyer-Moore algorithm Gusfield chap. 2

9 / 14
Proof that Boyer-Moore runs in linear time Gusfield chap. 3
Homework 3

9 / 19
Proof that Boyer-Moore runs in linear time
KMP algorithm 
Gusfield chap. 3
Gusfield chap. 2

9 / 21
KMP algorithm
Aho-Corasick algorithm
Proof Aho-Corasick construction is linear time
Gusfield chap. 2
Gusfield chap. 3
Homework 4

9 / 26
Proof Aho-Corasick construction is linear time
Aho-Corasick if patterns can be substrings of another patterns
Matching with don't care symbols
Suffix trees - introduction
Using suffix trees for exact matching
Gusfield chap. 3
Gusfield chap. 6

9 / 28
Suffix trees - Ukkonen's construction Gusfield chap. 6
Homework 5

10 / 3
Suffix trees - Ukkonen's construction cont'd.  Gusfield chap. 6
Giegerich and Kurtz, Algorithmica, 19:331 (1997)
10 / 5
Suffix trees - implementation details
Suffix trees - applications
Gusfield chap. 6
Gusfield chap. 7
Project 1 handed out
(no homework)
Note: Project 1 assumes all sequences are "forward" (no need to reverse complement)
Text.fa
Pattern.fa
10 / 10
Suffix arrays
Data compression - Lempel-Ziv, Burrows-Wheeler
Gusfield chap. 7

Manber, U. and Myers, G. SODA, 1990.
Abouelhoda, Kurtz, Ohlebusch .  WABI 2002
Wikipedia Lempel-Ziv entry
Wikipedia Burrows-Wheeler entry
10 / 12
Inexact matching (edit distance, global alignment)  Gusfield chap. 11
Homework 6

10 / 17
Inexact matching (sequence similarity, affine gap scores)  Gusfield chap. 11

10 / 19
Inexact matching  (heuristics: linear space, bounded error, chaining algorithm) Gusfield chap. 12
Gusfield chap. 13
no homework

10 / 24
Inexact matching (heuristics: MUMmer, FASTA, BLAST, minimizers, significance scores), p-values/e-values)
Delcher et. al. NAR, 1999 (MUMmer)
Delcher et al. NAR, 2002 (MUMmer)
Lipman, Pearson.  Science, 1985 (FASTA)
Pearson, Lipman.  PNAS, 1988 (FASTA)
Altshul et al.  J. Mol. Bio, 1990 (BLAST)
Roberts et al.  Bioinformatics, 2004 (Minimizers)

10 / 26
Inexact matching (significance scores, p-values/e-values)
Multiple sequence alignment
Durbin, 2.1, 2.7, 2.8
Gusfield chap. 14
Gusfield chap. 15
Homework 7

Korf, Yandell, Bedell.  BLAST, O'Reilly, 2003.  chap. 4
10 / 31
Multiple sequence alignment Gusfield chap. 14
Project 1 due

11 / 2
Midterm exam


11 / 7
Multiple sequence alignment  Gusfield chap. 14
Final project handed out

11 / 9
Multiple sequence alignment Gusfield chap. 14
Homework 8

11 / 14
Multiple sequence alignment - practical considerations (Muscle, ClustalW, ...)
Phylogenetic trees (intro)
Durbin, chap. 7
Anson, Myers.  J. Comp. Bio. 1997
Gribskov et al.  PNAS 1987
Thompson et al.  NAR, 1994
Edgar, NAR 2004
Edgar, BMC Bioinformatics 2004
Lee et al. Bioinformatics 2002
Lee, Bioinformatics 2003
Grasso et al. Bioinformatics 2003
11 / 16
Project proposal defense Project proposals due
Homework 9

11 / 21
Phylogenetic trees (distance based methods) Durbin, chap. 7


11 / 23
Thanksgiving (no class)

Additional reading
11 / 28
Phylogenetic trees (maximum likelihood) Durbin, chap. 8

11 / 30
RNA folding Durbin, chap. 10
Zuker, Science 1989
12 / 5
Protein folding (on a lattice) Lecture notes 25
Hart, Istrail, STOC 1995
Berger, Leighton,  RECOMB 1998
12 / 7
Protein folding (threading) Lecture notes 26
Bowie, Luthy, Eisenberg, Science 1991
Lathrop, Protein Eng., 1994
Lathrop, Smith, 1996
12 / 12
Protein folding, recap, Q&A Lecture notes 27
Final project due

12/14
Final exam (8-10am)
GOOD LUCK!