Date
| Topics
| Slides/Lecture
notes | Add'l
resources |
1/29 | Introduction. Molecular biology primer.
| Lecture
notes 1 | |
1/31
| Exact matching | Gusfield chap. 1 Homework 1 Lecture notes
|
|
2/5
| Boyer-Moore algorithm |
Gusfield chap. 2 Lecture notes
| |
2/7
| Boyer-Moore
algorithm - Giancarlo Apostolico variation | Gusfield chap. 3 | |
2/12
| Proof that
Boyer-Moore runs in linear time | Gusfield chap. 3 Lecture notes
| |
2/14
| KMP algorithm Aho-Corasick algorithm Proof Aho-Corasick construction is linear time
| Gusfield chap.
3
Gusfield chap. 2 | |
2/19
| Proof
Aho-Corasick construction is linear time
Aho-Corasick if patterns can be substrings of another patterns
Matching with don't care symbols
Suffix trees - introduction
Using suffix trees for exact matching | Gusfield
chap. 6
Gusfield chap. 3 Lecture notes | |
2/21
| Suffix trees - Ukkonen's construction
| Gusfield chap. 6 Homework 2 | |
2/26
| Suffix trees -
Ukkonen's construction cont'd suffix trees implementation details suffix trees applications | Gusfield chap. 6 Gusfield chap. 7 Project 1 handed out | Giegerich
and Kurtz, Algorithmica, 19:331 (1997) Note: Project 1 assumes
all sequences are "forward" (no need to reverse complement) Text.fa
Pattern.fa |
2/28
| Suffix trees - applications. suffix arrays | Gusfield chap.
6 Gusfield chap. 7 Lecture notes |
Manber, U. and Myers, G. SODA, 1990. Abouelhoda,
Kurtz, Ohlebusch . WABI 2002 |
3/4
| Suffix arrays - cont. Data compression - Lempel Ziv, Burrows-Wheeler |
Gusfield chap. 7
| Wikipedia Lempel-Ziv
entry Wikipedia
Burrows-Wheeler entry
|
3/6
| Inexact matching (edit distance, global alignment) | Gusfield chap. 11 Lecture notes | |
13/11 | Inexact
matching (sequence similarity, affine gap scores) | Gusfield chap. 11
| |
3/13
| Inexact matching
(heuristics: linear space, bounded error, chaining algorithm) | Gusfield chap. 12
Gusfield chap. 13 | |
3/17 | Spring Break | | |
3/20 | Spring
Break | | |
3/25 | Inexact
matching (heuristics: MUMmer, FASTA, BLAST, minimizers, significance
scores), p-values/e-values) |
| Delcher et.
al. NAR, 1999 (MUMmer) Delcher et
al. NAR, 2002 (MUMmer)
Lipman, Pearson. Science, 1985 (FASTA) Pearson,
Lipman. PNAS, 1988 (FASTA)
Altshul et al. J. Mol. Bio, 1990 (BLAST) Roberts
et al. Bioinformatics, 2004 (Minimizers) |
3/27 | Inexact
matching (significance scores, p-values/e-values)
Multiple sequence alignment | Durbin, 2.1, 2.7, 2.8
Gusfield chap. 14
Gusfield chap. 15 |
Korf,
Yandell, Bedell. BLAST, O'Reilly, 2003. chap. 4 Practice exam
|
4/1 | Multiple sequence alignment | Project 1 due Gusfield, chap. 14 |
|
4/3 | Midterm exam | | |
4/8 |
Midterm recap Multiple sequence alignment
| Gusfield chap. 14 | |
4/10 | Multiple
sequence alignment Practical considerations Muscle, ClustalW, reAligner Motif finding | Gusfield chap.
14 Project2 | Anson,
Myers. J. Comp. Bio. 1997 Gribskov
et al. PNAS 1987 Thompson
et al. NAR, 1994 Edgar,
NAR 2004 Edgar,
BMC Bioinformatics 2004 |
4/15
| Multiple sequence
alignment - partial order alignment Phylogenetic trees (distance based methods) | Gusfield chap. 14 Durbin, chap. 7
| Lee
et al. Bioinformatics 2002 Lee,
Bioinformatics 2003 Grasso
et al. Bioinformatics 2003 |
4/17 | Phylogenetic trees (maximum likelihood)
| Durbin, chap. 8
| |
4/22 |
Project proposal defense |
Project 2 proposal due Passover Project presentation schedule Homework 3 Project 1 Test Data | |
4/24 | Project proposal defense | Passover Project presentations | |
4/29
| RNA folding | Durbin, chap. 10 | Zuker,
Science 1989 |
5/1 | Protein folding (in a lattice) |
Lecture 25 | Hart,
Istrail, STOC 1995 Berger,
Leighton, RECOMB 1998
|
5/6 | Protein
folding (threading) | Lecture 26 | Bowie,
Luthy, Eisenberg, Science 1991 Lathrop,
Protein Eng., 1994 Lathrop,
Smith, 1996 |
5/8 | Short-read sequencing: alignment challenges
|
| |
5/13 |
Recap Q&A |
Final project due | |
5/19 | Final exam (10:30-12:30) | GOOD LUCK | |