Date
| Topics
| Slides/Lecture
notes | Add'l
resources |
1/29 | Introduction. Molecular biology primer.
| Lecture
notes 1 | |
1/31
| Molecular
biology primer. | Lecture notes 2
|
|
2/5
| Exact matching |
Gusfield chap. 1
| |
2/7
| Boyer-Moore
algorithm | Gusfield chap.
2 | |
2/12
| Proof that
Boyer-Moore runs in linear time | Gusfield chap. 3
| |
2/14
| Proof
that Boyer-Moore runs in linear time
KMP algorithm | Gusfield chap.
3
Gusfield chap. 2 | |
2/19
| KMP algorithm
Aho-Corasick algorithm
Proof Aho-Corasick construction is linear time | Gusfield
chap. 2
Gusfield chap. 3 | |
2/21
| Proof
Aho-Corasick construction is linear time
Aho-Corasick if patterns can be substrings of another patterns
Matching with don't care symbols
Suffix trees - introduction
Using suffix trees for exact matching | Gusfield chap.
3
Gusfield chap. 6 | |
2/26
| Suffix trees -
Ukkonen's construction | Gusfield chap. 6 | |
2/28
| Suffix
trees - Ukkonen's construction cont'd. | Gusfield chap.
6 | Giegerich
and Kurtz, Algorithmica, 19:331 (1997) |
3/4
| Suffix trees -
implementation details
Suffix trees - applications | Gusfield chap. 6
Gusfield chap. 7 Project
1 handed out
(no homework) | Note: Project 1 assumes
all sequences are "forward" (no need to reverse complement) Text.fa
Pattern.fa
|
3/6
| Suffix
arrays
Data compression - Lempel-Ziv, Burrows-Wheeler | Gusfield chap.
7 |
Manber, U. and Myers, G. SODA, 1990. Abouelhoda,
Kurtz, Ohlebusch . WABI 2002 Wikipedia Lempel-Ziv
entry Wikipedia
Burrows-Wheeler entry |
13/11 | Inexact matching (edit distance,
global alignment) | Gusfield chap. 11
| |
3/13
| Inexact
matching (sequence similarity, affine gap scores) | Gusfield chap.
11 | |
3/17 | Spring Break | | |
3/20 | Spring
Break | | |
3/25 | Inexact matching
(heuristics: linear space, bounded error, chaining algorithm) |
Gusfield chap. 12
Gusfield chap. 13 | |
3/27 | Inexact
matching (heuristics: MUMmer, FASTA, BLAST, minimizers, significance
scores), p-values/e-values) | |
Delcher et.
al. NAR, 1999 (MUMmer) Delcher et
al. NAR, 2002 (MUMmer)
Lipman, Pearson. Science, 1985 (FASTA) Pearson,
Lipman. PNAS, 1988 (FASTA)
Altshul et al. J. Mol. Bio, 1990 (BLAST) Roberts
et al. Bioinformatics, 2004 (Minimizers)
|
4/1 | Inexact
matching (significance scores, p-values/e-values)
Multiple sequence alignment | Durbin, 2.1, 2.7, 2.8
Gusfield chap. 14
Gusfield chap. 15 | Korf,
Yandell, Bedell. BLAST, O'Reilly, 2003. chap. 4
|
4/3 | Midterm exam | | |
4/8 |
Multiple sequence alignment
| Gusfield chap. 14 | |
4/10 | Multiple
sequence alignment | Gusfield chap.
14 Final
project handed out | |
4/15
| Multiple sequence
alignment | Project 1 due Gusfield chap. 14
| |
4/17 | Multiple
sequence alignment - practical considerations (Muscle, ClustalW, ...)
Phylogenetic trees (intro) | Durbin, chap. 7
| Anson,
Myers. J. Comp. Bio. 1997 Gribskov
et al. PNAS 1987 Thompson
et al. NAR, 1994 Edgar,
NAR 2004 Edgar,
BMC Bioinformatics 2004 Lee
et al. Bioinformatics 2002 Lee,
Bioinformatics 2003 Grasso
et al. Bioinformatics 2003 |
4/22 |
Project proposal defense |
Passover | |
4/24 | Phylogenetic
trees (distance based methods) | Passover Durbin, chap. 7 | |
4/29
| Phylogenetic trees (maximum likelihood) | Durbin, chap. 8 | |
5/1 | RNA
folding | Durbin, chap. 10
| Zuker,
Science 1989
|
5/6 | Protein
folding (on a lattice) | Lecture notes 25 | Hart,
Istrail, STOC 1995 Berger,
Leighton, RECOMB 1998 |
5/8 | Protein folding (threading) | Lecture
notes 26
| Bowie,
Luthy, Eisenberg, Science 1991 Lathrop,
Protein Eng., 1994 Lathrop,
Smith, 1996 |
5/13 |
Recap Q&A |
Final project due | |
5/19 | Final exam (10:30-12:30) | GOOD LUCK | |