Date	Topics	Slides/Lecture notes	Add'l resources
1/26	Introduction	Lecture 1
1/28	Exact matching preprocessing with Z algorithm KMP algorithm	Gusfield scan (TOC, chaps 1,2, 3.3) Also in CLR book, section 32.4 Lecture notes (from Chris)
2/2	KMP continued Boyer Moore algorithm	Lecture notes from Mark	Exact matching code from Geet
2/4	Multiple patterns (Aho Corasick) Dealing with don't care symbols Suffix trees intro	Lecture notes from Geet
2/9	SNOW DAY
2/11	SNOW DAY
2/16	Suffix trees Ukkonen's algorithm Homework 1	Lecture notes from Koyel	E. Ukkonen. (1995). On-line construction of suffix trees. Algorithmica 14(3):249-260. Gusfield chap 5 Gusfield chap. 6
2/18	Suffix arrays Burrows-Wheeler Transform Compressed suffix arrays	Lecture notes from Hyongtae	Manber, Myers (1990). Suffix arrays: a new method for on-line string searches. SODA http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform A block-sorting lossless data compression algorithm. Burrows and Wheeler. DEC TR 124, 1994 Ferragina, Manzini. Opportunistic Data Structures with Applications. FOCS 2000. Karkkainen. Fast BWT in small space by blockwise suffix sorting. Theor. Comp. Sci. 387(3): 249-257 (2007). Karkkainen, Manzini, Puglisi. Permuted longest-common prefix array. CPM 2009. Khan, Bloom, Kruglyak, Singh. A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Bioinformatics 25(13):1609-1616 (2009)
2/23	Suffix arrays, BWT, continued.	Notes from Raul	Project suggestions
2/25	BWT, continued	Notes from Derrick
3/2	Inexact alignment Dynamic programming - Smith Waterman, Needleman-Wunsch	Homework 2 Notes from Ted	Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197. doi:10.1016/0022-2836(81)90087-5. Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.
3/4	Dynamic programming...cont	Notes from Zheng
3/9	Dynamic programming...cont Recap	Notes from Emre
3/11	Midterm
3/16	Spring break
3/18	Spring break
3/23	Alignment heuristics Exclusion methods Fasta, Blast Spaced seeds "Inexact seeds" Dealing with long strings:chaining algorithms MUMmer	Note from Wikum	Kun-Mao Chao, William R. Pearson, Webb Miller. Aligning two sequences within a specified diagonal band. Bioinformatics 8(5):481-487 G.M. Landau, U. Vishkin. Introducing efficient parallelism into approximate string matching and a new serial algorithm. STOC 1986.
3/25	Dealing with short strings reversed indices - Maq, SOAP, others backtracking on suffix array – Bowtie inexact alignment on suffix arrays	Notes from Ferhan	Eugene W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica.12(4-5):345-374 (1994) Bin Ma, John Tromp, Ming Li. PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3):440:445 (2002) Uri Keich, Ming Li, Bin Ma, John Tromp. On spaced seeds for similarity search. Discrete Appl. Math. 138(3):253-263 (2004) Brona Brejova, Daniel G. Brown, Tomas Vinar. Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences. 70(3):364-380 (2005) Art L. Delcher, Adam M. Phillippy, Jane Carleton, Steven L. Salzberg. Fast Algorithms for Large-scale Genome Alignment and Comparison. Nucleic Acids Research. 30(11):2478-2483.(2002) Mohammad Ghodsi, Mihai Pop. Inexact local alignment search over suffix arrays Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 83-87 (2009) Ben Langmead, Cole Trapnell, Mihai Pop, Steven L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. (2009) Heng Li, J. Ruan, Richard Durbin. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research. 18(11):1851-1858 (2008) Ruiqiang Li, Yingrui Li, Karsten Kristiansen, Jun Wang. SOAP: short oligonucleotide alignment program Bioinformatics,24 (5):713–714 (2008) doi:10.1093/bioinformatics/btn025
3/30	Inexact alignment...cont'd	Notes from Sam
4/1	Multiple sequence alignment exact algorithms approximations - star alignment tree-guided alignment		Thompson, Higgins, Gibson. ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties, and weight-matrix choice. NAR 22(22):4673-4680. 1994 R.C. Edgar. Muscle: multiple sequence alignment with high accuracy and high throughput. NAR 32(5):1792-1797. 2004. Notredame, Higgins, Herringa. T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205-217. 2000. Passover 3/30-4/6 Good Friday 4/2
4/6	Motif finding Whole-genome multiple alignment Mauve RNA structure Structural alignments hardness results Profile HMMs		Darling, Mau, Blattner, Perna. Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res.14:1394-1403. 2004. Lawrence, Altshul, Boguski, Liu, Neuwald, Wootton. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignments. Science 262(5131): 208-214. 1993. Bailey, Noble. Searching for statistically significant regulatory modules. Bioinformatics. 19(s2):ii16 2003 Jiang, Lin, Ma, Zheng. A general edit distance between RNA structures. J. Comp. Biol. 9(2):371-388. 2002. Eddy. Profile Hidden Markov Models. Bioinformatics. 14(9):755-763. 1998. Nawrocki, Kolbe, Eddy. Infernal 1.0: inference of RNA alignments Bioinformatics 25(10):1335-1337
4/8	Profile alignments HMM Alignments RNA Structure	Multiple alignment notes	Altschul, Madden, Schaffer, Zhang, Zhang, Miller, Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25(17):3389-3402. 1997. Eddy. Profile Hidden Markov Models. Bioinformatics. 14(9):755-763. 1998. Nawrocki, Kolbe, Eddy. Infernal 1.0: inference of RNA alignments Bioinformatics 25(10):1335-1337
4/13	Structure-sequence alignment Sequence clustering – basics Distance-based methods Neighbor Joining Picking the right cut - VI-cut	Homework 3 RNA Structure and alignment notes	Jiang, Lin, Ma, Zheng. A general edit distance between RNA structures. J. Comp. Biol. 9(2):371-388. 2002. Navlakha, White, Nagarajan, Pop, Kingsford. Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions using the Variation of Information. RECOMB 2009.
4/15	phylogenetics: parsimony/maximum likelihood The need for speed: Cd-hit, DNAClust, Uclust Graph theory in sequence analysis introduction to sequence assembly overlap-layout consensus paradigm Eulerian paradigm	Sequence clustering notes	Saitou, Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol. Biol. Evol. 4(4):406-425. 1987. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences", Weizhong Li & Adam Godzik Bioinformatics, (2006) 22:1658-9. Open source PDF Pubmed
4/20	Phymm/PhymmBL Dr. Pop out of town		Brady, Salzberg. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods. 6:673-676 (2009)
4/22	Parallel algorithms BlastReduce Crossbow others Dr. Pop out of town		Schatz, M.C., CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 2009. 25(11): p. 1363-9. Langmead, B., et al., Searching for SNPs with cloud computing. Genome Biol, 2009. 10(11): p. R134.
4/27	Sequence assembly counting Eulerian tours NP-hardness results	Genome assembly notes Counting spanning trees (Thanks Sam!!)	Myers. Towards simplifying and accurately formulating fragment assembly. J.Comp. Biol. 2(2):275. 1995. Pevzner, Tang, Waterman. An Eulerian path approach to DNA fragment assembly. PNAS 98(17):9748. 2001 Myers. The genome assembly string graph. Bioinformatics 21. 2005. Medvedev, Georgiou, Myers, Brudno. Computability of models for sequence assembly. WABI 2007. Nagarajan, Pop. Parametric complexity of sequence assembly. J.Comp.Biol. 16(7):897. 2009
4/29	Graph simplifications How useful are mate-pairs?		Kingsford, Schatz, Pop. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 11:21. 2010.
5/4	Genome assembly validation Detecting breakpoints and rearrangements	Assembly validation notes	Phillippy, Schatz, Pop. Assembly forensics. Genome Biology 9:R55. 2008. Choi, Kim, Tang, Andrews, Gilbert, Colbourne. A Machine Learning Approach to Combined Evidence Validation of Genome Assemblies. Bioinformatics 24(6):744-750. 2008 Hormozidari, Alkan, Eichler, Sahinalp. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research 19:1270-1278. 2009.
5/6	Optical mapping	Optical maps notes	Nagarajan, Read, Pop. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10):1229-1235. 2008. Anantharaman, T., B. Mishra, et al. Genomics via optical mapping III: contiging genomic DNA. ISMB'99.
5/11	TBA	Final project due PROJECT DELIVERABLES
5/19	Final exam (10:30-12:30)	GOOD LUCK