Date
|
Topics
|
Slides/Lecture notes
|
Add'l resources
|
1/26
|
Introduction
|
Lecture 1
|
|
1/28
|
Exact matching preprocessing with Z algorithm KMP
algorithm
|
Gusfield
scan (TOC, chaps 1,2, 3.3)
Also in CLR book, section 32.4
Lecture
notes (from Chris)
|
|
2/2
|
KMP continued
Boyer Moore algorithm
|
Lecture
notes from Mark
|
Exact matching code
from Geet
|
2/4
|
Multiple patterns (Aho Corasick) Dealing with don't care
symbols
Suffix trees intro
|
Lecture notes from
Geet
|
|
2/9
|
SNOW DAY
|
|
|
2/11
|
SNOW DAY
|
|
|
2/16
|
Suffix trees Ukkonen's algorithm
Homework 1
|
Lecture notes from
Koyel
|
E.
Ukkonen. (1995). On-line construction of suffix trees.
Algorithmica 14(3):249-260.
Gusfield
chap 5
Gusfield
chap. 6
|
2/18
|
Suffix arrays Burrows-Wheeler Transform Compressed suffix
arrays
|
Lecture notes from
Hyongtae
|
Manber,
Myers (1990). Suffix arrays: a new method for on-line string
searches. SODA
http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
A
block-sorting lossless data compression algorithm. Burrows and
Wheeler. DEC TR 124, 1994
Ferragina,
Manzini. Opportunistic Data Structures with Applications. FOCS
2000.
Karkkainen.
Fast BWT in small space by blockwise suffix sorting. Theor. Comp.
Sci. 387(3): 249-257 (2007).
Karkkainen,
Manzini, Puglisi. Permuted longest-common prefix array. CPM 2009.
Khan,
Bloom, Kruglyak, Singh. A practical algorithm for finding maximal
exact matches in large sequence datasets using sparse suffix
arrays. Bioinformatics 25(13):1609-1616 (2009)
|
2/23
|
Suffix arrays, BWT, continued.
|
Notes from Raul
|
Project
suggestions
|
2/25
|
BWT, continued
|
Notes from
Derrick
|
|
3/2
|
Inexact alignment Dynamic programming - Smith Waterman,
Needleman-Wunsch
|
Homework 2
Notes from
Ted
|
Smith, Temple F.; and Waterman, Michael S. (1981).
"Identification
of Common Molecular Subsequences". Journal
of Molecular Biology 147: 195–197.
doi:10.1016/0022-2836(81)90087-5.
Needleman, Saul B.; and Wunsch, Christian D. (1970). "A
general method applicable to the search for similarities in the
amino acid sequence of two proteins". Journal of
Molecular Biology 48 (3): 443–53.
|
3/4
|
Dynamic programming...cont
|
Notes from
Zheng
|
|
3/9
|
Dynamic programming...cont
Recap
|
Notes from Emre
|
|
3/11
|
Midterm
|
|
|
3/16
|
Spring break
|
|
|
3/18
|
Spring break
|
|
|
3/23
|
Alignment heuristics Exclusion methods Fasta,
Blast Spaced seeds "Inexact seeds"
Dealing with long strings:chaining
algorithms MUMmer
|
Note
from Wikum
|
Kun-Mao Chao, William R. Pearson, Webb Miller. Aligning
two sequences within a specified diagonal band. Bioinformatics
8(5):481-487
G.M. Landau, U. Vishkin. Introducing
efficient parallelism into approximate string matching and a new
serial algorithm. STOC 1986.
|
3/25
|
Dealing with short strings reversed
indices - Maq, SOAP, others backtracking on suffix array –
Bowtie inexact alignment on suffix arrays
|
Notes from
Ferhan
|
Eugene W. Myers. A
sublinear algorithm for approximate keyword searching.
Algorithmica.12(4-5):345-374 (1994)
Bin Ma, John Tromp, Ming Li. PatternHunter:
faster and more sensitive homology search. Bioinformatics
18(3):440:445 (2002)
Uri Keich, Ming Li, Bin Ma, John Tromp. On
spaced seeds for similarity search. Discrete Appl. Math.
138(3):253-263 (2004)
Brona Brejova, Daniel G. Brown, Tomas Vinar. Vector
seeds: an extension to spaced seeds. Journal of Computer and
System Sciences. 70(3):364-380 (2005)
Art L. Delcher, Adam M. Phillippy, Jane Carleton, Steven L.
Salzberg. Fast
Algorithms for Large-scale Genome Alignment and Comparison.
Nucleic Acids Research. 30(11):2478-2483.(2002)
Mohammad Ghodsi, Mihai Pop. Inexact
local alignment search over suffix arrays
Proceedings of the IEEE International Conference on Bioinformatics
and Biomedicine (BIBM), 83-87 (2009)
Ben Langmead, Cole Trapnell, Mihai Pop, Steven L. Salzberg.
Ultrafast and
memory-efficient alignment of short DNA sequences to the human
genome. Genome Biol
10:R25. (2009)
Heng Li, J. Ruan, Richard Durbin. Mapping
short DNA sequencing reads and calling variants using mapping
quality scores. Genome Research. 18(11):1851-1858 (2008)
Ruiqiang Li, Yingrui Li, Karsten Kristiansen, Jun Wang. SOAP:
short oligonucleotide alignment program Bioinformatics,24
(5):713–714 (2008) doi:10.1093/bioinformatics/btn025
|
3/30
|
Inexact alignment...cont'd
|
Notes from Sam
|
|
4/1
|
Multiple sequence alignment exact
algorithms approximations - star alignment tree-guided
alignment
|
|
Thompson, Higgins, Gibson. ClustalW:
improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, position-specific gap
penalties, and weight-matrix choice. NAR 22(22):4673-4680.
1994
R.C. Edgar. Muscle:
multiple sequence alignment with high accuracy and high
throughput. NAR 32(5):1792-1797. 2004.
Notredame, Higgins, Herringa.
T-coffee:
a novel method for fast and accurate multiple sequence alignment.
J. Mol. Biol. 302:205-217. 2000.
Passover 3/30-4/6
Good Friday 4/2
|
4/6
|
Motif finding
Whole-genome multiple alignment
Mauve
RNA structure Structural alignments hardness
results Profile HMMs
|
|
Darling, Mau, Blattner,
Perna. Mauve: Multiple
alignment of conserved genomic sequence with rearrangements.
Genome Res.14:1394-1403. 2004.
Lawrence, Altshul, Boguski, Liu, Neuwald, Wootton. Detecting
subtle sequence signals: A Gibbs sampling strategy for multiple
alignments. Science 262(5131): 208-214. 1993.
Bailey, Noble. Searching
for statistically significant regulatory modules.
Bioinformatics. 19(s2):ii16 2003
Jiang, Lin, Ma, Zheng. A
general edit distance between RNA structures. J. Comp. Biol.
9(2):371-388. 2002.
Eddy. Profile
Hidden Markov Models. Bioinformatics. 14(9):755-763. 1998.
Nawrocki, Kolbe, Eddy. Infernal
1.0: inference of RNA alignments Bioinformatics
25(10):1335-1337
|
4/8
|
Profile alignments HMM Alignments
RNA Structure
|
Multiple
alignment notes
|
Altschul, Madden, Schaffer, Zhang, Zhang, Miller, Lipman.
Gapped
BLAST and PSI-BLAST: a new generation of protein database search
programs. NAR 25(17):3389-3402. 1997.
Eddy. Profile
Hidden Markov Models. Bioinformatics. 14(9):755-763. 1998.
Nawrocki, Kolbe,
Eddy. Infernal
1.0: inference of RNA alignments Bioinformatics
25(10):1335-1337
|
4/13
|
Structure-sequence alignment
Sequence clustering – basics Distance-based
methods Neighbor Joining Picking the right cut - VI-cut
|
Homework 3
RNA Structure and
alignment notes
|
Jiang, Lin, Ma,
Zheng. A
general edit distance between RNA structures. J. Comp. Biol.
9(2):371-388. 2002.
Navlakha, White, Nagarajan, Pop, Kingsford.
Finding
Biologically Accurate Clusterings in Hierarchical Tree
Decompositions using the Variation of Information. RECOMB
2009.
|
4/15
|
phylogenetics: parsimony/maximum
likelihood The need for speed: Cd-hit, DNAClust, Uclust
Graph theory in sequence analysis introduction to sequence
assembly overlap-layout consensus paradigm Eulerian paradigm
|
Sequence
clustering notes
|
Saitou, Nei. The
neighbor-joining method: a new method for reconstructing
phylogenetic trees.Mol. Biol. Evol. 4(4):406-425. 1987.
Cd-hit: a fast program for clustering and comparing large
sets of protein or nucleotide sequences", Weizhong Li &
Adam Godzik Bioinformatics, (2006) 22:1658-9. Open
source PDF Pubmed
|
4/20
|
Phymm/PhymmBL
Dr. Pop out of town
|
|
Brady, Salzberg. Phymm
and PhymmBL: metagenomic phylogenetic classification with
interpolated Markov models. Nature Methods. 6:673-676 (2009)
|
4/22
|
Parallel algorithms BlastReduce Crossbow others
Dr. Pop out of town
|
|
Schatz, M.C., CloudBurst:
highly sensitive read mapping with MapReduce.
Bioinformatics, 2009. 25(11): p. 1363-9.
Langmead, B., et al., Searching
for SNPs with cloud computing. Genome Biol, 2009.
10(11): p. R134.
|
4/27
|
Sequence assembly counting Eulerian tours NP-hardness
results
|
Genome
assembly notes
Counting
spanning trees (Thanks Sam!!)
|
Myers. Towards
simplifying and accurately formulating fragment assembly.
J.Comp. Biol. 2(2):275. 1995.
Pevzner, Tang, Waterman. An
Eulerian path approach to DNA fragment assembly. PNAS
98(17):9748. 2001
Myers. The
genome assembly string graph. Bioinformatics 21. 2005.
Medvedev, Georgiou, Myers, Brudno. Computability
of models for sequence assembly. WABI 2007.
Nagarajan, Pop. Parametric
complexity of sequence assembly. J.Comp.Biol. 16(7):897. 2009
|
4/29
|
Graph simplifications How useful are mate-pairs?
|
|
Kingsford, Schatz, Pop. Assembly
complexity of prokaryotic genomes using short reads. BMC
Bioinformatics. 11:21. 2010.
|
5/4
|
Genome assembly validation Detecting breakpoints and
rearrangements
|
Assembly
validation notes
|
Phillippy, Schatz, Pop. Assembly
forensics. Genome Biology 9:R55. 2008.
Choi, Kim, Tang, Andrews, Gilbert, Colbourne. A
Machine Learning Approach to Combined Evidence Validation of
Genome Assemblies. Bioinformatics 24(6):744-750. 2008
Hormozidari, Alkan, Eichler, Sahinalp. Combinatorial
algorithms for structural variation detection in high-throughput
sequenced genomes. Genome Research 19:1270-1278. 2009.
|
5/6
|
Optical mapping
|
Optical maps
notes
|
Nagarajan, Read, Pop. Scaffolding
and validation of bacterial genome assemblies using optical
restriction maps. Bioinformatics 24(10):1229-1235. 2008.
Anantharaman, T., B. Mishra,
et al. Genomics
via optical mapping III: contiging genomic DNA. ISMB'99.
|
5/11
|
TBA
|
Final project due
PROJECT
DELIVERABLES
|
|
5/19
|
Final exam (10:30-12:30)
|
GOOD LUCK
|
|