NIH Project R01-LM06845: Computational Gene Modeling and Genome
Sequence Assembly
Project PI: Steven L. Salzberg, Ph.D.
Senior Personnel: Arthur L. Delcher, Ph.D., Mihaela Pertea, Ph.D.,
Mihai Pop, Ph.D.
Software systems supported by this
grant
The MUMmer genome alignment software, http://mummer.sourceforge.net/
A Modular Open-Source assembler (AMOS), http://amos.sourceforge.net/
A comparative genome assembler, AMOScmp, http://amos.sourceforge.net/docs/pipeline/AMOScmp.html
Minimus, an assembler for small genome sequencing projects,
http://amos.sourceforge.net/docs/pipeline/minimus.html
GlimmerHMM, a eukaryotic genefinder, at http://cbcb.umd.edu/software/glimmerhmm/
TWAIN, a gene finder for finding genes in two genomes in parallel, at http://cbcb.umd.edu/software/twain/twaindoc.html
GeneZilla, a eukaryotic gene finder, at http://www.genezilla.org/
JIGSAW, a software system for combining the results of multiple gene
finding methods, at http://cbcb.umd.edu/software/jigsaw/
AutoEditor, software for automated correction of sequencing and
basecaller errors, http://www.tigr.org/software/autoeditor/
GeneSplicer, software for predicting splice sites in eukaryotic
genomes, at http://cbcb.umd.edu/software/GeneSplicer/
TransTerm, a system for finding transcription terminators in bacteria,
at http://cbcb.umd.edu/software/
RepeatFinder, software for finding and characterizing repetitive
sequences in complete and partial genomes,
http://cbcb.umd.edu/software/RepeatFinder/
Selected publications supported by this grant
- Large-scale
sequencing of human influenza reveals the dynamic nature of viral
genome evolution. (Reprint)
(Abstract)
E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V.
Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T.
Tatusova,
Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K.
Taubenberger, and S.L. Salzberg. Nature
(2005), 1162-1166.
- Serendipitous discovery
of Wolbachia genomes in multiple Drosophila species.
(local
PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop,
D.R. Smith, M.B. Eisen, and W.C. Nelson. Genome
Biology 2005, 6:R23.
- Efficient
implementation of a generalized pair hidden Markov model for
comparative gene finding. W.H. Majoros, M. Pertea, and S.L.
Salzberg. Bioinformatics 21:9
(2005), 1782-88.
- Efficient
decoding algorithms for generalized hidden Markov model gene finders.
W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg. BMC
Bioinformatics 6 (2005), 16.
- The
genome assembly archive: a new public resource. S.L.
Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004),
1273-1275.
- Yeast
rises again. S.L. Salzberg, Nature
423 (2003), 233-234.
- Comparative
genome assembly. M. Pop, A. Phillippy, A.L. Delcher,
S.L. Salzberg, Briefings in
Bioinformatics 5:3 (2004), 237-248.
- Automated
correction of genome sequence errors. P. Gajer, M. Schatz,
and S.L. Salzberg. Nucleic
Acids Research 32:2 (2004), 562-569. This describes the
AutoEditor system, with open source code
available here.
- M. Pop. Shotgun
sequence
assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June
2004.
- TigrScan
and GlimmerHMM: two open source ab initio eukaryotic gene-finders.
W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 20:16 (2004),
2878-79.
- An
empirical analysis of training protocols for probabilistic gene finders.
W.H. Majoros and S.L. Salzberg. BMC
Bioinformatics 5 (2004), 206.
- Versatile
and open software for comparing large genomes. S. Kurtz, A.
Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L.
Salzberg. Genome Biology
5:R12 (2004), http://genomebiology.com/2004/5/2/R12. The is the MUMmer3 paper, with open
source code available here.
- DAGChainer:
A tool for mining segmental genome duplications and synteny.
B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg.
Bioinformatics 20:18 (2004), 3643-6.
- Hierarchical
scaffolding with Bambus. M. Pop, D. Kosack, and S.L.
Salzberg. Genome Research
14(2004), 149-159. This describes our open source system for the
scaffolding phase of genome assembly.
- Computational
gene prediction using multiple sources of evidence. J.E.
Allen, M. Pertea, and S.L. Salzberg. Genome Research 14(2004),
142-148. This describes our open source system for producing a
gene prediction based on multiple gene finders, alignment programs, and
other evidence.
- Comparative
genome sequencing for discovery of novel polymorphisms in Bacillus
anthracis. T.D. Read, S.L. Salzberg, M. Pop, M. Shumway,
L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp,
D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002),
2028-2033.
- Fast
algorithms for large-scale genome alignment and comparison (Abstract)
(Full text
PDF)
A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic
Acids Research 30:11 (2002), 2478-2483. (This is the MUMmer 2
paper.)
- Full-length
messenger RNA sequences greatly improve genome annotation.
B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A.
Feldmann, R.B. Flavell, O. White, and S.L. Salzberg. Genome
Biology 3:6 (2002), research0029.1-12.
- M. Pop, S. L. Salzberg, M. Shumway. Genome Sequence
Assembly: Algorithms and Issues. IEEE Computer 35(7) 2002, pp.
47-54.
- Microbial
Genes in the Human Genome: Lateral Transfer or Gene Loss? (Abstract)(Full
text) (PDF
file) S.L. Salzberg, O. White, J. Peterson, and J.A. Eisen, Science 292 (2001),
1903-1906. See also the Enhanced
Perspective in Science. ANNOTATED! See
the
annotated version of this paper, designed to help students and
teachers of science, developed by the SCOPE project and the Editors of Science.
- GeneSplicer: a
new computational method for splice site prediction M. Pertea, X.
Lin, and S.L. Salzberg. Nucleic Acids Research 29:5 (2001)
1185-1190.
- A
probabilistic method for identifying start codons in bacterial genomes.
B.E. Suzek, M.D. Ermolaeva, M. Schreiber, and S.L. Salzberg. Bioinformatics
17:12 (2001), 1123-1130.
- Prediction
of operons in microbial genomes. M.D. Ermolaeva, O. White and
S.L. Salzberg. Nucleic Acids Research 29:5 (2001),
1216-1221.
- A
clustering method for repeat analysis in DNA sequences. N.
Volfovsky, B.J. Haas, and S.L. Salzberg. Genome
Biology 2:8 (2001), research0027:1-11. This describes
the RepeatFinder
software.
- Finding
genes in Plasmodium falciparum chromosome 3. M.
Pertea, S.L. Salzberg, and M.J. Gardner. Nature 404 (2000), 34.
- Prediction
of transcription terminators in bacterial genomes (get
abstract). M.D. Ermolaeva, H. Khalak, O. White, H.O. Smith,
and S.L. Salzberg. J. Molecular Biology 301 (2000), 27-33.
- Improved
microbial gene identification with GLIMMER A.L. Delcher, D.
Harmon, S. Kasif, O. White, and S.L. Salzberg. Nucleic Acids
Research, 27:23 (1999), 4636-4641.
|
|