NIH Project R01-LM06845: Computational Gene Modeling and Genome Sequence Assembly

Project PI: Steven L. Salzberg, Ph.D.

Senior Personnel: Arthur L. Delcher, Ph.D., Mihaela Pertea, Ph.D., Mihai Pop, Ph.D.

Software systems supported by this grant



The MUMmer genome alignment software, http://mummer.sourceforge.net/

A Modular Open-Source assembler (AMOS), http://amos.sourceforge.net/

A comparative genome assembler, AMOScmp, http://amos.sourceforge.net/docs/pipeline/AMOScmp.html

Minimus, an assembler for small genome sequencing projects,
http://amos.sourceforge.net/docs/pipeline/minimus.html

GlimmerHMM, a eukaryotic genefinder, at http://cbcb.umd.edu/software/glimmerhmm/

TWAIN, a gene finder for finding genes in two genomes in parallel, at http://cbcb.umd.edu/software/twain/twaindoc.html

GeneZilla, a eukaryotic gene finder, at http://www.genezilla.org/

JIGSAW, a software system for combining the results of multiple gene finding methods, at http://cbcb.umd.edu/software/jigsaw/

AutoEditor, software for automated correction of sequencing and basecaller errors, http://www.tigr.org/software/autoeditor/

GeneSplicer, software for predicting splice sites in eukaryotic genomes, at http://cbcb.umd.edu/software/GeneSplicer/

TransTerm, a system for finding transcription terminators in bacteria, at http://cbcb.umd.edu/software/

RepeatFinder, software for finding and characterizing repetitive sequences in complete and partial genomes,
http://cbcb.umd.edu/software/RepeatFinder/

Selected publications supported by this grant


  1. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.  (Reprint) (Abstract) E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K. Taubenberger, and S.L. Salzberg.  Nature (2005), 1162-1166.
  2. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.  (local PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson.  Genome Biology 2005, 6:R23.
  3. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding.  W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
  4. Efficient decoding algorithms for generalized hidden Markov model gene finders.  W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg.  BMC Bioinformatics 6 (2005), 16.
  5. The genome assembly archive: a new public resource.  S.L. Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004), 1273-1275.
  6. Yeast rises again.  S.L. Salzberg, Nature 423 (2003), 233-234.
  7. Comparative genome assemblyM. Pop, A. Phillippy, A.L. Delcher, S.L. Salzberg, Briefings in Bioinformatics 5:3 (2004), 237-248.
  8. Automated correction of genome sequence errors.  P. Gajer, M. Schatz, and S.L. Salzberg.  Nucleic Acids Research 32:2 (2004), 562-569.  This describes the AutoEditor system, with open source code available here.
  9. M. Pop. Shotgun sequence assembly. Advances in Computers vol. 60, M. Zelkowitz ed. June 2004.
  10. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.  W.H. Majoros, M. Pertea, and S.L. Salzberg.  Bioinformatics 20:16 (2004), 2878-79.
  11. An empirical analysis of training protocols for probabilistic gene finders.  W.H. Majoros and S.L. Salzberg.  BMC Bioinformatics 5 (2004), 206.
  12. Versatile and open software for comparing large genomes.  S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.  Genome Biology 5:R12 (2004), http://genomebiology.com/2004/5/2/R12.  The is the MUMmer3 paper, with open source code available here.
  13. DAGChainer: A tool for mining segmental genome duplications and synteny.  B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg.  Bioinformatics 20:18 (2004), 3643-6.
  14. Hierarchical scaffolding with Bambus. M. Pop, D. Kosack, and S.L. Salzberg.  Genome Research 14(2004), 149-159.  This describes our open source system for the scaffolding phase of genome assembly.
  15. Computational gene prediction using multiple sources of evidence.  J.E. Allen, M. Pertea, and S.L. Salzberg.  Genome Research 14(2004), 142-148.  This describes our open source system for producing a gene prediction based on multiple gene finders, alignment programs, and other evidence.
  16. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002), 2028-2033.
  17. Fast algorithms for large-scale genome alignment and comparison (Abstract) (Full text PDF) A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids Research 30:11 (2002), 2478-2483.  (This is the MUMmer 2 paper.)
  18. Full-length messenger RNA sequences greatly improve genome annotation.  B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg.  Genome Biology 3:6 (2002), research0029.1-12.
  19. M. Pop, S. L. Salzberg, M. Shumway. Genome Sequence Assembly: Algorithms and Issues. IEEE Computer 35(7) 2002, pp. 47-54.
  20. Microbial Genes in the Human Genome: Lateral Transfer or Gene Loss? (Abstract)(Full text) (PDF file) S.L. Salzberg, O. White, J. Peterson, and J.A. Eisen, Science 292 (2001), 1903-1906.   See also the Enhanced Perspective in ScienceANNOTATED! See the annotated version of this paper, designed to help students and teachers of science, developed by the SCOPE project and the Editors of Science.
  21. GeneSplicer: a new computational method for splice site prediction M. Pertea, X. Lin, and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001) 1185-1190.
  22. A probabilistic method for identifying start codons in bacterial genomes.  B.E. Suzek, M.D. Ermolaeva, M. Schreiber, and S.L. Salzberg.  Bioinformatics 17:12 (2001), 1123-1130.
  23. Prediction of operons in microbial genomes. M.D. Ermolaeva, O. White and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001), 1216-1221.
  24. A clustering method for repeat analysis in DNA sequences.  N. Volfovsky, B.J. Haas, and S.L. Salzberg.  Genome Biology 2:8 (2001), research0027:1-11.  This describes the RepeatFinder software.
  25. Finding genes in Plasmodium falciparum chromosome 3.  M. Pertea, S.L. Salzberg, and M.J. Gardner. Nature 404 (2000), 34.
  26. Prediction of transcription terminators in bacterial genomes (get abstract).  M.D. Ermolaeva, H. Khalak, O. White, H.O. Smith, and S.L. Salzberg.  J. Molecular Biology 301 (2000), 27-33.
  27. Improved microbial gene identification with GLIMMER  A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:23 (1999), 4636-4641.