NIH Project R01-LM007938: Bioinformatics Software for Analyzing Microbial Genomes

Project PI: Steven L. Salzberg, Ph.D.

Other Personnel: Arthur L. Delcher, Ph.D., Mihaela Pertea, Ph.D., Carl Kingsford, Ph.D.

Software systems supported by this grant



Glimmer, a gene finding system for bacteria, archaea, and viruses, http://cbcb.umd.edu/software/glimmer/

The MUMmer genome alignment software, http://mummer.sourceforge.net/

TransTermHP, a system for finding transcription terminators in bacteria, at http://transterm.cbcb.umd.edu

OperonDB, a database of operons in microbial genomes, at http://www.cbcb.umd.edu/cgi-bin/operons/operons.cgi

GlimmerHMM, a eukaryotic gene finder, at http://cbcb.umd.edu/software/glimmerhmm/

GeneZilla, a eukaryotic gene finder, at http://www.genezilla.org/

A comparative genome assembler, AMOScmp, http://amos.sourceforge.net/docs/pipeline/AMOScmp.html

Minimus, an assembler for small genome sequencing projects,
http://amos.sourceforge.net/docs/pipeline/minimus.html

RepeatFinder, software for finding and characterizing repetitive sequences in complete and partial genomes,
http://cbcb.umd.edu/software/RepeatFinder/

Publications supported by this grant


  1. Bioinformatics challenges of new sequencing technology.  Mihai Pop and Steven L. Salzberg, Trends in Genetics 24:3 (2008), 142-149.
  2. Automated eukaryotic gene structure annotation using EVidenceModeler.  B.J. Haas, S.L. Salzberg, et al.  Genome Biology 2008, 9:R7.
  3. Identifying bacterial genes and endosymbiont DNA with Glimmer. A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Bioinformatics 2007 Mar 15;23(6):673-9. This is the Glimmer 3 paper.
  4. Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.
    C.L. Kingsford, K. Ayanbule, and S.L. Salzberg.  Genome Biology 2007;8(2):R22.
  5. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes.  C. Kingsford, A.L. Delcher, and S.L. Salzberg.  Molec. Biol. and Evol 24:9 (2007),  2091-98.
  6. Hawkeye: an interactive visual analytics tool for genome assemblies.  M. Schatz, A.M. Phillippy, B. Shneiderman, and S.L. Salzberg.  Genome Biology 2007 Mar 9;8(3):R34.
  7. Minimus: a fast, lightweight genome assembler.  D.D. Sommer, A.L. Delcher, S.L. Salzberg, and M. Pop.  BMC Bioinformatics 2007 Feb 26;8:64.
  8. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. J.E. Allen and S.L. Salzberg. Algorithms for Molecular Biology 1:14 (2006).
  9. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.  J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg.  Genome Biology 2006, 7(Suppl):S9.
  10. Genome analysis linking recent European and African influenza (H5N1) virusesSteven L. Salzberg, Carl Kingsford, Giovanni Cattoli, David J. Spiro, Daniel A. Janies, Mona Mehrez Aly, Ian H. Brown, Emmanuel Couacy-Hymann, Gian Mario De Mia, Do Huu Dung, Annalisa Guercio, Tony Joannis, Ali Safar Maken Ali, Azizullah Osmani, Iolanda Padalino, Magdi D. Saad, Vladimir Savić, Naomi A. Sengamalay, Samuel Yingst, Jennifer Zaborsky, Olga Zorman-Rojs, Elodie Ghedin, and Ilaria Capua. Emerging Infectious Diseases 13:5 (May 2007).
  11. B.J. Haas and S.L. Salzberg. Finding repeats in genome sequences.  In Bioinformatics – From Genomes to Therapies, Volume 1: Molecular Sequences and Structures (T. Lengauer, ed.).  Weinheim, Germany: Wiley-VCH, 2007, 197-234.
  12. Genome re-annotation: a wiki solution? S.L. Salzberg. Genome Biology 2007, 8:102. Highly accessed.
  13. It is time to end the patenting of software.  J. Quackenbush and S.L. Salzberg.  Bioinformatics 22:12 (2006), 1416-7.
  14. Beware of mis-assembled genomes.  S.L. Salzberg and J.A. Yorke.  Bioinformatics 21:24 (2005), 4320-21.
  15. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.  (Reprint) (Abstract) E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K. Taubenberger, and S.L. Salzberg.  Nature (2005), 1162-1166.
  16. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.  (local PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson.  Genome Biology 2005, 6:R23.
  17. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding.  W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
  18. Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses. E.C. Holmes, E. Ghedin, N. Miller, J. Taylor, Y. Bao, K. St. George, B.T. Grenfell, S.L. Salzberg, C.M. Fraser, D.J. Lipman, and J.K. Taubenberger.  PLoS Biology 3:9 (2005), e300.
  19. Efficient decoding algorithms for generalized hidden Markov model gene finders.  W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg.  BMC Bioinformatics 6 (2005), 16.
  20. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. B.J. Loftus et al., Science 307:5713 (2005), 1321-1324.
  21. The genome assembly archive: a new public resource.  S.L. Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004), 1273-1275.
  22. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.  W.H. Majoros, M. Pertea, and S.L. Salzberg.  Bioinformatics 20:16 (2004), 2878-79.
  23. Yeast rises again.  S.L. Salzberg, Nature 423 (2003), 233-234.
  24. Comparative genome assemblyM. Pop, A. Phillippy, A.L. Delcher, S.L. Salzberg, Briefings in Bioinformatics 5:3 (2004), 237-248.
  25. Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath).  N. Ward, et al., PLoS Biology 10:2 (2004), e303.
  26. Automated correction of genome sequence errors.  P. Gajer, M. Schatz, and S.L. Salzberg.  Nucleic Acids Research 32:2 (2004), 562-569.  This describes the AutoEditor system, with open source code available here.
  27. Tools for gene finding and whole genome comparison.  S.L. Salzberg and A.L. Delcher.  In C.M. Fraser, T.D. Read, and K.E. Nelson (Eds.), Microbial Genomics,  Humana Press, 2004.
  28. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.  W.H. Majoros, M. Pertea, and S.L. Salzberg.  Bioinformatics 20:16 (2004), 2878-79.
  29. An empirical analysis of training protocols for probabilistic gene finders.  W.H. Majoros and S.L. Salzberg.  BMC Bioinformatics 5 (2004), 206.
  30. Bioinformatics methods for microbial detection and forensic diagnostic design.  T.R. Slezak and S.L. Salzberg.  In S. Schutzer, R. Breeze, and B. Budowle (Eds.), Microbial Forensics.  Academic Press, 2005.
  31. Versatile and open software for comparing large genomes.  S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.  Genome Biology 5:R12 (2004), http://genomebiology.com/2004/5/2/R12.  The is the MUMmer3 paper, with open source code available here.
  32. DAGChainer: A tool for mining segmental genome duplications and synteny.  B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg.  Bioinformatics 20:18 (2004), 3643-6.
  33. Hierarchical scaffolding with Bambus. M. Pop, D. Kosack, and S.L. Salzberg.  Genome Research 14(2004), 149-159.  This describes our open source system for the scaffolding phase of genome assembly.
  34. Computational gene prediction using multiple sources of evidence.  J.E. Allen, M. Pertea, and S.L. Salzberg.  Genome Research 14(2004), 142-148.  This describes our open source system for producing a gene prediction based on multiple gene finders, alignment programs, and other evidence.