photograph not shown

Steven Salzberg's home page 


Director, Center for Bioinformatics and Computational Biology
Phillip H. and Catherine C. Horvitz Professor, Department of Computer Science
3125 Biomolecular Sciences Bldg #296, University of Maryland, College Park, MD 20742.
Affiliate Professor, Department of Cell Biology & Molecular Genetics.
Faculty member, Bioengineering graduate program
Phone: 301-405-5936. Email: s a l z b e r g (at) u m i a c s . u m d . e d u
Blog: genome.fieldofscience.com

My group's software: Glimmer, MUMmer, AMOS assembler, JIGSAW, TransTermHP, and others
Courses, current and past

Editorials and opinion pieces
The contents of the syringe, my Nature commentary on the influenza vaccine, 10 July 2008. Also in PubMed Central.
My opinion piece on genome annotation from Genome Biology, 1 February 2007 (subscription may be required; email me for a reprint).
My editorial on evolution and the flu from The Philadelphia Inquirer newspaper, Nov 2, 2005.
The letter to Nature from the GISAID consortium on rapid release of avian influenza data, Aug. 24 2006, signed by over 70 scientists from 34 countries.
Our letter to the editor of Nature in favor of rapid release of influenza genome data, with Elodie Ghedin and David Spiro, Nature 440 (30 Mar 2006), 605.
It is time to end the patenting of software.  J. Quackenbush and S.L. Salzberg.  Bioinformatics 22:12 (2006), 1416-7.
Beware of mis-assembled genomes.  S.L. Salzberg and J.A. Yorke.  Bioinformatics 21:24 (2005), 4320-21.
Nature journal club article about viruses as living organisms.  Nature 438 (10 Nov 2005), 133.
Our letter to the editor of Nature in favor of unrestricted access to genome data (with Ewan Birney, Sean Eddy, and Owen White), Nature 422 (2003), 801.

Selected publications (GenomicsBioinformatics, or older machine learning papers)

Genomics papers
  1. A whole-genome assembly of the domestic cow, Bos taurus. A.V. Zimin, A.L. Delcher, L. Florea, D.R. Kelley, M.C. Schatz, D. Puiu, F. Hanrahan, G. Pertea, C.P. Van Tassell, T.S. Sonstegard, G. Marcais, M. Roberts, P. Subramanian, J.A. Yorke, and S.L. Salzberg. Genome Biology 2009, 10:R42.
  2. Re-assembly of the genome of Francisella tularensis subsp. holarctica OSU18.  D. Puiu and S.L. Salzberg, PLoS ONE 3:10 (2008): e3427.
  3. The complete genome sequence of Bacillus anthracis Ames “Ancestor.”  J. Ravel et al., J. Bacteriology 191:1 (2009), 445-446.
  4. Comparative genomics of the neglected human malaria parasite Plasmodium vivax.  J.M. Carlton et al., Nature 455, 757-763 (9 October 2008).
  5. Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A.  Steven L. Salzberg et al., BMC Genomics 9:204 (2008).
  6. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus).  R. Ming et al., Nature 452 (2008), 991-6.
  7. Genome analysis linking recent European and African influenza (H5N1) virusesSteven L. Salzberg, Carl Kingsford, Giovanni Cattoli, David J. Spiro, Daniel A. Janies, Mona Mehrez Aly, Ian H. Brown, Emmanuel Couacy-Hymann, Gian Mario De Mia, Do Huu Dung, Annalisa Guercio, Tony Joannis, Ali Safar Maken Ali, Azizullah Osmani, Iolanda Padalino, Magdi D. Saad, Vladimir Savić, Naomi A. Sengamalay, Samuel Yingst, Jennifer Zaborsky, Olga Zorman-Rojs, Elodie Ghedin, and Ilaria Capua. Emerging Infectious Diseases 13:5 (May 2007).
  8. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes.  C. Kingsford, A.L. Delcher, and S.L. Salzberg.  Molec. Biol. and Evol 24:9 (2007),  2091-98.
  9. Draft Genome of the Filarial Nematode Parasite Brugia malayi. E. Ghedin et al., Science 317:5845 (2007), 1756-60.
  10. Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. J.M. Carlton, et al., Science 315 (2007), 207-212.
  11. Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote. J.A. Eisen, et al. PLoS Biology 4:9 (2006): e286.
  12. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.  (Reprint) (Abstract) E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K. Taubenberger, and S.L. Salzberg.  Nature (2005), 1162-1166.
  13. Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses. E.C. Holmes, E. Ghedin, N. Miller, J. Taylor, Y. Bao, K. St. George, B.T. Grenfell, S.L. Salzberg, C.M. Fraser, D.J. Lipman, and J.K. Taubenberger.  PLoS Biology 3:9 (2005), e300.  [Local PDF copy]
  14. Comparative Genomics of Trypanosomatid Parasitic Protozoa.  N.M. El-Sayed et al.  Science 309 (2005), 404-409.
  15. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans.  B.J. Loftus et al. Science 309 (Feb 25 2005), 1321-4.
  16. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.  (local PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson.  Genome Biology 2005, 6:R23.
  17. Yeast rises again.  S.L. Salzberg, Nature 423 (2003), 233-234.
  18. The genome assembly archive: a new public resource.  S.L. Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004), 1273-1275.  [Local PDF copy]
  19. Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath).  N. Ward, et al., PLoS Biology 10:2 (2004), e303.
  20. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002), 2028-2033.
  21. Genome sequence of the human malaria parasite Plasmodium falciparum.  M.J. Gardner et al., Nature 419 (2002), 498-511.
  22. The genome sequence of the malaria mosquito Anopheles gambiae.  R.A. Holt et al., Science 298 (2002), 129-149.
  23. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster.  E.M. Zdobnov et al., Science 298 (2002), 149-159.
  24. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii.  J.M. Carlton et al., Nature 419 (2002), 512-519.
  25. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome.  R.J. Mural et al. (176 authors).  Science 296 (2002), 1661-1671.
  26. Microbial Genes in the Human Genome: Lateral Transfer or Gene Loss? (Abstract) (Local PDF copy) S.L. Salzberg, O. White, J. Peterson, and J.A. Eisen, Science 292 (2001), 1903-1906.   See also the Enhanced Perspective in ScienceANNOTATED! See the annotated version of this paper, designed to help students and teachers of science, developed by the SCOPE project and the Editors of Science.
  27. The Sequence of the Human Genome.  (free at the Science website) J. Craig Venter et al. (274 authors), Science 291 (2001), 1304-1351.  Get the figures showing genome-scale duplications in PDF format here: [Page 1] [Page 2]
  28. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.  The Arabidopsis Genome Initiative (143 authors), Nature 408 (2000), 796-815.  (Also contains links to our papers on chromosomes 1, 2, and 3 of Arabidopsis.
  29. Evidence for symmetric chromosomal inversions around the replication origin in bacteria.  Jonathan A. Eisen, John F. Heidelberg, Owen White, and Steven L. Salzberg. Genome Biology 1:6 (2000), 1-9.
  30. Microbial genome sequencing.  Claire M. Fraser, Jonathan A. Eisen, and Steven L. Salzberg.  Nature 406 (2000), 799-803.
  31. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae.  John F. Heidelberg et al., Nature 406 (2000), 477-483.
  32. Genome sequences of Chlamydia trachomatis MoPn and C. pneumoniae AR39.  Timothy D. Read et al.,  Nucleic Acids Research 28:6 (2000), 1397-1406.
  33. Gene Index analysis of the human genome estimates approximately 120,000* genes.  F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush. Nature Genetics 25:2 (2000), 239-240. *Estimate corrected to 56,000 genes; Nature Genetics 26:4 (2000), 501.
  34. Sequence and Analysis of Chromosome 2 of Arabidopsis thaliana (get abstract).  Xiaoying Lin et al., Nature 402  (1999), 761-768.
  35. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58 (get abstract).  Herve Tettelin et al.  Science287 (2000), 1809-1815.
  36. Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project (PDF).  H. Tettelin, D. Radune, S. Kasif, H. Khouri, and S.L. Salzberg. Genomics 62(1999), 500-507.
  37. Genome Sequence of the Radioresistant Bacterium Deinococcus radiodurans R1 (get abstract).   Owen White et al. , Science 286 (1999), 1571-1577.
  38. DNA uptake signal sequences in naturally transformable bacteria.  H.O. Smith, M.L. Gwinn, and S.L. Salzberg.  Research in Microbiology, 150 (1999), 603-616.
  39. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima (get abstract).  Karen E. Nelson et al. ,  Nature 399 (1999), 323-329.
  40. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum (get abstract).  Malcolm J. Gardner et al., Science282 (1998), 1126-1132.
  41. Complete Genomic Sequence of Treponema pallidum, the Syphilis Spirochete.  C.M. Fraser et al., Science 281 (1998), 375-388.
  42. Genomic Sequence of a Lyme Disease Spirochaete, Borrelia burgdorferi.   C.M. Fraser et al., Nature 390 (1997), 580-586.
 Bioinformatics papers (and one book)
  1. Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeB. Langmead, C. Trapnell, M. Pop, and S.L. Salzberg.  Genome Biology 2009, 10:R25. doi:10.1186/gb-2009-10-3-r25.
  2. TopHat: discovering splice junctions with RNA-Seq.  C. Trapnell, L. Pachter, and S.L. Salzberg.  Bioinformatics, doi:10.1093/bioinformatics/btp120 (published in advance access online, March 16, 2009).
  3. How to map billions of short reads onto genomes.  C. Trapnell and S.L. Salzberg.  Nature Biotechnology 27:5 (2009), 455-7.
  4. OperonDB: a comprehensive database of predicted operons in microbial genomes.  M. Pertea, K. Ayanbule, M. Smedinghoff, and S.L. Salzberg. Nucleic Acids Research doi:10.1093/nar/gkn784.
  5. Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads.  S.L. Salzberg, D.D. Sommer,  D. Puiu, and V.T. Lee. PLoS Computational Biology 4:9 (2008): e1000186.
  6. Bioinformatics challenges of new sequencing technology.  Mihai Pop and Steven L. Salzberg, Trends in Genetics 24:3 (2008), 142-149.
  7. Automated eukaryotic gene structure annotation using EVidenceModeler.  B.J. Haas, S.L. Salzberg, et al.  Genome Biology 2008, 9:R7.
  8. Identifying bacterial genes and endosymbiont DNA with Glimmer. A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Bioinformatics 2007 Mar 15;23(6):673-9. This is the Glimmer 3 paper.
  9. Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.
    C.L. Kingsford, K. Ayanbule, and S.L. Salzberg.  Genome Biology 2007;8(2):R22.
  10. Comprehensive DNA signature discovery and validation.  A.M. Phillippy, J.A. Mason, K. Ayanbule, D.D. Sommer, E. Taviani, A. Huq, R.R. Colwell, I.T. Knight, and S.L. Salzberg.  PLoS Computational Biology 3:5 (2007), e98.
  11. Hawkeye: an interactive visual analytics tool for genome assemblies.  M. Schatz, A.M. Phillippy, B. Shneiderman, and S.L. Salzberg.  Genome Biology 2007 Mar 9;8(3):R34.
  12. Minimus: a fast, lightweight genome assembler.  D.D. Sommer, A.L. Delcher, S.L. Salzberg, and M. Pop.  BMC Bioinformatics 2007 Feb 26;8:64.
  13. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. J.E. Allen and S.L. Salzberg. Algorithms for Molecular Biology 1:14 (2006).
  14. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.  J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg.  Genome Biology 2006, 7(Suppl):S9.
  15. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding.  W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
  16. Efficient decoding algorithms for generalized hidden Markov model gene finders.  W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg.  BMC Bioinformatics 6 (2005), 16.
  17. Comparative genome assemblyM. Pop, A. Phillippy, A.L. Delcher, S.L. Salzberg, Briefings in Bioinformatics 5:3 (2004), 237-248.
  18. Automated correction of genome sequence errors.  P. Gajer, M. Schatz, and S.L. Salzberg.  Nucleic Acids Research 32:2 (2004), 562-569.  This describes the AutoEditor system, with open source code available here.
  19. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.  W.H. Majoros, M. Pertea, and S.L. Salzberg.  Bioinformatics 20:16 (2004), 2878-79.
  20. An empirical analysis of training protocols for probabilistic gene finders.  W.H. Majoros and S.L. Salzberg.  BMC Bioinformatics 5 (2004), 206.
  21. Versatile and open software for comparing large genomes.  S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.  Genome Biology 5:R12 (2004), http://genomebiology.com/2004/5/2/R12.  The is the MUMmer3 paper, with open source code available here.
  22. DAGChainer: A tool for mining segmental genome duplications and synteny.  B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg.  Bioinformatics 20:18 (2004), 3643-6.
  23. Hierarchical scaffolding with Bambus. M. Pop, D. Kosack, and S.L. Salzberg.  Genome Research 14(2004), 149-159.  This describes our open source system for the scaffolding phase of genome assembly.
  24. Computational gene prediction using multiple sources of evidence.  J.E. Allen, M. Pertea, and S.L. Salzberg.  Genome Research 14(2004), 142-148.  This describes our open source system for producing a gene prediction based on multiple gene finders, alignment programs, and other evidence.
  25. Fast algorithms for large-scale genome alignment and comparison (Abstract) (Full text PDF) A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids Research 30:11 (2002), 2478-2483.  (This is the MUMmer 2 paper.)
  26. Full-length messenger RNA sequences greatly improve genome annotation.  B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg.  Genome Biology 3:6 (2002), research0029.1-12.
  27. Book: Computational Methods in Molecular Biology (1998; in paperback since 1999) edited by S.L. Salzberg, D.B. Searls, and S. Kasif. See the table of contents here.
  28. GeneSplicer: a new computational method for splice site prediction M. Pertea, X. Lin, and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001) 1185-1190.
  29. A probabilistic method for identifying start codons in bacterial genomes.  B.E. Suzek, M.D. Ermolaeva, M. Schreiber, and S.L. Salzberg.  Bioinformatics 17:12, 1123-1130.
  30. Prediction of operons in microbial genomes. M.D. Ermolaeva, O. White and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001), 1216-1221.
  31. A clustering method for repeat analysis in DNA sequences.  N. Volfovsky, B.J. Haas, and S.L. Salzberg.  Genome Biology 2:8 (2001), research0027:1-11.  This describes the RepeatFinder software.
  32. Finding genes in Plasmodium falciparum chromosome 3.  M. Pertea, S.L. Salzberg, and M.J. Gardner. Nature 404 (2000), 34.
  33. An optimized protocol for analysis of EST sequences.  F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush.  Nucleic Acids Research 28:18 (2000), 3657-3665.
  34. Prediction of transcription terminators in bacterial genomes (get abstract).  M.D. Ermolaeva, H. Khalak, O. White, H.O. Smith, and S.L. Salzberg.  J. Molecular Biology 301 (2000), 27-33.
  35. Improved microbial gene identification with GLIMMER  A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:23 (1999), 4636-4641.
  36. Interpolated Markov models for eukaryotic gene finding.  S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin.  Genomics, 59 (1999), 24-31.  This describes the GlimmerM gene finder, available below.
  37. Alignment of Whole Genomes  A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:11 (1999), 2369-2376.  Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white.  Click here for the color figure (PDF).  This describes the MUMmer system, available below.
  38. Microbial gene identification using interpolated Markov models S.L. Salzberg, A.L. Delcher, S. Kasif, and O. White. Nucleic Acids Research, 26:2 (1998), 544-548. This paper describes the original Glimmer system (version 1.0), available here.
  39. Skewed Oligomers and Origins of Replication. S.L. Salzberg, A.J. Salzberg, A.R. Kerlavage, and J.-F. Tomb. Gene 217:1-2 (1998), 57-67.
  40. Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman. This describes the VEIL system for finding genes. Journal of Computational Biology 4:2 (1997), 127-141.
  41. A Decision Tree System for Finding Genes in DNA (preprint only).  S.L. Salzberg, A.L. Delcher, K. Fasman, and J. Henderson. Journal of Computational Biology 5:4 (1998), 667-680.
  42. A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA. S.L. Salzberg.  Computer Applications in the Biosciences (CABIOS) 13:4 (1997), 365-376. 
  43. Locating Protein Coding Regions in Human DNA using a Decision Tree Algorithm. S.L. Salzberg.  Journal of Computational Biology, 2:3 (1995), 473-485.

Bioinformatics software from my group, all open source

Computational Gene Finding

  1. Glimmer, a system that uses interpolated Markov models to find genes in microbial DNA. March 2003: New release, version 2.1, automatically optimizes ORF length for training.
  2. JIGSAW. a program that predicts gene models using the output from multiple sources of evidence, including other gene finders, Blast searches, and other alignment data.
  3. GlimmerHMM (formerly GlimmerM), an interpolated Markov Model system for finding genes in many eukaryotes, including P. falciparum, A. thaliana, rice (O. sativa), mosquito (A. aegypti), B. malayi, C. neoformans, and others.
  4. GeneZilla, a generalized HMM for eukaryotic gene finding that improves upon and replaces TigrScan, a 2003-vintage generalized HMM with a design similar to Genscan.
  5. GeneSplicer, a fast system for detecting splice sites in genomic DNA of various eukaryotes.
  6. PIRATE, a website collecting many links to our gene finders and others.

Genome assembly and large-scale genome alignment

  1. MUMmer, a system for aligning whole genomes, chromosomes, and other very long DNA sequences. Since April 2003: MUMmer 3.0 and later releases are open source.
  2. Bowtie, an ultrafast system for aligning short reads from next-generation sequencers to the human genome and any other genome.
  3. The AMOS Assembler project is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.
  4. Hawkeye, a flexible graphical interface to genome assemblies from a variety of assemblers.
  5. AMOScmp is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species.  See the journal paper here.
  6. MINIMUS is a small, lightweight assembler for small jobs such as assembling a viral genome, assembling a set of reads that match a single gene, or other tasks that don't require the complex infrastructure of a large-genome assembler.
  7. BAMBUS the first publicly available, standalone genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information.
  8. AutoEditor, a tool for correcting sequencing and basecaller errors using sequence assembly and chromatogram data. On average AutoEditor corrects 80% of erroneous base calls, with an accuracy of 99.99%.

Transcription terminators, operons, and motif analysis tools

  1. TransTermHP (new release, spring 2008), a program that finds rho-independent transcription terminators in bacterial genomes.
  2. OperonDB (new release, fall 2008), results from our operon-finding software on a large number of prokaryotic genomes.  (Originally described in Ermolaeva et al., Prediction of operons in microbial genomes.)
  3. ELPH, a motif finder that can find ribosome binding sites, exon splicing enhancers, or regulatory sites.
  4. SeeESE, an online tool for identifying exon splicing enhancers (ESEs) in Arabidopsis, Drosophila, and other species.
  5. Skewed oligomers from bacterial and archaeal genomes (described in Gene 217:1-2 (1998), listed above).  Get the source code or Linux executable here. Tables of skewed oligomers for: A. fulgidis, B. burgdorferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. jannaschii, M. pneumoniae, M. thermoautotrophicum, Synechocystis sp. PCC 6803, T. maritima, T. pallidum
Laboratory members, current
Former lab members

Machine Learning systems (pre-1995)

  1. The OC1 decision tree system (source code included)
  2. The PEBLS memory-based reasoning system (source code included)

Students' Ph.D. theses

Courses

Personal

My father, Herman Salzberg, has a home page at the University of South Carolina.
My brother Alan Salzberg is CEO of Analysis and Inference, Inc., a statistical consulting company.
My wife Claudia and I have two daughters, Annika and Alyssa.