Center for Bioinformatics and Computational Biology

CBCB faculty: Steven Salzberg, Art Delcher, Mihaela Pertea

Gene Finding Tools

Gene finding related softwares created by our group include:
  • Glimmer a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea.
  • TWAIN a new syntenic gene finder which employs a Generalized Pair Hidden Markov Model (GPHMM) to predict genes in two closely related eukaryotic genomes simultaneously.
  • GlimmerHMM a Generalized Hidden Markov Model (GHMM) gene-finder which makes use of the techniques implemented previously by GlimmerM: splice site modules and Interpolated Markov Models.
  • GeneZilla, a gene finder based on the GHMM framework, similar to Genscan and Genie.
  • GeneSplicer a fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes.
  • ExAlt a Phylogenetic Generalized Hidden Markov Model for finding alternatively spliced exons.
  • JIGSAW a program that predicts gene models using the output from other annotation software; it uses a statistical algorithm to identify patterns of evidence corresponding to gene models.
  • RBSfinder a Perl script that implements an algorithm to find ribosome binding sites for genes in bacterial and archaeal genomes.
All of the software tools are OSI Certified Open Source Software.

Motif and Regulatory SitePrediction

  • Crab a regulatory site prediction web resource; includes 80 archaeal and bacterial genomes
  • OperonDB an operon prediction website
  • TransTerm a program that finds rho-independent transcription terminators in bacterial genomes.
  • ELPH a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences.
  • SEE ESE an online tool for identifying and visualizing exon splicing enhancers (ESEs).

Gene Finding Training Data Sets

Plasmodium.falciparum.tar.gz

See for a central repository for  gene finding and other related topics.



Publications

  1. Allen JE and Salzberg SL. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. Algorithms for Molecular Biology. 2006 1:14.
  2. Allen JE, Majoros WH, Pertea M, and Salzberg SL. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biology. 2006 7(S1):S9.
  3. Allen JE and Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics. 2005 21(18):3596-3603.
  4. Majoros WH, Pertea M, Salzberg SL. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics. 2005 21(9):1782-1788.
  5. Majoros WH, Pertea M, Delcher AL, Salzberg SL. Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinformatics. 2005 Jan 24;6(1):16
  6. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders. Bioinformatics.2004 Nov 1;20(16):2878-9.
  7. Allen JE, Pertea M, Salzberg SL. Computational gene prediction using multiple sources of evidence. Genome Res. 2004 Jan;14(1):142-8.
  8. Majoros WH, Pertea M, Antonescu C, Salzberg SL. GlimmerM, Exonomy and Unveil: three abinitio eukaryotic genefinders. Nucleic Acids Res. 2003 Jul 1;31(13):3601-4.
  9. Weinel C, Ermolaeva MD, Ouzounis C. PseuRECA: genome annotation and gene context analysis for Pseudomonas aeruginosa PAO1. Bioinformatics. 2003 Aug 12;19(12):1457-60.
  10. Pertea, M. and Salzberg, S.L. Computational gene finding in plants.Plant Mol Biol 2002; 48(1-2):39-48.
  11. Pertea, M. and Salzberg, S.L. Using GlimmerM to find genes in eukaryotic genomes. Current Protocols in Bioinformatics,2002.
  12. Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001 Mar 1;29(5):1185-90.
  13. Ermolaeva MD, White O, Salzberg SL. Prediction of operons in microbial genomes.Nucleic Acids Res. 2001 Mar 1;29(5):1216-21.
  14. Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL. A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics. 2001 Dec;17(12):1123-30.
  15. Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith and Steven L. Salzberg. Prediction of Transcription Terminators in Bacterial Genomes. J Mol Biol 301, (1), 27-33 (2000)
  16. Pertea M, Salzberg SL, Gardner MJ. Finding genes in Plasmodium falciparum. Nature,2000 Mar 2;404(6773):34. 
  17. A.L. Delcher, D. Harmon, S. Kasif, O.White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER(306K, PDF format) Nucleic Acids Research,1999, 27:23, 4636-4641.
  18. Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H. Interpolated Markov models for eukaryotic gene finding. Genomics. 1999 Jul 1;59(1):24-31.
  19.  S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models (73K, PDF format) Nucleic Acids Research 26:2 (1998b), 544-548. Reproduced with permission from NAR Online at http://www.oup.co.uk/nar