Gene Finding

CBCB faculty: Steven Salzberg, Art Delcher, Mihaela Pertea

Gene Finding Tools

Gene finding related softwares created by our group include:

  • Glimmer a system for finding genes in microbial DNA, especially
    the genomes of bacteria and archaea.
  • TWAIN
    a new syntenic gene finder which employs a Generalized Pair
    Hidden Markov Model (GPHMM) to predict genes in two closely related
    eukaryotic genomes simultaneously.
  • GlimmerHMM a Generalized Hidden Markov Model
    (GHMM) gene-finder which makes use of the
    techniques implemented previously by GlimmerM: splice site modules
    and Interpolated Markov Models.
  • GeneZilla, a gene finder
    based on the GHMM framework, similar to
    Genscan and Genie.
  • GeneSplicer a fast,
    flexible system for detecting splice sites in the genomic DNA of
    various eukaryotes.
  • ExAlt a Phylogenetic Generalized Hidden Markov Model for finding
    alternatively spliced exons.
  • JIGSAW a program that predicts gene models using the output
    from other annotation software; it uses a statistical algorithm to
    identify patterns of evidence corresponding to gene models.
  • RBSfinder
    a Perl script that implements an algorithm to find
    ribosome binding sites for genes in bacterial and archaeal genomes.


All of the software tools are OSI Certified Open
Source Software
.


Motif and Regulatory SitePrediction

  • Crab a
    regulatory site prediction web resource; includes 80 archaeal and
    bacterial genomes
  • OperonDB an operon prediction website
  • TransTerm
    a program that finds rho-independent transcription
    terminators in bacterial genomes.
  • ELPH a general-purpose
    Gibbs sampler for finding motifs in a set of DNA or protein sequences.
  • SEE ESE an online
    tool for identifying and visualizing exon splicing enhancers (ESEs).


Gene Finding Training Data Sets


Aspergillus.fumigatus.tar.gz

Arabidopsis.thaliana.tar.gz
Homo.sapiens.tar.gz
Mus.musculus.tar.gz
Plasmodium.falciparum.tar.gz

See for a central repository for  gene finding and other related topics.

Publications

  1. Allen JE and Salzberg SL.
    A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons.

    Algorithms for Molecular Biology. 2006 1:14.
  2. Allen JE, Majoros WH, Pertea M, and Salzberg SL.
    JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.

    Genome Biology. 2006 7(S1):S9.
  3. Allen JE and Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction.
    Bioinformatics. 2005 21(18):3596-3603.
  4. Majoros WH, Pertea M, Salzberg SL. Efficient
    implementation of a generalized pair hidden Markov model for
    comparative gene finding
    .
    Bioinformatics. 2005 21(9):1782-1788.
  5. Majoros WH, Pertea M, Delcher AL,
    Salzberg SL. Efficient
    decoding algorithms for generalized hidden
    Markov model gene finders
    .
    BMC Bioinformatics. 2005 Jan 24;6(1):16
  6. Majoros WH, Pertea M, Salzberg SL.
    TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders.
    Bioinformatics.2004 Nov 1;20(16):2878-9.
  7. Allen JE, Pertea M, Salzberg SL.
    Computational gene prediction using multiple sources of evidence.
    Genome Res. 2004 Jan;14(1):142-8.
  8. Majoros WH, Pertea M, Antonescu C, Salzberg SL.
    GlimmerM, Exonomy and Unveil: three abinitio eukaryotic genefinders.
    Nucleic Acids Res. 2003 Jul 1;31(13):3601-4.
  9. Weinel C, Ermolaeva MD, Ouzounis C.
    PseuRECA: genome annotation and gene context analysis for Pseudomonas aeruginosa PAO1.
    Bioinformatics. 2003 Aug 12;19(12):1457-60.
  10. Pertea, M. and Salzberg, S.L.
    Computational gene finding in plants.Plant Mol Biol 2002; 48(1-2):39-48.
  11. Pertea, M. and Salzberg, S.L.
    Using GlimmerM to find genes in eukaryotic genomes.
    Current Protocols in Bioinformatics,2002.
  12. Pertea M, Lin X, Salzberg SL.
    GeneSplicer: a new computational method for splice site prediction.
    Nucleic Acids Res. 2001 Mar 1;29(5):1185-90.
  13. Ermolaeva MD, White O, Salzberg SL. Prediction of operons in microbial genomes.Nucleic Acids Res. 2001 Mar 1;29(5):1216-21.
  14. Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL.A probabilistic method for identifying start codons in bacterial genomes.
    Bioinformatics. 2001 Dec;17(12):1123-30.
  15. Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith and Steven L. Salzberg.
    Prediction of Transcription Terminators in Bacterial Genomes.
    J Mol Biol 301, (1), 27-33 (2000)
  16. Pertea M, Salzberg SL, Gardner MJ.
    Finding genes in Plasmodium falciparum.
    Nature,2000 Mar 2;404(6773):34. 
  17. A.L. Delcher, D. Harmon, S. Kasif, O.White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER(306K, PDF format) Nucleic Acids Research,1999, 27:23, 4636-4641.
  18. Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H.Interpolated Markov models for eukaryotic gene finding. Genomics. 1999 Jul 1;59(1):24-31.
  19.  S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models (73K, PDF format) Nucleic Acids Research 26:2 (1998b), 544-548. Reproduced with permission from NAR Online at
    http://www.oup.co.uk/nar