Gene Finding
CBCB faculty: Steven Salzberg, Art Delcher, Mihaela Pertea
Gene Finding Tools
Gene finding related softwares created by our group include:
- Glimmer a system for finding genes in microbial DNA, especially
the genomes of bacteria and archaea. - TWAIN
a new syntenic gene finder which employs a Generalized Pair
Hidden Markov Model (GPHMM) to predict genes in two closely related
eukaryotic genomes simultaneously. - GlimmerHMM a Generalized Hidden Markov Model
(GHMM) gene-finder which makes use of the
techniques implemented previously by GlimmerM: splice site modules
and Interpolated Markov Models. - GeneZilla, a gene finder
based on the GHMM framework, similar to
Genscan and Genie. - GeneSplicer a fast,
flexible system for detecting splice sites in the genomic DNA of
various eukaryotes. - ExAlt a Phylogenetic Generalized Hidden Markov Model for finding
alternatively spliced exons. - JIGSAW a program that predicts gene models using the output
from other annotation software; it uses a statistical algorithm to
identify patterns of evidence corresponding to gene models. - RBSfinder
a Perl script that implements an algorithm to find
ribosome binding sites for genes in bacterial and archaeal genomes.
All of the software tools are OSI Certified Open
Source Software.
Motif and Regulatory SitePrediction
- Crab a
regulatory site prediction web resource; includes 80 archaeal and
bacterial genomes - OperonDB an operon prediction website
- TransTerm
a program that finds rho-independent transcription
terminators in bacterial genomes. - ELPH a general-purpose
Gibbs sampler for finding motifs in a set of DNA or protein sequences. - SEE ESE an online
tool for identifying and visualizing exon splicing enhancers (ESEs).
Gene Finding Training Data Sets
Aspergillus.fumigatus.tar.gz
Arabidopsis.thaliana.tar.gz
Homo.sapiens.tar.gz
Mus.musculus.tar.gz
Plasmodium.falciparum.tar.gz
See for a central repository for gene finding and other related topics.
Publications
-
Allen JE and Salzberg SL.
A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons.
Algorithms for Molecular Biology. 2006 1:14. -
Allen JE, Majoros WH, Pertea M, and Salzberg SL.
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.
Genome Biology. 2006 7(S1):S9. -
Allen JE and Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction.
Bioinformatics. 2005 21(18):3596-3603. -
Majoros WH, Pertea M, Salzberg SL. Efficient
implementation of a generalized pair hidden Markov model for
comparative gene finding.
Bioinformatics. 2005 21(9):1782-1788. -
Majoros WH, Pertea M, Delcher AL,
Salzberg SL. Efficient
decoding algorithms for generalized hidden
Markov model gene finders.
BMC Bioinformatics. 2005 Jan 24;6(1):16 - Majoros WH, Pertea M, Salzberg SL.
TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders.
Bioinformatics.2004 Nov 1;20(16):2878-9. - Allen JE, Pertea M, Salzberg SL.
Computational gene prediction using multiple sources of evidence.
Genome Res. 2004 Jan;14(1):142-8. - Majoros WH, Pertea M, Antonescu C, Salzberg SL.
GlimmerM, Exonomy and Unveil: three abinitio eukaryotic genefinders.
Nucleic Acids Res. 2003 Jul 1;31(13):3601-4. - Weinel C, Ermolaeva MD, Ouzounis C.
PseuRECA: genome annotation and gene context analysis for Pseudomonas aeruginosa PAO1.
Bioinformatics. 2003 Aug 12;19(12):1457-60. - Pertea, M. and Salzberg, S.L.
Computational gene finding in plants.Plant Mol Biol 2002; 48(1-2):39-48. - Pertea, M. and Salzberg, S.L.
Using GlimmerM to find genes in eukaryotic genomes.
Current Protocols in Bioinformatics,2002. - Pertea M, Lin X, Salzberg SL.
GeneSplicer: a new computational method for splice site prediction.
Nucleic Acids Res. 2001 Mar 1;29(5):1185-90. - Ermolaeva MD, White O, Salzberg SL. Prediction of operons in microbial genomes.Nucleic Acids Res. 2001 Mar 1;29(5):1216-21.
- Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL.A probabilistic method for identifying start codons in bacterial genomes.
Bioinformatics. 2001 Dec;17(12):1123-30. - Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith and Steven L. Salzberg.
Prediction of Transcription Terminators in Bacterial Genomes.
J Mol Biol 301, (1), 27-33 (2000) - Pertea M, Salzberg SL, Gardner MJ.
Finding genes in Plasmodium falciparum.
Nature,2000 Mar 2;404(6773):34. - A.L. Delcher, D. Harmon, S. Kasif, O.White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER(306K, PDF format) Nucleic Acids Research,1999, 27:23, 4636-4641.
- Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H.Interpolated Markov models for eukaryotic gene finding. Genomics. 1999 Jul 1;59(1):24-31.
- S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models (73K, PDF format) Nucleic Acids Research 26:2 (1998b), 544-548. Reproduced with permission from NAR Online at
http://www.oup.co.uk/nar.