Overview
GlimmerHMM is a new gene finder based on a Generalized Hidden
Markov Model (GHMM). Although the gene finder conforms to the overall
mathematical framework of a GHMM, additionally it incorporates splice
site models adapted from the GeneSplicer
program and a decision tree adapted from GlimmerM. It also utilizes
Interpolated Markov Models for the coding and noncoding models .
Currently, GlimmerHMM's GHMM structure includes introns of each phase,
intergenic regions, and four types of exons (initial, internal, final,
and single). A basic user manual can be consulted here.
System requirements
GlimmerHMM is released as source code and was tested on Linux
RedHat 6.x+, Sun Solaris, and Alpha OSF1, but should work on any Unix
system.
Accuracy
GlimmerHMM has been trained on several species including
Arabidopsis thaliana, Coccidioides species, Cryptococcus neoformans,
and Brugia malayi. New: trainings for C. elegans
and Danio rerio (zebrafish) are now available!
| |
Nuc Sens |
Nuc Spec |
Nuc Accur |
Exon Sens |
Exon Spec |
Exact Genes |
Size of test set |
| D.rerio |
93% |
78% |
86% |
77% |
69% |
24% |
549 genes |
| C.elegans |
96% |
95% |
96% |
82% |
81% |
42% |
1886 genes |
| Arabidopsis |
97% |
99% |
98% |
84% |
89% |
60% |
809 genes |
| Cryptococcus |
96% |
99% |
98% |
86% |
88% |
53% |
350 genes |
| Coccidioides |
99% |
99% |
99% |
84% |
86% |
60% |
503 genes |
| Brugia |
93% |
98% |
95% |
78% |
83% |
25% |
477 genes |
GlimmerHMM has been recently trained on human. The table below
presents its performace compared to Genscan on 963 human RefSeq genes
selected randomly from all 24 chromosomes, non-overlapping with the
training set. The test set contains 1000 bp of untranslated sequence on
either side (5' or 3') of the coding portion of each gene.
| |
Nuc Sens |
Nuc Spec |
Nuc Acc |
Exon Sens |
Exon Spec |
Exon Acc |
Exact Genes |
| GlimmerHMM |
86% |
72% |
79% |
72% |
62% |
67% |
17% |
| Genscan |
86% |
68% |
77% |
69% |
60% |
65% |
13% |
Obtaining GlimmerHMM
This software is OSI Certified
Open Source Software .
To download the complete GlimmerHMM system,
just click
here . New : human trainings are also
available!
After downloading, uncompress the distribution file by typing:
% tar -xzf GlimmerHMM.tar.gz
A directory named 'GlimmerHMM/' will be created which
contains the executable, training data sets, and other supporting
files.
Contact Information
Use this
form to contact us.
References
Majoros, W.H., Pertea, M.,
and Salzberg, S.L.
TigrScan
and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 20 2878-2879.
Pertea, M. and S. L. Salzberg (2002).
"Computational gene finding in plants." Plant Molecular Biology
48(1-2): 39-48.
The Arabidopsis Genome Initiative, (2000) "Analysis of the genome sequence of
the flowering plant Arabidopsis thaliana", Nature. Dec 14;
408(6814):796-815.
Pertea, M., S. L. Salzberg, et al. (2000).
"Finding genes in Plasmodium falciparum." Nature 404(6773): 34;
discussion 34-5.
Salzberg, S. L., M. Pertea, et al. (1999).
"Interpolated Markov models for eukaryotic gene finding." Genomics
59(1): 24-31.
Acknowledgements
The development of GlimmerHMM was supported by the NIH under grants R01-LM06845 and
R01-LM007938.
Back to the CBCB Software Page
|