This is the page for the old version 2.13 of Glimmer. The
lastest version of Glimmer
has reduced the number of false-positive predictions
and has improved the accuracy of start codon preditions.
For more about Glimmer3, including performance
comparisons with Glimmer2 and download information, click on
this link to Glimmer3
Glimmer v2.13 can still be downloaded as
glimmer213.tar.gz.
|
About Glimmer
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. Glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of some of these genomes and others is available at the Comprehensive Microbial Resource site.
For our eukaryotic gene finders go to the GlimmerHMM site .
|
Glimmer 2.13's Accuracy
Organism |
Notes |
Genes confirmed
by homology |
Found by
GLIMMER 2.13 |
Total genes annotated |
Total genes predicted |
A. ferrooxidans |
2 |
2054 |
2026 |
98.6% |
3215
|
3178
|
A. fulgidus |
2 |
1129 |
1128 |
99.9% |
2431
|
2475
|
B. anthracis |
2 |
3458 |
3444 |
99.6% |
5507
|
5395
|
B. subtilis |
3
|
4063
|
3979
|
97.9%
|
5231
|
4747
|
B. wolbachia |
2 |
712 |
710 |
99.7%
|
1299
|
1226
|
C. crescentus |
2 |
2205 |
2186 |
99.1% |
3763
|
3890
|
C. jejuni |
1 |
1341 |
1340 |
99.9% |
1886
|
1869
|
C. perfringens |
2 |
2153 |
2144 |
99.6% |
2974
|
2863
|
C. tepidum |
2 |
1304 |
1299 |
99.6% |
2281
|
2165
|
D. ethenogenes |
2 |
1141 |
1127 |
98.8%
|
1591
|
1544
|
E. coli |
2
|
861
|
855
|
99.3%
|
4174
|
4121
|
F. succinogenes |
2 |
2113 |
2105 |
99.6% |
3256
|
3210
|
G. sulfurreducens |
2 |
2462 |
2433 |
98.8% |
3468
|
3711
|
H. influenza |
2 |
1132 |
1131 |
99.9% |
1740
|
1785
|
H. pylori |
2 |
892 |
886 |
99.3% |
1587
|
1678
|
L. monocytogenes |
2 |
2084 |
2079 |
99.8% |
2847
|
2778
|
M. capsulatus |
2 |
2132 |
2093 |
98.2% |
3002
|
3434
|
M. tuberculosis |
2 |
2191 |
2177 |
99.4% |
4245
|
4245
|
N. meningitidis |
2 |
1202 |
1180 |
98.2% |
2154
|
2494
|
P. fluorescens |
2 |
4819 |
4790 |
99.4% |
6148
|
6968
|
P. gingivalis |
2 |
1254 |
1251 |
99.8% |
1988
|
2052
|
P. ruminicola |
2 |
2042 |
2040 |
99.9% |
2872
|
2907
|
S. agalactiae |
2 |
1487 |
1483 |
99.7% |
2053
|
2083
|
S. gordonii |
2 |
1702 |
1700 |
99.9% |
2090
|
2086
|
S. pneumoniae |
2 |
1425 |
1410 |
98.9% |
2236
|
2115
|
T. denticola |
3 |
1610 |
1603 |
99.6% |
2786
|
2743
|
T. maritima |
3 |
1101 |
1095 |
99.5% |
1872
|
1988
|
T. neapolitana |
3 |
1561 |
1556 |
99.7% |
1906
|
1964
|
T. pallidum |
3 |
576 |
571 |
99.1% |
1039
|
1069
|
V. spinosum |
2 |
3801 |
3791 |
99.7% |
9111
|
7332
|
Wolbachia |
2 |
746 |
745 |
99.9% |
1271
|
1256
|
Average Percentage Found |
- |
99.36% |
-
|
-
|
The table above shows Glimmer 2.13's accuracy for 31 complete
bacterial and archaeal genomes. Accuracy figures reflect
Glimmer's default settings for most parameters, while the
minimum gene length varies from 90bp to 150 bp for different
organisms. The majority of the genes missed were very short,
either below the minimum or very close to it.
To determine accuracy here we consider only "confirmed genes," which we define as genes that have a significant database match to a gene in another organism.
Notes:
The above results were obtained by different training procedures:
(1) Glimmer was trained by first extracting all non-overlapping open reading frames (using the long-orfs program that comes with the system).
(2) Glimmer was trained on all genes with a significant protein match based on translated Blast searches.
(3) Glimmer was trained on all annotated open reading frames.
The number of matches above represents both those orfs where
Glimmer finds only the correct stop codon and those for which
Glimmer finds both start and stop codons with the correct
coordinates. The number of start codons found can be improved
for most genomes by using the Perl program RBS-finder on the
original Glimmer output. This program moves the start codons
according to the position of likely ribosome binding
sites. Glimmer3 will have this RBS functionality built into
the code as well as many other improvements.
A sample of Glimmer 2.13 output for H. pylori is contained here
. The Glimmer 2.13 output format is explained in this readme
file .
|
Speed
For genomes under 2 megabases (e.g., H. pylori and
H. influenzae), Glimmer 2.13 requires under 30 seconds
for training (the build-icm program)
on a Linux PC powered by a 600 MHz Pentium processor and for
the biggest genome in the list above, V. spinosum (8.5 Mb) it
takes under 90 seconds. It then takes only 20 seconds
(the glimmer2 program) to find all the
genes in small genomes, up to 5 minutes for bigger genomes.
Note to Glimmer users: it is always preferable to train
Glimmer on a sample of genes from the same genome
that you are finding genes in. This is easy to do with any
bacterial genome, using the long-orfs program to extract long
open reading frames that can be used to bootstrap the
system. (This is explained in the readme files that come with
Glimmer.) If you wish to search for genes in a short fragment
of DNA, Glimmer needs to be trained on a longer sequence. The
best strategy is to train on a closely similar genome.
|
Obtaining Glimmer
This software is OSI Certified Open Source Software .
To download the complete Glimmer2.13 system , just click here .
After downloading, uncompress the distribution file by typing:
% tar -xzf glimmer213.tar.gz
A directory will be created which contains the source files, training data sets, and other supporting files. See the included "readme" files for more information or look at this README .
|
References
For a description of Glimmer 1.0 and 2.0 see our papers:
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER Nucleic Acids Research, 27:23 (1999), 4636-4641.
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models Nucleic Acids Research 26:2 (1998), 544-548.
|
Acknowledgements
Glimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845. |
|