About Glimmer

Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. Glimmer uses 3-periodic nonhomogenous Markov models in its IMMs.

Glimmer was the primary microbial gene finder used at The Institute for Genomic Research (TIGR), where it was first developed, and has been used to annotate the complete genomes of over 100 bacterial species from TIGR and other labs. Glimmer3 predictions are available for all NCBI RefSeq bacterial genomes at their ftp site.

For the eukaryotic version of Glimmer (really an entirely different program) go to the GlimmerHMM site .

Current Version:

Glimmer version 3.02 is the current version of the system.
Version 3.02 Release Notes        Download Glimmer v3.02
Glimmer has been pre-compiled for the Sun SPARC and Sun 64-bit (AMD) platforms by Mithun Sridharan. 
The previous version of Glimmer, v2.13, can still be downloaded by clicking here and is described on this page

Running Glimmer:

A Glimmer server is available on the NCBI website. To run Glimmer on your sequence, visit NCBI Glimmer

What's Changed from Glimmer2 to Glimmer3

Glimmer3 makes several algorithmic changes to reduce the number of false positive predictions and to improve the accuracy of start-site predictions. Changes also have been made in some program parameters and options, and in output formats. Some specific differences are:
  1. Glimmer2 used a set of rules to attempt to resolve overlaps between candidate orfs. When the overlap could not be resolved, both orfs were included in the prediction list, resulting in a high false-positive rate.
    Glimmer3 uses a dynamic programming algorithm to select the highest-scoring set of predictions consistent with the maximimum allowed overlap. This reduces the number of false positive predictions with little or no increase in the number of false negative predictions.
  2. Glimmer3 scores orfs in the reverse direction, i.e., from stop to start. This improves the accuracy of scores near the start codon because the trailing context of the ICM is within the coding region.
  3. The long-orfs program now uses an amino-acid distribution model to filter the set of candidate orfs before a set of long, non-overlapping orfs is selected.
  4. The make system and directory structure has been revised to separate source, object and executable files.
  5. Program options are now specified before required parameters (Unix style), rather than after (DOS style).
  6. The glimmer3 program produces two separate output files: a .detail file with information about all orfs (like the first part of Glimmer2 output); and a .predict file containing just the final predictions (like the last part of Glimmer2 output). glimmer3 requires a third parameter which is used to prefix the names of these files.
  7. Glimmer3 prediction coordinates now include the stop codon, and hence will differ from Glimmer2 values by 3.
  8. The glimmer3 program will process a multi-fasta sequence file. The outputs for each sequence are preceded by the fasta-header line in both the .detail and .predict files.
For more information on Glimmer3 see the Version 3.02 Release Notes

Glimmer3 vs. Glimmer2.13 Accuracy

Below are links to some comparisons of the results of Glimmer3 and Glimmer2 on 30 microbial genomes from RefSeq at GenBank.
  1. Table 1. Probability models trained on genes with annotated function. Predictions compared to the same set.
  2. Table 2. Probability models trained on genes with annotated function. Predictions compared to all annotated genes.
  3. Table 3. Probability models trained on the output of the long-orfs program. Predictions compared to genes with annotated function.
  4. Table 4. Probability models trained on the output of the long-orfs program. Predictions compared to all annotated genes.
  5. Table 5. Glimmer2.13 long-orfs output and Glimmer3 long-orfs output compared to all annotated genes.

Obtaining Glimmer

This software is OSI Certified Open Source Software .


Click here to download the complete Glimmer3 system . After downloading, uncompress the distribution file by typing:

% tar xzf glimmer302b.tar.gz

A directory named glimmer3.02 will be created, containing a file glim302notes.pdf with instructions on compiling and running the system.

References

For a description of Glimmer 1, 2, and 3 see our papers:

Acknowledgements

Glimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845.