Glimmer-MG is a system for finding genes in environmental shotgun DNA
sequences. Glimmer-MG (Gene Locator and Interpolated Markov ModelER -
MetaGenomics) uses interpolated Markov models (IMMs) to identify the
coding regions and distinguish them from noncoding DNA. The IMM
approach, described in
Nucleic Acids Research paper on Glimmer 1.0 and in
paper on Glimmer 2.0 , uses a combination of Markov models from
1st through 8th-order, weighting each model according to its
predictive power. Glimmer uses 3-periodic nonhomogenous Markov models
in its IMMs.
Glimmer-MG addresses the challenges of metagenomics gene
prediction. Prediction model training is the main reason Glimmer3
cannot be applied to metagenomics sequences. Rather than rely on GC%
to find evolutionary relative genomes for training, Glimmer-MG instead
finds phylogenetic classifications
using Phymm and
parameterizes gene prediction models using those
classifications. Glimmer-MG also clusters the sequences
which groups together sequences that are likely from the same
organism. Analogous to iterative schemes that are useful for whole
genomes, Glimmer-MG retrains prediction models within each cluster on
the initial gene predictions before making a final set of
predictions. To account for fragmented genes, Glimmer-MG incorporates
a model for gene length, in which partial genes are carefully
handled. Finally, Glimmer-MG can predict insertions and deletions in
the sequence by branching into a different frame at low quality base
calls such as homopolymer runs in 454 sequences.
Send questions and help requests to David Kelley - dkelley [at] fas [dot] harvard [dot] edu.
Mar. 16, 2014 - Release 0.3.2
Nov. 15, 2012 - Release 0.3.1
Phymm installation fix.
May 23, 2012 - Release 0.3
Minor bug fixes and a bug fix that kept g3-iterated.py from working properly.
Jan. 4, 2012 - Release 0.2
Fixed a problematic bug for retraining and some other smaller issues with installation and for very small clusters.
Jun. 1, 2011 - Release 0.1
Download Glimmer-MG v0.3.2
This software is OSI Certified Open Source Software.
sim_data.tgz (2.1 Gb)
- S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26:2 (1998), 544-548.
- A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27:23 (1999), 4636-4641.
- A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:6 (2007), 673-679.
- A. Brady and S.L. Salzberg. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods 6 (2009), 673-676.
- D.R. Kelley and S.L. Salzberg. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics 11:544 (2010).
- D.R. Kelley, B. Liu, A. Delcher, M. Pop, S.L. Salzberg. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Research 40:1 e9 (2012).
Glimmer is currently supported by the National
Library of Medicine at NIH under grant R01-LM007938. It was
previously supported by the National
Science Foundation under grants IRI-9530462 and IIS-9902923, and by
the National Institutes of Health
under grant R01-LM06845.