TWAIN

What is TWAIN?

TWAIN is a new syntenic genefinder which employs a Generalized Pair Hidden Markov Model (GPHMM) to predict genes in two closely related eukaryotic genomes simultaneously.  It utilizes the MUMmer package to perform approximate alignment before applying a GPHMM based on an enhanced version of the TigrScan gene finder.  TWAIN was written by Bill Majoros and Mihaela Pertea while at The Institute for Genomic Research (TIGR). 

TWAIN is available for download as open source software.

Results on Aspergillus fumigatus vs. A. nidulans:


Nuc
Sens
Nuc
Spec
Nuc
Accur
Splice
Sens
Splice
Spec
Start/
Stop
Sens
Start/
Stop
Spec
Exon
Sens
Exon
Spec
Exact
Genes
Genscan (no homology)
96%
99%
95%
66%
69%
57%
58%
50%
52%
22%
TigrScan (no homology)
99% 100% 99% 89% 81% 81% 80% 78% 73% 54%
TWAIN (using homology)
99% 100% 99% 94% 88% 92% 92% 89% 85% 74%

(N=147 gene pairs)

Example output:

page1.gif page2.gif page3.gif page4.gif page5.gif
page6.gif page7.gif page8.gif page9.gif page10.gif
page11.gif page12.gif page13.gif page14.gif page15.gif
page16.gif page17.gif page18.gif page19.gif page20.gif
page21.gif page22.gif page23.gif page24.gif page25.gif
page26.gif page27.gif page28.gif page29.gif page30.gif
page31.gif page32.gif page33.gif page34.gif page35.gif
page36.gif page37.gif



Detailed Description

TWAIN consists of two components: (1) ROSE, the Region Of Synteny Extractor, which identifies contiguous regions likely to contain one or more syntenic genes, and (2) OASIS, a generalized pair hidden Markov model (GPHMM) for predicting genes in the regions identified by ROSE.  The system utilizes approximate alignments constructed by the PROmer and NUCmer programs in the MUMmer package to assess approximate alignment scores efficiently.  More detailed information on the architecture of this system will be made available soon.
Slides from a talk at Computational Genomics 2004 are now available.

Download

  • MUMmer - contains the PROmer and NUCmer programs used by ROSE
  • ROSE - Region-Of-Synteny Extractor, a preprocessor for the GPHMM
  • OASIS - implements the GPHMM
  • alignment - alignment package used by OASIS to construct approximate alignments from MUMmer HSPs
  • TigrScan - used by OASIS to build a parse graph for each genome
  • Documentation on using TWAIN
  • Slides from the talk at Computational Genomics 2004
  • Information on ROSE
  • Sample files for a pair of fungal genomes:
    • A. fumigatus models (*.cfg, *.model, *.iso, *.top, *.distr, *.trans, train.sh)
    • A. nidulans models (*.cfg, *.model, *.iso, *.top, *.distr, *.trans, train.sh)
    • training data
    • fumigatus x nidulans syntenic pairs

References

Majoros W.H., Pertea M., Salzberg S.L. (2005) Efficient implementation of a Generalized Pair Hidden Markov Model for comparative gene finding. Bioinformatics 21 1782-1788.


Aknowledgements

Development of TWAIN is supported by NIH under grant R01-LM06845.


back to: CBCB | PIRATE  | genefinding.org