|
ExAlt: a phylogenetic generalized hidden Markov model for predicting alternatively spliced exons
|
|
ExAlt is a software program designed to predict alternatively spliced overlapping exons in genomic sequence.
The program works in several ways depending on the available input. ExAlt can use information
of existing gene structure as well as sequence conservation to improve the precision of it's predictions. ExAlt can also make
predictions when only a single
genomic sequence is available.
ExAlt has been extensively tested on Drosophila melanogaster, but can be adapted to run on other species.
Prediction Program |
Exon sensitivity |
Exon specificity |
Splice Site sensitivity |
Splice Site specificity |
Nucleotide sensitivity |
Nucleotide specificity |
ExAlt |
84% |
94% |
90% |
96% |
99.6% |
99.6% |
ExAlt ab initio |
87% |
84% |
90% |
92% |
99.5% |
99.5% |
Genscan |
43% |
62% |
63% |
87% |
92% |
97% |
Table 1
Results in Table 1 measure the accuracy of ExAlt on 538 exons in Drosophila melanogaster using FlyBase annotations. Two versions of ExAlt are shown, the phylogenetic generalized hidden Markov model version (ExAlt) and the generalized hidden Markov model version (ExAlt ab initio).
The typical input to ExAlt is a known (or predicted) gene structure, which should be checked for alternative splicing.
The core program takes as input a multiple sequence alignment and a phylogenetic tree and returns a GFF file containing the sequence coordinates of
exon predictions. Wrapper scripts are provided to take a Drosophila melanogaster gene (using the CG identifier) and iterate through each exon, using blastn to
find matches in closely releated species and muscle to generate multiple sequence alignments for input to ExAlt.
System requirements
ExAlt is developed in C++ and compiles on Linux using gcc 3.2.
(The software should compile on many other platforms as well.)
Download the most recent ExAlt system here HERE.
Run ExAlt on Drosophila data NOW!
Check HERE to download pre-compiled binaries.
Download source code for the cfasta utility HERE.
A package of Drosophila data is provided here as a conveniance for running ExAlt on the Drosophila genome. The data is made publicly available through
FlyBase and through the UCSC genome browser).
Drosophila data used to train and test ExAlt can be downloaded HERE (also included in source code distribution).
The ExAlt software is OSI
Certified Open Source Software.
The distribution includes documentation on how to get started. Check back for additional documentation coming online.
References
J. E. Allen and S. L. Salzberg.
A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons.
Algorithms for Molecular Biology, 1:14, 2006.
Development of ExAlt was supported in part by the NIH grant RO1-LM06845.
jeallen - umiacs umd edu
Back to the CBCB Software Page
|