|
ExAlt: a phylogenetic
generalized hidden Markov model for predicting alternatively spliced
exons
|
|
ExAlt is a software program designed to predict alternatively spliced
overlapping exons in genomic sequence. The program works in several
ways depending on the available input. ExAlt can use information about
existing gene structure as well as sequence conservation to improve the
precision of its predictions. ExAlt can also make predictions when only
a single genomic sequence is available.
ExAlt has been extensively tested on Drosophila melanogaster,
but can be adapted to run on other species.
Prediction
Program |
Exon
sensitivity |
Exon
precision |
Splice Site
sensitivity |
Splice Site
precision |
Nucleotide
sensitivity |
Nucleotide
precision |
ExAlt |
84% |
94% |
90% |
96% |
99.6% |
99.6% |
ExAlt ab initio |
87% |
84% |
90% |
92% |
99.5% |
99.5% |
Genscan |
43% |
62% |
63% |
87% |
92% |
97% |
Table 1
Results in Table 1 measure the accuracy of ExAlt on 538 exons in Drosophila
melanogaster using FlyBase annotations. Two versions of ExAlt are
shown, the phylogenetic generalized hidden Markov model version (ExAlt)
and the generalized hidden Markov model version (ExAlt ab initio).
The typical input to ExAlt is a known (or predicted) gene structure,
which should be checked for alternative splicing.
The core program takes as input a multiple sequence alignment and a
phylogenetic tree and returns a GFF file containing the sequence
coordinates of exon predictions. Wrapper scripts are provided to take a
Drosophila melanogaster gene (using the CG identifier) and
iterate through each exon, using blastn
to
find matches in closely releated species and muscle to generate multiple sequence
alignments for input to ExAlt.
System requirements
ExAlt is developed in C++ and compiles on Linux using gcc 3.2. (The
software should compile on many other platforms as well.)
Download the most recent ExAlt system here HERE.
A package of Drosophila data is provided here as a
conveniance for running ExAlt on the Drosophila genome. Run ExAlt on
Drosophila data NOW!
Check HERE to
download pre-compiled binaries.
Download source code for the cfasta utility HERE.
Drosophila data used to train and test ExAlt can be
downloaded HERE (also included in
source code distribution).
The genome data is made publicly available through FlyBase and through the UCSC genome browser).
The ExAlt software is OSI Certified
Open Source Software.
The distribution includes documentation on how to get started. Check
back for additional documentation coming online.
References
J. E. Allen and S. L. Salzberg.
A phylogenetic generalized hidden Markov model for predicting
alternatively spliced exons. Algorithms for Molecular Biology,
1:14, 2006.
Development of ExAlt was supported in part by the NIH grant RO1-LM06845.
jeallen - umiacs umd edu
Back to the CBCB Software Page
|
| |