ExAlt: a phylogenetic generalized hidden Markov model for predicting alternatively spliced exons

 

Overview

ExAlt is a software program designed to predict alternatively spliced overlapping exons in genomic sequence. The program works in several ways depending on the available input. ExAlt can use information about existing gene structure as well as sequence conservation to improve the precision of its predictions. ExAlt can also make predictions when only a single genomic sequence is available. ExAlt has been extensively tested on Drosophila melanogaster, but can be adapted to run on other species.

Accuracy

Prediction
Program
Exon
sensitivity
Exon
precision
Splice Site
sensitivity
Splice Site
precision
Nucleotide
sensitivity
Nucleotide
precision
ExAlt 84% 94% 90% 96% 99.6% 99.6%
ExAlt ab initio 87% 84% 90% 92% 99.5% 99.5%
Genscan 43% 62% 63% 87% 92% 97%
Table 1
Results in Table 1 measure the accuracy of ExAlt on 538 exons in Drosophila melanogaster using FlyBase annotations. Two versions of ExAlt are shown, the phylogenetic generalized hidden Markov model version (ExAlt) and the generalized hidden Markov model version (ExAlt ab initio).

Using ExAlt

The typical input to ExAlt is a known (or predicted) gene structure, which should be checked for alternative splicing. The core program takes as input a multiple sequence alignment and a phylogenetic tree and returns a GFF file containing the sequence coordinates of exon predictions. Wrapper scripts are provided to take a Drosophila melanogaster gene (using the CG identifier) and iterate through each exon, using blastn to find matches in closely releated species and muscle to generate multiple sequence alignments for input to ExAlt.

System requirements

ExAlt is developed in C++ and compiles on Linux using gcc 3.2. (The software should compile on many other platforms as well.)

Download

  • Download the most recent ExAlt system here HERE.
  • A package of Drosophila data is provided here as a conveniance for running ExAlt on the Drosophila genome. Run ExAlt on Drosophila data NOW!
  • Check HERE to download pre-compiled binaries.
  • Download source code for the cfasta utility HERE.
  • Drosophila data used to train and test ExAlt can be downloaded HERE (also included in source code distribution).
    The genome data is made publicly available through FlyBase and through the UCSC genome browser).

    The ExAlt software is OSI Certified Open Source Software.


    Documentation

    The distribution includes documentation on how to get started. Check back for additional documentation coming online.

    References

    J. E. Allen and S. L. Salzberg. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. Algorithms for Molecular Biology, 1:14, 2006.

    Acknowledgements

    Development of ExAlt was supported in part by the NIH grant RO1-LM06845.

    Contact Information

    jeallen - umiacs umd edu

    Back to the CBCB Software Page