 |
GeneSplicer : A computational
method for splice site prediction
|
Overview
A fast, flexible system for detecting
splice sites in the genomic DNA of various eukaryotes. The system has
been trained and tested successfully on Plasmodium falciparum
(malaria), Arabidopsis thaliana, human, Drosophila, and
rice . Training data sets for human and Arabidopsis thaliana
are included. Use the GeneSplicer Web
Interface
to run GeneSplicer directly, or see below for instructions on
downloading the complete system including source code .
System requirements
GeneSplicer is released as source code and was tested on Linux
RedHat 6.x+, Sun Solaris, and Alpha OSF1, but should work on any Unix
system.
Obtaining GeneSplicer
This software is OSI Certified
Open Source Software .
To download the complete GeneSplicer
system, just click
here .
After downloading, uncompress the distribution file by typing:
% tar -xzf GeneSplicer.tar.gz
A directory named 'GeneSplicer/' will be created which
contains the executable, training data sets, and other supporting files.
Training data sets are included in the tar
file.
Training GeneSplicer
There is no independent program to train GeneSplicer, but there is a
way to obtain the necessary files by using the training procedure of
GlimmerHMM, which can be downloaded from here.
After running trainGlimmerHMM, create a directory with the
following files from the resulted GlimmerHMM training directory:
- acc*
- don*
- score_*
- outex
- outin
In the same directory, create a file called config_file with the following
info on a line, in this order:
- a "high-confidence" threshold for acceptors
- a "high-confidence" threshold for donors
- a threshold for acceptors
- a threshold for donors
- 1 (if there are files like acc<number> among
the training files, if there is only acc1.mar than this line should be
0)
- 1 (if there are files like don<number>
among the training files, if there is only don1.mar than this line
should be 0)
- a number representing the distance for
filtering neighbouring acceptors (usually 60)
- a number representing the distance for
filtering neighbouring donors (usually 60)
Ideas for thresholds can be taken from the
files false.acc and false.don for acceptors and
donors respectively. These files are created in the initial GlimmerHMM
training directory. Please consult the existing config_file's
distributed with GeneSplicer to see concrete examples.
Contact Information
Use this
form to contact
us.
References
M. Pertea , X. Lin , S. L. Salzberg
. GeneSplicer : a new
computational method for splice site prediction . Nucleic Acids
Res . 2001 Mar 1;29(5):1185-90 .
Acknowledgements
The development of GeneSplicer was supported by the NSF under grant KDI-9980088 and
DBI-0234704 and NIH grant
R01-LM06845 .
Back to the CBCB Software Page
|