Results of Running a Plasmodium-Trained Gene-Finder on Tetrahymena

bmajoros
10/13/2004

Accuracy results on plasmodium; 184 genes (training & testing):


notes
nuc
sens
%
nuc
spec
%
nuc
acc
%
gt/ag
sens
%
gt/ag
spec
%
atg/
tag
sens
atg/
tag
spec
exon
sens
%
exon
spec
%
exact
genes
%
basic parameterization
95
100
95
64
55
60
59
53
48
40
small mean intron size & aggressive exon optimism
97
100
97
84
75
77
63
73
62
57


Results on Tetrahymena:

(basic parameterization)


plasmodium tetrahymena
transcript length 2351.5 ±2490.6
(60-15906)
166.5 ±105.8
(21-1188)
transcript extent 3573.2 ±3315.9
(66-21790)
176 ±130.5
(21-1354)
exons/transcript 2.9 ±2.4
(1-17)
1.1 ±0.3
(1-4)
log(P[coding]/
          P[noncoding])
394.5 ±375.4
 (23.1-3461)
43.5 ±23.7
(13.7-288.7)
GC
21% 22%
AT
79%
78%
A
41%
39%
T
39%
39%
C
10%
11%
G
11%
11%


On roughly 1.5 Mb of sequence:


plasmodium
tetrahymena
donor scores
-75.3 +/- 6.2
-76.4 +/- 5.3
acceptor scores
-54.3 +/-5.0
-54.5 +/- 4.5
start codon scores
-37.8 +/- 3.9
-38.4 +/- 3.9
stop codon scores
-60.9 +/- 7.6
-64.0 +/- 6.0
density of putative
donors
0.046
0.043
density of putative
acceptors
0.043
0.068
density of putative
start codons
0.041
0.028
density of putative
stop codons
0.131
0.030
scores* of putative
single-exons
-225
-1285
scores* of putative
initial-exons
-130
-806
scores* of putative
internal-exons
-109
-796
scores* of putative
final-exons
-153
-1135
lengths of putative
single-exons
2278
N=883,748
567
N=147,038
lengths of putative
initial-exons
2019
N=767,321
614
N=445,441
lengths of putative
internal-exons
2322
N=1,503,203
721
N=1,659,927
lengths of putative
final-exons
2506
N=3,465,709
758
N=818,248
total sequence length
1,501,292 bp
1,523,430 bp

* mean length-normalized log of Markov probability