Computational Gene Finding


a system that uses interpolated Markov models to find genes in microbial DNA. March 2003: New release, version 2.1, automatically optimizes ORF length for training.
TWAIN   a Generalized Pair HMM to predict genes simultaneously in two closely related eukaryotic organisms.
GlimmerHMM
a Generalized Hidden Markov Model gene-finder which makes use of the techniques implemented previously by GlimmerM
GeneZilla

a generalized HMM for eukaryotic gene finding, with a design similar to Genscan.  Written and maintained by Bill Majoros, now at Duke University.
ExAlt
a Phylogenetic Generalized Hidden Markov Model for finding alternatively spliced exons.
JIGSAW
(previously called Combiner),a program that predicts gene models using the output from other annotation software. It uses a statistical algorithm to identify patterns of evidence corresponding to gene models.
GeneSplicer
a fast system for detecting splice sites in genomic DNA of various eukaryotes.
PIRATE


a website collecting many links to our gene finders and others.

Genome assembly and large-scale genome alignment

Mummer
a system for aligning whole genomes, chromosomes, and other very long DNA sequences. Since April 2003: MUMmer 3.0 and later releases are open source.
High throughput sequence alignment using Graphics Processing Units (GPUs). Uses a technique called general-purpose GPU programming (GPGPU programming) to harness the extreme parallelism of GPUs for non-graphics tasks. In this application, hundreds of query sequences are simultaneously aligned to a reference sequence, creating an order of magnitude speed up over the same alignment on the CPU.
AMOS Assembler project  AMOS The is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.
AMOScmp
 AMOS
is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species.  See the journal paper here.
MINIMUS
AMOS (new in August 2004) is a small, lightweight assembler for small jobs such as assembling a viral genome, assembling a set of reads that match a single gene, or other tasks that don't require the complex infrastructure of a large-genome assembler.
BAMBUS
 bambus
the first publicly available, standalone genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information.
Hawkeye
hawkeye
A visual analytics tool for genome assembly analysis and validation, designed to aid in identifying and correcting assembly errors. All levels of the assembly data hierarchy are made accessible to users, along with summary statistics and common assembly metrics. A ranking component guides investigation towards likely mis-assemblies or interesting features to support the task at hand. Can be used to interactively analyze assemblies from many popular assemblers on your desktop computer. See the journal paper here.
AutoEditor
AutoEditor
a tool for correcting sequencing and basecaller errors using sequence assembly and chromatogram data. On average AutoEditor corrects 80% of erroneous base calls, with an accuracy of 99.99%.
Figaro
A vector trimmer capable of accurately trimming vector from shotgun reads without prior knowledge of the vector sequence. Figaro statistically models short oligo-nucleotide frequencies in order to infer which oligos are associated vector sequence.
Celera Assembler

whole genome assembler originally developed at Celera Genomics for the assembly of the human genome.  Currently CeleraAssembler is an open-source project at SourceForge.  The code is actively maintained by researchers at the Venter Institute, the CBCB, and TIGR.

Other sequence analysis tools

ELPH a motif finder that can find ribosome binding sites, exon splicing enhancers, or regulatory sites.
RepeatFinder RepeatFinder, software for finding and characterizing repetitive sequences in complete and partial genomes.
SEE ESE an online tool for identifying exon splicing enhancers (ESEs) in Arabidopsis and Drosophila.
  a program that finds rho-independent transcription terminators in bacterial genomes.
OperonDB   results from applying our operon-finding software to a large number of prokaryotic genomes.  (Described in Ermolaeva et al., Prediction of operons in microbial genomes, listed above.)
CRAB

Conserved Regions in Archaea and Bacteria, a database of conserved intergenic sites likely to regulate transcription of nearby genes
Skewed oligomers   from bacterial and archaeal genomes (from the paper inGene, above).  Get the source code or Linux executable here. Tables of skewed oligomers for: A. fulgidis, B. burgdorferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. jannaschii, M. pneumoniae, M. thermoautotrophicum, Synechocystis sp. PCC 6803, T. maritima, T. pallidum
A collection of links to external sequence analysis programs.