Computational Gene Finding (all programs are open source)
|
|
|
|
a system that uses interpolated Markov models to find genes
in microbial DNA. Used to annotate hundreds (possibly thousands) of bacterial, archaeal, and viral genomes. Current version is 3.02. |
| TWAIN |
|
a Generalized Pair HMM to predict genes simultaneously in two
closely related eukaryotic organisms. |
GlimmerHMM
|
 |
a Generalized Hidden Markov Model gene-finder which makes
use of the
techniques implemented previously by GlimmerM. |
GeneZilla
|

|
a generalized HMM for eukaryotic gene finding,
with a design similar to Genscan. Written and maintained by Bill
Majoros, now at Duke University.
|
ExAlt
|
 |
a Phylogenetic Generalized Hidden Markov Model for finding
alternatively spliced exons. |
JIGSAW
|
 |
(previously called Combiner),a program that predicts gene
models using the output from other annotation software. It uses a
statistical algorithm to identify patterns of evidence corresponding to
gene models. |
GeneSplicer
|
 |
a fast system for detecting splice sites in genomic DNA of
various eukaryotes. |
PIRATE
|

|
a website collecting many links to our gene finders and
others. |
| SIM4CC |
|
An efficient program to align cDNA sequences (or ESTs)to genomic sequences, specifically designed for cross-species alignment. |
Genome assembly and large-scale genome alignment (all
programs are open source)
|
|
|

|
a system for aligning whole genomes, chromosomes, and other
very long DNA sequences. New (May 2008): see how to
use MUMmer to align
Solexa
reads to the human genome. |
|
|
|
High throughput sequence alignment using Graphics Processing
Units (GPUs). Uses a technique called general-purpose GPU programming
(GPGPU programming) to harness the extreme parallelism of GPUs for
non-graphics tasks. In this application, hundreds of query sequences
are simultaneously aligned to a reference sequence, creating an order
of magnitude speed up over the same alignment on the CPU. |
| AMOS Assembler
project |
 |
The is a set of tools, libraries, and freestanding genome
assemblers, all open source. AMOS is also an open consortium that
includes TIGR, the University of Maryland, The Karolinska Institutet,
and the Marine Biological Laboratory. |
ABBA
|
 |
|
AMOScmp
|
 |
is a comparative genome assembler, which uses
one genome as a reference on which to assemble another, closely related
species. See the journal paper
here.
|
MINIMUS
|
 |
A small, lightweight assembler for small jobs such as
assembling a
viral genome, assembling a set of reads that match a single gene, or
other tasks that don't require the complex infrastructure of a
large-genome assembler. |
Bowtie
|
|
(New in August 2008)
An ultrafast, memory-efficient short read aligner that aligns short DNA
sequences to the human genome at a rate of about 25 million reads per
hour on a typical workstation with 2 GB of memory. Bowtie indexes the
genome with a Burrows-Wheeler index to keep its memory footprint small:
1.1 GB for the human genome. |
TopHat
|
|
(New in February 2009)
A short read aligner for RNA-Seq experiments. TopHat discovers novel exon-exon
splice junctions and can align millions of RNA-Seq reads to a mammalian genome per hour. |
Cufflinks
|
|
(New in September 2009)
A transcript assembler and abundance estimator for RNA-Seq |
BAMBUS
|
 |
the first publicly available, standalone
genome sequence scaffolding program. It orders and orients contigs into
scaffolds based on various types of linking information.
|
CloudBurst
|
|
(New in Nov 2008)
Highly Sensitive Short Read mapping with MapReduce. CloudBurst uses Hadoop - an open source version of
Google's parallel computing software MapReduce - to
efficiently parallelize the short read mapping problem to dozens or
hundreds of computers. This enables CloudBurst to execute highly
sensitive read mappings with any number of mutations or indels. |
Hawkeye
|
 |
A visual analytics tool for genome assembly
analysis and validation, designed to aid in identifying and correcting
assembly errors. All levels of the assembly data hierarchy are made
accessible to users, along with summary statistics and common assembly
metrics. A ranking component guides investigation towards likely
mis-assemblies or interesting features to support the task at hand. Can
be used to interactively analyze assemblies from many popular
assemblers on your desktop computer. See the journal paper
here.
|
| AutoEditor
|

|
A tool for correcting sequencing and basecaller errors using
sequence assembly and chromatogram data. On average AutoEditor corrects
80% of erroneous base calls, with an accuracy of 99.99%. |
Figaro
|
 |
A vector trimmer capable of accurately trimming
vector from shotgun reads without prior knowledge of the vector
sequence. Figaro statistically models short oligo-nucleotide
frequencies in order to infer which oligos are associated vector
sequence.
|
Celera Assembler
|
|
A whole genome assembler
originally developed at Celera Genomics for the assembly of the human
genome. CeleraAssembler is now an open-source project at
SourceForge. The code is actively maintained by researchers at
CBCB and the
Venter Institute (formerly
known as TIGR, The Institute for Genomic Research).
|
Other sequence analysis tools (all programs are open source)
|
| ELPH |
 |
a motif finder that can find ribosome binding sites, exon
splicing enhancers, or regulatory sites. |
|
|
|
a program that finds rho-independent transcription
terminators in bacterial genomes. |
|
|
|
Software and a database of operons covering a large
number of prokaryotic genomes. Described in M.
Pertea et al., Nucl. Acids Res 37 (2009), D479-D482.
|
GeneMerge
|
|
a
program for analysis of microarray data including rank scores for
over-representation of particular functions and categories
|
| SEE ESE |
|
an online tool for identifying exon splicing enhancers
(ESEs) in Arabidopsis and Drosophila. |
| RepeatFinder |
|
an
older system for finding and characterizing repetitive sequences in
complete and partial genomes. |
| Metastats |
|
Statistical
methods for detecting differentially abundant features in metagenomic
data. |
| Phymm+PhymmBL |
|
A one-stop system for taxonomically classifying metagenomic short reads. |
| A collection of
links to external
sequence analysis programs. |