A major contribution of researchers in the CBCB are open-source software packages made freely available to the scientific community. The software described below are actively being developed and maintained. For software that is no longer being maintained in the Center (many of the packages are currently maintained by our alumni) please see the Inactive Software page.
Select one or more categories

The is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.

is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species. See the journal paper here.

(New in early 2009) Antibiotic Resistance Genes Database

An ultrafast, memory-efficient short read aligner that aligns short DNA sequences to the human genome at a rate of about 25 million reads per hour on a typical workstation with 2 GB of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.1 GB for the human genome.

Steven Salzberg has been nominated for the 2013 Benjamin Franklin Award in the Life Sciences. This is a humanitarian/bioethics award presented to an individual who has, in his or her practice, promoted free and open access to the materials and methods used in the life sciences. More information on the award can be found at

A whole genome assembler originally developed at Celera Genomics for the assembly of the human genome. CeleraAssembler is now an open-source project at SourceForge. The code is actively maintained by researchers at CBCB and the Venter Institute (formerly known as TIGR, The Institute for Genomic Research).

(New in July 2010) DNACLUST is a tool for clustering millions of short DNA sequences. DNACLUST is free software.

A comprehensive system for finding unique DNA sequences that can be used to identify any bacterial or virus species or strain. Currently has over 13,000 species and strains in its database..

A fast, multithreaded k-mer counter.

R package to estimate differential abundance of marker gene survey data and visualize results.

(New in 2010) Taxonomic Profiling for Metagenomic Sequences.

A correction pipeline to enable the use of the long-read sequences (such as those produced by the PacBio RS instrument) for assembly or other analysis.

Scaffolding using Optical Restriction Mapping

(New in 2012) Spanki is a toolkit for analysis of alternative splicing from RNA-SEQ data.

(New in February 2009) A short read aligner for RNA-Seq experiments. TopHat discovers novel exon-exon splice junctions and can align millions of RNA-Seq reads to a mammalian genome per hour.