Finishing Toolbox
The process of finishing a genome and moving it from
a draft stage (the result of sequencing and initial assembly)
to a complete genome is typically a time and resource intensive task.
The advent of new sequencing technologies has come with its own set of
opportunities and pitfalls in the finishing process. While genomes can
now be sequenced to high redundancy in a cost-effective manner, the
process of assembling the genomes is more challenging and often draft
genomes are fragmented into hundreds of contigs. Correspondingly, the
task of producing the complete genome can involve months of lab work
and thousands of finishing experiments and is usually done in large
genome centers.
The work in our lab has focussed on computational approaches to
speed-up the finishing process. Specifically, we have explored the use
of optical mapping and mate-pair data to augment assemblies and direct
finishing experiments. The tools developed in our lab have been used
in several finishing projects, producing complete genomes (and
near-complete ones) with surprisingly little computational and
experimental effort (Nagarajan et al., in submission). The executables
(as well as source code) for these tools are freely available here:
- Scaffolding using Optical Restriction Mapping
Optical Maps are global, ordered maps of restriction site
locations in a genome. This information can be quite useful
in scaffolding contigs from a shotgun assembly to guide the
finishing process. A set of programs to exploit optical maps
for assembly can be found
here: SOMA
v2.0 (63 MB tar.gz file). This version of SOMA contains
several improvements to programs in v1.0 as well as new
scripts for working with multiple maps, contig graphs and
scaffolds.
- Augmenting assemblies with mate-pair data
Mate-pair information can be valuable in augmenting
short-read assemblies and reconstructing the genome as
larger scaffolds. AMOS-Hybrid is a pipeline written in the
AMOS framework (open-source assembly tools) to merge
arbitrary mated reads into an existing assembly and merge
contigs and create scaffolds where possible. Source code and
executables for AMOS-Hybrid are available
here: AMOS-Hybrid
v1.0 (142 MB tar.gz file).
- Assembly and sequence-composition guided
finishing
Contigs from a shotgun assembly are
typically linked together in a graph structure that can
serve to guide finishing and in some case close
gaps in-silico. Also, in many cases, sequence
composition of contigs can provide clues to fill gaps in
scaffolds. A set of scripts to automate some of these tasks
can be found
here: Finishing
Scripts v1.0 (63 MB tar.gz file).
This work is supported by NSF grant IIS-0812111 and DoD grant IB06RSQ002. For questions and comments contact: niranjan at umiacs.umd.edu
|
|