Useful AMOS utilities
Below is a short description of several useful tools or modules
released with AMOS. Full documentation for some of these tools is
still pending, however you can obtain the basic info by running these
programs with option "-h" (help). If you wish to contribute
documentation to any of the tools let us know.
Bank operations
bank-transact - tool for loading an AMOS message file into a bank
bank-report - tool for extracting AMOS messages from a bank
select-reads - tool for
selecting subsets of reads from a bank. Allows both inclusive and
exclusive queries (i.e. fetch all reads not in a specified list)
Validation
asmQC - mate-pair based
validation tool. Reports clusters of problematic mate-pairs, i.e.
too short/long, mis-oriented, etc. Can also be used to recompute
library sizes. The problem areas are reported as features that
can be viewed by the Hawkeye viewer.
amosvalidate - AMOS pipeline
containing several quality control operations (including asmQC).
Running amosvalidate on a bank will populate the bank with
features indicating possible problem areas, such as incorrect
mate-pairs, high SNP density, etc. These features can be viewed
in Hawkeye.
cavalidate - like amosvalidate however it works off the output of Celera Assembler.
Multialignment operations
make-consensus - code to build a multiple alignment of a set of reads. As input it takes a layout
- i.e. a contig that specifies the approximate placement of the reads
with respect to each other. Make-consensus fills in the details
(gaps, exact position of the reads in the consensus, etc.) and computes
the consensus sequence for the contig.
recallConsensus - tool that
updates the consensus of a contig. Note that the contig must
already have been created by make-consensus. recallConsensus can
apply a slightly different algorithm for computing consensus calls
(e.g. by allowing ambiguity codes), or recompute the consensus after
some of the reads have been edited. RecallConsensus does not
recompute the placement of the reads. Use make-consensus if read
placement may need to change.
Layout tools
tigger - a unitigger based on
Gene Myers' original chunk graph assembly code. It takes as input
a set of overlaps between reads (stored in a bank) and outputs a set of
layouts. To compute the consensus you will need to use the
make-consensus program (see above).
casm-layout - a comparative
layout program. Takes as input a MUMmer .delta file and outputs a
set of layouts (which can be processed by make-consensus to generate
contigs). Casm-layout attempts to avoid mis-assemblies by
identifying differences between the genome being assembled and the
reference genome used to construct the layout. For more
information see AMOScmp.
Overlappers
hash-overlap - this is a basic
shotgun read overlapper, using mimizers (see reference below) to reduce
memory usage and increase performance.
Roberts
M, Hunt BR, Yorke JA, Bolanos RA, Delcher AL. (2004)A
preprocessor for shotgun assembly of large genomes.J
Comput Biol. 2004;11(4):734-52.
Viewers
Hawkeye - one-stop shop to your data visualization needs