Genome Assembly with Short Reads Tutorial
There are 2 AMOS pipelines specially designed to assemble short reads:
- AMOScmp-shortReads
- AMOScmp-shortReads-alignmentTrimmed
AMOScmp-shortReads
Uses the given read clear range and does no read trimming prior to the assembly.
Differences compared to AMOScmp:
- smaller nucmer alignment cluster size
- smaller make-consensus alignment wiggle value
AMOScmp-shortReads allows for more input parameters than AMOScmp.
Defaults:
MINCLUSTER = 20
MINMATCH = 20
MINOVL = 5
MAXTRIM = 10
MAJORITY = 50
CONSERR = 0.06
ALIGNWIGGLE = 2
AMOScmp-shortReads-alignmentTrimmed
Does a reference based alignment trimming of the reads prior to the assembly.
Differences compared to AMOScmp-shortReads:
- aligns the reads to reference using nucmer
- determines zero coverage regions
- extracts the read clear ranges from the alignment(delta) file
- extends the clear ranges for reads adjacent to zero coverage regions
- updates the bank with the new clear ranges
- updates the alignment(delta) file with the new read lengths and clear ranges
This pipeline calls several Perl scripts to process the delta file:
- delta2cvg
- delta2clr
- updateDeltaClr
AMOScmp-shortReads-alignmentTrimmed also allows for more input parameters than AMOScmp.
Defaults:
MINCLUSTER = 16
MINMATCH = 16
MINLEN = 24 # delta-filter -l 24
MINOVL = 5
MAXTRIM = 10
MAJORITY = 50
CONSERR = 0.06
ALIGNWIGGLE = 2
Assuming that prefix is the name of the organism to assemble, two files are required:
- prefix.1con : reference sequence: a related organism sequence in FASTA format (complete or well assembled, usually downloaded from GenBank)
- prefix.afg : AMOS message file that contains read/fragment messages corresponding to each short read; can be generated using the toAmos script
Examples:
$ toAmos -s prefix.seq -o prefix.afg # create an AMOS message file from short read FASTA sequences
$ toAmos -s prefix.seq -q prefix.qual -o prefix.afg # create an AMOS message file from short read FASTA sequences and qualities
$ AMOScmp-shortReads prefix # assemble reads (no trimming, default parameters)
$ AMOScmp-shortReads prefix -D MINCLUSTER=16 -D MINMATCH=16 # use a minimum alignment/cluster size of 16 bp
$ AMOScmp-shortReads prefix -D CONSERR=0.1 # use a consenus error of 0.1(10%)
$ AMOScmp-shortReads-alignmentTrimmed prefix # assemble reads (alignment based trimming, default parameters)
|