Genome Assembly with Short Reads Tutorial

There are 2 AMOS pipelines specially designed to assemble short reads:
  • AMOScmp-shortReads
  • AMOScmp-shortReads-alignmentTrimmed

AMOScmp-shortReads

Uses the given read clear range and does no read trimming prior to the assembly.
Differences compared to AMOScmp:
  • smaller nucmer alignment cluster size
  • smaller make-consensus alignment wiggle value
AMOScmp-shortReads allows for more input parameters than AMOScmp.
Defaults:
   MINCLUSTER  = 20
   MINMATCH    = 20
   MINOVL      = 5
   MAXTRIM     = 10
   MAJORITY    = 50
   CONSERR     = 0.06
   ALIGNWIGGLE = 2

AMOScmp-shortReads-alignmentTrimmed

Does a reference based alignment trimming of the reads prior to the assembly.
Differences compared to AMOScmp-shortReads:
  • aligns the reads to reference using nucmer
  • determines zero coverage regions
  • extracts the read clear ranges from the alignment(delta) file
  • extends the clear ranges for reads adjacent to zero coverage regions
  • updates the bank with the new clear ranges
  • updates the alignment(delta) file with the new read lengths and clear ranges
This pipeline calls several Perl scripts to process the delta file:
  • delta2cvg
  • delta2clr
  • updateDeltaClr
AMOScmp-shortReads-alignmentTrimmed also allows for more input parameters than AMOScmp.
Defaults:
   MINCLUSTER  = 16
   MINMATCH    = 16
   MINLEN      = 24  # delta-filter -l 24
   MINOVL      = 5
   MAXTRIM     = 10
   MAJORITY    = 50
   CONSERR     = 0.06
   ALIGNWIGGLE = 2
Assuming that prefix is the name of the organism to assemble, two files are required:
  • prefix.1con : reference sequence: a related organism sequence in FASTA format (complete or well assembled, usually downloaded from GenBank)
  • prefix.afg : AMOS message file that contains read/fragment messages corresponding to each short read; can be generated using the toAmos script

Examples:
   $ toAmos -s prefix.seq -o prefix.afg                           # create an AMOS message file from short read FASTA sequences
   $ toAmos -s prefix.seq -q prefix.qual -o prefix.afg            # create an AMOS message file from short read FASTA sequences and qualities

   $ AMOScmp-shortReads prefix                                    # assemble reads (no trimming, default parameters)
   $ AMOScmp-shortReads prefix -D MINCLUSTER=16 -D MINMATCH=16    # use a minimum alignment/cluster size of 16 bp
   $ AMOScmp-shortReads prefix -D CONSERR=0.1                     # use a consenus error of 0.1(10%)
   $ AMOScmp-shortReads-alignmentTrimmed prefix                   # assemble reads (alignment based trimming, default parameters)