CA Source Code


Two versions of pre-compiled binaries are available (which correspond to those used in the publication). The first is compiled to support sequences up to 32,768bp and is only used for correction. The second is compiled to support sequences to up 16,384 and is only used for assembly.

  • Correction source (including CA and AMOS, pre-built for Linux-amd64 machines) is available here
  • The assembly code (including CA, pre-build for Linux-amd64 machine) is available here
  • An updated release (7.0) supporting both correction and assembly in one pipeline is available here


Datasets


Below you will find all the datasets used for testing the assembly/correction pipeline.

  1. Lambda phage
    • Sample data tarball includes simulated Illumina sequences and filtered PacBio RS sequences
    • PacBio RS sequencing alo available at the SRA
  2. Ecoli K12 (PacBio DevNet)
    • Filtered fastq sequences direct link: fastq.
    • Ecoli K12 Illumina SRA data.
  3. Ecoli C227-11 (PacBio DevNet)
    • Filtered fastq sequences direct link: fastq
    • Unfiltered run: SRA.
    • 100X filtered fastq fastq
    • CCS filtered fastq direct link: fastq
    • CCS unfiltered run: SRA.
    • Ecoli C227-11 Simulated Illumina frg for correction.
    • Ecoli C227-11 Simulated 50X Illumina 500bp pairs fastq
    • Ecoli C227-11 Simulated 100X Illumina 500bp pairs fastq
    • Ecoli C227-11 Simulated 50X 3Kbp pairs fastq
    • Ecoli C227-11 Simulated 50X 6Kbp pairs fastq
  4. Ecoli 17-2 (PacBio DevNet)
    • Filtered fastq sequences direct link: fastq
    • Unfiltered run: SRA.
    • Illumina fastq
  5. Ecoli JM221 (PacBio DevNet)
    • Filtered fastq sequences direct link: fastq
    • Unfiltered run: SRA
    • Roche FLX Titanium SFF.
  6. Yeast
    • PacBio RS and Illumina sequencing data.
    • Simulated perfect "Short" Pre-Release PacBio sequences (median: 761, max: 3,062) fastq
    • Simulated perfect "Medium" Initial C1 PacBio sequences (median: 1,062, max: 5,241) fastq
    • Simulated perfect "Long" Current C2 PacBio sequences (median: 1,580, max: 14,901) fastq
  7. Parrot
  8. Maize
    • Transcriptome data.


Corrected Sequences and Assembly


Below are the corrected PBcR sequences and assemblies (both hybrid and second-gen alone).

  1. Lambda phage
  2. Ecoli K12
  3. Ecoli C227-11
  4. Ecoli 17-2
  5. Ecoli JM221
  6. Yeast
  7. Parrot