CA From Source Code


A version of source code is available (which correspond to that used in the publication). The code is compiled to support up to 64,786bp sequences for correction and assembly.

  • Publication source (including CA, AMOS, SMRTportal, and utility scripts pre-built for Linux-amd64 machines) is available here
  • List of commands used for correction and assembly is available here as well as in the distribution above.
  • The latest code with updates and bug fixes can be built from the respository following the instructions here.
  • For the latest usage information and announcements, please visit the PBcR wiki

Simulation Results


  • An interactive KRONA chart of simulated assemblies given C1, C2, XL-C2, XL-XL, and ZL sequencing is available here.
  • A list of repeat counts and maximum repeat size for all genomes is available here.
  • The simulation results for # gaps remaining for a given coverage and chemistry type is available here

Datasets


Below you will find all the datasets used for testing the assembly/correction pipeline.
      E. coli K12 PacBio RS sequences were generously provided by Pacific Biosciences.
      Sequencing of E. coli O157, B. tre, M. haemolytica, and S. enterica were perfomed by the USDA

  1. E. coli K12 MG1655
  2. Ecoli O157:H7
    • Raw sequences available at the SRA
    • 200X Filtered fastq sequences direct link: fastq
    • MiSeq 100X tar.gz
    • 454 40X tar.gz
  3. B. tre
    • Raw sequences available at the SRA
    • 200X Filtered fastq sequences direct link: fastq
    • MiSeq 100X tar.gz
    • 454 50X frg
    • CCS 25X frg
  4. M. hist
    • Raw sequences available at the SRA
    • 200X Filtered fastq sequences direct link: fastq
    • MiSeq 100X tar.gz
    • CCS 25X frg
  5. F. tularensis
    • Raw sequences available at the SRA
    • All Filtered fastq sequences direct link: fastq
    • 200X Filtered fastq sequences direct link: fastq
    • MiSeq 100X tar.gz
    • 454 50X frg
  6. S. enterica
    • Raw sequences available at the SRA
    • 200X Filtered fastq sequences direct link: fastq
    • MiSeq 56X tar.gz
    • 454 25X tar.gz
    • CCS 22X frg

Second-gen Assemblies


Below are the 454 and Illumina assemblies. 454 assemblies were generated by Newbler v2.8. Illumina assembles were generated by SPAdes v2.5.0 and MaSuRCA v1.9.5 and consensus-polished with iCORN.

  1. Validation statistics on all generated assemblies in the paper is available here
  2. E. coli K12
  3. Ecoli O157
  4. B. tre
  5. M. haemolytica
  6. F. tularensis
  7. S. enterica

Corrected Sequences and Assembly


Below are the PBcR sequences and assemblies (both hybrid and second-gen alone). CA has some randomized code. Therefore, to reproduce the exact results in the paper, you must start with the corrected fastq sequences and assemble them rather than re-running correction and assembly.

  1. Validation statistics on all generated assemblies in the paper is available here
  2. E. coli K12
  3. Ecoli O157
  4. B. tre
  5. M. haemolytica
  6. F. tularensis
  7. S. enterica