PacBio Corrected Reads (PBcR) pipeline

Usage and Example Data
  • For a tutorial on using the pipeline for correction (including self-correction) and assembly, please see the PBcR wiki.
    • If you encounter issues or have questions, please contact the authors of the pipeline, Sergey Koren (sergek AT umd.edu) or Adam M. Phillippy (aphillippy AT gmail.com).
  • For best results with a high-coverage PacBio RS data (over 50X), we recommend using 25X of the longest post-correction sequences for assembly.
  • For known issues, please see the known issues wiki page.
  • Assembly spec file for an SGE grid and a high-memory multi-core environment.


Utilities related to the pipeline and publications


  • Validation scripts for corrected sequences and assembled contigs used in the publication. Note, these scripts require MUMmer 3.23.
    • sh analyzeCorrectedReads.sh <reference fasta file> <corrected sequence fasta file> <uncorrected fasta/fastq file> will output statistics on chimeric and improperly trimmed sequences compared to the reference.
    • sh getCorrectnessStats.sh <directory containing results, can be .> <reference fasta file> <assembly contig fasta file> will output assembly statistics following the GAGE methodology.


    Publications and Supporting Data