PacBio Corrected Reads (PBcR) pipeline
Usage and Example Data
- For a tutorial on using the pipeline for correction (including self-correction) and assembly, please see the PBcR wiki.
- If you encounter issues or have questions, please contact the authors of the pipeline, Sergey Koren (sergek AT umd.edu) or Adam M. Phillippy (aphillippy AT gmail.com).
- For best results with a high-coverage PacBio RS data (over 50X), we recommend using 25X of the longest post-correction sequences for assembly.
- For known issues, please see the known issues wiki page.
Utilities related to the pipeline and publications
Validation scripts for corrected sequences and assembled contigs used in the publication. Note, these scripts require MUMmer 3.23.
- sh analyzeCorrectedReads.sh <reference fasta file> <corrected sequence fasta file> <uncorrected fasta/fastq file> will output statistics on chimeric and improperly trimmed sequences compared to the reference.
- sh getCorrectnessStats.sh <directory containing results, can be .> <reference fasta file> <assembly contig fasta file> will output assembly statistics following the GAGE methodology.
Publications and Supporting Data
- Koren S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., Wang, Z., Rasko, D. A., McCombie, W. R., Jarvis, E. D., and Phillippy, A. M. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotech. (2012)
- Link to supporting data and assemblies.
- Koren S., Harhay G. P., Smith T. P. L., Bono J. L., Harhay D. M., Mcvey D. S., Radune D., Bergman N. H., and Phillippy A. M. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biology 14:R101, 2013.
Berlin, K.*, Koren S.*, Chin, CS., Drake J., Landolin, J. M., Phillippy, A. M. Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing. Nature Biotech. (2015)
- Link to supporting data and assemblies.
- Link to supporting data and assemblies.