PacBio to CA correction and assembly pipeline
Usage and Example Data
- For a tutorial on using the pipeline for correction (including self-correction) and assembly, please see the pacBioToCA wiki.
- If you encounter issues or have questions, please contact the authors of the pipeline, Sergey Koren (sergek AT umd.edu) or Adam M. Phillippy (aphillippy AT gmail.com).
- For best results with a high-coverage PacBio RS data (over 50X), we recommend using 25X of the longest post-correction sequences for assembly.
- For known issues, please see the known issues wiki page.
- Assembly spec file for an SGE grid and a high-memory multi-core environment.
Utilities related to the pipeline and publications
Validation scripts for corrected sequences and assembled contigs used in the publication. Note, these scripts require MUMmer 3.23.
- sh analyzeCorrectedReads.sh <reference fasta file> <corrected sequence fasta file> <uncorrected fasta/fastq file> will output statistics on chimeric and improperly trimmed sequences compared to the reference.
- sh getCorrectnessStats.sh <directory containing results, can be .> <reference fasta file> <assembly contig fasta file> will output assembly statistics following the GAGE methodology.
Publications and Supporting Data
- Koren S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., Wang, Z., Rasko, D. A., McCombie, W. R., Jarvis, E. D., and Phillippy, A. M. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotech. (2012)
- Link to supporting data and assemblies.
- Koren S., Harhay G. P., Smith T. P. L., Bono J. L., Harhay D. M., Mcvey D. S., Radune D., Bergman N. H., and Phillippy A. M. Reducing assembly complexity of microbial genomes with single-molecule sequencing. arXiv.org preprint.
- Link to supporting data and assemblies.