Publications by CBCB authors

 Login

Citation N Nagarajan, M Pop
Parametric complexity of sequence assembly: theory and applications to next generation sequencing.
J Comput Biol, 16:897-908 (2009)
Abstract Abstract In recent years, a flurry of new DNA sequencing technologies have altered the landscape of genomics, providing a vast amount of sequence information at a fraction of the costs that were previously feasible. The task of assembling these sequences into a genome has, however, still remained an algorithmic challenge that is in practice answered by heuristic solutions. In order to design better assembly algorithms and exploit the characteristics of sequence data from new technologies, we need an improved understanding of the parametric complexity of the assembly problem. In this article, we provide a first theoretical study in this direction, exploring the connections between repeat complexity, read lengths, overlap lengths and coverage in determining the "hard" instances of the assembly problem. Our work suggests at least two ways in which existing assemblers can be extended in a rigorous fashion, in addition to delineating directions for future theoretical investigations.
PubMed 19580519
doi 10.1089/cmb.2009.0005
URL http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=19580519&dopt=Abstract
Download citation
GS Google Scholar
Notes