Niranjan

Niranjan Nagarajan

Current Position: Senior Research Scientist, Computational and Mathematical Biology, Genome Institute of Singapore

Postdoctoral Fellow, 2007-2009 (advisor: Mihai Pop)
Center for Bioinformatics and Computational Biology,
and UM Institute for Advanced Computer Studies

Ph.D., Cornell University, 2006 (advisor: Uri Keich)
M.S., Cornell University, 2004
B.A., Ohio Wesleyan University, 2000

niranjan [at] umiacs.umd.edu
Center for Bioinformatics and Computational Biology
Biomolecular Sciences Bldg #296
College Park, MD 20742
301-405-8804


Motif finding

Motif Finding

    DNA and protein sequences often contain short motifs that are conserved within and across organisms due to their functional importance. Searching for these conserved motifs (often called motif finding) therefore serves as a useful approach to identifying novel functional sequence elements and several computational tools have been designed for this purpose (such as MEME, GLAM and GIMSAN). In addition to the computational difficulty of searching for small patterns in long, largely random strings, motif finding programs also face the challenge of determining if a putative motif is statistically significant (i.e. unlikely to be present by chance in the dataset). Under a commonly used model, computing the significance of motifs can be surprisingly hard -- in our early work (Nagarajan et al., 2005) we showed that approximations used in two popular motif finding tools can be quite off. In response, we proposed a new, discrete Fourier trasform based algorithm that can accurately and efficiently compute the significance of motifs (ungapped multiple alignments). The novel technique proposed in this work is suprisingly general and we have since explored its use in the context of designing efficient exact algorithms for computing p-values for other common statistical tests as well (see also: Computational Statistics).

    In subsequent work, we also explored other models to compute the significance of motifs (Ng et al., 2006) as well as new scores to optimize for, while searching for the "best" motif (Ng et al., 2006, Nagarajan et al., 2006). Interestingly, a measure of statistical significance of a motif, the "E-value", can serve as an ideal score for searching for motifs, and using some clever memoization, this can be done with a small computational overhead (Nagarajan et al., 2006). The algorithms described in these works are available as library functions and programs in the FAST package (Nagarajan and Keich, 2008).