Niranjan

Niranjan Nagarajan

Current Position: Senior Research Scientist, Computational and Mathematical Biology, Genome Institute of Singapore

Postdoctoral Fellow, 2007-2009 (advisor: Mihai Pop)
Center for Bioinformatics and Computational Biology,
and UM Institute for Advanced Computer Studies

Ph.D., Cornell University, 2006 (advisor: Uri Keich)
M.S., Cornell University, 2004
B.A., Ohio Wesleyan University, 2000

niranjan [at] umiacs.umd.edu
Center for Bioinformatics and Computational Biology
Biomolecular Sciences Bldg #296
College Park, MD 20742
301-405-8804


Computational Statistics

Computational Statistics

    Statistical tests play an integral role in Bioinformatics in helping to distinguish biologically meaningful signals from random artifacts of the experimental data. For example, one of the commonly used tests is a log-likelihood ratio test to compare a column of a multiple alignment against a background distribution of nucleotide or amino acid frequencies. The applications for this test have ranged from motif-finding to protein structure prediction and have typically relied on a χ2 approximation which be can very inaccurate in the tail of the distribution. To address this, we designed a numerically robust exact algorithm for this problem (bagFFT) that can accurately estimate very small probabilities while being asymptotically more efficient than existing algorithms (Keich and Nagarajan, 2006). The bagFFT algorithm is based on a new technique to compensate for numerical errors in FFT-based calculations with a wide variety of possible applications in Computational Statistics. In recent work we have also explored extensions of this approach to compute the statistical significance of multiple alignments (Nagarajan et al., 2005) as well as to compute the significance of the Mann-Whitney test (Nagarajan and Keich, 2009). As part of these studies we succeeded in established the failings of several popular approximations for these tests, such as a normal or a χ2  approximation, especially in computational applications where large hypothesis spaces lead to the need to accurately estimate small probabilities. In addition, we also demonstrate that numerical instability can plague some of the published algorithms for these tests (e.g. Baglivo's algorithm, Harding's algorithm).