Niranjan

Niranjan Nagarajan

Current Position: Senior Research Scientist, Computational and Mathematical Biology, Genome Institute of Singapore

Postdoctoral Fellow, 2007-2009 (advisor: Mihai Pop)
Center for Bioinformatics and Computational Biology,
and UM Institute for Advanced Computer Studies

Ph.D., Cornell University, 2006 (advisor: Uri Keich)
M.S., Cornell University, 2004
B.A., Ohio Wesleyan University, 2000

niranjan [at] umiacs.umd.edu
Center for Bioinformatics and Computational Biology
Biomolecular Sciences Bldg #296
College Park, MD 20742
301-405-8804


Metagenomics

Metagenomics (studying uncultured bugs)

    Metagenomics is a newly emerging field that aims to study environmental samples (bacterial or viral) directly instead of trying to culture constituents of the sample in a lab. The advantages of metagenomic studies are twofold: the notoriously difficult step of producing pure cultures can be bypassed and secondly it enables microbiologists to understand the entire ecosystem of a sample rather than just study individual bacterial or viral constituents. One of the popular approaches for metagenomic studies is to isolate and sequence a universal marker gene such as 16S rRNA (that is typically conserved within a bacterial species but is different between species) to quantify the composition of a sample. Since, the function from similarity between 16S sequences to species classes is unknown (in any case the notion of species at the bacterial level is quite subjective), researchers typically rely on various clustering algorithms (e.g. single linkage clustering) and ad hoc thresholds to produce a coarse approximation to the notion of a species. In a recent work (White et al., In Preparation), we show that these approximations can indeed be very poor and can lead to incorrect estimates for microbial diversity. In principle, semi-supervised clustering approaches that exploit information in databases of known 16S sequences, can alleviate some of these problems. In another recent work (Saket et al, 2009) we propose a new and general semi-supervised clustering algorithm and show that it can indeed approximate the notion of species more accurately, even with sparsely labelled input.
    Metagenomic samples are increasingly being used to investigate the correlation between the abundance of various species and phenotypes of the sample. For example, a recent study (Ley et al., 2006) reported that certain divisions of bacteria were significantly more/less abundant in obese humans when compared to lean humans. An efficient and sensitive statistical methodology is required to do such analysis and we recently proposed a tool for this (White et al., 2009) that more robustly handles sparsely sampled features and can be applied to SAGE data as well.