Reassortments & Inference with Tree Ensembles
While
organisms typically evolve by
local mutations and deletions in their genomes, once in a while, larger
rearrangements can lead to drastic changes in function (as seen for
e.g. in cancer cells). In viruses these rearrangements tend to be even
more common and have the potential to create highly virulent strains.
An important example of this is seen in the
Influenza
virus (common flu):
reassortments
(a special class of rearrangements) in the flu genome have been linked
to
pandemic strains of the past such as the Asian H2N2 pandemic of 1957
and the Hong Kong H3N2 pandemic in 1968. Detection of new strains
resulting from a reassortment can serve as an early-warning system to
avoid the next pandemic.
In fact, one of the major sources of concern among public health
experts is that new strains of bird flu might
emerge, possibly through reassortments, that are trasmittable between
humans. In recent work, we reported the development of a robust method
to detect reassortments in genomic data that can account for the
uncertainity in the inferred phylogeny of the strains (
Nagarajan and
Kingsford, 2008). Our method works by
analyzing ensembles of trees
and relies on a novel algorithm for efficiently finding large
bicliques
in a bipartite graph. In general, our method can detect topological
incongruencies between two distributions of trees. We are currently
working on an extension that also incorporates evolutionary distance
information based on a statistical approach to detect when inter-clade
distances have changed (see also:
Computational Statistics).