All News

CBCB Doctoral Student Uses Powerful Computing to Unlock the Mysteries of Microbes

Sun Oct 15, 2017

Nidhi Shah loves a good challenge and is inquisitive by nature, so it’s only fitting that she’s involved in an expanding scientific field where curiosity and determination are essential: metagenomics.

Shah, a second-year computer science doctoral student in the Center for Bioinformatics and Computational Biology (CBCB), is developing algorithms and software tools to unlock the mysteries of microbes, the tiny microorganisms that make up the majority of our planet’s living material.

Until the last decade or so, there were many unanswered questions surrounding microbes, mostly due to the fact that only a small percentage of them could be isolated and studied in a controlled lab environment.

Now, with advances in high-throughput DNA sequencing and other tools, scientists are able to take an environmental sample—whether from a hydrothermal vent or the gut of a sheep—sequence the DNA in it, and then piece together the genomes of any microbes that are present.

One of the biggest challenges in metagenomics is to annotate, or identify, which microorganism or gene that a specific DNA sequence originates from, which is where the research by Shah and other computational biologists comes into play.

“Annotating the sequences found in these samples is complicated work, but it’s important because it helps us not only understand the diversity of our environment, but also lets us better characterize a microbe’s functional capabilities,” Shah says. “This, in turn, can help us understand and have a positive impact on human health, agricultural yield, the environment, and more, by exploring these types of microbial dynamics.”

Helping Shah in her work are the vast computing resources available in CBCB, one the dozen-plus labs and centers in the University of Maryland Institute for Advanced Computer Studies (UMIACS).

“I use UMIACS’ computational resources extensively,” she says. “It’s practically impossible to run and test our algorithms and ideas for large-scale study samples on our [desktop] computers. So being able to quickly migrate workloads to the large-scale UMIACS data center lets me quickly make informed decisions about the capabilities of our sampling methods.”

Shah says that prior to coming to the University of Maryland, she had very little experience with computational biology. As an undergraduate majoring in information technology at the Veermata Jijabai Technological Institute in Mumbai, she studied computer science theory. When it came time to apply to graduate school, Shah searched for something that would be a “good mix” of both theory and application.

The cutting-edge research in metagenomics underway in CBCB seemed to be a good pathway to explore, she says.

At Maryland, Shah is working closely with her adviser, Mihai Pop, a professor of computer science who has a joint appointment in UMIACS.

Pop, an expert in metagenomics, says that Shah has made tremendous progress in her research over the past year, publishing her initial results in the recent Workshop on Algorithms in Bioinformatics (WABI 2017) held in Boston.

“We are already using the new approaches she has developed to study microbial communities from the guts of children from the developing world, as well as to study the impact of brain injury on the gastrointestinal track of mice,” Pop says. “Nidhi has strong technical skills, and an amazingly positive and thoughtful attitude—attributes that foretell an outstanding scientific career in graduate school and beyond.”

Although Shah doesn’t know exactly where her path will lead once her doctoral program is complete, she says that working in CBCB is not only teaching her computational biology, but is also providing the academic tools needed to become an independent researcher.

“Whatever field I end up working in, I want to be able to ask the right questions, and have the intuitive knowledge needed to answer those questions effectively,” she says. “Equally important is being able to communicate one’s research findings to the public. Working with all of the amazing research scientists, postdocs and other graduate students in CBCB is definitely helping me master these important skills.”

—Story by Melissa Brachfeld

Pop Lends Expertise to International Competition for Metagenomics Software

Mon Oct 02, 2017

Communities of bacteria live everywhere: inside our bodies, on our bodies and all around us. The human gut alone contains hundreds of species of bacteria that help digest food and provide nutrients, but can also make us sick. To learn more about these groups of bacteria and how they impact our lives, scientists need to study them. But this task poses challenges, because taking the bacteria into the laboratory is either impossible or would disrupt the biological processes the scientists wish to study.

To bypass these difficulties, scientists have turned to the field of metagenomics. In metagenomics, researchers use algorithms to piece together DNA from an environmental sample to determine the type and role of bacteria present. Unlike established fields such as chemistry, where researchers evaluate their results against a set of known standards, metagenomics is a relatively young field that lacks such benchmarks.

Mihai Pop, a professor of computer science with a joint appointment in the University of Maryland Institute for Advanced Computer Studies, recently helped judge an international challenge called the Critical Assessment of Metagenome Interpretation (CAMI), which benchmarked metagenomics software. The results were published in the journal Nature Methods on October 2, 2017.

“There’s no one algorithm that we can say is the best at everything,” said Pop, who is also co-director of the Center for Health-related Informatics and Bioimaging at UMD. “What we found was that one tool does better in one context, but another does better in another context. It is important for researchers to know that they need to choose software based on the specific questions they are trying to answer.”

The study’s results were not surprising to Pop, because of the many challenges metagenomics software developers face. First, DNA analysis is challenging in metagenomics because the recovered DNA often comes from the field, not a tightly controlled laboratory environment. In addition, DNA from many organisms—some of which may not have known genomes—mingle together in a sample, making it difficult to correctly assemble, or piece together, individual genomes. Moreover, DNA degrades in harsh environments.

“I like to think of metagenomics as a new type of microscope,” Pop said. “In the old days, you would use a microscope to study bacteria. Now we have a much more powerful microscope, which is DNA sequencing coupled with advanced algorithms. Metagenomics holds the promise of helping us understand what bacteria do in the world. But first we need to tune that microscope.”

CAMI’s leader invited Pop to help evaluate the submissions by challenge participants because of his expertise in genome and metagenome assembly. In 2009, Pop helped publish Bowtie, one of the most commonly used software packages for assembling genomes. More recently, he collaborated with the University of Maryland School of Medicine to analyze hundreds of thousands of gene sequences as part of the largest, most comprehensive study of childhood diarrheal diseases ever conducted in developing countries.

“We uncovered new, unknown bacteria that cause diarrheal diseases, and we also found interactions between bacteria that might worsen or improve illness,” Pop said. “I feel that’s one of the most impactful projects I’ve done using metagenomics.”

For the competition, CAMI researchers combined approximately 700 microbial genomes and 600 viral genomes with other DNA sources and simulated how such a collection of DNA might appear in the field. The participants’ task was to reconstruct and analyze the genomes of the simulated DNA pool.

CAMI researchers scored the participants’ submissions in three areas: how well they assembled the fragmented genomes; how well they “binned,” or organized, DNA fragments into related groups to determine the families of organisms in the mixture; and how well they “profiled,” or reconstructed, the identity and relative abundance of the organisms present in the mixture. Pop contributed metrics and software for evaluating the submitted assembled genomes.

Nineteen teams submitted 215 entries using six genome assemblers, nine binners and 10 profilers to tackle this challenge.

The results showed that for assembly, algorithms that pieced together a genome using different lengths of smaller DNA fragments outperformed those that used DNA fragments of a fixed length. However, no assemblers did well at picking apart different, yet similar genomes.

For the binning task, the researchers found tradeoffs in how accurately the software programs identified the group to which a particular DNA fragment belonged, versus how many DNA fragments the software assigned to any groups. This result suggests that researchers need to choose their binning software based on whether accuracy or coverage is more important. In addition, the performance of all binning algorithms decreased when samples included multiple related genomes.

In profiling, software either recovered the relative abundance of bacteria in the sample better or detected organisms better, even at very low quantities. However, the latter algorithms identified the wrong organism more often.

Going forward, Pop said the CAMI group will continue to run new challenges with different data sets and new evaluations aimed at more specific aspects of software performance. Pop is excited to see scientists use the benchmarks to address research questions in the laboratory and the clinic.

“The field of metagenomics needs standards to ensure that results are correct, well validated and follow best practices,” Pop said. “For instance, if a doctor is going to stage an intervention based on results from metagenomic software, it’s essential that those results be correct. Our work provides a roadmap for choosing appropriate software.”

—Story by Irene Ying

This work was led by Alice McHardy of the Department for Computational Biology of Infection Research at the Helmholtz Centre for Infection Research and the Braunschweig Integrated Centre of Systems Biology in Braunschweig, Germany.

This work was supported by an Engineering and Physical Sciences Research Council Grant (Award No. EP/K032208/1), a U.S. Department of Energy contract (Award No. DEAC02-05CH11231) and the Cluster of Excellence on Plant Sciences program funded by the Deutsche Forschungsgemeinschaft. The content of this article does not necessarily reflect the views of these organizations.

UMIACS Partners with Fraunhofer, Signature Science on DNA Screening Technologies

Thu Sep 28, 2017

Computational biologists in the University of Maryland Institute for Advanced Computer Studies (UMIACS) are collaborating with other experts to develop new approaches and tools for screening DNA sequences that might accidently—or intentionally—be altered, resulting in a biological threat.

Mihai Pop (left in photo), a professor of computer science and interim director of UMIACS, and Todd Treangen (right), an assistant research scientist in UMIACS, are working closely with the Fraunhofer Center for Experimental Software Engineering and Signature Science LLC on next-generation computational and bioinformatics tools that can quickly assess whether certain synthesized DNA strands could pose a risk.

“This is an ambitious project that will join experts in biology, bioinformatics, machine learning and software engineering,” says Pop. “The software underlying this project is extremely complex, involving an intricate chain of sophisticated software components. This chain has to work seamlessly—not only to reliably identify biological threats, but to do so under strict time and resource constraints.”

DNA synthesis has increased significantly during the past decade, Pop says, with scientists in academia and industry using automated machines to construct genes and other long strands of DNA by stringing together chemical building blocks called nucleotides in any desired sequence.

While these altered DNA strands can lead to revolutionary advances in medicine, agriculture and materials science, there is the possibility that someone could exploit synthetic DNA for harmful purposes—like creating a synthetic smallpox virus, a deadly scourge that was eradicated from nature in the late 1970s and currently exists only in a few highly secure repositories.

The Maryland researchers are subcontracted to Signature Science—a subsidiary of the Southwest Research Institute—on the Functional Genomic and Computational Assessment of Threats (Fun GCAT) program, which is managed by the Intelligence Advanced Research Projects Activity (IARPA).

For their role in Fun GCAT, researchers from UMIACS, Fraunhofer and Signature Science have identified specific tasks for each group in order to create a “bioinformatics analysis pipeline,” says Treangen, an expert in developing software that can quickly and efficiently analyze large amounts of genomic data.

Treangen is working on the project with Dan Nasko, a postdoctoral scientist, and graduate students in the Center for Bioinformatics and Computational Biology, an interdisciplinary center in UMIACS with access to powerful computing and data storage resources.

“My group will develop software modules that can provide rapid sequence and protein structure comparisons to assess the threat potential of functional elements from short DNA sequences,” he says. “The biggest challenge will be adapting current tools—and developing new tools—to perform accurate taxonomic assignment, function prediction, and threat assignment of these sequences.”

The scientists at Fraunhofer will integrate the software modules designed by Treangen’s team into a larger software infrastructure that meets regulatory standards and can be optimized for peak performance, says Adam Porter, a professor of computer science at the University of Maryland who is the executive director of Fraunhofer.

Fraunhofer will also create the visual dashboards needed for monitoring overall system performance, adds Porter.

“This is a large undertaking that requires robust proficiency in designing and integrating automated systems used for testing and validating large amounts of data very quickly,” he says. “Fraunhofer and UMIACS can provide that type of expertise in force.”

As the prime contractor for the $2.9M project, Signature Science will coordinate the work done by UMIACS, Fraunhofer and other team members.

In addition to their appointments in UMIACS, Pop and Treangen have leadership roles in the Center for Health-related Informatics and Bioimaging; Pop is co-director and Treangen is assistant director.

This effort is supported by the U.S. Army Research Office. The content of this release does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

About UMIACS: The University of Maryland Institute for Advanced Computer Studies is multidisciplinary research institute with more than 80 faculty and research scientists from 11 departments and six colleges on the University of Maryland campus. Its primary mission to is foster and enable cutting-edge interdisciplinary research that is grounded in computer science and that addresses pressing scientific and societal challenges.

About Fraunhofer: The Fraunhofer Center for Experimental Software Engineering is an applied research and technology transfer organization that is affiliated with the University of Maryland. It regularly collaborates with UMD faculty, labs and centers to add Fraunhofer’s software and systems engineering skills to their basic research, so it can be applied to important scientific and societal problems in biology, health, national security and more.

About Signature Science LLC: A subsidiary of the Southwest Research Institute, Signature Science is a scientific and technical consulting firm providing multidisciplinary applied research, technology design and development, and scientific, technical and operational services to government and industry.

About IARPA: Launched in 2006, the Intelligence Advanced Research Projects Agency invests in high-risk, high-payoff research programs that address some of the most difficult scientific challenges faced by the U.S. intelligence community.

Pop and Treangen Assume Leadership Roles at CHIB

Wed Aug 02, 2017

Two computational biologists at the University of Maryland have assumed leadership positions in the Center for Health-related Informatics and Bioimaging (CHIB), a unique cross-institutional collaboration that matches data-intensive biomedical problems to cutting-edge expertise in data science.

Mihai Pop, professor of computer science and interim director of the University of Maryland Institute for Advanced Computer Studies (UMIACS), is the new co-director of CHIB. Todd Treangen, a research scientist in UMIACS, is CHIB’s new assistant director.

As co-director, Pop will share senior leadership responsibilities with Owen White, a genomic expert at the University of Maryland School of Medicine.

Launched in 2013, CHIB matches computing resources at the University of Maryland, College Park with clinical and biomedical expertise at the University of Maryland, Baltimore.

Interdisciplinary teams from both institutions—with administrative support, and in some cases, seed funding from CHIB—have advanced research and education in areas like the use of new visualization technologies in healthcare, studying diarrheal pathogens in young children from low-income countries, and training graduate students in network biology.

The driving force in much of the center’s work is the use of cutting-edge computing technologies to address grand challenges in genomic research, medical information management and precision medicine—and then translate those findings to practitioners focused on improving human health.

“We want to build on the success we’ve had, and continue to develop new tools and technologies that will have a positive health impact on people in the state of Maryland and beyond,” says Pop.

A recent round of funding from MPowering the State will spur several new initiatives in CHIB.

One involves a series of MPower “summits”—joining CHIB experts with other researchers in the University of Maryland Medical System, other colleges and universities, and nearby federal scientists—in key areas like health data mobilization, personalized medicine and cancer, network analytics, and antibiotic resistance and pathogen detection.

“All of these will require input from a broad range of experts at both institutions—computer scientists, statisticians, physicians, epidemiologists, social science specialists, and more,” says Pop. “We're looking forward to these new research partnerships and projects.”

Colwell Awarded 2017 International Prize for Biology

Mon Aug 21, 2017

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies (UMIACS), has been named the 2017 laureate of the International Prize for Biology for her outstanding contributions to marine microbiology, bioinformatics and the understanding and prevention of cholera.

Colwell is the 33rd recipient of this award, generally recognized as one of the most prestigious honors a natural scientist can receive. Past laureates include renowned biologists such as John B. Gurdon, Motoo Kimura, Edward O. Wilson, Ernst Mayr and Thomas Cavalier-Smith.

In awarding the prize, Japan’s Society for the Promotion of Science honored Colwell as a pioneer in the use of computational tools and DNA sequencing to identify and classify marine bacteria and other microorganisms, work that helped lay the foundation for the bioinformatics revolution.

The prize also recognizes Colwell’s life-saving contributions to the understanding and prevention of cholera, an acute diarrheal disease, caused by ingestion of water or food contaminated with Vibrio cholera, which according to the World Health Organization is responsible for approximately 1 to 4 million illnesses and 20,000 to 140,000 deaths each year.

Colwell, whose career bridges the disciplines of microbiology, ecology, infectious disease, public health and computer and satellite technology, continues to be a leader in bioinformatics, notably in understanding microbiomes and the application of this knowledge to human health and the diagnosis and treatment of disease. This includes her current work as founder and chairman of CosmosID Inc., a microbial genomics company focused on molecular diagnostics of human pathogens and antimicrobial resistance.

“It is an extraordinary honor to be named recipient of the International Prize for Biology, a very special honor for a biologist,” said Colwell. “I am deeply grateful to the Japan Society for the Promotion of Science for this award. I have many friends and colleagues in Japan and look forward to continuing my many collaborations with them.”

The selection committee also cited Colwell's transformational work in these areas:

- Establishing the taxonomy of vibrio bacteria, which includes Vibrio cholerae.

- Identifying a previously unknown survival strategy of dormant vibrio cells, which the committee said "has had a profound influence on microbiology and medicine.”

- Showing how climate change has expanded the habitat range of vibrios, and the occurrence of cholera.

- Helping prevent the spread of cholera in developing countries by discovering and demonstrating an effective way to use the sari, the traditional dress of women on the Indian subcontinent, as a filter to remove vibrio-carrying plankton from drinking water drawn from ponds, rivers and other surface waters.

The International Prize for Biology was instituted in April 1985 by the Committee on the International Prize for Biology. The prize, consisting of a certificate, a medal and an award of 10-million yen (more than $90,000) is given to the recipient, along with an imperial gift, a silver vase bearing the imperial crest. The award presentation ceremony and a subsequent reception in honor of Colwell will held in late 2017 at the Japanese Academy in Japan.

Pop Delivers Keynote Talk at International Conference on Bioinformatics

Wed Sep 20, 2017

Mihai Pop, professor of computer science and interim director of UMIACS, is a featured keynote speaker this week at the German Conference on Bioinformatics.

The annual conference—held this year in Tübingen, Germany—attracts a multinational audience of 200–250 participants. The meeting is open to all fields of bioinformatics and serves as a collaborative platform for the European bioinformatics community.

Pop’s talk will center on the computational analysis of microbial communities. He will discuss some of the recent results from his lab—focusing on both experimental and computational challenges—as well as how microbes can impact human health.

Pop will also highlight his lab’s work in using computational biology to study the causes of diarrhea in the developing world, as well as his group’s research on the testing and validation of genome assembly software.

Spring 2017 CAMI2 hackathon on May 3rd to May 5th

Fri Mar 10, 2017

CBCB will be hosting a 3-day CAMI2 hackathon on Wednesday, May 3rd through Friday, May 5th at the University of Maryland (in Biomolecular Sciences Bldg, room 3118).

This will be a joint event with the Mid-Atlantic Microbiome Meetup (M3). This event precedes the CAMI+M3 workshop taking placing on May 7th to May 8th at the University of Maryland (STAMP).

The registration website is online at:

For more information, please see:

Spring 2017 Mid-Atlantic Microbiome Meetup (M3) on May 7th and May 8th

Fri Mar 10, 2017

We are happy to announce that the Spring 2017 Mid-Atlantic Microbiome Meetup (M3) registration website is now online at:

This will be a joint event with CAMI ( and is taking place on May 7th & 8th (2017) at the University of Maryland (STAMP).

For more information on this meetup, please see:


Subscribe to All News