Researchers from Georgia Tech and the National Institutes of Health have developed a new genomic assessment method for detecting clear species boundaries in microbes, allowing for quick and accurate identification of minute differences in species.
Known as FastANI, the new approach couples a powerful sequence-mapping algorithm with a robust sequence similarity metric to reduce the computational complexity of this type of genomic assessment. Up to three orders of magnitude faster than current alignment-based approaches, FastANI is accurate for both complete and draft genomes.
The new approach and research findings were published in the November 2018 issue of Nature Communications.
Although the timely identification of species is critical to public and environmental health concerns, the process of identifying genomes for prokaryotes – usually single-cell organisms that have no true nucleus – is not straight-forward.
“There is an age old question in microbiology that asks, ‘Are there clearly identifiable species boundaries in the microbial world?’” said Srinivas Aluru, School of Computational Science and Engineering (CSE) professor.
“This question arises because microbes can exchange genes across species without engaging in reproduction, more formally called horizontal gene transfer, which can make the distinction between different species quite murky,” said Aluru, who also serves as Co-Executive Director of the Institute for Data Engineering and Science.
Alignment-free versus alignment-based approaches
One major task in determining species boundaries is estimating the genetic relatedness between two genomes. A measurement known as the whole-genome average nucleotide identity (ANI) has emerged in recent years as a strong tool for this task. According to the metric, organisms belonging to the same species typically share an ANI of 95 percent or more. Kostas Konstantinidis, Professor in Civil and Environmental Engineering, developed the ANI metric and provided the microbiology expertise for the project.
“Whole-genome similarity metrics, such as ANI, help define clear species boundaries by facilitating the classification of thousands of genomes from diverse evolutionary backgrounds,” said CSE Ph.D. student Chirag Jain, the primary investigator of the paper.
“However, despite its strengths, all previous ANI-based methods have not been able to assess a large number of genomes because of their reliance on alignment-based searches, which are computationally expensive,” he said.
To bypass this bottleneck and meet the growing demand for genomic assessment, the team developed the first ANI-based method that is not alignment-based.The FastANI method quickly estimates the similarity of two sets by comparing a sample against the full collection of available prokaryotic genomes using Mashmap, an alignment-free approximate sequence-mapping algorithm developed by Jain and Aluru in collaboration with Adam Phillippy at the National Institute of Health.
The team performed over 8 billion pairwise comparisons of orthologous sequences between almost 90,000 genomes using the new technique before releasing their findings, which are publicly available.
“The FastANI metric is an output from 0 to 100. The higher the metric the more closely the two genomes are related. A result registering 95 or above means there is a 98.5 percent chance that the two matched genomes are the same species. Results below 95 confirm the two are not the same species,” said Jain.
According to Aluru, FastANI in practice will improve the diagnosis of disease agents, the regulation of which organisms can be transported between countries, and help to determine which organisms should be under quarantine.