AI Discovers Over 160,000 New RNA Viruses

AI Virus Discovery Art Concept
Over 161,000 RNA viruses have been discovered through AI, showcasing vast, unexplored viral diversity and setting the stage for further scientific breakthroughs. Credit: SciTechDaily.com

Largest discovery of new virus species sheds light on the hidden virosphere.

Artificial intelligence (AI) has been used to reveal details of a diverse and fundamental branch of life living right under our feet and in every corner of the globe. These viruses not only play significant roles in human health but are also prevalent in extreme environments, highlighting their crucial roles in ecosystems and offering insights into viral evolution and diversity.

Using a machine learning tool, researchers have discovered 161,979 new species of RNA virus, a breakthrough that could dramatically enhance our understanding of Earth’s biodiversity and assist in identifying millions more viruses yet to be characterized.

Published on October 9 in the journal Cell and conducted by an international team of researchers, the study is the largest virus species discovery paper ever published.

Unprecedented Viral Diversity Unveiled

“We have been offered a window into an otherwise hidden part of life on earth, revealing remarkable biodiversity,” said senior author Professor Edwards Holmes from the School of Medical Sciences in the Faculty of Medicine and Health at the University of Sydney.

“This is the largest number of new virus species discovered in a single study, massively expanding our knowledge of the viruses that live among us,” Professor Holmes said. “To find this many new viruses in one fell swoop is mind-blowing, and it just scratches the surface, opening up a world of discovery. There are millions more to be discovered, and we can apply this same approach to identifying bacteria and parasites.”

The Role of RNA Viruses in Extreme Environments

Although RNA viruses are commonly associated with human disease, they are also found in extreme environments around the world and may even play key roles in global ecosystems. In this study they were found living in the atmosphere, hot springs, and hydrothermal vents.

“That extreme environments carry so many types of viruses is just another example of their phenomenal diversity and tenacity to live in the harshest settings, potentially giving us clues on how viruses and other elemental life-forms came to be,” Professor Holmes said.

Advancements in Viral Identification via AI

The researchers built a deep learning algorithm, LucaProt, to compute vast troves of genetic sequence data, including lengthy virus genomes of up to 47,250 nucleotides and genomically complex information to discover more than 160,000 viruses.

“The vast majority of these viruses had been sequenced already and were on public databases, but they were so divergent that no one knew what they were,” Professor Holmes said. “They comprised what is often referred to as sequence ‘dark matter’. Our AI method was able to organize and categorize all this disparate information, shedding light on the meaning of this dark matter for the first time.

The AI tool was trained to compute the dark matter and identify viruses based on sequences and the secondary structures of the protein that all RNA viruses use for replication.

Future Directions and Applications of AI in Virology

It was able to significantly fast-track virus discovery, which, if using traditional methods, would be time intensive.

Co-author from Sun Yat-sen University, the study’s institutional lead, Professor Mang Shi said: “We used to rely on tedious bioinformatics pipelines for virus discovery, which limited the diversity we could explore. Now, we have a much more effective AI-based model that offers exceptional sensitivity and specificity, and at the same time allows us to delve much deeper into viral diversity. We plan to apply this model across various applications.”

Co-author Dr Zhao-Rong Li, who researches in the Apsara Lab of Alibaba Cloud Intelligence, said: “LucaProt represents a significant integration of cutting-edge AI technology and virology, demonstrating that AI can effectively accomplish tasks in biological exploration. This integration provides valuable insights and encouragement for further decoding of biological sequences and the deconstruction of biological systems from a new perspective. We will also continue our research in the field of AI for virology.”

Professor Holmes said: “The obvious next step is to train our method to find even more of this amazing diversity, and who knows what extra surprises are in store.”

Reference: “Using artificial intelligence to document the hidden RNA virosphere” by Xin Hou, Yong He, Pan Fang, Shi-Qiang Mei, Zan Xu, Wei-Chen Wu, Jun-Hua Tian, Shun Zhang, Zhen-Yu Zeng, Qin-Yu Gou, Gen-Yang Xin, Shi-Jia Le, Yin-Yue Xia, Yu-Lan Zhou, Feng-Ming Hui, Yuan-Fei Pan, John-Sebastian Eden, Zhao-Hui Yang, Chong Han, Yue-Long Shu, Deyin Guo, Jun Li, Edward C. Holmes, Zhao-Rong Li and Mang Shi, 9 October 2024, Cell.
DOI: 10.1016/j.cell.2024.09.027

The researchers declare no competing interests. The research was supported by the National Natural Science Foundation of China, the Shenzhen Science and Technology Program, the Natural Science Foundation of Guangdong Province, the Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project, the Hong Kong Innovation and Technology Fund (ITF) and the Health and Medical Research Fund. Professor Holmes is funded by a National Health and Medical Research Council of Australia Investigator grant and by AIR@InnoHK administered by the Innovation and Technology Commission, Hong Kong Special Administrative Region, China.