The study, which was published recently in the journal Cell, created a catalog of complex structural variants using more than 4,000 human genomes from around the globe. These variants often occurred in genes governing the brain and were found in regions of the genome linked to human evolution. The researchers also showed that some of the complex structural variants affected how the instructions contained in brain-related genes were read out in the brains of people who had been diagnosed with schizophrenia or bipolar disorder.
“This work is a major step forward in figuring out the genetic and molecular basis for psychiatric disorders and suggests that brain-related diseases and in general disorders that have a strong genetic component should have a complex structural variant analysis,” said senior author of the study Alexander Urban, PhD, associate professor of psychiatry and behavioral sciences, and of genetics. “Any whole genome sequence should be run through this new algorithm; this will allow us to unearth important answers in the data that are currently ignored.”
Urban and Wing Wong, PhD, the Stephen R. Pierce Family Goldman Sachs Professor of Science and Human Health and Professor of Statistics and of Biomedical Data Science, were co-senior authors.
Enhancing Understanding of Psychiatric Diseases
Almost all the variations that have been discovered in the human genome so far are simple. But the new algorithm’s output showed that each genome also has between 80 and 100 complex structural variations.
“Looking for only simple variations is like proofreading a book manuscript and searching exclusively for typos that change single letters,” Urban said. “You are overlooking words that are scrambled or duplicated, or in the wrong order — you might even miss that half a chapter is gone. All these things should be caught before the manuscript is sent to the print shop.”
AI-Driven Discovery of Genetic Variants
The Automated Reconstruction of Complex Structural Variants algorithm, ARC-SV for short, catches all kinds of DNA rearrangements and has an accuracy rate of 95% in finding complex structural variants. The algorithm uses an AI model and was trained on dozens of complete human genomes, called pangenomes, from people with diverse ancestry.
The algorithm found more than 8,000 distinct complex structural variants, which ranged in length between 200 and 100,000 base pairs. Many variants were located in regions of the genome that regulate brain development and function. The researchers looked more closely at whether these variants were associated with psychiatric disease.
Genetic Analysis and Psychiatric Diagnosis
The ability to easily find and study complex structural variations could help explain which alterations in the genome lead to psychiatric diseases that are heritable. The study examined two such diseases, schizophrenia and bipolar disorder. Genome-wide association studies, called GWAS, have identified many locations in the genome that carry a risk of being diagnosed with a psychiatric disease. But GWAS results fall short of explaining the genetic risk with enough detail to act on it.
“We have made amazing progress in identifying genetic components of psychiatric diseases, but there is still something important missing,” Urban said. “GWAS results tell us where in the genome some DNA change related to a disorder is located. But the information from GWAS is somewhat vague. It is like knowing that there are errors somewhere on pages 118, 237, and 304 in a book. But we do not know what kind of errors they are or which words are involved.”
Precision in Genetic Research
Urban explained that while GWAS results might direct researchers to look for something wrong on page 118, knowing the sequence of complex structural variants is like having yellow highlighter on the actual 10-word sentence on that page that has one scrambled word and another word duplicated.
“It’s that exact,” he said.
Implications for Disease Understanding and Treatment
The researchers put the output of the ARC-SV algorithm to the test. They used whole-genome sequences combined with measures of gene expression from more than 100 postmortem brain tissue samples from healthy individuals and people who had been diagnosed with schizophrenia or bipolar disorder to investigate what complex structural variations might be doing. The variants tended to be located near or overlapped with GWAS locations known to be associated with risk for developing schizophrenia or bipolar disorder. The complex structural variants also affected how nearby genes were expressed — changing the readout of the instructions contained in DNA — which suggests the variants could be contributing to the disease.
“Identifying and studying complex structural variants will give us more understanding of the ways DNA can vary and will provide molecular clues that will allow mapping of the trajectory of biological function that leads to disease and to the treatment of disease,” said Bo Zhou, PhD, an instructor in psychiatry and behavioral sciences and a first author on the study.
Reference: “Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders” by Bo Zhou, Joseph G. Arthur, Hanmin Guo, Taeyoung Kim, Yiling Huang, Reenal Pattni, Tao Wang, Soumya Kundu, Jay X.J. Luo, HoJoon Lee, Daniel C. Nachun, Carolin Purmann, Emma M. Monte, Annika K. Weimer, Ping-Ping Qu, Minyi Shi, Lixia Jiang, Xinqiong Yang, John F. Fullard, Jaroslav Bendl, Kiran Girdhar, Minsu Kim, Xi Chen, William J. Greenleaf, Laramie Duncan, Hanlee P. Ji, Xiang Zhu, Giltae Song, Stephen B. Montgomery, Dean Palejev, Heinrich zu Dohna, Panos Roussos, Anshul Kundaje, Joachim F. Hallmayer, Michael P. Snyder, Wing H. Wong and Alexander E. Urban, 30 September 2024, Cell.
DOI: 10.1016/j.cell.2024.09.014
Joseph G. Arthur, PhD, a computational biologist at 10X Genomics, and Hanmin Guo, PhD, a postdoctoral scholar in psychiatry, were also first authors. Researchers at Pusan National University, Icahn School of Medicine at Mount Sinai, Pennsylvania State University, Bulgarian Academy of Sciences, American University of Beirut and the James J. Peters VA Medical Center contributed to this study.
This work was funded by the National Institutes of Health (grants K01MH129758, T32-GM096982, P50HG00773506, U01MH116529, R01HG010359, R01AG050986, R01MH109677, U01MH116442, R01MH110921, R01MH125246, R01AG067025, R01MH125244 R01AG066490, U01HG01096, R01HG006137 and UL1TR002014), the National Science Foundation (grants DGE-114747 and DMS1952386), the National Research Foundation of Korea, a VA Merit grant, Stanford’s Stein Fellowship and Penn State seed grants.