AI model popEVE predicts likelihood of harm from ultra-rare DNA mutations
Study: Tool even proved effective when only a single child's DNA was available
Written by |
A new computational model, called popEVE, can predict the likelihood of harm from a DNA mutation, even when the variant is extremely rare or has never been observed before, according to a new study.
The tool was able to pinpoint potentially damaging mutations in children with developmental disorders — including cases where only the child’s DNA was available and no genetic information from relatives or siblings could be used. This offers “a generalizable framework for rare disease variant interpretation, especially in singleton cases,” researchers wrote.
The study, “Proteome-wide model for human disease genetics,” was published in Nature Genetics by a team of researchers from institutions in the U.S. and Europe.
Some mutations appear just once in entire human population
Interpreting the genetic cause of rare disorders can be challenging, and for ultra-rare conditions like AADC deficiency — a disorder caused by mutations in the DDC gene that lead to delays in early development and other AADC symptoms — the difficulty is even greater.
Many disease-causing mutations are extremely rare, appearing only occasionally or even once in the entire human population. This leaves doctors with few reliable reference points when trying to interpret a child’s genetic information. Missense mutations, which change a single building block of the protein encoded by a gene, are especially challenging because their effects can be subtle and depend heavily on where they occur within the protein.
Current tools used to interpret genetic data rely on the frequency at which a mutation has been observed previously. This approach can be effective for more common conditions, but it often fails for disorders caused by ultra-rare genetic changes. These tools also struggle to assess the severity of a mutation — for example, whether it could be life-threatening in childhood or lead to milder effects later in life.
To overcome these limitations, researchers developed popEVE, an artificial intelligence model trained to estimate the likelihood of harm from a DNA change, even when the genetic variant has little or no precedent in human data.
Instead of relying on how often a mutation has been observed, popEVE learns from two powerful sources. One is the long evolutionary history captured in the DNA of many species, which shows genetic changes that have been preserved because they are essential for life. The other is the pattern of genetic variation observed across large human populations, which reveals which changes are tolerated in healthy individuals and which are rarely observed.
By combining these sources, popEVE can make proteome-wide predictions, meaning it can compare the severity of mutations across the entire set of human proteins rather than only within a single gene.
Mutations found in affected children received higher severity scores
After validating that popEVE could detect harmful mutations and accurately measure their severity, scoring variants with well-established clinical effects, the team applied the model to real patient data.
They used popEVE to score 31,058 non-inherited, or de novo, missense mutations found in children with severe developmental disorders and compared these results to mutations observed in a group of 5,764 unaffected individuals from another study, as well as more than 500,000 adults in the UK Biobank.
A clear pattern emerged. Mutations found in affected children consistently received higher severity scores than those seen in the unaffected comparison groups. The shift toward higher severity scores was even stronger in a group of 2,982 children whose developmental disorders had already been diagnosed.
When the researchers applied a strict cutoff to isolate the mutations popEVE predicted to be the most severely damaging — those with a 99.99% likelihood of disrupting protein function — these variants were about 15 times more common in children with developmental disorders than in the control groups.
Using this severity cutoff and a statistical test to identify genes carrying more harmful mutations than expected by chance, popEVE uncovered 123 genes with unusually damaging variants that had not been previously linked to developmental disorders.
This work introduces a model designed to support genetic diagnosis in [ultra-rare conditions].
Many of these newly identified genes exhibited functions similar to those already associated with developmental disorders, and several were part of the same biological pathways or protein complexes.
When protein structures were available, the most severely scored mutations often fell in regions essential for normal protein function, reinforcing the idea that these changes were likely to have meaningful biological consequences.
PopEVE also proved useful in one of the most challenging real-world situations: When only the child’s DNA is available, and there is no parental data to determine whether a mutation is inherited or newly arisen.
In a group of 9,859 children, where roughly 2,700 were expected to carry a causal de novo missense mutation, popEVE identified 513 children with a de novo variant that fell into its most severe category. In 98% of those cases, popEVE ranked that severe mutation as the most damaging variant in the child’s genome.
“This work introduces a model designed to support genetic diagnosis in [ultra-rare conditions],” the researchers wrote, highlighting the need for tools that can evaluate mutations even when only a single patient is available.