Herophilus Publishes General Method for Detecting Relevant Signals in Machine Learning Analysis of Complex Biological Datasets
Herophilus, a leading biotechnology company developing neurotherapeutics to cure complex brain diseases, announced the publication of research that describes a new statistical method to identify and analyze the effects of potentially confounding variables on machine learning models for complex biological datasets.
The capability of machine learning (ML) to extract scientific insights from high-dimensional datasets is often limited by confounding variables that bias the models. Determining the influence of confounders is particularly challenging for complex bioscience datasets, which tend to be organized in nested hierarchies that prohibit the use of traditional methods such as linear regression to correct for the effects of nuisance variables. Though tools exist to mitigate known confounders, scientists lack a general method to identify which variables in a set of potential confounders require debiasing.
In “Hierarchical confounder discovery in the experiment–machine learning cycle,” published in Cell Patterns, the authors define a new nonparametric statistical method for scoring the effect of a potential confounder, called the “Rank-to-Group” (RTG) score. RTG scoring is robust to outlier noise and can identify the source of a confounding effect even in non-linear structures. The method is applicable both to raw data and to the results of ML models.
Recommended AI News: DISH to Trial VMware RAN Intelligent Controller
“RTG scoring is a broadly useful tool to analyze high-dimensional datasets with complex, potentially nested, sources of bias – which standard methods for bias identification can’t address. This approach enables a virtuous cycle of experimental design, data collection, and model building for the reduction of bias in data and thus strengthens the use of machine learning in discovery science,” said Sean Escola M.D., Ph.D., co-founder of Herophilus.
“Herophilus is focused on the discovery and development of curative therapeutics for brain disease, but we maintain a serious commitment to advancing the tools of foundational scientific inquiry for the benefit of all,” said Saul Kato, Ph.D., co-founder and CEO of Herophilus. “The next wave of ML research is moving beyond strict model performance into considerations of reliability, interpretability, and bias. RTG scoring has become part of our everyday use of ML for doing interpretable science, and we felt it merited sharing with the community.”
Recommended AI News: The Trade Desk and LiveRamp to Lead Industry Effort to Bring New Privacy-First Interoperable ID Solution to Meet Emerging Requirements in Europe
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.