Why You Shouldn’t Select an AI System Based on Performance in Pathology
Driving Meaningful Improvements in the Pathology Laboratory using AI and Machine Learning
Healthcare is undergoing a major transformation in the way diseases like cancer are diagnosed. Until recently, the practice of pathology had remained largely unchanged in its 150-year history, still depending on pathologists evaluating stained tissue biopsies affixed to glass slides under the microscope. Now, the rise of digital pathology is shifting this standard of care from microscope to image.
And in doing so, it’s opening new opportunities to drive efficiency, unlock new insights, and deliver better patient care, including the introduction of computational methods. AI systems have come online to process biopsy images – with applications that prioritize the most urgent cases for review, highlight regions of cancerous tissue, and quantify tumor cells among addressing other use cases. Yet not every AI system will be impactful. Some will outright fail to perform well – and others will fail to generalize to new sites.
Most unexpectedly, even AI systems that perform well on paper will fail in the clinic.
Recommended AI ML Story of the Month: Predictions Series 2022: Sony AI Leadership Discuss Coolest Innovations Built Using AI and Machine Learning
Ensuring the value of AI in the real world
How can labs ensure that they don’t face this type of failure in practice? The answer may seem a bit counterintuitive: don’t select an AI system based solely on its performance. An AI system’s performance (read: metrics that are often discussed in scientific studies like sensitivity, specificity, area under the curve or AUC, etc.) is never what matters to the lab or pathologist. Instead, what matters in the real world are secondary measures, or the performance of the pathologist-plus-AI or lab-plus-AI unit. These are metrics that the lab is already paying attention to because they are most critical to its success and the patient receiving a proper diagnosis. They might include efficiency (e.g. how many cases are diagnosed or tests ordered in a period of time), diagnostic accuracy, and case turnaround time (e.g. how long before that cancer is diagnosed and the patient can begin treatment).
Understanding Role of AI and Machine Learning in Pathology
When it’s clear that these are the metrics that AI is intended to impact, it also becomes clear why the performance of an AI system isn’t truly what needs to be measured to evaluate whether it’s providing utility to the pathologist or to the lab. Importantly, this isn’t to say that AI system performance shouldn’t be measured. It can be a good directional indicator of whether an AI system could feasibly provide utility – at one extreme, for example, an algorithm with 0% accuracy probably isn’t going to make any positive splashes in the clinic. But an AI system could easily have perfect performance on paper and be entirely useless.
Consider an AI system that’s designed to improve diagnostic accuracy. While such a system might outperform non-experts, it’s not really accomplishing this aim in practice if its performance does not exceed that of the subspecialist experts who are typically the ones making diagnoses. On the other hand, an AI system that has less than 100% accuracy may still provide utility in practice depending on its use.
For example, an AI system that prioritizes urgent cases with even 60% sensitivity to help accelerate turnaround time would deliver value if the status quo were that no urgent cases are prioritized. With the AI system, at least 60% of these urgent cases would be reviewed right away (provided there are also not too many false positives).
These are just a couple illustrative examples, but it’s likely we will see similar issues as more AI systems are put into practice in labs soon. What’s more, these challenges are not unique to AI systems focused on pathology, so it follows that we’ll increasingly see them as AI is increasingly adopted across other domains of healthcare – and more generally. It’s important to be wary of reports of AI system performance in isolation and question what can be done to mimic – or better – study – the performance of the AI in addressing metrics that really matter. What these AI systems aim to do is drive meaningful change, and that’s how we should evaluate them.
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.