HSS Research Evaluates Whether AI Chatbots Provide Reliable Medical Information
Artificial intelligence (AI) chatbots are more accurate than expected when asked to answer medical questions about spine surgery, but patients still need to use extreme caution when turning to these tools for help with medical decision-making. That’s according to a study from HSS researchers being presented at the American Academy of Orthopaedic Surgeons annual meeting.
“In the past 20 years, the Internet has been probably the number-one place that people go for medical information,” says Sheeraz Qureshi, MD, MBA, co-chief of HSS Spine and co-author of the study. “I haven’t yet heard from any patients who have used ChatGPT or other chatbots in this way, but it’s definitely where we see things going. The same way that people use search engines to look for medical information now, we expect they will use chatbots in the future, including in their decision-making process.”
Recommended AI News: DH2i Launches DxOperator for Simplified SQL Server Container Deployment on Kubernetes
For this study a team of investigators identified nine frequently asked questions about cervical spine surgery, which they considered to be of particular clinical relevance. Question topics ranged from the benefits and drawbacks of different surgical approaches to side effects and recovery after surgery. The questions were inputted one at a time into ChatGPT version 3.5.
Two experts in cervical spine surgery who were not involved in designing the questions rated the chatbot’s responses on accuracy, appropriateness, and readability. On average, the responses received a score of 8.1/10, with a 3.9/5 for accuracy and a 2.2/3 for appropriateness. The main drawback the reviewers noted was that ChatGPT failed to provide comprehensive responses, often omitting important factors. For example, it described a particular procedure as being more challenging without mentioning that the level of challenge depended on patient indications as well as the surgeon’s overall practice, training and comfort with the technique.
The experts noted that the responses were easier for people to understand than research literature, which can be complicated for non-experts. Flesh-Kincaid Grade Level analysis determined that ChatGPT’s responses were at the level of a junior in high school, compared with primary literature, which is aimed at scholars working in the field. They also appreciated that responses from ChatGPT were always prefaced with a statement regarding consulting an expert for medical advice.
Recommended AI News: Auterion Government Solutions Transforms Combat Drone Operations with Information Revolution
Dr. Qureshi explains that one serious concern with using a chatbot versus a web search is that it’s not clear where the information is being sourced from. “Most people know not to blindly trust everything they find on the Internet,” he says. “If a search takes you to the webpage for HSS or another well-established medical center, you can feel confident that the information has been vetted by experts.”
The researchers plan to continue studying this topic to better understand how patients are using AI tools. Ultimately, their hope is to identify opportunities to ensure that when AI is responding to medical queries it is prioritizing the most reliable and credentialed information.
Recommended AI Interview: McAfee at CES 2024: Deepfake Audio Detection Technology using AI
[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]
Comments are closed.