ChatGPT flunks American College of Gastroenterology Exams, Feinstein Institutes report
Researchers asked the artificial intelligence tool to take 2021 and 2022 multiple-choice self-assessment tests; results were published in the American Journal of Gastroenterology
While the popular artificial intelligence (AI) ChatGPT is seen as a potential educational tool, it won’t be getting its medical specialty certification anytime soon. To test its abilities and accuracy, investigators at The Feinstein Institutes for Medical Research asked the consumer-facing ChatGPT (Chat Generative Pre-trained Transformer, OpenAI) to take the 2021 and 2022 multiple-choice self-assessment tests for the American College of Gastroenterology. ChatGPT failed to make the grade, scoring 65.1 percent and 62.4 percent compared to the required 70 percent to pass the exams. Full details of the study were published today in the American Journal of Gastroenterology.
AiThority Interview Insights: AiThority Interview with Dorian Selz, Co-Founder & CEO at Squirro
“ChatGPT has sparked enthusiasm, but with that enthusiasm comes skepticism around the accuracy and validity of AI’s current role in health care and education”
ChatGPT is a 175-billion-parameter natural language processing model that generates human-like text in response to user prompts. The tool is a large language model (LLM) trained to predict word sequences based on context. ChatGPT has been tested before, even passing the United States Medical Licensing Exam. In this study, the Feinstein Institutes’ researchers wanted to challenge ChatGPT’s (versions 3 and 4) ability to pass the ACG assessment, which is supposed to gauge how one would fare on the actual American Board of Internal Medicine (ABIM) Gastroenterology board examination.
Read More about AiThority Interview: AiThority Interview with James Rubin, Product Manager at Google
“Recently, there has been a lot of attention on ChatGPT and the use of AI across various industries. When it comes to medical education, there is a lack of research around this potential ground-breaking tool,” said Arvind Trindade, MD, associate professor at the Feinstein Institutes’ Institute of Heath System Science and senior author on the paper. “Based on our research, ChatGPT should not be used for medical education in gastroenterology at this time and has a ways to go before it should be implemented into the health care field.”
Each ACG test consists of 300 multiple-choice questions with real-time feedback. Each question and answer was copied and pasted directly into the ChatGPT versions 3 and 4. Overall, ChatGPT answered 455 questions (145 questions were excluded because of an image requirement). Chat GPT-3 answered 296 of 455 questions correctly (65.1 percent) across the two exams, and Chat GPT-4 answered 284 questions correctly (62.4 percent).
“ChatGPT has sparked enthusiasm, but with that enthusiasm comes skepticism around the accuracy and validity of AI’s current role in health care and education,” Andrew C. Yacht, MD, senior vice president, academic affairs and chief academic officer at Northwell Health. “Dr. Trindade’s fascinating study is a reminder that, at least for now, nothing beats hitting time-tested resources like books, journals and traditional studying to pass those all-important medical exams.”
ChatGPT does not have any intrinsic understanding of a topic or issue. Potential explanations for ChatGPT’s failing grade could be the lack of access to paid subscription medical journals or ChatGPT’s sourcing of questionable outdated or non-medical sources, with more research needed before it is used reliably.
Latest AiThority Interview Insights : AiThority Interview with Sumeet Arora, Chief Development Officer at ThoughtSpot
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.