John Snow Labs Achieves New State-of-the-Art Medical LLM Accuracy Benchmarks

The LLM Outperforms GPT-4, Med-PaLM2, and Hundreds of Others

By AiT Analyst On May 21, 2024

John Snow Labs’ Commitment to Delivering Novel, Responsible, Production-Ready Models is Reflected by Three New Milestones in Accuracy of Medical LLMs

AI for healthcare company John Snow Labs announced it has achieved new state-of-the-art (SOTA) medical LLM accuracy on the benchmarks used in the Open Medical LLM leaderboard, surpassing hundreds of other high performing models. This combination of nine benchmarks challenges AI models to answer thousands of medical licensing exam questions (MedQA), biomedical research questions (PubMedQA), and college-level exams in anatomy, genetics, biology, and medicine (MMLU).

AI in Healthcare Update: HEALWELL AI’s Pentavere Successfully Demonstrates its AI Powered Clinician Co-pilot System

The following three milestones reflect John Snow Labs’ commitment to delivering the most accurate medical Large Language Models (LLMs) to date:

A Medical LLM which achieves 87.35 on the same reproducible test harness of the leaderboard, outperforming models such as Med-PaLM2, GPT-4, OpenBioLLMLlama, MedLlama, Orpo-Med, and all others.
A Medical LLM with just 7 billion parameters which outperforms all previous models of that size and is the first 7B model to outperform GPT-4 on PubMedQA (78.4 vs. 75.2). PubMedQA is a dataset of 273,500 questions that require reasoning over biomedical research texts, with a single human performance of 78% accuracy.
A Medical LLM with just 3 billion parameters which outperforms all current models of that size by more than 12 points, while still being able to run on a mobile device. This is significant for medical professionals who need to process millions to billions of patient notes without straining computing budgets. This accuracy comes close to what medical LLMs with 8 billion parameters like BioMistral achieved just three months prior.

Cloud Computing AiThority News: Red Hat OpenShift AI For Hybrid Cloud’s Flexibility

Recent research shows that lack of accuracy was the most concerning roadblock to Generative AI adoption.

John Snow Labs Achieves New State-of-the-Art Medical LLM Accuracy Benchmarks — Source: John Snow Labs

Despite this, a majority of GenAI projects have not yet been tested for LLM requirements. The same survey indicated a strong preference for small, task-specific language models, with 54% of respondents from large companies using healthcare-specific task-specific language models. John Snow Labs addresses the need for top accuracy and targeted models optimized for healthcare use cases.

DynPro and Turgon AI Announce Strategic Partnership to Transform Enterprise IT Modernization with AI-Native Platform & Solutions

Feb 19, 2026

Coforge Advances EvolveOps.AI: Agentic AI-Powered IT Operations Platform for Mission Zero Resiliency from Edge to Cloud

Feb 19, 2026

MindMap AI Launches ‘Teams’ Plan: A New Standard for Collaborative AI Intelligence

Feb 19, 2026

Prev Next 1 of 42,592

As achievable accuracy continues to improve, so do requirements for production-ready models. To meet efficiency, scalability, compliance, and responsible AI standards, models must be updated continuously. John Snow Labs’ Healthcare NLP & LLM subscription provides access to these models for production use, while also providing continuous updates and new releases, guaranteeing customers remain state-of-the-art over time.

“It’s a great responsibility and honor to provide novel, state-of-the-art, production-ready models to the global healthcare AI community,” said David Talby, CTO, John Snow Labs.

AI Startups to Watch Out for: Pre/Dicta Partners with Quinn Emanuel to Provide Lawyers with AI-Powered Litigation Prediction Tools

David added, “We didn’t give these new models fancy names because we’ll have better ones next week. That’s been the essence of our work for the past seven years, and it’s what makes John Snow Labs the most comprehensive medical language understanding solution on the market.”

John Snow Labs will continue releasing new software every two weeks. Coming soon are larger models, larger context windows, new medical text summarization models (currently beating GPT-4 2:1 on blind tests by clinicians), medical speech-to-text models (for both layman and clinician speak), and medical text translation models.

John Snow Labs, the AI for healthcare company, provides state-of-the-art software, models, and data to help healthcare and life science organizations put AI to good use. Developer of Spark NLP, Healthcare NLP, the Healthcare GPT LLM, the Generative AI Lab No-Code Platform, and the Medical Chatbot, John Snow Labs’ award-winning medical AI software powers the world’s leading pharmaceuticals, academic medical centers, and health technology companies. Creator and host of The NLP Summit, the company is committed to further educating and advancing the global AI community.

Recommended by AiThority.com: Daily AI News Roundup: 10 AI Events that Caught our Eyes

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

John Snow Labs Achieves New State-of-the-Art Medical LLM Accuracy Benchmarks

The LLM Outperforms GPT-4, Med-PaLM2, and Hundreds of Others

John Snow Labs’ Commitment to Delivering Novel, Responsible, Production-Ready Models is Reflected by Three New Milestones in Accuracy of Medical LLMs

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

John Snow Labs Achieves New State-of-the-Art Medical LLM Accuracy Benchmarks

The LLM Outperforms GPT-4, Med-PaLM2, and Hundreds of Others

John Snow Labs’ Commitment to Delivering Novel, Responsible, Production-Ready Models is Reflected by Three New Milestones in Accuracy of Medical LLMs

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]

Quick Links

Visit Our Other Sites

Follow Us

Interested in our Customized Editorial Services?

﻿Please fill your details and we’ll get in touch with you!

NEWS

INTERVIEWS

INSIGHTS

AI RADAR

SERVICES

SUBSCRIBE

CONTACT US

Brought to you by

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought. Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy

Please fill your details and we’ll get in touch with you!

To repurpose or use any of the content or material on this and our sister sites, explicit written permission needs to be sought.

Copyright © 2026 AiThority. All Rights Reserved. Privacy Policy