Writer AI Large Language Models Achieve Top Scores on Stanford HELM

By Business Wire On Jul 19, 2023

Benchmarks reinforce Palmyra as the enterprise-ready LLM model with transparency and accuracy for enterprise generative AI use cases

Writer, the leading generative AI platform for enterprises, announced that Palmyra, its family of large language models (LLMs), has achieved top benchmark scores from Stanford’s Holistic Evaluation of Language Models (HELM), demonstrating its leadership in the generative AI field.

AiThority Interview Insights: How to Get Started with Prompt Engineering in Generative AI Projects

“We are thrilled to see Writer Palmyra at the top of these benchmarks”

In key benchmark tests, Palmyra outperformed models by OpenAI, Cohere, Anthropic, Microsoft, and important open-source models such as Falcon 40B and LLaMA-30B.

HELM is a benchmarking initiative by Stanford University’s Center of Research on Foundation Models that evaluates prominent language models across a wide range of scenarios. Palmyra excelled in tests that evaluated a model’s ability to understand knowledge and answer natural language questions accurately.

Palmyra ranked first in several important tests, scoring 60.9% on Massive Multitask Language Understanding (MMLU), 89.6% on BoolQ, and 79.0% on NaturalQuestions.
Palmyra ranked second in two additional key tests with 49.7% on Question Answering in Context and 61.6% on TruthfulQA.

Read More about AiThority Interview: AiThority Interview with Brian Steele, VP of Product Management at Gryphon.ai

The HELM results validate Palmyra’s proficiency in knowledge comprehension, making inferences, and accurately answering open-ended, context-based questions that are worded naturally. These scores highlight Palmyra’s power and ability to complete advanced tasks, which makes it uniquely capable of tackling a wide range of enterprise use cases.

“We are thrilled to see Writer Palmyra at the top of these benchmarks,” said Waseem AlShikh, Writer co-founder and chief technology officer. “Our models have demonstrated their breadth of knowledge comprehension and ability to accurately answer questions in natural language – all with an efficient-sized model that doesn’t exceed 43 billion parameters. These results offer further proof that the Writer generative AI platform is the enterprise-ready choice for organizations looking to accelerate growth, increase productivity, and align brand.”

In a world where LLMs are increasingly undifferentiated, training data, duration, and methodology make a big difference. Unlike other model families, Palmyra is trained on high-quality formal writing and has a deep vertical focus, with industry-specific models for healthcare and financial services. The models are transparent and auditable rather than black box, built so data stays private, and can be self-hosted. Given that Palmyra LLMs don’t exceed 43 billion parameters, these latest rankings further demonstrate that smaller, more efficient, and more accessible models can still deliver superior results.

Latest AiThority Interview Insights : AiThority Interview with Michael Schmidt, Chief Technology Officer at DataRobot

[To share your insights with us, please write to sghosh@martechseries.com]