Patronus AI Created a Groundbreaking Automated Evaluation Platform

By AIT Staff Writer On May 30, 2024

What is The News About?

Patronus AI, established by Anand Kannappan and Rebecca Qian, two seasoned professionals in metamachine learning (ML), has created a groundbreaking automated evaluation platform that claims to detect hallucinations, copyright infringement, and safety issues in LLM outputs. Without the human labor needed by most businesses today, the system achieves model performance scoring, stress testing using adversarial cases, and granular benchmarking through the use of proprietary AI.

There has been a mad dash in Silicon Valley to make use of the generative capabilities of recently emerged powerful LLMs like OpenAI’s GPT-4o and Meta’s Llama 3. However, high-profile model failures have also been on the rise, with news site CNET releasing AI-generated articles plagued with errors and drug development businesses retracting research papers based on LLM-hallucinated compounds.

Read: 10 AI ML In Data Storage Trends To Look Out For In 2024

According to Patronus AI, these blunders in public merely reveal deeper problems with the present generation of LLMs. Previous work by the firm, such as the “CopyrightCatcher” API that came out three months ago and the “FinanceBench” benchmark that came out six months ago, exposes shocking shortcomings in the capacity of top models to correctly respond to inquiries based on facts.

The NineHertz Launches ContinuumAI™: A 7-Principle Framework to Build, Run, and Evolve with AI

Jul 17, 2026

Fueling Agentic AI: Why Autonomous Agents Struggle with Single-Model Pipelines and How AI.cc Provides the Solution

Jul 17, 2026

AI search visibility report finds 53% of B2B companies absent from ChatGPT and AI Search

Jul 17, 2026

Prev Next 1 of 43,468

Why Is It Important?

To create its “FinanceBench” benchmark, Patronus used publicly available SEC filings to ask models like GPT-4 financial questions. Regrettably, even after devouring the full yearly report, the top-performing model could only answer 19% of questions correctly. Using Patronus’s new “CopyrightCatcher” API, an additional investigation discovered that open-source LLMs replicated copyrighted text word for word in 44% of the outputs.

Week’s Top Read Insight:10 AI ML In Supply Chain Management Trends To Look Out For In 2024

According to the business, Patronus AI is helping numerous Fortune 500 corporations in sectors such as education, software, automotive, and finance employ LLMs “safely within their organizations.” However, the company chose not to disclose the names of its customers. Patronus intends to expand its research, engineering, and sales teams as well as create new industry standards with the new funding.

Must Read: What is Experience Management (XM)?

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]