[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

Aisera Introduces a Framework to Evaluate How Domain-Specific Agents Can Deliver Superior Value in the Enterprise

Accepted by ICLR Workshop on Trustworthy LLMs, Aisera’s new framework is a groundbreaking standard for measuring real-world effectiveness of AI agents.

Aisera, a leading provider of Agentic AI for enterprises, announced today that it has completed a research study that introduces a new benchmarking framework for evaluating the performance of AI agents in real-world enterprise applications. It also announced that the results of this benchmark study have been accepted at the ICLR 2025 Workshop on building trust in Large Language Models (LLMs) and LLM applications. Aisera plans to open-source this benchmark framework to empower the AI community in driving innovation and advancing enterprise AI agents.

Latest News: Sideko Launches API Ecosystem Platform

The International Conference on Learning Representations (ICLR) is the leading global industry body focused on developing and setting best-practices in cutting edge artificial intelligence standards. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics, and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.

Co-authored by Utkarsh Contractor, Field CTO at Aisera, Vasilis Vassalos, Ph.D., Senior Director of AI at Aisera, Michael Wornow, PhD student at Stanford University’s School of Computer Sciences and Vaishnav Garodia, Master’s student at Stanford University’s School of Computer Sciences, this study provides a holistic benchmarking framework to evaluate enterprise AI agents and goes on to perform a comparative evaluation of domain-specific AI Agents with AI Agents built directly on foundation LLMs. The performance of these AI Agents was evaluated using real-life data from industry-specific use cases across IT, CX and HR functions within disparate industries, including banking, financial services, healthcare, educational technology, and biotechnology. The study found that domain-specific AI agents outperformed AI agents built directly using frontier LLMs, demonstrating the advantages of domain specialization in enterprise applications.

Traditional evaluation methods have focused solely on accuracy and fail to capture the breadth of real-world requirements. Many existing academic and industry benchmarks rely on synthetic data from tasks that fail to reflect the complexity of real-world enterprise environments, their diverse nature, and the inherent risks. To ensure dependable and compliant agentic AI solutions, benchmarking frameworks must also capture operational factors such as cost efficiency, latency, stability (accuracy over repeated invocations), and security (for example, an AI agent not responding to malicious prompts).

Related Posts
1 of 41,919

Introducing The CLASSic Framework: To address these challenges, the authors of this study introduced the CLASSic framework – a holistic approach to evaluating enterprise AI agents across five key dimensions:

  • Cost: Measures operational expenses, including API usage, token consumption, and infrastructure overhead
  • Latency: Assesses end-to-end response times
  • Accuracy: Evaluates correctness in selecting and executing workflows
  • Stability: Checks consistency and robustness across diverse inputs, domains, and varying conditions
  • Security: Assesses resilience against adversarial inputs, prompt injections, and potential data leaks

Domain-specific models show a clear advantage: The evaluation shows that specialized domain-specific AI agents outperform in tasks within complex enterprise settings while ensuring high accuracy, more reliability, lower costs, and stronger security. Although AI Agents built directly on general-purpose foundational models may achieve competitive accuracy across domains, they lag in cost, latency, and security, highlighting opportunities for improvement through domain-specific application architectures, including domain fine-tuning and distillation of these LLMs.

Also Read: Precisely Expands Automate SAP Data API to Simplify Integration and Scale Enterprise Process Automation

“The CLASSic framework serves as a pragmatic guide for enterprise AI adoption, as it directly delivers measurable results and insights that are valuable and actionable for today’s enterprises,” said Utkarsh Contractor, Field CTO at Aisera and a co-author of this report. “Enterprises should adopt AI agents that are not just highly accurate, but at the same time cost-effective, stable, and secure for greater long-term value. In the coming months, we will be sharing our code and datasets publicly for wider adoption of this new framework.”

“As AI agents grow more sophisticated, evaluating them on multiple dimensions is essential for unlocking their full value for enterprises,” said Michael Wornow, PhD student at Stanford University. “This is what the CLASSic framework aims to achieve.”

[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]

Comments are closed.