[bsfp-cryptocurrency style=”widget-18″ align=”marquee” columns=”6″ coins=”selected” coins-count=”6″ coins-selected=”BTC,ETH,XRP,LTC,EOS,ADA,XLM,NEO,LTC,EOS,XEM,DASH,USDT,BNB,QTUM,XVG,ONT,ZEC,STEEM” currency=”USD” title=”Cryptocurrency Widget” show_title=”0″ icon=”” scheme=”light” bs-show-desktop=”1″ bs-show-tablet=”1″ bs-show-phone=”1″ custom-css-class=”” custom-id=”” css=”.vc_custom_1523079266073{margin-bottom: 0px !important;padding-top: 0px !important;padding-bottom: 0px !important;}”]

Arize AI Launches Industry-First LLM Observability Tool

Arize AI, a market leader in machine learning observability, debuted new capabilities for fine tuning and monitoring large language models (LLMs)The offering brings greater control and insight to teams looking to build with LLMs.

As the industry re-tools and data scientists begin to apply foundational models to new use cases, there is a distinct need for new LLMOps tools to reliably evaluate, monitor, and troubleshoot these models. According to a recent survey, 43% of machine learning teams cite “accuracy of responses and hallucinations” as among the biggest barriers to production deployment of LLMs.

Recommended: Enhancing AI: Why New Technology Must Include Diversity

Now available as part of the free product, Arize’s LLM observability tool is the first to evaluate LLM responses, pinpoint where to improve with prompt engineering, and identify fine-tuning opportunities using vector similarity search. The new offering is built to work in tandem with Phoenix, an open source library for LLM evaluation that launched onstage at Arize:Observe.

Leveraging Arize, teams can:

Related Posts
1 of 41,129
  • Detect Problematic Prompts and Responses: By monitoring a model’s prompt/response embeddings performance using LLM evaluation scores and cluster analysis, teams can narrow in on areas their LLM needs improvement.
  • Analyze Clusters Using LLM Evaluation Metrics and GPT-4: Automatically generate clusters of semantically similar data points and sort by performance. Arize supports LLM-assisted evaluation metrics, task-specific metrics, along with user feedback. An integration with ChatGPT also enables teams to analyze clusters for deeper insights.
  • Improve LLM Responses with Prompt Engineering: Pinpoint prompt/response clusters with low evaluation scores. Workflows suggest ways to augment prompts to help your LLM models generate better responses and improve acceptance rates.
  • Fine-Tune Your LLM Using Vector Similarity Search: Find problematic clusters, such as inaccurate or unhelpful responses, to fine-tune with better data. Vector-similarity search clues you into other examples of issues emerging, so you can begin data augmentation before they become systemic.
  • Leverage Pre-Built Clusters for Prescriptive Analysis: Use pre-built global clusters identified by Arize algorithms, or define custom clusters of your own to simplify RCA and make prescriptive improvements to your generative models.

Recommended: Enhancing AI: Why New Technology Must Include Diversity

“Despite the power of these models, the risk of deploying LLMs in high risk environments can be immense,” notes Jason Lopatecki, CEO and Co-Founder of Arize. “As new applications get built, Arize LLM observability is here to provide the right guardrails to innovate with this new technology safely.”

Latest Insights: Synthetic Data: A Game-Changer for Marketers or Just Another Fad?

[To share your insights with us, please write to sghosh@martechseries.com]

Comments are closed.