TruLens for LLM Applications Launches Evaluate and Track Large Language Model Application Experiments
TruEra, which provides software to test, debug, and monitor ML models across the full MLOPs lifecycle, launched TruLens for LLM Applications, the first open source testing software for apps built on Large Language Models (LLMs) like GPT.
Latest AiThority Interview Insights : AiThority Interview with Sumeet Arora, Chief Development Officer at ThoughtSpot
LLMs are emerging as a key technology that will power a multitude of apps in the near future – but there are also growing concerns about their use, with prominent news stories about LLM hallucinations, inaccuracies, toxicity, bias, safety, and potential for misuse.
TruLens addresses two major pain points in LLM app development today:
Experiment iteration and champion selection is too slow and painful.
The workflow for building LLM applications involves significant experimentation. After developing the first version of an app, developers manually test and review answers; adjust prompts, hyperparameters, and models; and re-test, over and over again, until a satisfactory result is achieved. It is an often challenging process, where the final winner is not necessarily clear.
Existing testing methods are inadequate, resource intensive, and time consuming.
One of the main reasons that experiment iteration is challenging is that existing tools for testing LLM apps are ineffective. Direct human feedback is the most common testing method in use today. While getting direct human feedback is a useful first step, it can be slow and patchy, and difficult to scale. TruLens leverages a new approach it calls feedback functions – a programmatic way of evaluating LLM applications at scale – to enable teams to test, iterate on, and improve their LLM-powered apps quickly.
“TruLens feedback functions score the output of an LLM application by analyzing generated text from an LLM-powered app and metadata,” explained Anupam Datta, Co-founder, President and Chief Scientist at TruEra. “By modeling this relationship, we can then programmatically apply it to scale up model evaluation.”
Read More about AiThority Interview: AiThority Interview with James Rubin, Product Manager at Google
TruLens for LLMs can help AI developers:
- Improve the efficacy of LLM usage for your application
- Reduce the “toxicity” or potential social harm of LLM results
- Evaluate information retrieval performance
- Flag biased language in application responses
- Understand the dollar cost of their application’s LLM API usage
TruLens provides feedback functions that can evaluate:
- Truthfulness
- Question answering relevance
- Harmful or toxic language
- User sentiment
- Language mismatch
- Response verbosity
- Fairness and bias
- Or other custom feedback functions created by the user
“LLM-based applications are taking off and will only become more prevalent,” said Datta. “TruLens can help developers build high performing applications and get them to market faster. TruLens does this by validating the effectiveness of the LLM for their application’s use case and mitigating the possible harmful effects that LLMs can have. It fills a hole in the emerging LLMOps tech stack.”
AiThority Interview Insights: AiThority Interview with Dorian Selz, Co-Founder & CEO at Squirro
[To share your insights with us, please write to sghosh@martechseries.com]
Comments are closed.