Iterative’s New DataChain Enables Use of AI Models to Evaluate the Quality of Unstructured Data
Company accelerates AI development by offering new open-source tool for data curation and model evaluation at scale
Iterative, the company dedicated to streamlining the workflow of artificial intelligence (AI) engineers and creator of widely-used open-source projects in MLOps, today announced the upcoming release of DataChain, a new open-source tool for processing and evaluating unstructured data.
Also Read: AiThority Interview with Nicole Janssen, Co-Founder and Co-CEO of AltaML
According to McKinsey’s Global Survey on the state of AI published in early 2024, only 15 percent of surveyed companies have realized a meaningful effect of generative AI (GenAI) on their business to date. A large part of the problem lies in the challenge of processing unstructured data at scale and estimating the results which is traditionally cumbersome – and stems from the missing link between the structured data technologies and the newer AI workflows based in Python. While the (older) analytical databases provided full control over the data quality, unstructured multimodal data like text and images proved much harder to assess and improve at scale.
“The biggest challenge in adopting artificial intelligence in the enterprise today is the lack of practices and tools for data curation and generative AI evaluation that can ensure the quality of results,” said Dmitry Petrov, CEO of Iterative. “As the next step, we need AI models that can evaluate and improve AI models. So far this has only happened at the industry forefront – take a look at DeepMind’s AlphaGo training against itself, or OpenAI’s DALL-E3 curating its own dataset. Our goal is to change this.”
The proliferation of sophisticated AI foundational models opens the door to intelligent curation and data processing. However, the absence of easy solutions to wrangle unstructured data using AI models in easy-to-manage formats keeps the technology barrier high. In practice, most AI engineers are still building custom code for converting their JSON model responses, adapting them to databases, and running models in parallel with out-of-memory data.
Also Read: Proactive Ways to Skill Up for AI
DataChain democratizes the popular AI-based analytical capabilities like ‘large language models (LLMs) judging LLMs’ and multimodal GenAI evaluations, greatly leveling the playing field for data curation and pre-processing. DataChain can also store and structure Python object responses using the latest data model schemas – such as those utilized by leading LLM and AI foundational model providers.
Founded in 2018, Iterative creates developer tools for AI engineers. The company has recorded more than 20M downloads for its open-source software DVC and earned more than 18,000 stars on GitHub. Iterative now has more than 400 contributors across its different tools and over 20 customers in their enterprise SaaS including F500 companies like UBS. Iterative is backed by True Ventures, Afore Capital, and 468 Capital.
Also Read: Red Teaming is Crucial for Successful AI Integration and Application
[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]
Comments are closed.