The Future of Language Models: Customization, Responsibility, and Multimodality
By: Dr. Marlene Wolfgruber is the Product Marketing Lead for AI at ABBYY
Throughout history, technology has reflected the circumstances, aspirations, and shortcomings of the people developing it. Artificial intelligence (AI) is no different, having recently drawn more attention to this intertwined relationship through the exploration of AI biases and language processing capabilities.
Language models epitomize the bridge connecting humans and technology by allowing people to communicate with AI in plain language, just as they would with another human. While in-depth interactions with technology used to only be accessible to developers, engineers, programmers and other purveyors of the arcane technological arts, language models and generative AI have brought this power to the masses like the Greek myth of Prometheus putting fire into the hands of man.
This has transformed countless facets of human endeavor across classrooms, news desks, board rooms and beyond. Among the most affected are business leaders, who have been swept up in a whirlwind race towards AI-generated value. The constant emergence of new AI tools and use cases demands vigilant attention on the present and more time to consider the trajectory or future implications of AI innovation and implementation.
This consideration is crucial to businesses’ competitive success. While there isn’t a crystal ball to perfectly predict how language models will evolve, recent trends indicate emerging capabilities innovation leaders should keep on their radar.
Also Read: AiThority Interview with Venki Subramanian, SVP of Product Management at Reltio
Customization is key
OpenAI released customizable GPTs in November 2023, approximately one year after the game-changing release of ChatGPT. This ability to tailor generative AI assistants to a specific purpose and narrow down their notoriously vast knowledgebase came as an answer to some of the earliest concerns about gen AI and large language models (LLMs) – namely, their tendencies for hallucination and voracious resource consumption.
The most general of language models – such as the consumer release of ChatGPT – scrape information from the entire internet, forcing them to navigate a dizzying repository of data to generate their outputs. This provided a proportionately staggering number of opportunities to yield an errant response, also raising concerns of privacy and security due to the publicly accessible and pollution-prone ecosystem of data. Navigating such vast data is also costly from a resource perspective, with each ChatGPT query utilizing more than ten times the energy of a google search.
Thus, the language model market began its natural maturation towards specialization and customization in an effort to shed inaccuracies and inefficiencies while preserving the versatility and value of generative AI. Custom GPTs are just one manifestation of this trend, with specialized language models gaining traction in increasingly specific areas.
For example, language models for automating document-centric workflows epitomize this trend in the form of pre-trained AI assets that excel in the processing, extraction, and understanding of data from specified document types. Through these assets businesses can adopt a purposeful and easily customizable approach to automating the repetitive and resource-intensive workflows that are central to their operations, such as handling invoices, waybills, tax forms, etc.
Specialized language models are also easier to train than their larger counterparts, offering a key strategic benefit in the way of drastically accelerated time-to-value. By also incorporating emerging strategies like agentic AI and retrieval augmented generation (RAG), language models can be prescribed a specified knowledge base to generate outputs from to further improve contextual relevance and accuracy.
Also Read: AI in Research: Transforming Processes and Outcomes
Combining capabilities for multimodality
While text-based queries and outputs are the bread and butter of language models, they’re being increasingly enhanced with image processing capabilities that dramatically extend their utility.
Multimodal transformer encoders can extract “rich” properties like images from documents and leverage their content for more accurate contextualization and classification. When used in tandem with advanced, AI-powered optical character recognition (OCR) and natural language processing (NLP), multimodal capabilities accelerate and simplify the optimization and automation of document workflows.
Algorithmic clustering of documents into groups based on similar characteristics can also benefit from multimodal functionality, as the model will have additional references for comparison beyond just text content.
As businesses continue to hone their expertise in enabling straight-through processing in workflows with AI, learning how to leverage language models with non-textual data will become increasingly essential.
Also Read: How AI-Driven Strategies Can Fuel Global Business Expansion
Prioritizing responsibility and explainability
The evolving global regulatory landscape has a pronounced influence on the development and practical application of language models in business, setting standards for explainability and responsibility that necessitate best practices in data handling.
The inherent bias of AI demands heightened consideration of language model use cases in sensitive areas like healthcare, legal, and finance. We’ve already seen instances of misuse, such as the well-known example of a nonexistent, AI-hallucinated legal case being referenced in a courtroom. While this early example was promptly refuted, not all misuses of language models are as easily identified as this cautionary tale. The newfound accessibility of language models creates possibilities for misinformation to be spread at an alarming scale and speed, disguised under varying degrees of credibility.
In response, regulation is on the rise. The EU AI Act is planned to become fully enforceable in August of 2026, prompting many European organizations to scramble for the resources and expertise to become compliant. While it’s not yet clear if or how the rest of the world will follow the EU’s example, it is incumbent on global innovation leaders to uphold the highest standards of responsibility to ensure a sustainable and safe future for language models in business.
As with other AI concerns, businesses will prefer to rise and adapt to this challenge than admit defeat and miss out on the potential value offered by language models. One such strategy is the embracement of independent AI auditors, who can seek certification through non-profit organizations that specialize in empowering businesses to be autonomous in self-regulating responsible AI practices. Leveraging language models that were built with a particular purpose or to fulfill a specified task will also reduce the risk of hallucination due to narrower, more focused use of data.
The Promethean democratization of language models and AI at large continues to impact business landscapes in unpredictable ways. Amidst the constant change, noise, and hype, innovation leaders should bear just a few questions in mind when evaluating new technologies: what is the specific challenge that must be solved? To that end, can it be solved without AI? Will it generate value?
Using these principles as a guiding star will help decision makers be confident and discerning in their investments and keep them competitive in the continuous race for purpose-built, AI-powered operational excellence.
[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]
Comments are closed.