How to Stay Ahead of Today’s Ever-Evolving Language with AI
How often do you use thou in a sentence? Unless you’re watching, reading, or writing a period piece, the answer is probably not often. Meanwhile, at this very moment, a grandparent is likely trying to understand why a younger family member responded to a joke with a skull emoji. Words fall out of favor and new expressions baffle us every day, because our language is constantly evolving.
Where texting inspired a proliferation of shorthand and emoji usage, the rapid growth of online communities has accelerated both the speed of communication and language evolution. New words, emojis, and acronyms emerge to aptly and efficiently convey thoughts. Vaccinated has become vaxxed (making “vax” the word of the year for both Oxford and Dictionary.com’s), and acronyms like TBH, WFH, PPE, and FTW have made their way into both common discourse and top dictionaries.
Much like a confused grandparent seeking to understand the meaning of a meme or emoji sent by a younger relative, systems that help marketers contextualize digital environments have struggled to keep up with evolving language. Is that vegetable or fruit emoji being used safely or suggestively? Is that skull an expression of amusement or death?
Older, rigid systems rely on keyword lists that blanketly label content. However, as meaning evolves and users learn how to evade rudimentary censorship tools, marketers and community managers are left in a game of chase. Let’s explore why that is, why that matters, and how it’s being solved.
Why Current Systems Lag Behind Language
Traditional content analysis systems rely on fixed rules to define the value and meaning of a given piece of content. Automatic translation tools lean on one to one mapping of terms from one language to another. Moderation tools rely on keyword lists to flag known unsafe terms, and to some degree their derivatives. The list goes on. Every so often — and not nearly often enough in the context of brand safety — these definitions and lists are audited.
However, language evolves daily.
Popular culture can wildly alter the meaning of a given expression in reaction to a world event or viral story. Take, for instance, the recent surge in usage of sunflowers or blue and yellow heart emojis. Seen in succession, these emojis convey a far different meaning today than they would have a month or two ago.
While some systems have sought to define the meaning of these emojis, they are unable to keep up with ever-evolving meaning, and these emoji definition libraries cover only a fraction of the visual expressions used in everyday online discourse. Live chat environments like Twitch and Discord are home to user-created emotes that number in the hundreds of thousands. Each has its own meaning, specific to its own audience, and which may change over time.
Where dictionaries may add only a handful of new words every year, people add a myriad of new visual expressions and assign new meanings to even more expressions — both word-based and visual — every day.
Why Language Shifts Matter
As new expressions emerge and meanings shift, systems reliant on static definitions and mappings are increasingly prone to misclassification. A traditionally negative term that has a newfound or contextually positive meaning could be mis-flagged. A seemingly innocuous emoji could slip through content moderation, allowing hate or sexual content to proliferate.
Static systems simply fail amid dynamic digital discourse.
In a brand safety study we conducted in 2021, the keyword lists used by Fortune 500 brands to block unsafe ad placements failed when tested against user-generated content. Their F1 score — a measure of their ability to accurately identify unsafe content with minimal false positives or false negatives — was only 40%.
In other words, a keyword-based solution built upon static definitions is more likely to misclassify user-generated content than it is to accurately understand it.
How AI Learns Alongside Language
Fortunately, AI can empower systems to cut the tethers of static definitions. By building models that examine the relative proximity of expressions, we can identify both when new terms appear in a given neighborhood of expressions, and when terms shift.
This is particularly helpful when assessing otherwise unknown expressions, which happens frequently within user-generated content.
For instance, when exploring options for accurately identifying and contextualizing an ever-expanding library of emotes, emojis, and emoticons, we developed a unique, drift-resistant framework for the task. Dubbed the LOOVE framework, which stands for “learn out-of-vocabulary emoticons”, it leverages AI to analyze expressions based on their relationship to other known entities.
The result: an AI-powered framework that can continually contextualize an ever-expanding library of unique expressions while outperforming the previous accuracy benchmark by over 7 percentage points.
Much like a dictionary relies on synonyms, proximity-based AI models can infer the meaning of a given string based on relationships. Unlike a dictionary, it updates automatically and in real-time. Like a real-time traffic map, AI empowers understanding of how terms are moving, where they are clustering, and what that means for the content containing them.
For anyone seeking to understand audiences and the content they generate at scale, it is time to take a hard look at the foundational aspects of how you use AI to extract meaning. Are your methodologies rooted in static definitions that may already be outdated, or do you have a drift resilient framework that can make sense of the next emote to trend or word to evolve?
[To share your insights with us, please write to email@example.com]