Mistakes NLG Can Make and How AI Platforms Can Avoid Them

Natural LanguageGuest AuthorsNeural Networks

By Jeff Coyle On Jan 22, 2021

To err is human, but machines make mistakes, too. In this case, the devices we’re talking about are natural language generation platforms. Sometimes these mistakes, in the form of bias, can be subtle. At other times they can go really wrong.

How does NLG work?

Historically, language experts created rules and models that drove natural language generation. This time-intensive and manual process was the defacto operational standard until relatively recently. Access to vast amounts of data, coupled with advances in machine learning and computing power, has caused a dramatic shift towards the commercial use of statistical-based natural language generation.

This approach, popularized by GPT-3, is based on distributional semantics or word association. All prior terms in the document statistically determine the next word in a sentence. What’s most likely to occur next depends on what has come before. Really, it’s an educated guess. This type of NLG model does not “understand” language, even though it may sometimes seem that way.

How do they do this?

Known as a statistical generative language model, it’s created by analyzing a vast number of documents. Machine learning needs to observe how words occur together, how close they are, and the frequency of usage. In principle, the more observations made, the greater the certainty in predicting how words occur.

The issue is with the data used to train the models. “As for all biased results in data science predictions, it depends on the dataset we are training the models on,” says Rosaria Silipo, Ph.D., principal data scientist at KNIME. “If the dataset is a collection of biased texts, the generated texts will reflect that bias.” Train the natural language generation model with enough documents where “Muslim” occurs in close context with “terrorist” or “white people” with “KKK,” and it may take that to be the norm.

Where and how does bias occur in the process?

So far, the convention has been to train these NLG models on exponentially larger sets of text data. Many models use the Common Crawl corpus, consisting of over 2.5 billion unfiltered pages, as their training data. For some, that’s just a start. They’ll essentially scrape everything they can from the entire internet, including the good, the bad, and the ugly.

It’s not just blog posts and news articles that make up these datasets: social media, forum posts, and the like account for a large portion. “We need to be really vigilant in knowing what kinds of content the training sets contain,” states Pure Strategy Inc Founder and CEO Briana Brownell.

While these massive sets of training data may improve a model’s predictive capability, there are downsides. The sheer size of the data makes it extremely difficult to check for toxic language, and there’s a lot. The internet is more misogynistic, racist, and sexist than most people realize. Training an NLG model to view this as typical is not a good idea by any stretch of the imagination.

What are the types of bias and misrepresentation?

Scrape enough data from the internet, and soon your vocabulary will expand to include numerous words unfit for public consumption. But sometimes, even innocuous words strung together can form biased representations of gender, profession, race, and religion. Take, for example, the gender stereotype “blonde bombshell,” a term full of negative connotations from a couple of commonplace words.

But when considering bias, “there is an important but subtle distinction between the action taken by a system versus the analysis,” explains Rayid Ghani, Professor in the Machine Learning Department and Public Policy at Carnegie Mellon University. “How the analysis will be used helps in determining what types of biases are more important than others to avoid.”

Language changes over time, and datasets based on past language usage may not reflect what is current. Take the medical profession, where at one time, nurses were women and doctors were men.

Leveraging AI to Support the American Worker

May 17, 2024

A New Era of Multilingual CX is Here

May 17, 2024

Tech Mahindra and IBM to Help Enterprises Accelerate Adoption of Trustworthy Generative AI Using watsonx

May 16, 2024

Prev Next 1 of 975

Obviously, it is no longer the case, but the model can perpetuate this stereotype if not retrained regularly using updated datasets.

What steps can be taken to mitigate these issues?

If the dataset used to train the language model contains biased text, it would be reflected in the generated text. Using a bigger dataset will not necessarily overcome that bias. At MarketMuse, we’ve found that a well-curated training dataset that’s scrubbed clean of toxic language works better than one that is far larger and unrefined.

According to Christopher Penn, Chief Data Scientist at TrustInsights, “Almost all real-world datasets contain biases.” He believes the real question is whether those biases are harmful or illegal.

“Recirculating, for example, disinformation is not illegal, but it is harmful. If we were generating language about vaccines, for example, we would want to eliminate disinformation.”

Obviously, illegal biases are also of great concern. As Christopher points out, “In the United States, protected classes on which we may not discriminate – which includes the training data we provide to models – include race, national origin, sexual orientation, gender identity, veteran status, disability, and religion.”

Since large real-world datasets are known to contain biases, vendors should explore ways to measure their pre-trained models to determine the extent of discrimination. One possibility is StereoSet from MIT, a dataset of 17,000 sentences measuring bias across gender, race, religion, and profession.

Make use of guardrails. In real life, these crash barriers keep automobiles on the road reducing the risk of serious accidents. In NLG, guardrails serve a similar purpose, ensuring the generated content doesn’t go off-course. Not using barriers is like giving the model carte blanche, which is never a good idea.

NLG models need to learn continually. It’s not enough to train a model once and use it evermore. Retrain the model regularly and fine-tune it to learn more about the subject for which it is creating a new piece of content. Ideally, this occurs with every new generation request.

Involve humans in the content creation process

Perhaps the most significant risk doesn’t come from the model itself but its implementation. Many marketers see NLG as a way of removing humans from the content creation equation. I think they view it as a kind of easy-button where they push and out comes a publishable piece of content ready for consumption. This lack of oversight is where the danger lies.

“The solution is incredibly simple,” explains digital consultant Vip Sitaraman. “Pair the computer with a human. Insofar as there is always a human editor curating the works of natural language generation, there is no outsize risk.”

That parallels our experience in designing First Draft, our NLG platform. We see natural language generation as augmenting the work of writers, not replacing them. In creating our system, we account for interactivity at multiple steps of the process. We find that letting the user configure the content toward their goals and giving them editorial control at key stages is crucial to avoiding embarrassing mistakes.

Natural language generation is still in its infancy and marketers should not have blind faith in the process. They will achieve the best outcome by taking a hands-on approach to incorporating NLG as a sophisticated writing aid. Simultaneously, vendors need to be aware of the potential for bias in large datasets used to train language models and take appropriate action.

6 Comments

PhillipHop says 2 years ago

Узнайте о дезинсекции и [url=https://www.km.ru/stil/2021/08/25/890409-zachem-nuzhna-dezinfektsiya-pomeshcheniya]о дезинфекции[/url] в Московской обл. Служба Изосепт
Külotlu çorap içinde bir hizmetçi ile seks Porno says 1 year ago

Büyük yağ sokulmuş yatağa giriyor. Bbw Büyük Kalça Büyük Göğüs Zenci Büyük Yarak.
Merhaba polis baştan çıkardım ben 22 yaşında oldukça karizmatik sarışın. omm ben Eda
Asyalı kadın fisting adam yıIIar önce başımdan geçen bir oIayı
sizIere anIatacağım, pelin yoktu ve Fatma Hanım yoktu, İstersen 5 dk.
Kaylee says 7 months ago

Howdy would you mind letting me know which
webhost you’re working with? I’ve loaded
your blog in 3 different web browsers and I must
say this blog loads a lot quicker then most. Can you suggest a
good web hosting provider at a honest price?
Thanks, I appreciate it!
Copper recycling partnerships says 2 months ago

Copper hydroxide scrap buyer Copper scrap shipping Scrap metal innovation
Copper cable shredder, Scrap metal trade negotiations, Environmental copper processing
частная клиника медицинской профилактики в Москве says 1 month ago

современная клиника персональной медицины в Москве где сделать справку из поликлиники где сделать справку из больницы
atrolak tilgængelig på apoteket i Quito says 1 day ago

Sweet blog! I found it while surfing around on Yahoo News.

Do you have any tips on how to get listed in Yahoo News?
I’ve been trying for a while but I never seem to get there!
Thank you