What Went Wrong With Google’s Image-Generating AI?

AI Machine Learning ProjectsCloudIT and DevOps

By Pooja Choudhary On Mar 1, 2024

The Delicate Balance Between Innovation and Responsibility

Google committed yet another embarrassing AI faux pas: an algorithm for creating photographs that ludicrously increased their diversity without considering their historical context. Since then, Google has apologized—or nearly did. Despite how obvious the core issue is, Google claims the model is “becoming” too sensitive. No, the model didn’t magically materialize. Gemini, the company’s conversational AI platform, will ask for a version of the Imagen 2 model whenever you ask it to take photos.

Until recently, though, people didn’t realize that it produced ridiculous results when asked to visualize particular persons or events from history. Even though it is now known that many of the Founding Fathers owned slaves, they were depicted as a diverse group that included people of color. This embarrassing and easily replicated problem was quickly parodied by online critics. Commentators used it as evidence that the already liberal IT industry was being infected even more by the woke mind virus, and it was also dragged into the continuing issue about diversity, equity, and inclusion, which is currently having a negative impact on its reputation locally. But this problem was generated by a completely reasonable solution for systematic bias in training data, as Google points out in its pretty sad little apology-adjacent essay today, and as anyone knowledgeable with the technology might tell you.

Read OpenAI Open-Source ASR Model Launched- Whisper 3

The Generative Model

For example, suppose you’re planning to use Gemini to create ten photos depicting “a person walking a dog in a park” for a promotional campaign. The dealer gets to choose the type of person, dog, or park to use; the generative model only gives the dealer what it knows is best. In addition, that is usually due to biases in the training data rather than actual findings. Which kinds of people, dogs, and parks stand out most in the thousands of relevant photos that the model has seen? When you don’t tell the model to display a specific race, it will likely default to white people because of the disproportionate representation of white people in many picture collections (stock photos, rights-free photography, etc.).

DataRobot Unifies AI Governance Beyond the Cloud

Jul 3, 2026

Insurity Unveils Agenda for Excellence in AI & Insurance, Showcasing How Insurers Are Turning AI into Operational Advantage

Jul 3, 2026

EasyVista Acquires French Software Company Konverso, Brining AI Agents to the Core of Its IT Platform

Jul 3, 2026

Prev Next 1 of 12,001

Read Top 20 Uses of Artificial Intelligence In Cloud Computing For 2024

Although Google acknowledges that “because our users come from all over the world, we want it to work well for everyone,” the issue is actually related to the training data. When requesting a photo of a football player or someone walking a dog, it can be wise to provide a diverse range of subjects. Images of people that are uniformly one race (or gender) are probably not something you’re looking for. Photographing a white guy in a suburban park with a golden retriever is quite OK. But what if, when you summon them, every one of the ten is a white man strolling a golden retriever in a suburban park? Do you really call Morocco home? It seems like every corner has a distinct character, from the people to the pets to the parks. Obviously, you would prefer that not to occur. When a feature is not specified, the model should prioritize diversity over homogeneity, even though its training data may be biased. This is a problem that every form of generative media has. Furthermore, a simple solution does not exist. In common, delicate, or both scenarios, however, companies like Google, OpenAI, Anthropic, etc. discreetly add more model instructions.

LLM ecosystem

This kind of implicit instruction is quite common, and I can’t stress it enough. The whole LLM ecosystem is built upon implicit instructions, or system prompts as they are commonly known. Guidelines such as “don’t swear,” “be concise,” and others are provided to the model before each talk. The model has been taught to avoid uttering racist jokes, much like the rest of us, so even though it has eaten millions of jokes, it will not deliver one if you ask it for one. Despite the need for greater openness, this is infrastructure and not a clandestine mission. Problematically, Google’s model did not provide any implicit direction for cases where the past played a pivotal role. So, while “the person is of a random gender and ethnicity” or whatever else they insert helps prompts like “a person walking a dog in a park,” substituting it with “the U.S. Founding Fathers signing the Constitution” obviously does not.

Read the Latest blog from us: AI And Cloud- The Perfect Match

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]