Gemini Image Generation Fiasco: What Went Wrong and How to Fix It

Introduction

Google launched Gemini Image Generation Fiasco in December 2023, rebranding its previous products Bard and Duet. The image generation feature aimed to be a fun and creative tool capable of producing realistic and diverse images of people, animals, landscapes, and more.

However, soon after its launch, users discovered that Gemini’s image generation was flawed and inaccurate. Some of the images it generated were offensive, insensitive, or downright wrong.

For example, Gemini generated images of people of color when asked for white historical figures, or images of Nazis when asked for diverse soldiers.
Google quickly apologized and paused the image generation of people in Gemini while it worked on improving the accuracy of its responses. But how did this happen and what can be done to prevent it from happening again?

The Problem: Tuning Gone Wrong

According to Google’s official blog post, the problem with Gemini’s image generation was caused by two factors: tuning and caution.

Tuning refers to the process of adjusting the parameters of an AI model to achieve a desired outcome.

In this case, Google wanted Gemini to show a range of people when generating images, to avoid bias and stereotyping.

For example, if you ask for a picture of football players, or someone walking a dog, you may want to receive a range of people, not just people of one ethnicity or gender.

However, Google failed to account for cases that should clearly not show a range, leading to generating images that did not match the user’s request, or worse, images that were inappropriate.

Caution refers to the tendency of an AI model to avoid generating images that could be potentially harmful, violent, or explicit. Google wanted Gemini to be cautious and not generate images of real people, or images that could be considered sensitive or controversial. However, Google admitted that the model became “way more cautious than we intended and refused to answer certain prompts entirely – wrongly interpreting some very anodyne prompts as sensitive”. This led to Gemini being over-conservative and not generating images of any particular group, or generating images that were bland and generic.

The Response: Sundar Pichai Apologizes and Promises Changes

Google’s CEO Sundar Pichai also addressed the controversy surrounding Gemini’s image generation in an internal memo to employees. He acknowledged the issue, vowed to make structural changes, and emphasized the importance of building trustworthy products in the AI industry. Pichai called the Gemini’s responses around race unacceptable and said that Google got it wrong. He said that the teams have been working around the clock to fix the problem and that they have already seen a substantial improvement on a wide range of prompts. He also said that Google will review what happened and make sure it does not happen again.

The Solution: Testing and Feedback

Google said that it turned off the image generation of people in Gemini and will work to improve it significantly before turning it back on. This process will include extensive testing and feedback from users and experts. Google also said that it will be more transparent and accountable about how its AI models work and what they can and cannot do.
It acknowledged that Gemini is not a reliable source of information, especially when it comes to current events, evolving news, or hot-button topics. Google also reminded users that Gemini is a creativity and productivity tool, not a factual or educational one. Google said that it cannot guarantee that Gemini will never generate embarrassing, inaccurate, or offensive results, but it will do its best to minimize them.

The Debate: Did Google Recklessly Launch Gemini Without Fine Tuning?

Gemini’s image generation fiasco has sparked a debate among the AI community and the public about whether Google recklessly launched Gemini without fine tuning and whether it has repercussions. Some critics argue that Google rushed to release Gemini without proper testing and evaluation, and that it exposed its users to harmful and misleading results. They claim that Google prioritized its competitive edge over its ethical and social responsibility, and that it failed to anticipate the potential consequences of its AI model. They also point out that Google has a history of launching flawed and controversial AI products, such as Google Photos, which labeled some Black people as gorillas, or Google Duplex, which mimicked human voices without disclosing its identity.

Some defenders assert that Google didn’t recklessly launch Gemini but rather confronted an unprecedented and complex challenge. They argue that Google designed Gemini to be natively multimodal, initially pre-training it on various modalities. Furthermore, they contend that Google fine-tuned Gemini with additional multimodal data to enhance its effectiveness further. They also cite Google’s blog post, which explained the technical difficulties of tuning and caution, and the trade-offs involved in balancing accuracy and diversity.

The Takeaway: AI is Not Perfect

Gemini image generation fiasco is a reminder that AI is not perfect and that it can make mistakes. AI models are complex and often unpredictable, and they depend on the quality and quantity of the data they are trained on. It also need constant monitoring and evaluation, as they can change over time and adapt to new inputs. AI models also need human oversight and intervention, as they can have unintended consequences and ethical implications. AI models also need clear and consistent communication, as they can be misunderstood or misused by users who have different expectations and goals.