I am sure all of you have seen or heard about some of the issues encountered by Google’s newest AI model - Gemini.
This post is dedicated to all my Google friends! Keep up the great work you all are doing and don't get distracted by the hurdles you may have faced while thinking big.
I think this was one of the most viral post and rightfully mentioned that they couldn’t get image of Caucasian male at all - including for Pope.
Or as someone reported about “Modi a fascist?”
In either of these scenario, I am sure by now you have figured out that AI model may not be performing as expected.
What is Google’s Gemini:
Gemini, developed by DeepMind, is a cutting-edge multimodal AI model that integrates text, images, and more. It aims to redefine AI's role in technology and daily life by offering natural and engaging conversations with AI. Gemini's multimodal design allows it to handle various data types like images and videos, making interactions more helpful and accurate. The model is designed to offer pinpoint accuracy and stay on track with reality, incorporating techniques like reinforcement learning and tree search
Why the issues:
Google Gemini has been generating wrong images but that can be due to many reasons. Though primarily,
Failure to Account for Specific Cases: Google tuned Gemini to show a range of people, but this tuning did not consider cases where a range was not appropriate. This led to overcompensation, resulting in images that were embarrassing and inaccurate.
Overly Cautious Model: Over time, the model became overly cautious and refused to answer certain prompts entirely, misinterpreting harmless requests as sensitive. This over-conservatism led to further inaccuracies in the generated images.
Thankfully, Google has acknowledged this as an issue and working on fixing it.
Many of you are wondering that if Google’s model can make mistakes are we ready for AI? Remember, we are still in the early days of AI and we are just getting started. The generative models such as Gemini (and many other, such as chatGPT, Claude, LLaMa 2) is built as a creativity and productivity enhancement tools. It’s trained with vast amount of data and likely to make mistakes or produce unreliable results - e.g. hallucination is real problem.
Meanwhile you all might be wondering if you should still invest in AI?
While these are some of the hurdles in AI, as we know Google already have acknowledged the problem and fixing it. These are some of the growing pain as AI matures, similar to teething problems we see as the child grows. I would say, continue your AI projects as planned but start with low-risk or boring problems which doesn’t impact your brand but you are still able to bore the fruit of AI. When possible, see if you can have human-in-loop while applying AI - as we humans are still the best brains out there! Consider the risk of model inaccuracies (which is real as we just saw with Gemini) and create a plan to address it - like any risk, assess the risk and either assume, transfer or create a plan to deal with it.
What do you all think?
Other AI Advancements:
Sora:
In case you didn't noticed, OpenAI launched the video model - Sora. Why this is amazing? It's quite realistic and if you show it to someone who hasn't been told that it's AI generated video, they can not distinguish it.
If you haven't seen it yet, highly encourage all of you to check it out: https://openai.com/sora