On Tuesday, Google introduced a range of new generative AI products at its Google I/O developer conference. These include the Gemini Live assistant, updates to its Android and Workspaces platforms, and a significant revamp of its Search feature. This move is part of Google’s strategy to reclaim its position as a leader in AI, following the surprise partnership between Microsoft and OpenAI in 2022.
The launch of these new products comes shortly after OpenAI's announcement of its GPT-4o AI model and just before major AI updates from Microsoft and Apple.
A key highlight is the update to Google’s Search platform, which now integrates generative AI results at the top of the screen. Initially available in a preview through Search Labs, this feature will soon be accessible to all U.S. users and will eventually roll out globally.
Another major announcement is the introduction of Gemini Live, a personalized AI assistant powered by the Gemini 1.5 Pro model. This assistant can interact directly with users on their mobile devices, offering responses in a variety of natural-sounding voices.
For Android users, Google is expanding the assistant’s capabilities to work with apps like Google Messages and Gmail. Users can now drag and drop images generated by Gemini into texts and emails, and the assistant can answer specific questions about YouTube videos by analyzing their captions.
Google is also enhancing its on-device version of Gemini, called Gemini Nano, with new multimodal features. This means the AI can process and generate content across text, images, and video. Amar Subramanya, Google’s VP of engineering for Gemini experiences, mentioned that he has been using Gemini Live for brainstorming and translation tasks. Future updates will enable the assistant to access a user's camera, allowing it to interact with real-world objects, a capability also showcased by OpenAI's GPT-4o.
Subramanya provided an example where he asked the assistant to find a recipe for a pineapple upside-down cake for 15 people. The assistant found a recipe for eight people, adjusted the proportions, and added the necessary ingredients to his Keep shopping list.
Gemini Nano is integrated into various parts of Android, offering features like audio descriptions for on-screen images in the TalkBack accessibility software. This helps users with vision impairments by providing descriptions for previously unlabeled images online and in messages. Additionally, Gemini Nano can detect potential phone scams by identifying conversation patterns typical of scams. All processing for this feature occurs on the device, ensuring privacy.
In Google’s Workspace, the more powerful Gemini 1.5 Pro model is now available in the side panel of apps like Docs and Gmail. This allows users to continue working while utilizing Gemini’s capabilities in a smaller interface. Gemini 1.5 Pro can reference larger data sets, summarizing extensive email threads and providing smart replies based on the email’s content.
As Google rolls out these AI advancements, its competitors are poised to unveil their own innovations. Microsoft is expected to announce updates to its AI-powered Copilot for Microsoft 365 at the Build conference on May 20, while Apple is anticipated to introduce a generative AI-powered version of its Siri assistant at WWDC on June 10. The tech world now watches to see if these launches will overshadow Google's recent advancements.