Sundar Pichai says Google is fully in its Gemini era
Google I/O 2024 was the perfect opportunity for the company to showcase its AI integrations across Search, Photos, Workspace, Android, and more. Read more for a rundown of Google's major AI announcements.
The Google and Alphabet CEO addressed the Google I/O 2024 audience to talk about the fruits of Google's last decade of investment and research on AI, in addition to what lies ahead for the company. As has been well-known for some time now, Google is doubling down on its AI ambitions, suggesting it has plans to integrate the technology wherever possible. Google I/O 2024 was the perfect opportunity for the company to showcase its AI integrations across Search, Photos, Workspace, Android, and more.
Expanded AI Overviews: Google has been experimenting with generative AI-infused Search for a while now, offering AI-generated snapshots of key information related to users' queries, and suggested questions to enhance the associated search experience. AI-generated snapshots are also displayed when looking for a product, which includes factors to consider, product suggestions, and related information such as user reviews. The AI Overviews feature is coming out of Search Labs, with the first roll-out covering users in the US, and more countries coming soon. Improvements to AI Overviews include customizable adjustments to the Overviews' simplicity of language and level of detail, the possibility to ask more complex questions directly in Search, planning assistance, AI-organized results pages, and AI-assisted video search, which will be accessible in Search Labs to US-based English language users first.
Gemini 1.5 Flash, Gemma 2, and Project Astra: Google announced major updates to its Gemini AI model family, including Gemini 1.5 Flash – a lighter-weight, faster model optimized for efficiency and scalability. Additionally, Gemini 1.5 Pro received enhancements like a 2 million token context window, and Gemini Nano now understands multimodal inputs like images. Google unveiled its next-gen open model Gemma 2 with a new architecture. Finally, Google also shared progress on Project Astra, its vision for future AI assistants that can perceive the world through sight and sound, understand context, and converse naturally. Some Astra capabilities will come to Google products like the Gemini app later this year.
Google Veo and Imagen 3: Google's most capable video generation model generates high-quality 1080p resolution videos over a minute long, in various cinematic styles. Much like OpenAI, Google collaborates with filmmakers and other creatives to resolve how Veo can best enhance the creative process. The highlight of this mission is Google's collaboration with Donald Glover and his creative studio, Gilga. Building on Google's vast research on video generation, Veo is available via a waitlisted preview in VideoFX. Eventually, Google will bring Veo capabilities to YouTube Shorts and other video products. Google also unveiled the latest generation of its image generation model, Imagen 3, which is also in a waitlisted private preview on ImageFX, with plans to make it available in Vertex AI. Finally, the company also gave everyone a look into its foray into music generation by announcing the Lyria model and the Music AI Sandbox. Google also remarked on its commitment to safe and responsible AI deployment with technologies like the SynthID watermarks.
Ask Photos: Google Photos is launching a new AI-powered "Ask Photos" feature that lets users search their photos and videos using natural language questions. Leveraging Google's Gemini AI model, Ask Photos can understand the context and subject matter within images to pull out relevant details. Users can ask about past events, dates, and locations, and it will surface the applicable photos and information. Additionally, Ask Photos can assist with tasks like curating trip highlights or generating personalized captions by comprehending the content of the images. The experimental feature is rolling out over the coming months.
Gemini in Gmail and Workspace: Following its Google Cloud Next announcements, Google unveiled even more ways in which Gemini can help users stay productive in Workspace. The Gemini in the Workspace side panel is now powered by Gemini 1.5 Pro, leveraging its longer context window and advanced reasoning. The Gmail app also features new Gemini in the Workspace features, including email summarization, contextual smart replies generated by Gemini, and a Gemini Q&A capability to find information across emails and files. Moreover, the "Help me write" in Docs and Gmail feature now supports Spanish and Portuguese on the desktop, with plans to eventually add more languages. These updates aim to help individuals and businesses get more out of their Google apps using generative AI capabilities.
A slew of AI features coming to Android: Google made several Android-related announcements, but indeed, the center of attention was on the new ways to experience AI in Android: as a mobile assistant, Gemini on Android is getting better at understanding users' screen contexts. Soon, users will superpose the Gemini overlay on top of their active app to get Gemini to perform tasks or dig out relevant information related to their current tasks. In addition to getting multimodality, Gemini Nano will be deployed as Android's on-device foundation model, starting with the Google Pixel smartphone.
Gemini Nano will also power Android's TalkBack feature. TalkBack empowers people experiencing blindness or low vision by helping them get richer and clearer descriptions of what’s happening in an image. Since Gemini Nano will be on-device, it will power the TalkBack service even without a network connection. Finally, the company is looking into an opt-in feature for spam call detection. The available details are scarce, but the company has committed to sharing more on the spam call detection feature later this year.
Other notable I/O announcements include the introduction of LearnLM, a family of models optimized for learning environments, new tools, and safeguards that build on Google's commitment to responsible AI deployment, updates for Gemini Advanced subscribers, including the longest context window in the world, the Gemini Live experience —which enables users to have natural sounding voice conversations with Gemini—, advanced planned assistance, customized Gems, and improved app connections. On the hardware front, Google announced Trillium, its latest custom AI-specific TPU.