Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks

Google released Gemini 2.0 Flash, the first of its next-gen Gemini model family

Google unveiled Gemini 2.0 Flash, a multimodal AI model with native tool use and multimodal input and output capabilities to be available in early 2025. An experimental model version without multimodal outputs is now available through the API and on the Gemini desktop and web apps.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Google released Gemini 2.0 Flash, the first of its next-gen Gemini model family
Credit: Google

This Wednesday, Google shared some advances related to its research on agentic AI, the alleged next frontier to reach on the path towards artificial general intelligence (AGI). In a recent blog post, Google confirmed that thanks to its multimodal capabilities, Gemini 2.0 is powering agentic AI prototypes such as Project Astra—a universal AI assistant—or the novel Project Mariner—human-agent interaction, currently within a web browser, similar to Claude's computer use.

But the company is not stopping there. Undoubtedly, the major announcement is that Google is releasing Gemini 2.0 Flash as an experimental model, now available in the Gemini API and as a chat-optimized version on the desktop app and mobile web experience.

Gemini 2.0 Flash can natively process multimodal inputs (image, video, audio, and text) and generate multimodal outputs (text, text-to-speech, images). However, an important limitation is that, although they will process multimodal inputs, the experimental consumer products available through the Gemini API and app will not ship with multimodal output generation, remaining text-only. Developers who want to build with Gemini 2.0 can leverage the Multimodal Live API for applications requiring audio or video streams from cameras or screens.

The model also features native tool use, enabling Flash to perform tasks such as calling Google Search or executing code and third-party functions through function calling. According to the blog post, by running Google Search natively, Gemini 2.0 Flash can deliver more factual answers and drive traffic to content publishers. Before the image and audio output capabilities become more widely available, they will be accessible to Google's early-access partners. Google has also confirmed multimodal outputs will be watermarked using SynthID technology. Google claims the production version of Gemini 2.0 Flash will be available in early 2025.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More