At Google's I/O 2025, Google announced enhancements to its Gemini AI model lineup beyond the early release of its Gemini 2.5 Pro update a couple of weeks ago. That Gemini 2.5 Pro preview showcased the model's enhanced coding and multimodal reasoning capabilities, which are now joined by more new features, including a new enhanced reasoning mode called "Deep Think". Additionally, both Gemini 2.5 Pro and Flash will now boast exciting new features, such as Project Mariner's computer use capabilities and native audio outputs for more natural voice conversations.

Gemini 2.5 Pro, now with "Deep Think"

Since the Gemini 2.5 Pro preview launch, Google seems to have shifted its focus from academic benchmarks to human preference leaderboards such as the LMArena and WebDev Arena. Indeed, the provided model card reveals strong benchmark scores across some of the most popular evaluations. Still, 2.5 Pro falls short of the current state-of-the-art scores.

In contrast, Gemini 2.5 Pro is doing quite well on human preference leaderboards, as the model currently tops every English-based tasks leaderboards at the Chatbot Arena, which sets up battles between two chatbots to test user preference across tasks such as math, instruction following, and multi-turn interactions. Similarly, the Gemini 2.5 Pro preview tops the WebDev Arena leaderboard, which likewise sets up battles to test for user preference, but exclusively across web development tasks.

Google touts 2.5 Pro's benchmark scores and state-of-the-art long context and video understanding performance as the partly product of the model's generous 1 million-token context window. However, 2.5 Pro's standout feature is "Deep Think," an experimental enhanced reasoning mode for Gemini 2.5 Pro that enables the model to consider multiple hypotheses before responding. This capability has demonstrated impressive results on challenging benchmarks, including the 2025 USAMO math competition and LiveCodeBench for competition-level coding tasks.

Google has stated that until it conducts more safety testing and gathers more feedback on safety from external experts, it will initially make the Deep Think feature available to trusted testers via the Gemini API before wider release.

Google bets big on Gemini 2.5 Flash

Meanwhile, Gemini 2.5 Flash, described as Google's "most efficient workhorse model," has been improved across key benchmarks while using 20-30% fewer tokens. It's now available in the Gemini app for all users, with general availability for production coming in early June.

Additional enhancements for both Gemini models include:

  • An audio-visual input and native audio out dialogue preview in the Live API, and a new text-to-speech preview feature for Gemini 2.5 Flash and Pro;
  • Computer use capabilities from Project Mariner;
  • Strengthened security protections against threats like indirect prompt injections;
  • Thought summaries to make model reasoning more transparent
  • Thinking budgets, an option for developers to control model cost and performance by specifying how many tokens a model should use before responding, or to turn the models' thinking capabilities completely off.
  • Support for Model Context Protocol (MCP) definitions in the Gemini API