Following the launch of Gemini 2.0 Flash Thinking, Google's first incursion into the realm of "reasoning" models, Google has introduced Gemini 2.5 Pro, describing it as their "most intelligent AI model" and the Gemini 2.5 Pro launch as an experimental version of the model.
Notably, Google reports this experimental version of Gemini 2.5 Pro has already claimed the top position on the Chatbot Arena LLM Leaderboard, a platform that crowdsources AI benchmarking on human preference. At the time of writing, Gemini 2.5 Pro still holds the top position on the Overall, Math, Instruction Following, Creative Writing, and hard Prompts leaderboards, among others.
A "Thinking Model" with Enhanced Reasoning
Gemini 2.5 represents a new category that Google calls "thinking models." These systems are designed to enhance their performance and accuracy by reasoning through their thoughts before responding. According to the company, Gemini's new reasoning abilities allow the model to go beyond classification and prediction, which LLMs already excel at, adding new capabilities to its roster, like being able to analyze information, draw logical conclusions, incorporate context and nuance, and make more informed decisions.
In its blog post, Google explains that it achieved this new level of performance by combining a significantly enhanced base model with improved post-training techniques. Moving forward, Google plans to build these thinking capabilities directly into all of their models. Like most other companies in the business of developing foundation models, Google believes Google 2.5 Pro's advanced capabilities will be the key that unlocks performant and context-aware agents.
Benchmark-Leading Performance
In addition to its top scores in the Chatbot Arena LLM Leaderboard, Gemini 2.5 Pro demonstrates state-of-the-art capabilities across several more academic benchmarks. Without using test-time techniques that increase computational costs (like majority voting), Gemini 2.5 Pro obtained leading scores in math and science benchmarks, including GPQA and AIME 2025. The model also scored 18.8% on Humanity's Last Exam, which Google notes is state-of-the-art among models without tool use.
Advanced Coding Capabilities
A significant focus for the Gemini 2.5 development has been coding performance. Google reports that the new model excels at creating visually compelling web applications and agentic code applications, along with code transformation and editing tasks. On SWE-Bench Verified, which Google describes as "the industry standard for agentic code evals," Gemini 2.5 Pro scored 63.8% when paired with a custom agent setup.
Building on Gemini's Foundation
Gemini 2.5 builds upon the multimodal capabilities and long context windows that characterized previous Gemini models. The 2.5 Pro ships with a 1 million token context window (with plans to expand to 2 million soon), allowing it to comprehend vast datasets and handle complex problems from diverse information sources, including text, audio, images, video, and entire code repositories.
Availability
Gemini 2.5 Pro is now available in Google AI Studio and the Gemini app for Gemini Advanced users, with Vertex AI integration coming soon. Google plans to introduce pricing in the coming weeks so Gemini 2.5 Pro can be used with higher rate limits in production environments.
Comments