Google officially launched its next-generation artificial intelligence reasoning model, Gemini 2.5, today. Centered on a “think-verify-answer” multimodal reasoning capability, Google describes it as the “smartest model to date.”
Its flagship version, Gemini 2.5 Pro Experimental, surpassed competitors such as OpenAI and Anthropic across multiple benchmark tests. It demonstrated particularly strong performance in code generation and mathematical reasoning, marking a significant breakthrough in AI technology for handling complex tasks.
Gemini 2.5 Pro supports multimodal inputs including text, images, audio, video, and code, with a context window of up to one million tokens (approximately 750,000 words). This capacity allows it to parse the complete Lord of the Rings series in its entirety, with plans to upgrade to two million tokens in the future. This capability gives it an advantage when addressing complex cross-modal problems.
Code Generation: It scored 68.6% on the Aider Polyglot code editing test, outperforming models from OpenAI and Anthropic. In the SWE-bench Verified test, it achieved a score of 63.8%, trailing only Claude 3.7 Sonnet (70.3%).
Mathematical and Scientific Reasoning: It led most competitors in the “Humanity’s Last Exam” (a comprehensive multimodal assessment) with an accuracy rate of 18.8%, without relying on external tools.
General Capabilities: On the LMArena leaderboard, it surpassed GPT-4.5 by a margin of 40 points, topping both the Vision Arena and WebDev Arena rankings.
Gemini 2.5 Pro is available today to users subscribed to “Gemini Advanced” ($20 per month) via Google AI Studio and the Gemini app. It will later be deployed on the Vertex AI platform. Google has not yet announced API pricing but stated that enterprise application plans would be disclosed within a few weeks.