Anthropic Unveils First Hybrid Reasoning Model, Claude 3.7: Coding Superiority and Controllable Thinking Time

Author Info

David Kowalski

Developer Tools & Agents Editor

15+ years software engineering; maintainer of internal agent-evaluation playbooks

David tests coding agents, IDE integrations, and terminal workflows the way working teams use them. He documents prompts, environment pins, and regression cases so readers can compare tools fairly. When vendors sponsor access, he discloses it and keeps scoring criteria unchanged.

#Coding Agents #IDE Integrations #Developer Productivity #Tool Comparisons

Full author profile →

Claude’s Late-Night Heavyweight Release: A New Model

Claude 3.7 Sonnet, the first hybrid reasoning model, has been released, delivering significant improvements in coding and front-end web development to achieve comprehensive leadership.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

A direct comparison of Claude 3.7 Sonnet against o3-mini using identical prompts:

o3-mini:

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Claude 3.7 Sonnet:

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

(Prompt: Write a p5.js script that simulates 100 colorful balls bouncing inside a sphere. Each ball should leave a fading trail showing its recent movement path. The container sphere should rotate slowly. Ensure proper collision detection is implemented so the balls remain within the sphere.)

It can also be used to create video games.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

As a hybrid reasoning model, it features two thinking modes:

Near-real-time responses & extensive, step-by-step thinking.

In Extended Thinking mode, it offers additional improvements in mathematics, physics, instruction following, coding, and other tasks.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

API users can even precisely control the model’s thinking time. It is now available across all platforms, including Amazon Bedrock and Google Cloud. Extended Thinking mode is available on all plans except the free tier.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Model upgrades without price increases:

Pricing remains the same as previous generations: $3 per million input tokens and $15 per million output tokens (including thinking tokens).

Additionally, they released their first coding tool, Claude Code:

It can complete tasks that previously required over 45 minutes of manual work in a single go.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Netizens have expressed their eagerness to start using it immediately.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

But why the 3.7 version?

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Claude 3.7 Sonnet: The First Hybrid Reasoning Model

Anthropic first stated that the philosophy behind developing Claude 3.7 Sonnet differs from other reasoning models on the market.

Analogous to the human brain having both rapid response and deep thinking capabilities, they believe reasoning should be a comprehensive capability of frontier models rather than entirely separate models.

This philosophy is reflected in several aspects of Claude 3.7 Sonnet.

First, Claude 3.7 Sonnet serves as both a standard LLM and a reasoning model:

You can choose when to let the model answer normally and when to let it think longer before responding.

In Standard mode, Claude 3.7 Sonnet is an upgrade over Claude 3.5 Sonnet. In Extended Thinking mode, it engages in self-reflection before answering, thereby improving its performance in mathematics, physics, instruction following, coding, and many other tasks.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

The prompting approach is similar across both modes.

This illustrates the difference between responses from the two thinking modes on a code generation task.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Second, control the thinking budget, trading speed and cost for answer quality.

API users can instruct Claude to limit its thinking to no more than N tokens. The value of N can be any number, but the output must not exceed 128K tokens.

The model’s performance depends on the allowed token count for thinking. For example, see the results chart from AIME 2024 below.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Third, the optimization focus has shifted toward real-world tasks that better reflect user needs, with less emphasis on optimizing for math and computer science competition problems.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

△ Solving practical probability problems

Regarding coding tasks, netizens were impressed by its “vibe coding” capabilities (AI-assisted coding for non-developers).

One user’s test revealed that it can indeed solve problems that other models cannot. > Can you write the most complex fabric simulation using p5js?

Grok 3 and o1 pro yielded no usable results, whereas Claude 3.7 Sonnet performed as follows:

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

It achieved State-of-the-Art (SOTA) performance in agent tool usage.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Claude 3.7 Sonnet excels in instruction following, general reasoning, multimodal capabilities, and agent coding, while Extended Thinking provides significant boosts in mathematics and science.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Beyond traditional benchmarks, it even surpassed all previous models in a Pokémon game test.

The team equipped Claude with basic memory, screen pixel input, and function calls for key presses and screen navigation, allowing it to exceed typical context limits, play continuously, and interact tens of thousands of times.

Final experiments showed it was the best-performing Sonnet model to date; it successfully battled three Pokémon Gym Leaders (game bosses) and won their badges. In contrast, Claude 3.0 Sonnet couldn’t even leave the house in Pallet Town where the story begins.

The x-axis represents the number of interactions completed by Claude while playing; the y-axis indicates significant milestones in the game, such as collecting certain items, navigating to specific areas, and defeating bosses.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

For this release, researchers used parallel test-time computation to enhance model performance.

Their method involves sampling multiple independent reasoning processes and selecting the best answer without prior knowledge of the correct one. One approach uses majority voting; choosing the most common answer as the “best.” Another method employs another language model (such as a second instance of Claude) to review its own work or learn from a scoring function, then select what it deems the best answer.

This approach resulted in remarkable improvements on the GPQA evaluation.

GPQA is a set of challenging questions in biology, chemistry, and physics. Under conditions equivalent to 256 independent samples, using a learned scoring model, and with a maximum thinking cost of 64 tokens, Claude 3.7 Sonnet achieved an 84.8% GPQA score (including a 96.5% score on the physics subset).

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Additionally, Claude 3.7 Sonnet makes more nuanced distinctions between harmful and benign requests, reducing unnecessary refusals by 45% compared to its predecessor.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

Claude’s First Coding Tool Debuts

Claude Code, based on features revealed by the official team, can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command-line tools.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

It is currently an early preview version that can run directly in the terminal.

First Hybrid Reasoning Model Claude 3.7 Released! Comprehensive Leadership in Programming Capabilities, with Controllable Thinking Time

In early tests, Claude Code completed tasks that typically require over 45 minutes of manual operation in one go, thereby reducing development time and overhead.

Over the coming weeks, they plan to continuously improve it based on usage data—

Such as enhancing tool call reliability, adding support for long-running commands, improving in-app rendering, and expanding Claude’s understanding of its own capabilities.

Additionally, they have improved the coding experience on Claude.ai. GitHub integration is now available across all Cl

available on the Aude platform, allowing developers to directly connect their code repositories to Claude.

References

  1. 1894110678027571412 — x.com/deedydas/status/1894110678027571412
  2. 1894106278185898489 — x.com/_akhaliq/status/1894106278185898489