Claude Sonnet 5: Stronger Agent Capabilities at a More Usable Price

Anthropic released Claude Sonnet 5 on June 30, 2026. It is the next generation of the Sonnet line. Its positioning is not simply “the most capable model,” but a model that brings stronger agent, coding, and tool-use ability into a price range that is easier to use every day.

According to Anthropic, Sonnet 5 is clearly stronger than Sonnet 4.6 in reasoning, tool use, coding, and knowledge work. Some tasks approach Opus 4.8, while the price is lower. For developers, the most direct change is that Claude Code, Claude Platform, and the Claude API can all use claude-sonnet-5.

Availability and pricing

Claude Sonnet 5 is available across Claude plans:

Free and Pro users use Sonnet 5 by default.
Max, Team, and Enterprise users can use Sonnet 5.
Claude Code and Claude Platform support Sonnet 5.
Developers can use the model name claude-sonnet-5 in the Claude API.

The API price uses a lower introductory period before moving to the standard rate:

Time	Input price	Output price
Before August 31, 2026	$2 / million tokens	$10 / million tokens
Standard price afterward	$3 / million tokens	$15 / million tokens

Anthropic also says Sonnet 5 uses an updated tokenizer. The same input may map to more tokens under the new tokenizer, roughly 1.0 to 1.35 times the old count depending on content type. One purpose of the introductory price is to make the migration from Sonnet 4.6 smoother.

Key improvement: Sonnet becomes a practical execution layer

The keyword for Sonnet 5 is agent. Anthropic emphasizes that it can plan, use tools such as browsers and terminals, and keep working through longer task chains.

For developers and enterprise users, that matters in several ways:

Coding tasks can move beyond snippet completion toward multi-step changes, debugging, and validation.
Tool use is more stable, making it more suitable for browsers, terminals, enterprise apps, and internal workflows.
At medium effort, Sonnet 5 offers better cost-performance; at higher effort, some tasks can approach Opus 4.8.
For Claude Code users, it feels more like a daily execution model than an expensive option reserved for a few hard tasks.

Anthropic cites early partner feedback from complex codebases, brownfield projects, insurance workflows, legal research, and data analysis. The recurring theme is that Sonnet 5 can follow tasks more completely instead of stopping midway or only giving advice.

Safety assessment: safer, but not risk-free

Anthropic’s safety assessment has two sides.

On one hand, Sonnet 5 is steadier than Sonnet 4.6. It improves agent safety, refusal of malicious requests, prompt-injection resistance, hallucination reduction, and lower sycophancy. Anthropic’s automated behavior audits also show a lower rate of undesirable behavior than Sonnet 4.6.

On the other hand, it is not as steady as stronger models such as Opus 4.8 or Mythos Preview. In the same safety evaluation category, Sonnet 5 still has a higher undesirable-behavior rate than those models.

For cybersecurity, Anthropic says it did not intentionally train Sonnet 5 on cybersecurity tasks. It can handle some ordinary, benign security work, but in potentially dangerous capability evaluations it is clearly weaker than Opus 4.8 and Mythos 5. In the Firefox exploit evaluation mentioned by Anthropic, Sonnet 5 did not produce a complete usable exploit, though it had a higher partial success rate than Sonnet 4.6.

Because of that, Sonnet 5 has cybersecurity safeguards enabled by default. These safeguards detect and block dangerous cybersecurity uses in real time. Their strength is similar to Claude Opus 4.7 and Opus 4.8, but lower than the stricter safeguards used on Fable 5.

What to watch when migrating

If you already use the Claude API or Claude Code, Sonnet 5 is a direct upgrade candidate for Sonnet 4.6, but there are three details worth checking first.

First, change the model name to:

1

claude-sonnet-5

Second, do not estimate cost from unit price alone. Sonnet 5’s standard price is higher than the launch discount, and the tokenizer change may increase token usage for some inputs. For long-context tasks, log analysis, and codebase scans, re-estimate costs with your own real requests.

Third, effort settings affect cost-performance. Sonnet 5 can cover a wider range of cost and capability at different effort levels. Daily coding, documentation, and lightweight agent tasks may not need the highest effort. Save higher effort for tasks that truly require long planning and multi-tool collaboration.

Its relationship with Opus 4.8

Sonnet 5 does not replace Opus 4.8. A better way to understand it is that part of the agent capability that used to feel closer to Opus has moved down into the Sonnet tier.

For tasks that need the highest ceiling, especially complex research, deep reasoning, long-chain agents, and difficult coding, Opus 4.8 still has a place. For daily throughput, price, and stable execution, Sonnet 5 is better suited as the default model.

That is the most important signal in this release. Sonnet is no longer just the “fast and affordable” middle tier. It is starting to take on a large amount of real execution work. For enterprises and developers, model selection may shift from “default to Opus and downgrade if too expensive” to “default to Sonnet 5 and upgrade to Opus when needed.”

Summary

Claude Sonnet 5 shows Anthropic moving agent capability from flagship models into more commonly used model tiers. Its main selling point is not a single benchmark, but more complete task execution, acceptable pricing, and broader default availability.

In the short term, three scenarios are worth watching:

Multi-step coding, debugging, and codebase modification in Claude Code.
Internal enterprise agents, data analysis, documentation, and workflow automation.
API applications that need a finer balance between cost and capability.

If you already use Sonnet 4.6, Sonnet 5 is worth testing. If you currently rely on Opus 4.8, you can move some medium-complexity tasks down to Sonnet 5 and compare cost and completion rate.

Practical migration guide: test by task tier

If your team already uses Sonnet 4.6, do not switch every call to Sonnet 5 on day one. A steadier approach is to divide tasks by difficulty and risk: lightweight Q&A, summarization, code explanation, single-file edits, multi-file refactors, long-running agents, and automation with tool use. Prepare samples for each group.

In the first round, measure completion rate and rework rate, not just whether the answer feels smarter. In Claude Code, for example, compare whether it misses fewer tests, understands repository structure more reliably, and asks questions when uncertain.

In the second round, measure cost. The tokenizer change can make the same prompt produce more tokens, so use real logs instead of mental math from published prices. Long-context, document-analysis, and codebase tasks are especially sensitive to token changes.

Only then decide the default model. My suggestion is to make Sonnet 5 the first candidate for daily agent and coding tasks, keep Opus or Fable for retries and high-value tasks, and keep Haiku for lightweight batch work. That makes migration smoother and helps identify where the improvement is real.

Metrics to track

When testing Sonnet 5, record four metrics: first-pass task completion rate, manual editing time, tool-call failure rate, and cost per task. Benchmarks alone are easy to misread because real team tasks mix code, documents, environments, permissions, and context memory.

If Sonnet 5 is more stable than the older model for a task class, migrate that class first. If it only answers more verbosely while making bolder edits, keep human approval or use more conservative prompts.

Original: Introducing Claude Sonnet 5