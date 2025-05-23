Anthropic unveils Claude 4 models with coding and agentic AI upgrades
Ben Wodecki
May 23, 2025 10:30 AM
Anthropic's Claude app displayed on a smartphone

OpenAI rival Anthropic has unveiled Claude 4, its next-gen foundation model featuring improved code capabilities, advanced reasoning, and tools for users to build their own AI agents.

Anthropic unveiled two new models in its Claude family: Claude Opus 4 and Sonnet 4. Opus is the largest, with the startup claiming it’s “the world’s best coding model”, while Sonnet 4 is designed to be a significant upgrade to the prior Sonnet 3.7.

The latest releases generate responses far faster despite extended reasoning. They’re also compatible with new offerings in Anthropic’s API that let developers build more powerful AI agents.

“These models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact,” Anthropic’s announcement page reads. “We're excited to see what you'll create.”

Anthropic doubles down on agentic AI

The Amazon and Google-backed startup looks to be the latest firm to place its bets on agentic AI, or an agent-based AI system capable of performing tasks on behalf of a user.

The new models are designed to think more extensively when using tools, with Anthropic suggesting they’re 65% less likely to use shortcuts or loopholes to complete tasks compared to the previous gen Sonnet 3.7.

Claude 4 systems also come with boosted memories to help with agentic tasks, with Opus 4 able to maintain what Anthropic described as ‘memory files’ of local files Claude has access to, which the startup said helps it with long-term task awareness and performance.

To show off its agentic capabilities, Anthropic showcased Claude Opus 4 autonomously playing Game Boy Colour era Pokémon games, like Gold and Silver.

Anthropic's Claude 4 improved memory in action: When given access to local files, the model records key information to help improve its game play | Credit: Anthropic
To further boost its ability to handle tasks as an AI agent, the new Claude 4 models feature improved code handling capabilities.

The models, which dropped days after rival OpenAI unveiled a cloud-based software engineering agent dubbed Codex, can handle background tasks via GitHub Actions and support integrations with platforms like VS Code and JetBrains.

Its coding abilities were touted by GitHub, with the repository platform planning on using Claude Sonnet 4 to power a new coding agent in GitHub Copilot, its AI-powered coding support tool.

Claude Opus 4 took the top spot on release for the SWE-bench and Terminal-bench benchmark tests, which are used to evaluate an AI model’s coding performance

Graph showing Claude 4's performance on the SWE-benchmark for AI coding tasks

Anthropic said the model “delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours”.

“These models advance our customers' AI strategies across the board: Opus 4 pushes boundaries in coding, research, writing, and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade from Sonnet 3.7,” the startup said.

Ben Wodecki
Ben Wodecki
Senior Reporter Capacity Media
Contact
