Claude Opus 4.7 Tops SWE-bench with 87.6% Score

Anthropic's latest flagship model, Claude Opus 4.7, has set a new industry record on SWE-bench Verified — the gold-standard benchmark for AI software engineering — achieving a score of 87.6%. This surpasses all previously published results from any publicly available model.

What SWE-bench measures

SWE-bench Verified tests whether an AI can resolve real GitHub issues from popular open-source repositories. Each task requires the model to read a bug report, understand a codebase, write a fix, and pass the project's existing test suite — without any human guidance.

Pricing and availability

Claude Opus 4.7 is priced at $5 per million input tokens and $25 per million output tokens — positioning it as a premium option for teams building agentic software workflows. It is available via the Anthropic API and Claude.ai Pro.

What this means for developers

The result signals a qualitative shift in what AI agents can accomplish autonomously. At 87.6%, the model can reliably fix the majority of real-world bugs without human intervention — a capability that was considered out of reach just 18 months ago.

Anthropic describes Claude Opus 4.7 as purpose-built for long-horizon agentic tasks: multi-step coding projects, automated testing pipelines, and complex refactoring work that previously required experienced engineers.