Skip to content
Claude Opus 4.6
Product

Claude Opus 4.6

Anthropic's publicly released frontier model; scored within 5–10 percentage points of Claude Mythos on AISI's CTF benchmarks.

Last refreshed: 16 April 2026

Key Question

How close is Claude Opus 4.6 to the restricted Mythos model that spooked the US Treasury?

Timeline for Claude Opus 4.6

View full timeline →
Common Questions
How does Claude Opus 4.6 compare to Mythos on cybersecurity benchmarks?
AISI's April 2026 evaluation found Claude Opus 4.6 within 5 to 10 percentage points of Mythos on isolated CTF tasks. The gap is larger on multi-step autonomous operations, where Mythos completed a 32-step chain estimated at 20 human hours — a test Opus 4.6 was not evaluated against in the same setting.Source: UK AI Security Institute
What is Claude Opus 4.6 used for?
Opus 4.6 is Anthropic's most capable publicly available model, used by API developers, enterprise customers, and Claude subscribers. It was the basis for comparison in AISI's April 2026 evaluation of the restricted Mythos Preview.Source: Anthropic

Background

Claude Opus 4.6 is the most capable publicly available model in Anthropic's Claude line-up, released before the restricted Claude Mythos Preview. On 15 April 2026, the UK AI Security Institute (AISI) published a comparative evaluation of Mythos in which Claude Opus 4.6 was one of three comparison models — alongside GPT-5.4 and Codex 5.3 — used to benchmark Mythos on isolated capture-the-flag (CTF) cybersecurity tasks. Mythos scored above 85% on those tasks; Opus 4.6 fell within 5 to 10 percentage points, as did GPT-5.4 and Codex 5.3. The comparison established that no single-task superiority exists for Mythos over public frontier models.

Opus 4.6 is available to API and consumer subscribers as Anthropic's public capability ceiling. Unlike Mythos, which was withheld from release, Opus 4.6 is the model enterprises and developers deploy at scale. The AISI benchmark positioning — within 5 to 10 points of a restricted model on discrete tasks — signals that the public frontier is very close to the restricted frontier on single-task performance. The gap opens on multi-step autonomous operations: the 32-step benchmark that AISI ran is the category where Mythos demonstrates a confirmed capability Opus 4.6 was not evaluated on in the same setting.

For this beat, the significance is that JPMorgan Chase and other Project Glasswing partners have privileged access to Mythos while the rest of the market uses Opus 4.6 and its equivalents. If Mythos's advantage is in 20-hour autonomous operation chains rather than single tasks, the Glasswing access gap is more consequential than the public AISI CTF scores suggest.