The UK AI Security Institute (AISI) published an independent evaluation of Anthropic's Claude Mythos Preview on 15 April 2026. On isolated capture-the-flag (CTF) tasks, Mythos scored above 85%, but rival frontier models, GPT-5.4, Claude Opus 4.6 and Codex 5.3, fell within 5 to 10 percentage points. No single-task superiority. In AISI's 32-step "The Last Ones" benchmark, however, Mythos autonomously completed a sequence the Institute estimates would take a trained human roughly 20 hours, without human prompting between steps.
AISI is the UK government body established to evaluate the safety of frontier AI models; its evaluation is the first external assessment of Mythos since Anthropic distributed restricted access to twelve founding partners under Project Glasswing on 8 April . Anthropic's marketing had emphasised thousands of zero-day vulnerabilities discovered by the model; Tom's Hardware on 9 April reported those claims rested on only 198 manual reviews . AISI's CTF findings partly vindicate that critique: Mythos is not dramatically more capable than competitors at short, bounded tasks.
The attack-chaining result is the capability that matters. Sustained autonomous execution over 32 steps and roughly 20 hours is the operational profile a trained human analyst, paralegal or junior engineer currently provides inside a bank, law firm or software team. It is also the profile the Scott Bessent and Jerome Powell emergency convening of Wall Street CEOs at Treasury on 8 April was called to assess . Treasury and the Fed convened promptly on a capability that federal agencies could not themselves verify; AISI's 20-hour-human-equivalent figure is the first external confirmation the convening was warranted on substance.
For the workforce implication, the relevant dimension is not Mythos's cybersecurity reach but its ability to replace trained-human throughput at chain-of-task scale. That capability is what JPMorgan CEO Jamie Dimon described in February when he told the bank's investor meeting that AI has led to internal redeployment, covered elsewhere in this update. Every original Glasswing partner, and the additional five named in Anthropic's 7 April system card, will have to integrate the attack-chain profile into internal risk frameworks during live deployment.
The evaluation was accessed via a third-party summary from Results Sense rather than AISI's primary publication, so specific scores should be verified against the Institute's direct release when it becomes available. The methodology point, however, is solidly established: Mythos's material advantage is durability, not speed, and durability is the AI capability that most directly substitutes for salaried human labour.
