What did AISI find in its Claude Mythos evaluation?: AISI found Mythos has no single-task superiority over competitors but can autonomously complete a 32-step attack chain estimated to take a trained human 20 hours. CTF scores were above 85%.Source: UK AI Security Institute (via Results Sense)
What is the UK AI Security Institute and what does it do?: AISI was established after the 2023 Bletchley Park AI Safety Summit to independently evaluate frontier AI models. It has privileged access to unreleased models and publishes results for policymakers.Source: UK Department for Science, Innovation and Technology
Why did the US Treasury hold an emergency meeting about AI in April 2026?: The Bessent-Powell meeting on 8 April was convened over AI cybersecurity risks federal agencies could not verify. AISI's 15 April evaluation confirmed the concern: Mythos can sustain 20 hours of autonomous attack work.Source: UK AI Security Institute
Is Claude Mythos better than GPT-5.4 at hacking?: Not on single tasks — AISI found GPT-5.4 within 5 to 10 percentage points of Mythos on isolated CTF benchmarks. The Mythos advantage is in chained autonomous operations across 32+ steps.Source: UK AI Security Institute
What did AISI find when it tested GPT-5.5?: AISI's Frontier AI Trends Report (1 May 2026) found GPT-5.5 scored 71.4% on the expert cyber suite and completed the 32-step 'The Last Ones' autonomous attack benchmark end-to-end in 2 of 10 attempts — matching Claude Mythos's April 2026 result.Source: AISI Frontier AI Trends Report
Why was the UK AI Safety Institute renamed the AI Security Institute?: The rebrand from 'AI Safety Institute' to 'AI Security Institute' (retaining AISI) reflects a shift in mission emphasis from broad safety evaluation toward active cybersecurity risk assessment, following a series of frontier-model evaluations focused on autonomous attack capabilities.Source: DSIT
How fast is AI cybersecurity capability improving?: AISI's Frontier AI Trends Report concluded frontier cyber capability is doubling approximately every four months. Two models — Claude Mythos and GPT-5.5 — cleared the same 32-step autonomous attack benchmark within five days of each other in May 2026.Source: AISI Frontier AI Trends Report
What is the 'The Last Ones' benchmark used by AISI?: The Last Ones (TLO) is AISI's 32-step autonomous attack chain benchmark. A model that clears it end-to-end can sustain an operation AISI estimates would take a trained human around 20 hours, demonstrating durable autonomous execution across a complex multi-step offensive sequence.Source: AISI

Background

The UK AI Security Institute (AISI) published an independent evaluation of Anthropic's Claude Mythos Preview on 15 April 2026, providing the first external confirmation that the model's attack-chaining capability is genuine. On isolated capture-the-flag tasks Mythos scored above 85%, with GPT-5.4, Claude Opus 4.6 and Codex 5.3 all within 5 to 10 percentage points — no single-task superiority claimed. The significant finding was in AISI's 32-step "Last Ones" benchmark: Mythos autonomously completed an operation the Institute estimates would require a trained human roughly 20 hours, confirming durable autonomous execution.

AISI was established in November 2023 after the first international AI Safety Summit at Bletchley Park, under the UK Department for Science, Innovation and Technology. Its mandate is to evaluate frontier models for safety risks before or after deployment, with privileged access to unreleased systems. The April 2026 Mythos evaluation tested a model Anthropic had withheld from public release, demonstrating that access. The Institute works alongside the National Cyber Security Centre and international counterparts including the US AI Safety Institute.

For this beat, the AISI finding closes a loop opened by the Bessent-Powell emergency meeting of 8 April, which Treasury and the Federal Reserve convened over AI cybersecurity risks they could not themselves verify. The 20-hour autonomous-operation benchmark is the first independent confirmation that the convening was warranted on substance. The UK has a standing independent evaluator publishing results; the US has ad hoc emergency convening with no public follow-up document.

AISI's Frontier AI Trends Report, published 1 May 2026, confirmed that GPT-5.5 cleared the same 32-step autonomous attack chain benchmark ('The Last Ones') that Claude Mythos passed in April — making it the second frontier model to do so in under five days. GPT-5.5 scored 71.4% on the expert cyber suite and completed the TLO benchmark end-to-end in 2 of 10 attempts. The Report's headline finding: frontier cyber capability is doubling roughly every four months. AISI has since rebranded operationally to the AI Security Institute (retaining AISI as its abbreviation), reflecting a shift in emphasis from safety evaluation to active security risk assessment. The Institute's track record of independent evaluations — Mythos in April, GPT-5.5 in May — makes it the primary source for the accelerating-capability thesis at the centre of the ai-jobs-power-money briefing.