Skip to content
Briefings are running a touch slower this week while we rebuild the foundations.See roadmap
AI: Jobs, Power & Money
2MAY

AISI: GPT-5.5 matches Mythos on 32-step attack

3 min read
15:17UTC

The UK AI Security Institute published its evaluation of OpenAI's GPT-5.5 on 1 May, finding the model scored 71.4 per cent on expert-level capture-the-flag tasks and cleared AISI's 32-step enterprise-network attack range, becoming the second model after Anthropic's Mythos to do so.

EconomicDeveloping
Key takeaway

Two frontier AI models can now autonomously execute 32-step attack chains, and the supervisory framework was built for one.

The UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5 on 1 May 2026 1. The model scored 71.4 per cent on expert-level capture-the-flag tasks against Mythos's 73 per cent, and completed AISI's 32-step "The Last Ones" enterprise-network attack range end-to-end, becoming the second model after Anthropic's Claude Mythos Preview to clear the threshold. The agentic capability AISI estimated at 20 hours of trained-human work in its earlier Mythos evaluation is no longer exclusive to one frontier laboratory.

The supervisory consequence runs straight into existing rules. The Bank of England Financial Policy Committee directive in April on agentic AI risk in payments and financial markets was scoped around a single frontier model. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened five Wall Street CEOs at Treasury on 8 April over Mythos's capabilities. The Glasswing restricted-access architecture, where Anthropic distributed Mythos to 17 partners under coordinated-disclosure terms, has no equivalent for GPT-5.5. Financial firms that built risk frameworks around Mythos's specific behavioural profile must now extend them to a model with different safety training and a different deployment surface.

AISI's threshold cleared in roughly four weeks suggests the 32-step capability runs on underlying compute and post-training approach rather than a unique architectural breakthrough. Expect a third frontier model to clear it within two quarters; AISI's evaluation cadence is the constraint, not the lab capacity. The supervisory premise the BoE FPC framed in April is one month old and already outdated by a model release.

For the workforce displacement argument, the 32-step autonomous capability is the operational profile of a junior analyst, paralegal, or software engineer. Jamie Dimon told JPMorgan's February investor meeting the bank had "displaced people from AI" ; $600 million annually now goes to retraining. AISI has now confirmed two firms can sell that capability into the same financial-supervisory void. For account holders and pension contributors, the practical question is whether the FCA can supervise a payments system in which two competing AI models can autonomously execute 32-step operations when its April directive was scoped around just one.

Deep Analysis

In plain English

The UK's AI Security Institute is a government body that tests how capable AI models are at potentially dangerous tasks, including hacking into computer networks. In May 2026, it confirmed that OpenAI's newest model, GPT-5.5, can autonomously complete a 32-step process to attack and compromise an enterprise computer network. It scored 71.4% on expert-level tests. The only previous model that could do this was Anthropic's Claude Mythos, which scored 73%. Bank of England and FCA rules issued in April to manage AI risk in financial firms were written assuming only Anthropic's Mythos had cleared this capability threshold. GPT-5.5 cleared the same threshold on 1 May, making both sets of rules outdated within weeks of publication. For the AI jobs beat, the agentic capability that makes AI useful for complex multi-step work tasks, the same feature that makes it capable of network attacks, is now available from at least two competing suppliers.

Deep Analysis
Root Causes

The AISI benchmark was designed in Q3 2025 when Anthropic's Mythos was the only model approaching the 32-step capability threshold. The evaluation framework was calibrated to that frontier, using a custom enterprise network range ('The Last Ones') built to challenge Mythos specifically.

OpenAI's GPT-5.5 clearing the same benchmark within weeks of Mythos is not coincidental: frontier model capability timelines have compressed from 18-24 months per generation to 6-9 months, driven by the same $190-200 billion capex programmes at Microsoft, Amazon, and Google. The benchmark proliferation is a direct output of the infrastructure race described in events 2, 3, and 5 of this update.

The regulatory lag is structural: governments commission safety evaluations on a quarterly cycle, but capability jumps now occur on a monthly cycle. AISI published its Mythos evaluation in April 2026; GPT-5.5 cleared the same threshold by 1 May, a six-week interval between regulatory assessment and frontier proliferation.

What could happen next?
  • Consequence

    The Bank of England FPC and FCA will be required to revise their April AI directives to address multi-model capability rather than single-frontier-model risk, adding regulatory complexity and likely delaying implementation timelines.

    Immediate · 0.8
  • Risk

    Financial institutions holding Glasswing-level AI access to either model face a materially different threat model than the single-supplier architecture regulators assumed in April; internal AI governance frameworks built around that assumption are now inadequate.

    Short term · 0.72
  • Precedent

    The six-week gap between the AISI Mythos evaluation and GPT-5.5 clearing the same threshold establishes that capability-based AI regulation is structurally unable to keep pace with frontier development under current evaluation timelines.

    Medium term · 0.85
First Reported In

Update #8 · Beijing court bans AI sackings as Big Tech burns cash

AISI· 2 May 2026
Read original
Different Perspectives
UK financial regulators (BoE FPC / FCA)
UK financial regulators (BoE FPC / FCA)
The Bank of England's April FPC directive on agentic AI in payments was scoped around one frontier model; AISI confirmed a second model cleared the same 32-step threshold on 1 May. The supervisory architecture is one model behind the capability it was built to contain.
Indian IT sector workers (TCS, Infosys, Wipro)
Indian IT sector workers (TCS, Infosys, Wipro)
TCS posted its first annual revenue decline in the modern era, Infosys shed 8,400 workers in a quarter, and Wipro hit its zero-fresher target. Western Big Tech's AI automation is cannibalising the offshored-services model that employs roughly five million Indian IT workers.
Chinese workers (Hangzhou and Beijing plaintiffs)
Chinese workers (Hangzhou and Beijing plaintiffs)
Workers Zhou and Liu won cases that established a two-court doctrinal chain: AI adoption is the employer's deliberate strategy, placing the cost of displacement on the employer rather than the worker. Any Chinese employee facing AI-driven dismissal now has a citable legal route that American, British, and European counterparts do not.
Chinese government, courts, and domestic employers
Chinese government, courts, and domestic employers
The Hangzhou rulings were released on Workers' Day eve alongside the Ministry of Human Resources' recognition of 42 new AI occupations. Domestic firms now face mandatory retraining obligations; the Orgvue estimate of 8-14 months added to displacement timelines will feature in employer compliance briefings throughout 2026.
EU regulators and European Parliament
EU regulators and European Parliament
The second Digital Omnibus trilogue collapsed without agreement on 28 April; the third is scheduled for 13 May with the binding employer AI-literacy obligation still contested. Brussels is arguing over a non-binding encouragement clause while Beijing's courts have already bound employers.
US legislators (Warner, Rounds, Hawley, Sanders)
US legislators (Warner, Rounds, Hawley, Sanders)
Warner and Rounds produced the Economy of the Future Commission Act, the most concrete federal vehicle still moving, endorsed by the companies it would notionally regulate. The Sanders-AOC moratorium was killed by Democratic senators; the Hawley-Warner disclosure bill remains in committee with no floor date.