2MAY

AISI: GPT-5.5 matches Mythos on 32-step attack

3 min read

15:17UTC

The UK AI Security Institute published its evaluation of OpenAI's GPT-5.5 on 1 May, finding the model scored 71.4 per cent on expert-level capture-the-flag tasks and cleared AISI's 32-step enterprise-network attack range, becoming the second model after Anthropic's Mythos to do so.

← Back to Beijing court bans AI sackings as Big Tech burns cash Jump to analysis ↓

EconomicDeveloping

Key takeaway

Two frontier AI models can now autonomously execute 32-step attack chains, and the supervisory framework was built for one.

The UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5 on 1 May 2026 ¹. The model scored 71.4 per cent on expert-level capture-the-flag tasks against Mythos's 73 per cent, and completed AISI's 32-step "The Last Ones" enterprise-network attack range end-to-end, becoming the second model after Anthropic's Claude Mythos Preview to clear the threshold. The agentic capability AISI estimated at 20 hours of trained-human work in its earlier Mythos evaluation is no longer exclusive to one frontier laboratory.

The supervisory consequence runs straight into existing rules. The Bank of England Financial Policy Committee directive in April on agentic AI risk in payments and financial markets was scoped around a single frontier model. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened five Wall Street CEOs at Treasury on 8 April over Mythos's capabilities. The Glasswing restricted-access architecture, where Anthropic distributed Mythos to 17 partners under coordinated-disclosure terms, has no equivalent for GPT-5.5. Financial firms that built risk frameworks around Mythos's specific behavioural profile must now extend them to a model with different safety training and a different deployment surface.

AISI's threshold cleared in roughly four weeks suggests the 32-step capability runs on underlying compute and post-training approach rather than a unique architectural breakthrough. Expect a third frontier model to clear it within two quarters; AISI's evaluation cadence is the constraint, not the lab capacity. The supervisory premise the BoE FPC framed in April is one month old and already outdated by a model release.

For the workforce displacement argument, the 32-step autonomous capability is the operational profile of a junior analyst, paralegal, or software engineer. Jamie Dimon told JPMorgan's February investor meeting the bank had "displaced people from AI" ; $600 million annually now goes to retraining. AISI has now confirmed two firms can sell that capability into the same financial-supervisory void. For account holders and pension contributors, the practical question is whether the FCA can supervise a payments system in which two competing AI models can autonomously execute 32-step operations when its April directive was scoped around just one.

Deep Analysis

In plain English

The UK's AI Security Institute is a government body that tests how capable AI models are at potentially dangerous tasks, including hacking into computer networks. In May 2026, it confirmed that OpenAI's newest model, GPT-5.5, can autonomously complete a 32-step process to attack and compromise an enterprise computer network. It scored 71.4% on expert-level tests. The only previous model that could do this was Anthropic's Claude Mythos, which scored 73%. Bank of England and FCA rules issued in April to manage AI risk in financial firms were written assuming only Anthropic's Mythos had cleared this capability threshold. GPT-5.5 cleared the same threshold on 1 May, making both sets of rules outdated within weeks of publication. For the AI jobs beat, the agentic capability that makes AI useful for complex multi-step work tasks, the same feature that makes it capable of network attacks, is now available from at least two competing suppliers.

Deep Analysis

Root Causes

The AISI benchmark was designed in Q3 2025 when Anthropic's Mythos was the only model approaching the 32-step capability threshold. The evaluation framework was calibrated to that frontier, using a custom enterprise network range ('The Last Ones') built to challenge Mythos specifically.

OpenAI's GPT-5.5 clearing the same benchmark within weeks of Mythos is not coincidental: frontier model capability timelines have compressed from 18-24 months per generation to 6-9 months, driven by the same $190-200 billion capex programmes at Microsoft, Amazon, and Google. The benchmark proliferation is a direct output of the infrastructure race described in events 2, 3, and 5 of this update.

The regulatory lag is structural: governments commission safety evaluations on a quarterly cycle, but capability jumps now occur on a monthly cycle. AISI published its Mythos evaluation in April 2026; GPT-5.5 cleared the same threshold by 1 May, a six-week interval between regulatory assessment and frontier proliferation.

What could happen next?

Consequence
The Bank of England FPC and FCA will be required to revise their April AI directives to address multi-model capability rather than single-frontier-model risk, adding regulatory complexity and likely delaying implementation timelines.
Immediate · 0.8
Risk
Financial institutions holding Glasswing-level AI access to either model face a materially different threat model than the single-supplier architecture regulators assumed in April; internal AI governance frameworks built around that assumption are now inadequate.
Short term · 0.72
Precedent
The six-week gap between the AISI Mythos evaluation and GPT-5.5 clearing the same threshold establishes that capability-based AI regulation is structurally unable to keep pace with frontier development under current evaluation timelines.
Medium term · 0.85

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: The 1998 proliferation of 128-bit SSL encryption from PGP (one supplier) to Netscape Navigator (mass market) created a structurally identical problem for US export controls. The Arms Export Control Act had classified 128-bit encryption as a munition; once Netscape shipped it to millions of consumers, the legal framework was overtaken by facts.

The Clinton administration revised the rules in 1999. The AISI threshold operates analogously: it was written around frontier capability held by one lab, and one lab's proliferation changes the enforcement architecture overnight.

Counter-parallel: In 2017, when multiple nation-states gained access to NSA-derived WannaCry-class cyber tools simultaneously, the global response was not immediate regulatory revision but rather a series of uncoordinated national responses that took three years to harmonise under the Budapest Convention framework. The GPT-5.5 proliferation risks the same fragmented response.

Consensus view: RUSI's cyber research group (director Ciaran Martin, former NCSC head) and Cambridge University's Centre for the Study of Existential Risk (CSER, researcher Shahar Avin) both assessed the proliferation from one to two frontier models as the critical inflection in agentic capability risk.

Martin's specific concern: the Bank of England FPC directive issued in April was calibrated to a capability held by a single firm, which regulators could engage directly. With two frontier labs clearing the threshold, regulatory containment requires either binding international standards or pre-deployment evaluation mandates, neither of which is currently in place.

Counter-view: The Information Technology and Innovation Foundation (ITIF, Alan McQuinn) published a counter-assessment arguing that agentic capability benchmarks are divorced from real-world deployment constraints.

GPT-5.5's 71.4% success on the AISI 'The Last Ones' range reflects performance under laboratory conditions; real enterprise networks have asset-specific configurations, detection layers, and human-in-the-loop responses that reduce effective attack success rates substantially. McQuinn cited Mandiant incident response data showing AI-augmented attacks currently have a 15-23% success rate on defended corporate networks.

Key tension: Whether the FCA's supervisory architecture, which requires firms to notify regulators before deploying agentic AI above certain capability thresholds, can be operationalised fast enough now that two models clear the threshold simultaneously.

First Reported In

Update #8 · Beijing court bans AI sackings as Big Tech burns cash

AISI· 2 May 2026

Read original →

Causes and effects

Caused by

AISI confirms Mythos 20-hour attack chain

AISI's April evaluation of Mythos established the 32-step autonomous benchmark; the GPT-5.5 evaluation confirms the same threshold is now cleared by a second frontier lab.

Occurred 15 Apr 2026

Read story →

BoE flags agentic AI systemic risk

Bank of England FPC's April directive was scoped to a single frontier model; GPT-5.5 clearing the same threshold within weeks makes that directive immediately outdated.

Occurred 10 Apr 2026

Read story →

Dimon: JPMorgan displaced workers from AI

JPMorgan Dimon's AI displacement admission provides the corporate context against which AISI's GPT-5.5 proliferation is most consequential for financial-sector supervision.

Occurred 24 Feb 2026

Read story →

This Event

AISI: GPT-5.5 matches Mythos on 32-step attack

The autonomous capability that took financial regulators by surprise three weeks ago is no longer exclusive to one frontier laboratory; the supervisory architecture is one model behind.

Different Perspectives

UK financial regulators (BoE FPC / FCA)

The Bank of England's April FPC directive on agentic AI in payments was scoped around one frontier model; AISI confirmed a second model cleared the same 32-step threshold on 1 May. The supervisory architecture is one model behind the capability it was built to contain.

Indian IT sector workers (TCS, Infosys, Wipro)

TCS posted its first annual revenue decline in the modern era, Infosys shed 8,400 workers in a quarter, and Wipro hit its zero-fresher target. Western Big Tech's AI automation is cannibalising the offshored-services model that employs roughly five million Indian IT workers.

Chinese workers (Hangzhou and Beijing plaintiffs)

Workers Zhou and Liu won cases that established a two-court doctrinal chain: AI adoption is the employer's deliberate strategy, placing the cost of displacement on the employer rather than the worker. Any Chinese employee facing AI-driven dismissal now has a citable legal route that American, British, and European counterparts do not.

Chinese government, courts, and domestic employers

The Hangzhou rulings were released on Workers' Day eve alongside the Ministry of Human Resources' recognition of 42 new AI occupations. Domestic firms now face mandatory retraining obligations; the Orgvue estimate of 8-14 months added to displacement timelines will feature in employer compliance briefings throughout 2026.

EU regulators and European Parliament

The second Digital Omnibus trilogue collapsed without agreement on 28 April; the third is scheduled for 13 May with the binding employer AI-literacy obligation still contested. Brussels is arguing over a non-binding encouragement clause while Beijing's courts have already bound employers.

US legislators (Warner, Rounds, Hawley, Sanders)

Warner and Rounds produced the Economy of the Future Commission Act, the most concrete federal vehicle still moving, endorsed by the companies it would notionally regulate. The Sanders-AOC moratorium was killed by Democratic senators; the Hawley-Warner disclosure bill remains in committee with no floor date.