15MAY

AISI: GPT-5.5 matches Mythos on 32-step attack

3 min read

15:55UTC

The UK AI Security Institute published its evaluation of OpenAI's GPT-5.5 on 1 May, finding the model scored 71.4 per cent on expert-level capture-the-flag tasks and cleared AISI's 32-step enterprise-network attack range, becoming the second model after Anthropic's Mythos to do so.

← Back to GitLab signs the manifesto, Brussels backs out Jump to analysis ↓

EconomicDeveloping

Key takeaway

Two frontier AI models can now autonomously execute 32-step attack chains, and the supervisory framework was built for one.

The UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5 on 1 May 2026 ¹. The model scored 71.4 per cent on expert-level capture-the-flag tasks against Mythos's 73 per cent, and completed AISI's 32-step "The Last Ones" enterprise-network attack range end-to-end, becoming the second model after Anthropic's Claude Mythos Preview to clear the threshold. The agentic capability AISI estimated at 20 hours of trained-human work in its earlier Mythos evaluation is no longer exclusive to one frontier laboratory.

The supervisory consequence runs straight into existing rules. The Bank of England Financial Policy Committee directive in April on agentic AI risk in payments and financial markets was scoped around a single frontier model. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened five Wall Street CEOs at Treasury on 8 April over Mythos's capabilities. The Glasswing restricted-access architecture, where Anthropic distributed Mythos to 17 partners under coordinated-disclosure terms, has no equivalent for GPT-5.5. Financial firms that built risk frameworks around Mythos's specific behavioural profile must now extend them to a model with different safety training and a different deployment surface.

AISI's threshold cleared in roughly four weeks suggests the 32-step capability runs on underlying compute and post-training approach rather than a unique architectural breakthrough. Expect a third frontier model to clear it within two quarters; AISI's evaluation cadence is the constraint, not the lab capacity. The supervisory premise the BoE FPC framed in April is one month old and already outdated by a model release.

For the workforce displacement argument, the 32-step autonomous capability is the operational profile of a junior analyst, paralegal, or software engineer. Jamie Dimon told JPMorgan's February investor meeting the bank had "displaced people from AI" ; $600 million annually now goes to retraining. AISI has now confirmed two firms can sell that capability into the same financial-supervisory void. For account holders and pension contributors, the practical question is whether the FCA can supervise a payments system in which two competing AI models can autonomously execute 32-step operations when its April directive was scoped around just one.

Deep Analysis

In plain English

The UK's AI Security Institute is a government body that tests how capable AI models are at potentially dangerous tasks, including hacking into computer networks. In May 2026, it confirmed that OpenAI's newest model, GPT-5.5, can autonomously complete a 32-step process to attack and compromise an enterprise computer network. It scored 71.4% on expert-level tests. The only previous model that could do this was Anthropic's Claude Mythos, which scored 73%. Bank of England and FCA rules issued in April to manage AI risk in financial firms were written assuming only Anthropic's Mythos had cleared this capability threshold. GPT-5.5 cleared the same threshold on 1 May, making both sets of rules outdated within weeks of publication. For the AI jobs beat, the agentic capability that makes AI useful for complex multi-step work tasks, the same feature that makes it capable of network attacks, is now available from at least two competing suppliers.

Deep Analysis

Root Causes

The AISI benchmark was designed in Q3 2025 when Anthropic's Mythos was the only model approaching the 32-step capability threshold. The evaluation framework was calibrated to that frontier, using a custom enterprise network range ('The Last Ones') built to challenge Mythos specifically.

OpenAI's GPT-5.5 clearing the same benchmark within weeks of Mythos is not coincidental: frontier model capability timelines have compressed from 18-24 months per generation to 6-9 months, driven by the same $190-200 billion capex programmes at Microsoft, Amazon, and Google. The benchmark proliferation is a direct output of the infrastructure race described in events 2, 3, and 5 of this update.

The regulatory lag is structural: governments commission safety evaluations on a quarterly cycle, but capability jumps now occur on a monthly cycle. AISI published its Mythos evaluation in April 2026; GPT-5.5 cleared the same threshold by 1 May, a six-week interval between regulatory assessment and frontier proliferation.

What could happen next?

Consequence
The Bank of England FPC and FCA will be required to revise their April AI directives to address multi-model capability rather than single-frontier-model risk, adding regulatory complexity and likely delaying implementation timelines.
Immediate · 0.8
Risk
Financial institutions holding Glasswing-level AI access to either model face a materially different threat model than the single-supplier architecture regulators assumed in April; internal AI governance frameworks built around that assumption are now inadequate.
Short term · 0.72
Precedent
The six-week gap between the AISI Mythos evaluation and GPT-5.5 clearing the same threshold establishes that capability-based AI regulation is structurally unable to keep pace with frontier development under current evaluation timelines.
Medium term · 0.85

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: The 1998 proliferation of 128-bit SSL encryption from PGP (one supplier) to Netscape Navigator (mass market) created a structurally identical problem for US export controls. The Arms Export Control Act had classified 128-bit encryption as a munition; once Netscape shipped it to millions of consumers, the legal framework was overtaken by facts.

The Clinton administration revised the rules in 1999. The AISI threshold operates analogously: it was written around frontier capability held by one lab, and one lab's proliferation changes the enforcement architecture overnight.

Counter-parallel: In 2017, when multiple nation-states gained access to NSA-derived WannaCry-class cyber tools simultaneously, the global response was not immediate regulatory revision but rather a series of uncoordinated national responses that took three years to harmonise under the Budapest Convention framework. The GPT-5.5 proliferation risks the same fragmented response.

Consensus view: RUSI's cyber research group (director Ciaran Martin, former NCSC head) and Cambridge University's Centre for the Study of Existential Risk (CSER, researcher Shahar Avin) both assessed the proliferation from one to two frontier models as the critical inflection in agentic capability risk.

Martin's specific concern: the Bank of England FPC directive issued in April was calibrated to a capability held by a single firm, which regulators could engage directly. With two frontier labs clearing the threshold, regulatory containment requires either binding international standards or pre-deployment evaluation mandates, neither of which is currently in place.

Counter-view: The Information Technology and Innovation Foundation (ITIF, Alan McQuinn) published a counter-assessment arguing that agentic capability benchmarks are divorced from real-world deployment constraints.

GPT-5.5's 71.4% success on the AISI 'The Last Ones' range reflects performance under laboratory conditions; real enterprise networks have asset-specific configurations, detection layers, and human-in-the-loop responses that reduce effective attack success rates substantially. McQuinn cited Mandiant incident response data showing AI-augmented attacks currently have a 15-23% success rate on defended corporate networks.

Key tension: Whether the FCA's supervisory architecture, which requires firms to notify regulators before deploying agentic AI above certain capability thresholds, can be operationalised fast enough now that two models clear the threshold simultaneously.

First Reported In

Update #8 · Beijing court bans AI sackings as Big Tech burns cash

AISI· 2 May 2026

Read original →

Causes and effects

Caused by

AISI confirms Mythos 20-hour attack chain

AISI's April evaluation of Mythos established the 32-step autonomous benchmark; the GPT-5.5 evaluation confirms the same threshold is now cleared by a second frontier lab.

Occurred 15 Apr 2026

Read story →

BoE flags agentic AI systemic risk

Bank of England FPC's April directive was scoped to a single frontier model; GPT-5.5 clearing the same threshold within weeks makes that directive immediately outdated.

Occurred 10 Apr 2026

Read story →

Dimon: JPMorgan displaced workers from AI

JPMorgan Dimon's AI displacement admission provides the corporate context against which AISI's GPT-5.5 proliferation is most consequential for financial-sector supervision.

Occurred 24 Feb 2026

Read story →

This Event

AISI: GPT-5.5 matches Mythos on 32-step attack

The autonomous capability that took financial regulators by surprise three weeks ago is no longer exclusive to one frontier laboratory; the supervisory architecture is one model behind.

Led to

Intuit cuts 3,000, licenses its data

Anthropic's Project Glasswing established the model of frontier-AI firms accumulating proprietary sector data; Intuit's multi-year deal extends that accumulation to consumer tax and financial data.

Occurred 20 May 2026

Read story →

Washington pulls a live AI model

AISI confirmed GPT-5.5 cleared the same 32-step attack chain on 1 May, the capability comparison Anthropic cited in disputing the selective application of the directive.

Occurred 12 Jun 2026

Read story →

Different Perspectives

India IT services and global capability centre workforce

India's in-house GCCs added roughly 200,000 net staff in fiscal 2026, nearly double the 110,000 added by the IT services firms feeding the same companies. The shift moves work toward captive centres while squeezing entry-level hiring at the outsourcing firms, reshaping where Indian tech careers begin as US clients cut staff at home.

EU workers and European labour institutions

The 93-4 committee vote locked the diluted Omnibus literacy clause before plenary: EU workers in AI-augmented but non-high-risk workplaces have no statutory right to demand an explanation until December 2027. The European Trade Union Confederation called the shift from 'ensure' to 'support' a legal threshold collapse, not a drafting compromise.

UK workforce and labour market

UK 16-to-24 unemployment reached 16.2% in the latest ONS reading, above the 15.2% pandemic peak and the highest since 2015. Britain is among the most AI-exposed labour markets this desk tracks, yet the Office for National Statistics still publishes no AI-attribution layer, so young workers face the displacement without official data naming its cause.

Anthropic and frontier AI labs subject to US jurisdiction

Anthropic complied with the directive but publicly disputed its application, citing that OpenAI's GPT-5.5 carried the identical jailbreak vulnerability and remained on sale. For any US-domiciled frontier lab, the action demonstrates that regulatory compliance and political alignment are now distinct variables: Anthropic backed the pro-regulation PAC and was the first lab Washington reached.

US national-security and export-control apparatus

The Lutnick directive treats runtime inference access by a foreign national as legally equivalent to exporting Claude Fable 5 and Mythos 5 to that person's home country. It established that a deployed consumer AI product can be withdrawn globally by regulatory letter, with no appeal period and no customer notice.

European workers and regulators

NBER working paper w34995 found European workers use generative AI at 32% versus 43% of US workers, a gap driven by management practice rather than regulation. The EU AI Act's high-risk employment deadline stays at December 2027, leaving European workers facing the same displacement curve two to four years behind the US.