4APR

GPT-5.5 clears 32-step attack chain; two models in five days

4 min read

20:44UTC

The UK AI Safety Institute confirmed on 6 May 2026 that GPT-5.5 cleared the 32-step autonomous cyber attack chain benchmark, becoming the second model to do so after Claude Mythos, with AISI's Frontier AI Trends Report recording frontier cyber capability doubling every four months.

← Back to AI leads US layoffs as cuts go uncounted Jump to analysis ↓

EconomicDeveloping

Key takeaway

GPT-5.5 and Claude Mythos both cleared the 32-step attack chain within five days, making it a class-level capability.

The AISI (UK AI Safety Institute, the UK government body established in November 2023 to evaluate frontier AI capabilities) published its Frontier AI Trends Report on 6 May 2026, confirming that GPT-5.5, OpenAI's frontier language model, cleared the 32-step autonomous cyber attack chain benchmark known as "The Last Ones" (TLO). GPT-5.5 achieved 71.4% on the expert cyber suite and solved the TLO benchmark end-to-end in 2 of 10 attempts. GPT-5.5 is the second model to clear the benchmark; Anthropic's Claude Mythos cleared the same threshold on 1 May 2026 .

AISI's report assesses frontier cyber capability as doubling every four months. The 32-step attack chain benchmark triggered an emergency convening of five Wall Street bank CEOs by Treasury Secretary Bessent and Fed Chair Powell in April 2026, after Claude Mythos became the first model to clear it . Five days after Mythos cleared the threshold on 1 May, GPT-5.5 repeated the result on 6 May.

Two separate frontier models clearing the same benchmark within five days changes the nature of the risk assessment. A single-model capability that is deliberately kept under restricted access (as Mythos Preview was, through Project Glasswing with access limited to twelve partners) can be managed through deployment controls. A benchmark cleared by two models from different organisations within five days of each other is no longer a single-deployment governance question; it is a class of capability that has arrived across the frontier.

AISI's four-month doubling rate, if it holds, means the next iteration of the same benchmark class will be cleared by a broader set of models within months. AISI's evaluation function, confirming which models have crossed which thresholds, is doing the work that no other institution with formal standing has yet stepped in to perform. Whether that evaluation function informs regulatory action in any jurisdiction within a timeframe that is operationally relevant to the doubling rate is the question AISI's report leaves open.

Deep Analysis

In plain English

On 6 May 2026, the UK AI Safety Institute published a report confirming that GPT-5.5, an AI model from OpenAI, had passed a difficult cyber security test. The test involves completing 32 consecutive steps in a simulated computer attack without human help. Only one other model, Anthropic's Claude Mythos, had passed this test before, and it did so just five days earlier. The report also said that frontier AI models are getting twice as capable at cyber attacks every four months. This matters because cybersecurity has been one of the sectors least affected by AI job cuts, because the technology was not yet reliable enough to replace human security analysts. A confirmed four-month capability doubling, if it continues, changes that assumption within 12-18 months.

Deep Analysis

Root Causes

The relevance to the AI-jobs topic is the labour market implication of the four-month capability doubling that the body does not name explicitly. Cybersecurity is one of the few technology employment sectors that has remained at full employment or near it throughout the 2025-2026 restructuring cycle: demand for human cyber analysts has been resilient because autonomous cyber capability has been below the operational reliability threshold.

The AISI confirmation that two models have now cleared the 32-step benchmark changes the supply side of the cybersecurity labour market: if frontier AI models can autonomously execute 32-step attack chains at a 20% success rate today, security operations centres will face pressure to reduce analyst headcount as that rate improves toward operational reliability.

Lloyd's of London cyber underwriters base pricing partly on the probability of a successful autonomous attack on insured infrastructure, making the AISI benchmark confirmation an underwriting input. A confirmed four-month doubling of frontier cyber capability is an underwriting input that re-prices coverage upward, affecting every enterprise that carries cyber insurance.

What could happen next?

Risk
Security operations centre headcount justifications at major banks and critical infrastructure operators begin to face the same AI displacement pressure as software engineering if the four-month doubling sustains through 2026.
Medium term · 0.6
Consequence
Cyber insurance pricing must incorporate the AISI-confirmed capability doubling as a loss probability input; Lloyd's and specialist cyber underwriters will reprice 2027 coverage upward.
Short term · 0.7
Risk
The absence of a second interagency convening following GPT-5.5's clearance, five days after the first model cleared the same benchmark, suggests the institutional alarm response is not scaling with the capability curve.
Immediate · 0.65

Source Landscape

This story draws on mixed-leaning sources from United States

United States

Primary parallel: The UK government's April 2026 emergency Treasury-Federal Reserve convening at the Mythos capability confirmation is the structural parallel: a government response to an AI capability threshold crossing that specifically triggered interagency co-ordination. GPT-5.5 cleared the same threshold within five days of Mythos, but no equivalent convening has been announced, suggesting the second crossing of a threshold generates less institutional alarm than the first.

Counter-parallel: AISI's evaluation of Anthropic's Claude Mythos Preview on 15 April 2026 found no single-task superiority among the top models: GPT-5.4, Claude Opus 4.6, and Codex 5.3 all scored within 5-10 percentage points of Mythos on isolated CTF tasks.

The convergence of frontier models on the same capability threshold within days of each other is consistent with the Schelling point logic: once one model demonstrates a capability, others close the gap rapidly because the underlying research community has access to similar architectures.

The frontier cyber capability doubling every four months creates two economic implications outside the defence sector.

First, the cybersecurity insurance market faces repricing. Autonomous attack capability at the AISI benchmark level changes loss probability calculations for every enterprise holding cyber coverage; Lloyd's and Beazley have both flagged frontier AI models as a pricing variable in 2026.

Second, the security operations centre headcount models at major banks, insurers, and critical infrastructure operators are built on assumptions about human analyst throughput per threat event. A model that can execute 32-step attack chains autonomously at 20% and rising reliability changes the analyst-per-threat ratio on the attack side, which in turn changes the SOC headcount justification on the defence side.

Consensus view: RAND Corporation's Sasha Romanosky, who studies AI and critical infrastructure risk, assessed in May 2026 that the capability doubling every four months, if sustained, produces a model capability level by Q1 2027 that would clear the 32-step benchmark in 5 of 10 attempts rather than the current 2 of 10.

The strategic implication is that state-level cyber actors will have access to autonomous attack-chain capability within 12-18 months at the reliability threshold required for operational deployment.

Counter-view: Citizen Lab director Ron Deibert argues that the AISI benchmark measures isolated laboratory performance on a defined test environment, not operational cyber capability in a real-world network with defences, monitoring, and human response. A 2-of-10 completion rate on a known test environment does not translate directly to operational deployment reliability, and historical capability curves from AI benchmarks regularly plateau before reaching assumed trajectories.

Key tension: Whether the four-month doubling period sustains beyond the current benchmark level, or whether this is the last benchmark at which performance scales smoothly before encountering diminishing returns, determines whether the emergency Treasury-Fed convening over Claude Mythos needs replication for GPT-5.5's capability confirmation.

Sources:Bloomberg / CFO Dive·Bloomberg / CFO Dive

Mentions:GPT-5.5 →Anthropic →OpenAI →Federal Reserve →Claude Mythos Preview →Project Glasswing →Codex 5.3 →Claude Opus 4.6 →RAND Corporation →UK AI Security Institute →

First Reported In

Update #9 · GitLab signs the manifesto, Brussels backs out

Bloomberg / CFO Dive· 15 May 2026

Read original →

Causes and effects

Caused by

UK vacancies frozen at 721,000 for six months

711,000 April reading breaks downward from six-publication plateau that had been the covered story

Occurred 10 Apr 2026

Read story →

This Event

GPT-5.5 clears 32-step attack chain; two models in five days

Two separate frontier models clearing the same 32-step attack chain within five days of each other confirms the benchmark is no longer a single-model capability threshold, removing the assumption that frontier cyber risk remains isolated to a single controlled deployment.

Different Perspectives

Barclays

Barclays economist Pooja Sriram flagged a 28,000-a-month bleed in finance and information roles the same week Microsoft disputed that AI drove its own 4,800 cuts. The bank treats Challenger's AI-attribution share as a lagging indicator against faster erosion visible in raw labour-market data.

European Commission

Brussels deferred the Digital Omnibus's Annex III employment-compliance deadline from 2 August 2026 to December 2027, even as California advanced three binding AI-hiring bills the same week. The 17-month delay leaves EU workers without the algorithmic-hiring safeguards the regulation already promises.

OpenAI

OpenAI proposed a 5% US government equity stake worth $42.6bn, structured as a public wealth fund modelled on the Alaska Permanent Fund, with Sam Altman pitching it directly to Trump, Bessent and Lutnick. The offer pre-empts Sanders' rival one-time 50% AI-stock tax, which has not yet reached committee.

India's IT and outsourcing sector

BAT's transfer of 3,500 roles to Accenture on 29 June fits a delivery model Indian IT firms increasingly run: consultancies win Western contracts, then execute through offshore centres. The sector expects more Fit2Win-style transfers, not straight redundancies, as employers absorb AI without cutting outsourced headcount.

European Trade Union Confederation

ETUC says the Council's shift from 'ensure' to 'support' in the AI-literacy duty, confirmed in the Digital Omnibus's final adoption on 29 June, is a collapse of the legal threshold, not a drafting tidy-up. It expects EU workers to face AI-driven hiring and monitoring decisions with a statutory right to explanation that exists in name only.

British American Tobacco's Fit2Win workforce

BAT is cutting 9,000 roles under Fit2Win, transferring 3,500 to Accenture rather than making them redundant, to reach roughly £500m in AI-driven savings by 2027. For affected staff, that distinction decides whether they keep a job at all, just not at BAT.

Key Figures

GPT-5.5 expert cyber suite score71.4%

TLO benchmark end-to-end completions2 of 10 attempts

Frontier cyber capability doubling periodevery four months

In everyday terms

Cybersecurity analysts, security operations centre workers, and penetration testing professionals are in the first technology employment category where AI autonomous capability has now reached a benchmark threshold that presages operational deployment. The four-month doubling trajectory, if sustained, suggests that the category's relative insulation from the AI displacement pressure affecting software engineers will shorten.

What’s Uncertain

Four-month capability doubling may not sustainAISI's assessed doubling rate is based on recent benchmark progression. AI capability curves regularly plateau before reaching assumed trajectories; 2-of-10 to operationally reliable may not follow the same curve as laboratory benchmark progression.

Laboratory-to-operational gap unknownThe 32-step benchmark completion rate measures performance on a defined test environment. Real-world network defence, monitoring, and human response create a gap between benchmark performance and operational deployment reliability that is not modelled in AISI's report.

Sources

2 centre

United States

Centre-left

Bloomberg / CFO Dive Microsoft launches voluntary retirement programme covering 8,750 US employees; LinkedIn cuts on 13 MayUS financial and business news agency

Bloomberg / CFO Dive

Entities Involved

GPT-5.5 →Anthropic →OpenAI →Federal Reserve →Claude Mythos Preview →

Evidence & sources

3 sources

SourceUK AI Safety Institute, Frontier AI Trends Report, published around 6 May 2026
SourceAISI evaluation of Anthropic Claude Mythos Preview, 15 April 2026 {{EVREF:/t/ai-jobs-power-money/6/aisi-confirms-mythos-20-hour-attack-chain/}}
SourceTreasury-Federal Reserve emergency convening, 8 April 2026 {{EVREF:/t/ai-jobs-power-money/5/fed-and-treasury-summon-bank-ceos/}}

Understand the Context