Skip to content
Briefings are running a touch slower this week while we rebuild the foundations.See roadmap
AI: Jobs, Power & Money
16APR

AISI: GPT-5.5 matches Mythos on 32-step attack

3 min read
13:29UTC

The UK AI Security Institute published its evaluation of OpenAI's GPT-5.5 on 1 May, finding the model scored 71.4 per cent on expert-level capture-the-flag tasks and cleared AISI's 32-step enterprise-network attack range, becoming the second model after Anthropic's Mythos to do so.

EconomicDeveloping
Key takeaway

Two frontier AI models can now autonomously execute 32-step attack chains, and the supervisory framework was built for one.

The UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5 on 1 May 2026 1. The model scored 71.4 per cent on expert-level capture-the-flag tasks against Mythos's 73 per cent, and completed AISI's 32-step "The Last Ones" enterprise-network attack range end-to-end, becoming the second model after Anthropic's Claude Mythos Preview to clear the threshold. The agentic capability AISI estimated at 20 hours of trained-human work in its earlier Mythos evaluation is no longer exclusive to one frontier laboratory.

The supervisory consequence runs straight into existing rules. The Bank of England Financial Policy Committee directive in April on agentic AI risk in payments and financial markets was scoped around a single frontier model. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened five Wall Street CEOs at Treasury on 8 April over Mythos's capabilities. The Glasswing restricted-access architecture, where Anthropic distributed Mythos to 17 partners under coordinated-disclosure terms, has no equivalent for GPT-5.5. Financial firms that built risk frameworks around Mythos's specific behavioural profile must now extend them to a model with different safety training and a different deployment surface.

AISI's threshold cleared in roughly four weeks suggests the 32-step capability runs on underlying compute and post-training approach rather than a unique architectural breakthrough. Expect a third frontier model to clear it within two quarters; AISI's evaluation cadence is the constraint, not the lab capacity. The supervisory premise the BoE FPC framed in April is one month old and already outdated by a model release.

For the workforce displacement argument, the 32-step autonomous capability is the operational profile of a junior analyst, paralegal, or software engineer. Jamie Dimon told JPMorgan's February investor meeting the bank had "displaced people from AI" ; $600 million annually now goes to retraining. AISI has now confirmed two firms can sell that capability into the same financial-supervisory void. For account holders and pension contributors, the practical question is whether the FCA can supervise a payments system in which two competing AI models can autonomously execute 32-step operations when its April directive was scoped around just one.

Deep Analysis

In plain English

The UK's AI Security Institute is a government body that tests how capable AI models are at potentially dangerous tasks, including hacking into computer networks. In May 2026, it confirmed that OpenAI's newest model, GPT-5.5, can autonomously complete a 32-step process to attack and compromise an enterprise computer network. It scored 71.4% on expert-level tests. The only previous model that could do this was Anthropic's Claude Mythos, which scored 73%. Bank of England and FCA rules issued in April to manage AI risk in financial firms were written assuming only Anthropic's Mythos had cleared this capability threshold. GPT-5.5 cleared the same threshold on 1 May, making both sets of rules outdated within weeks of publication. For the AI jobs beat, the agentic capability that makes AI useful for complex multi-step work tasks, the same feature that makes it capable of network attacks, is now available from at least two competing suppliers.

Deep Analysis
Root Causes

The AISI benchmark was designed in Q3 2025 when Anthropic's Mythos was the only model approaching the 32-step capability threshold. The evaluation framework was calibrated to that frontier, using a custom enterprise network range ('The Last Ones') built to challenge Mythos specifically.

OpenAI's GPT-5.5 clearing the same benchmark within weeks of Mythos is not coincidental: frontier model capability timelines have compressed from 18-24 months per generation to 6-9 months, driven by the same $190-200 billion capex programmes at Microsoft, Amazon, and Google. The benchmark proliferation is a direct output of the infrastructure race described in events 2, 3, and 5 of this update.

The regulatory lag is structural: governments commission safety evaluations on a quarterly cycle, but capability jumps now occur on a monthly cycle. AISI published its Mythos evaluation in April 2026; GPT-5.5 cleared the same threshold by 1 May, a six-week interval between regulatory assessment and frontier proliferation.

What could happen next?
  • Consequence

    The Bank of England FPC and FCA will be required to revise their April AI directives to address multi-model capability rather than single-frontier-model risk, adding regulatory complexity and likely delaying implementation timelines.

    Immediate · 0.8
  • Risk

    Financial institutions holding Glasswing-level AI access to either model face a materially different threat model than the single-supplier architecture regulators assumed in April; internal AI governance frameworks built around that assumption are now inadequate.

    Short term · 0.72
  • Precedent

    The six-week gap between the AISI Mythos evaluation and GPT-5.5 clearing the same threshold establishes that capability-based AI regulation is structurally unable to keep pace with frontier development under current evaluation timelines.

    Medium term · 0.85
First Reported In

Update #8 · Beijing court bans AI sackings as Big Tech burns cash

AISI· 2 May 2026
Read original
Different Perspectives
TSMC and Taiwan chip supply chain
TSMC and Taiwan chip supply chain
Nvidia's 17% headcount growth to 42,000 on $81.6 billion in quarterly revenue depends on TSMC's CoWoS advanced packaging capacity constraining H100 and B200 supply, sustaining margins above 70%. The AI build-out's sole headcount-growth story runs through a Taiwan supply chain that has no parallel in downstream software.
Displaced tech workers globally
Displaced tech workers globally
CrowdStrike's SEC disclosure puts AI attribution on a material regulatory record for the first time, but Oracle's Massachusetts WARN clock expired unfiled after up to 14 workers were logged as remote despite office proximity. The legal apparatus cannot enforce what it cannot see: hybrid reclassification, GCC transfers, and hires never made.
UK workforce and policymakers
UK workforce and policymakers
ONS recorded UK vacancies at 705,000, below the pre-pandemic baseline for the first time, as payrolled employment fell 210,000 year on year with real wage growth at 0.1%. The Bank of England's AI worst case assumed 500,000 additional unemployed from a baseline above 730,000; the UK is already below that floor, and ONS still publishes no AI-exposure breakdown.
India IT workforce and graduates
India IT workforce and graduates
NASSCOM's FY2026 data shows net sector growth of 140,000, but entry-level hiring fell 20-25% as the growth concentrated in in-house GCC offices requiring mid-career specialists. Indian graduates who previously entered through TCS, Infosys and Wipro fresher programmes find that channel closing at both ends: outsourcers cutting and GCCs not hiring at the junior level.
IG Metall and European trade unions
IG Metall and European trade unions
European labour bodies see the market reward pattern, cuts on record revenue, as investor preference for short-term margin extraction over validated AI productivity. They note the EU Digital Omnibus provisional deal has dropped binding employer AI-literacy obligations at the precise moment the ILO-NASK index has quantified that 3.3% of global workers are in the highest AI exposure category.
Federal Reserve Board
Federal Reserve Board
Governor Cook told Stanford's SIEPR on 27 May that speculative-grade software bond spreads have widened on AI-disruption concern, moving AI displacement from a labour observation into the Fed's financial-stability mandate. The Fed cannot resolve structural labour transformation through rate policy, so Cook routed the concern through the one channel the Fed does control.