ProductUS

GPT-5.5

OpenAI model; second to complete AISI's 32-step autonomous cyberattack benchmark.

Last refreshed: 2 May 2026 · Appears in 1 active topic

Key Question

Is GPT-5.5 the second AI that can carry out a full corporate cyberattack without human help?

Timeline for GPT-5.5

#81 May

AISI: GPT-5.5 matches Mythos on 32-step attack

AI: Jobs, Power & Money

#81 May

Mentioned in: Cboe cuts 20% on record-revenue day

AI: Jobs, Power & Money

View full timeline →

Follow AI: Jobs, Power & Money →

Common Questions

What is GPT-5.5 and what did it achieve on the AISI benchmark?: GPT-5.5 is an OpenAI model that scored 71.4% on AISI expert-level cybersecurity tasks and became the second model to complete AISI's 32-step 'The Last Ones' autonomous enterprise network attack benchmark on 1 May 2026.Source: UK AI Security Institute evaluation, 1 May 2026
What is the AISI 32-step 'The Last Ones' benchmark?: It is a UK AI Security Institute test that simulates autonomous traversal of a corporate network from initial access to data exfiltration across 32 steps. AISI estimates a full pass represents roughly 20 hours of trained-human autonomous work.Source: UK AI Security Institute
Which AI models have passed the AISI autonomous cyberattack benchmark?: Two models have completed the 32-step 'The Last Ones' benchmark: Anthropic's Claude Mythos Preview (first) and OpenAI's GPT-5.5 (second, evaluated 1 May 2026).Source: UK AI Security Institute evaluation, May 2026
Why does GPT-5.5 passing an attack benchmark matter for AI safety?: It marks the point at which advanced autonomous cyberattack capability is no longer exclusive to one frontier lab. Two commercially-linked models can now execute complex, multi-stage network attacks without human guidance, raising pressure for regulatory deployment thresholds.Source: AISI evaluation commentary, May 2026

Background

GPT-5.5 is an AI model released by OpenAI. On 1 May 2026, the UK AI Security Institute (AISI) published an evaluation in which GPT-5.5 scored 71.4% on expert-level capture-the-flag cybersecurity tasks, with 73% on the Mythos benchmark, and became only the second model to complete AISI's 32-step "The Last Ones" enterprise network attack range end-to-end. The first model to pass this benchmark was Anthropic's Claude Mythos Preview.

The 32-step benchmark simulates autonomous traversal of a corporate network, from initial access through lateral movement to data exfiltration. AISI estimated the capability demonstrated is equivalent to approximately 20 hours of trained-human work performed autonomously. The benchmark's completion by a second model signals that advanced autonomous cyberattack capability is no longer exclusive to one frontier AI laboratory.

The evaluation marks a significant threshold in AI safety discourse: the point at which multiple commercially available models can execute complex, multi-stage cyberattacks without human guidance. The result is likely to accelerate regulatory pressure on AI developers and CISA-adjacent bodies in both the US and UK to develop mandatory capability thresholds above which deployment restrictions apply.

How the World Sees Them

US cyber agencies

A second frontier model clearing the 32-step attack benchmark increases pressure on CISA and NSA to develop mandatory AI capability deployment thresholds.

Cybersecurity industry

The result validates the red-teaming community's view that AI-assisted offensive cyber tools are now a credible near-term threat, not a theoretical future risk.

UK AI Security Institute

The AISI published the evaluation as a public capability disclosure; the institute assesses frontier models against its attack benchmarks before and after deployment.