Skip to content
Briefings are running a touch slower this week while we rebuild the foundations.See roadmap
GPT-5.5
ProductUS

GPT-5.5

OpenAI model; second to complete AISI's 32-step autonomous cyberattack benchmark.

Last refreshed: 2 May 2026 · Appears in 1 active topic

Key Question

Is GPT-5.5 the second AI that can carry out a full corporate cyberattack without human help?

Timeline for GPT-5.5

View full timeline →
Common Questions
What is GPT-5.5 and what did it achieve on the AISI benchmark?
GPT-5.5 is an OpenAI model that scored 71.4% on AISI expert-level cybersecurity tasks and became the second model to complete AISI's 32-step 'The Last Ones' autonomous enterprise network attack benchmark on 1 May 2026.Source: UK AI Security Institute evaluation, 1 May 2026
What is the AISI 32-step 'The Last Ones' benchmark?
It is a UK AI Security Institute test that simulates autonomous traversal of a corporate network from initial access to data exfiltration across 32 steps. AISI estimates a full pass represents roughly 20 hours of trained-human autonomous work.Source: UK AI Security Institute
Which AI models have passed the AISI autonomous cyberattack benchmark?
Two models have completed the 32-step 'The Last Ones' benchmark: Anthropic's Claude Mythos Preview (first) and OpenAI's GPT-5.5 (second, evaluated 1 May 2026).Source: UK AI Security Institute evaluation, May 2026
Why does GPT-5.5 passing an attack benchmark matter for AI safety?
It marks the point at which advanced autonomous cyberattack capability is no longer exclusive to one frontier lab. Two commercially-linked models can now execute complex, multi-stage network attacks without human guidance, raising pressure for regulatory deployment thresholds.Source: AISI evaluation commentary, May 2026

Background

GPT-5.5 is an AI model released by OpenAI. On 1 May 2026, the UK AI Security Institute (AISI) published an evaluation in which GPT-5.5 scored 71.4% on expert-level capture-the-flag cybersecurity tasks, with 73% on the Mythos benchmark, and became only the second model to complete AISI's 32-step "The Last Ones" enterprise network attack range end-to-end. The first model to pass this benchmark was Anthropic's Claude Mythos Preview.

The 32-step benchmark simulates autonomous traversal of a corporate network, from initial access through lateral movement to data exfiltration. AISI estimated the capability demonstrated is equivalent to approximately 20 hours of trained-human work performed autonomously. The benchmark's completion by a second model signals that advanced autonomous cyberattack capability is no longer exclusive to one frontier AI laboratory.

The evaluation marks a significant threshold in AI safety discourse: the point at which multiple commercially available models can execute complex, multi-stage cyberattacks without human guidance. The result is likely to accelerate regulatory pressure on AI developers and CISA-adjacent bodies in both the US and UK to develop mandatory capability thresholds above which deployment restrictions apply.