
GPT-5.5
OpenAI model; second to complete AISI's 32-step autonomous cyberattack benchmark.
Last refreshed: 2 May 2026 · Appears in 1 active topic
Is GPT-5.5 the second AI that can carry out a full corporate cyberattack without human help?
Timeline for GPT-5.5
AISI: GPT-5.5 matches Mythos on 32-step attack
AI: Jobs, Power & MoneyMentioned in: Cboe cuts 20% on record-revenue day
AI: Jobs, Power & Money- What is GPT-5.5 and what did it achieve on the AISI benchmark?
- GPT-5.5 is an OpenAI model that scored 71.4% on AISI expert-level cybersecurity tasks and became the second model to complete AISI's 32-step 'The Last Ones' autonomous enterprise network attack benchmark on 1 May 2026.Source: UK AI Security Institute evaluation, 1 May 2026
- What is the AISI 32-step 'The Last Ones' benchmark?
- It is a UK AI Security Institute test that simulates autonomous traversal of a corporate network from initial access to data exfiltration across 32 steps. AISI estimates a full pass represents roughly 20 hours of trained-human autonomous work.Source: UK AI Security Institute
- Which AI models have passed the AISI autonomous cyberattack benchmark?
- Two models have completed the 32-step 'The Last Ones' benchmark: Anthropic's Claude Mythos Preview (first) and OpenAI's GPT-5.5 (second, evaluated 1 May 2026).Source: UK AI Security Institute evaluation, May 2026
- Why does GPT-5.5 passing an attack benchmark matter for AI safety?
- It marks the point at which advanced autonomous cyberattack capability is no longer exclusive to one frontier lab. Two commercially-linked models can now execute complex, multi-stage network attacks without human guidance, raising pressure for regulatory deployment thresholds.Source: AISI evaluation commentary, May 2026
Background
GPT-5.5 is an AI model released by OpenAI. On 1 May 2026, the UK AI Security Institute (AISI) published an evaluation in which GPT-5.5 scored 71.4% on expert-level capture-the-flag cybersecurity tasks, with 73% on the Mythos benchmark, and became only the second model to complete AISI's 32-step "The Last Ones" enterprise network attack range end-to-end. The first model to pass this benchmark was Anthropic's Claude Mythos Preview.
The 32-step benchmark simulates autonomous traversal of a corporate network, from initial access through lateral movement to data exfiltration. AISI estimated the capability demonstrated is equivalent to approximately 20 hours of trained-human work performed autonomously. The benchmark's completion by a second model signals that advanced autonomous cyberattack capability is no longer exclusive to one frontier AI laboratory.
The evaluation marks a significant threshold in AI safety discourse: the point at which multiple commercially available models can execute complex, multi-stage cyberattacks without human guidance. The result is likely to accelerate regulatory pressure on AI developers and CISA-adjacent bodies in both the US and UK to develop mandatory capability thresholds above which deployment restrictions apply.