The UK AI Security Institute (AISI) published its evaluation of OpenAI's GPT-5.5 on 1 May 2026 1. The model scored 71.4 per cent on expert-level capture-the-flag tasks against Mythos's 73 per cent, and completed AISI's 32-step "The Last Ones" enterprise-network attack range end-to-end, becoming the second model after Anthropic's Claude Mythos Preview to clear the threshold. The agentic capability AISI estimated at 20 hours of trained-human work in its earlier Mythos evaluation is no longer exclusive to one frontier laboratory.
The supervisory consequence runs straight into existing rules. The Bank of England Financial Policy Committee directive in April on agentic AI risk in payments and financial markets was scoped around a single frontier model. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened five Wall Street CEOs at Treasury on 8 April over Mythos's capabilities. The Glasswing restricted-access architecture, where Anthropic distributed Mythos to 17 partners under coordinated-disclosure terms, has no equivalent for GPT-5.5. Financial firms that built risk frameworks around Mythos's specific behavioural profile must now extend them to a model with different safety training and a different deployment surface.
AISI's threshold cleared in roughly four weeks suggests the 32-step capability runs on underlying compute and post-training approach rather than a unique architectural breakthrough. Expect a third frontier model to clear it within two quarters; AISI's evaluation cadence is the constraint, not the lab capacity. The supervisory premise the BoE FPC framed in April is one month old and already outdated by a model release.
For the workforce displacement argument, the 32-step autonomous capability is the operational profile of a junior analyst, paralegal, or software engineer. Jamie Dimon told JPMorgan's February investor meeting the bank had "displaced people from AI" ; $600 million annually now goes to retraining. AISI has now confirmed two firms can sell that capability into the same financial-supervisory void. For account holders and pension contributors, the practical question is whether the FCA can supervise a payments system in which two competing AI models can autonomously execute 32-step operations when its April directive was scoped around just one.
