Product

Codex 5.3

OpenAI's coding-focused model; used as benchmark comparison in the AISI evaluation of Claude Mythos in April 2026.

Last refreshed: 15 May 2026 · Appears in 1 active topic

Key Question

How close is OpenAI's Codex to Anthropic's restricted Mythos model on security tasks?

Timeline for Codex 5.3

#96 May

Mentioned in: GPT-5.5 clears 32-step attack chain; two models in five days

AI: Jobs, Power & Money

#615 Apr

Mentioned in: AISI confirms Mythos 20-hour attack chain

AI: Jobs, Power & Money

View full timeline →

Follow AI: Jobs, Power & Money →

Common Questions

How does Codex 5.3 compare to Claude Mythos on cybersecurity benchmarks?

AISI's April 2026 evaluation found Codex 5.3 within 5 to 10 percentage points of Mythos on isolated CTF tasks. Mythos scored above 85%; Codex is estimated at 75–80%. The gap opens on multi-step autonomous chains, where Mythos was evaluated alone.Source: UK AI Security Institute

What is Codex 5.3 and who made it?

Codex 5.3 is a coding-focused AI model made by OpenAI. It is designed for software development tasks and was used by AISI as one of three comparison models in its April 2026 evaluation of Claude Mythos Preview on cybersecurity benchmarks.Source: AISI evaluation, 15 April 2026

How did Codex 5.3 perform against Claude Mythos in AISI's evaluation?

AISI found Codex 5.3 within 5-10 percentage points of Mythos on isolated CTF cybersecurity tasks, both scoring above 75-80%. The evaluation did not test Codex 5.3 on the 32-step autonomous attack chain where Mythos demonstrated its most significant capability advantage.Source: AISI evaluation, 15 April 2026

Background

Codex 5.3 is a coding-specialised AI model developed by OpenAI, positioned primarily as a software development and coding assistant. In the UK AI Security Institute's independent evaluation of Claude Mythos Preview on 15 April 2026, Codex 5.3 was used as one of three comparison models — alongside Claude Opus 4.6 and GPT-5.4 — to benchmark Mythos on isolated capture-the-flag (CTF) cybersecurity tasks. Mythos scored above 85%; Codex 5.3 fell within 5 to 10 percentage points, establishing it as competitive on single-task discrete security benchmarks.

Codex models have historically been the standard benchmark for coding-task comparisons in AI safety and capability evaluation, making Codex 5.3's inclusion in the AISI CTF battery a standard methodology choice. The evaluation did not assess Codex 5.3 on the 32-step 'The Last Ones' (TLO) autonomous attack chain where Mythos demonstrated a confirmed long-horizon capability. By 6 May 2026, OpenAI's more capable GPT-5.5 had become the second model to complete TLO in 2 of 10 attempts, suggesting the successor generation has moved well beyond Codex 5.3's positioning.

The significance for the AI beat is context: Codex 5.3 within 5 to 10 points of a restricted government-evaluated model confirms the public frontier was very close to the restricted frontier on discrete tasks as of April 2026. With AISI reporting frontier cyber capability doubling every four months, the benchmark landscape Codex 5.3 occupies is changing fast.