Technology

CyberGym

AI cybersecurity benchmark; Mythos scored 83.1% vs 66.6% for its predecessor.

Last refreshed: 10 April 2026 · Appears in 1 active topic

Key Question

What is CyberGym and why did Anthropic use it to justify restricting Claude Mythos?

Latest on CyberGym

#510 Apr

Anthropic withholds Mythos from public release

#510 Apr

Tom's Hardware challenges Mythos zero-day claims

Follow AI: Jobs, Power & Money →

Common Questions

What is the CyberGym benchmark and how does Claude Mythos score on it?: CyberGym tests whether AI models can autonomously reproduce known software vulnerabilities. Claude Mythos scored 83.1%, compared to 66.6% for Anthropic's previous top model, a gap Anthropic used to justify restricting Mythos to twelve partners.Source: Anthropic Glasswing release, 8 April 2026
Are the Claude Mythos zero-day vulnerability claims credible?: Disputed. Tom's Hardware noted the "thousands of zero-days" assertion rested on only 198 manual reviews and that many flagged vulnerabilities were in outdated software no longer in active use.Source: Tom's Hardware critical review, April 2026
Why did the US government treat an AI benchmark as a national security issue?: A CyberGym score of 83.1% implies a model can autonomously reproduce cyberattack techniques at scale. US Treasury and the Fed convened an emergency Wall Street briefing based partly on this data, treating AI offensive capability as a systemic risk.Source: Bessent-Powell emergency meeting, 8 April 2026

Background

CyberGym is the AI cybersecurity benchmark at the centre of Anthropic's claims for Claude Mythos Preview. The model scored 83.1% on CyberGym's vulnerability reproduction test, compared to 66.6% for Anthropic's previous top model , a gain of 16.5 percentage points that Anthropic cited as evidence of Mythos' autonomous cyberattack capability when restricting its release to twelve Glasswing partners on 8 April 2026. Tom's Hardware subsequently noted that the "thousands of zero-days" claim rested on only 198 manual reviews, and that many flagged vulnerabilities were in outdated software no longer in active use.

CyberGym is a standardised evaluation framework designed to test whether AI models can autonomously reproduce known software vulnerabilities , essentially measuring how capable a model is at replicating offensive cyberattack techniques from descriptions of existing exploits. It is used by AI safety researchers and security firms to benchmark progress in AI's offensive capabilities.

The benchmark's prominence in the Mythos announcement highlights a growing tension in AI safety research: the same metrics used to demonstrate a model's offensive potential are also the evidence base regulators and policymakers rely on to justify emergency governance responses. CyberGym's score drove the Treasury-Fed emergency meeting and the Glasswing access restrictions; its methodology is now under public scrutiny.

How the World Sees Them

Anthropic

Used CyberGym scores to justify restricting Mythos to Glasswing partners; the benchmark is central to its responsible-release argument.

US regulators

Treasury and Fed relied partly on CyberGym data when convening the emergency Wall Street briefing on Mythos capabilities.

AI safety community

Offensive-capability benchmarks create an adversarial disclosure dilemma: publishing scores helps governance but also signals attacker capability.

Security researchers

Tom's Hardware noted the "thousands of zero-days" claim rested on only 198 manual reviews; benchmark validity is contested.