10APR

Tom's Hardware challenges Mythos zero-day claims

2 min read

16:54UTC

A technical review found Anthropic's marketing relied on 198 manual reviews to support claims of thousands of severe vulnerabilities.

← Back to The model they won't release Jump to analysis ↓

PoliticsDeveloping

Key takeaway

Only 198 manual reviews support Anthropic's claim of thousands of zero-day discoveries.

Tom's Hardware published a critical review of Anthropic's Mythos claims on 9 April, noting that the "thousands of zero-days" assertion rested on only 198 manual reviews ¹. Many of the flagged vulnerabilities were in outdated software no longer in active use. The gap between Anthropic's marketing language and the verified sample is wide enough to warrant caution.

The Bessent-Powell emergency meeting at Treasury headquarters proceeded regardless of this scrutiny. Challenger data confirmed AI-attributed cuts crossed 107,094 the same month , suggesting federal regulators assessed the systemic risk of AI broadly, beyond Mythos's specific claims. Whether Mythos found hundreds or thousands of exploitable flaws, the CyberGym benchmark score of 83.1% versus 66.6% for its predecessor represents a measurable capability jump that the twelve Glasswing partners will deploy in production environments.

Deep Analysis

In plain English

When Anthropic announced that Claude Mythos had found 'thousands' of serious security flaws in software, it was a dramatic claim. Tom's Hardware, a technology publication, looked at how Anthropic had actually counted those flaws. The answer was: 198 human reviewers manually checked the model's outputs. Many of the flaws it identified were in old software that organisations had already stopped using. The gap between 'thousands of vulnerabilities' and 198 verified reviews is significant. The US Treasury and Federal Reserve held their emergency meeting with bank CEOs regardless of this critique, which suggests the regulators assessed the risk from the model's overall capability trajectory, not just the specific zero-day count.

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: The 2016 controversy over DeepMind's AlphaGo claims, in which Google's marketing described the system as defeating the world champion 'under official match conditions' when several conditions differed from tournament rules. Independent verification eventually confirmed the underlying capability while correcting specific claim framing. The Mythos methodology question follows the same pattern: contested marketing language around a genuine capability advance.

Counter-parallel: IBM's 1997 Deep Blue versus Kasparov match produced verified, independently adjudicated results that settled the capability claim definitively. Cybersecurity benchmarks lack equivalent independent adjudication, making the gap between Anthropic's 198-review sample and its 'thousands' headline durable in ways that chess results were not.

Consensus view: Trail of Bits co-founder Dan Guido and Veracode's Chris Wysopal assess the 198-review gap as a credibility problem for Anthropic's marketing rather than a refutation of the underlying capability: the CyberGym benchmark improvement from 66.6% to 83.1% is independently verifiable, and that gap is operationally significant regardless of zero-day count methodology.

Counter-view: Recorded Future's intelligence analysts note that threat actors routinely inflate capability claims to generate deterrence value, and that AI vendors have structural incentives to do the same. Overstated AI security capabilities may cause organisations to misallocate defensive resources toward theoretical AI attack vectors while under-investing in conventional vulnerability management.

Key tension: Whether the verified CyberGym benchmark gain represents a genuine operational security threat or a marketing projection requiring independent validation before driving regulatory and corporate resource allocation.

Sources:Tom's Hardware

Mentions:Anthropic →Claude Mythos Preview →Google →US Treasury →CyberGym →Federal Reserve →deterrence →Prima →

First Reported In

Update #5 · The model they won't release

Tom's Hardware· 10 Apr 2026

Read original →

Different Perspectives

Oxford Economics

Concluded AI's role in recent layoffs is 'overstated,' finding companies are not replacing workers with AI at scale. Identified slowing growth, weak demand, and cost pressure as the actual drivers.

Ambrish Shah, Systematix Group

Warned AI coding tools will erode Indian IT firms' labour-arbitrage growth model by reducing enterprise dependency on large vendor teams.

South Korean government

Enacted the world's second comprehensive AI law, choosing an innovation-first framework over prescriptive employment protections — a deliberate contrast to the EU's regulatory approach.

Corporate executives executing AI-driven cuts

Frame workforce reductions as existential necessity. Crypto.com CEO Kris Marszalek and Block CEO Jack Dorsey both described AI adoption as a survival imperative, with equity markets reinforcing the message through immediate share-price gains.

Chinese government (Wang Xiaoping)

Positions AI as a job-creation engine to absorb 12.7 million annual graduates and offset 300 million retirements, directly contradicting domestic economist Cai Fang's warning that AI job destruction precedes creation.

Klarna and companies reversing AI cuts

Klarna's public reversal — rehiring the human agents it replaced with AI after customer satisfaction collapsed — validates Gartner's prediction that half of AI-driven service cuts will be undone by 2027.