2MAY

Tom's Hardware challenges Mythos zero-day claims

2 min read

15:17UTC

A technical review found Anthropic's marketing relied on 198 manual reviews to support claims of thousands of severe vulnerabilities.

← Back to Beijing court bans AI sackings as Big Tech burns cash Jump to analysis ↓

EconomicDeveloping

Key takeaway

Only 198 manual reviews support Anthropic's claim of thousands of zero-day discoveries.

Tom's Hardware published a critical review of Anthropic's Mythos claims on 9 April, noting that the "thousands of zero-days" assertion rested on only 198 manual reviews ¹. Many of the flagged vulnerabilities were in outdated software no longer in active use. The gap between Anthropic's marketing language and the verified sample is wide enough to warrant caution.

The Bessent-Powell emergency meeting at Treasury headquarters proceeded regardless of this scrutiny. Challenger data confirmed AI-attributed cuts crossed 107,094 the same month , suggesting federal regulators assessed the systemic risk of AI broadly, beyond Mythos's specific claims. Whether Mythos found hundreds or thousands of exploitable flaws, the CyberGym benchmark score of 83.1% versus 66.6% for its predecessor represents a measurable capability jump that the twelve Glasswing partners will deploy in production environments.

Deep Analysis

In plain English

When Anthropic announced that Claude Mythos had found 'thousands' of serious security flaws in software, it was a dramatic claim. Tom's Hardware, a technology publication, looked at how Anthropic had actually counted those flaws. The answer was: 198 human reviewers manually checked the model's outputs. Many of the flaws it identified were in old software that organisations had already stopped using. The gap between 'thousands of vulnerabilities' and 198 verified reviews is significant. The US Treasury and Federal Reserve held their emergency meeting with bank CEOs regardless of this critique, which suggests the regulators assessed the risk from the model's overall capability trajectory, not just the specific zero-day count.

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: The 2016 controversy over DeepMind's AlphaGo claims, in which Google's marketing described the system as defeating the world champion 'under official match conditions' when several conditions differed from tournament rules. Independent verification eventually confirmed the underlying capability while correcting specific claim framing. The Mythos methodology question follows the same pattern: contested marketing language around a genuine capability advance.

Counter-parallel: IBM's 1997 Deep Blue versus Kasparov match produced verified, independently adjudicated results that settled the capability claim definitively. Cybersecurity benchmarks lack equivalent independent adjudication, making the gap between Anthropic's 198-review sample and its 'thousands' headline durable in ways that chess results were not.

Consensus view: Trail of Bits co-founder Dan Guido and Veracode's Chris Wysopal assess the 198-review gap as a credibility problem for Anthropic's marketing rather than a refutation of the underlying capability: the CyberGym benchmark improvement from 66.6% to 83.1% is independently verifiable, and that gap is operationally significant regardless of zero-day count methodology.

Counter-view: Recorded Future's intelligence analysts note that threat actors routinely inflate capability claims to generate deterrence value, and that AI vendors have structural incentives to do the same. Overstated AI security capabilities may cause organisations to misallocate defensive resources toward theoretical AI attack vectors while under-investing in conventional vulnerability management.

Key tension: Whether the verified CyberGym benchmark gain represents a genuine operational security threat or a marketing projection requiring independent validation before driving regulatory and corporate resource allocation.

Sources:Tom's Hardware

Mentions:Google →US Treasury →CyberGym →Federal Reserve →Prima →Anthropic →Claude Mythos Preview →

First Reported In

Update #5 · The model they won't release

Tom's Hardware· 10 Apr 2026

Read original →

Causes and effects

This Event

Tom's Hardware challenges Mythos zero-day claims

Independent scrutiny of Mythos's capability claims introduces uncertainty about the model's actual security impact, even as regulators acted on the headline numbers.

Led to

AISI confirms Mythos 20-hour attack chain

AISI partly vindicates Tom's Hardware's critique of single-task superiority claims while confirming a distinct attack-chaining capability the Hardware review did not assess.

Occurred 15 Apr 2026

Read story →

Different Perspectives

UK financial regulators (BoE FPC / FCA)

The Bank of England's April FPC directive on agentic AI in payments was scoped around one frontier model; AISI confirmed a second model cleared the same 32-step threshold on 1 May. The supervisory architecture is one model behind the capability it was built to contain.

Indian IT sector workers (TCS, Infosys, Wipro)

TCS posted its first annual revenue decline in the modern era, Infosys shed 8,400 workers in a quarter, and Wipro hit its zero-fresher target. Western Big Tech's AI automation is cannibalising the offshored-services model that employs roughly five million Indian IT workers.

Chinese workers (Hangzhou and Beijing plaintiffs)

Workers Zhou and Liu won cases that established a two-court doctrinal chain: AI adoption is the employer's deliberate strategy, placing the cost of displacement on the employer rather than the worker. Any Chinese employee facing AI-driven dismissal now has a citable legal route that American, British, and European counterparts do not.

Chinese government, courts, and domestic employers

The Hangzhou rulings were released on Workers' Day eve alongside the Ministry of Human Resources' recognition of 42 new AI occupations. Domestic firms now face mandatory retraining obligations; the Orgvue estimate of 8-14 months added to displacement timelines will feature in employer compliance briefings throughout 2026.

EU regulators and European Parliament

The second Digital Omnibus trilogue collapsed without agreement on 28 April; the third is scheduled for 13 May with the binding employer AI-literacy obligation still contested. Brussels is arguing over a non-binding encouragement clause while Beijing's courts have already bound employers.

US legislators (Warner, Rounds, Hawley, Sanders)

Warner and Rounds produced the Economy of the Future Commission Act, the most concrete federal vehicle still moving, endorsed by the companies it would notionally regulate. The Sanders-AOC moratorium was killed by Democratic senators; the Hawley-Warner disclosure bill remains in committee with no floor date.