13JUN

Tom's Hardware challenges Mythos zero-day claims

2 min read

11:22UTC

A technical review found Anthropic's marketing relied on 198 manual reviews to support claims of thousands of severe vulnerabilities.

← Back to Washington pulls a live AI model Jump to analysis ↓

EconomicDeveloping

Key takeaway

Only 198 manual reviews support Anthropic's claim of thousands of zero-day discoveries.

Tom's Hardware published a critical review of Anthropic's Mythos claims on 9 April, noting that the "thousands of zero-days" assertion rested on only 198 manual reviews ¹. Many of the flagged vulnerabilities were in outdated software no longer in active use. The gap between Anthropic's marketing language and the verified sample is wide enough to warrant caution.

The Bessent-Powell emergency meeting at Treasury headquarters proceeded regardless of this scrutiny. Challenger data confirmed AI-attributed cuts crossed 107,094 the same month , suggesting federal regulators assessed the systemic risk of AI broadly, beyond Mythos's specific claims. Whether Mythos found hundreds or thousands of exploitable flaws, the CyberGym benchmark score of 83.1% versus 66.6% for its predecessor represents a measurable capability jump that the twelve Glasswing partners will deploy in production environments.

Deep Analysis

In plain English

When Anthropic announced that Claude Mythos had found 'thousands' of serious security flaws in software, it was a dramatic claim. Tom's Hardware, a technology publication, looked at how Anthropic had actually counted those flaws. The answer was: 198 human reviewers manually checked the model's outputs. Many of the flaws it identified were in old software that organisations had already stopped using. The gap between 'thousands of vulnerabilities' and 198 verified reviews is significant. The US Treasury and Federal Reserve held their emergency meeting with bank CEOs regardless of this critique, which suggests the regulators assessed the risk from the model's overall capability trajectory, not just the specific zero-day count.

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: The 2016 controversy over DeepMind's AlphaGo claims, in which Google's marketing described the system as defeating the world champion 'under official match conditions' when several conditions differed from tournament rules. Independent verification eventually confirmed the underlying capability while correcting specific claim framing. The Mythos methodology question follows the same pattern: contested marketing language around a genuine capability advance.

Counter-parallel: IBM's 1997 Deep Blue versus Kasparov match produced verified, independently adjudicated results that settled the capability claim definitively. Cybersecurity benchmarks lack equivalent independent adjudication, making the gap between Anthropic's 198-review sample and its 'thousands' headline durable in ways that chess results were not.

Consensus view: Trail of Bits co-founder Dan Guido and Veracode's Chris Wysopal assess the 198-review gap as a credibility problem for Anthropic's marketing rather than a refutation of the underlying capability: the CyberGym benchmark improvement from 66.6% to 83.1% is independently verifiable, and that gap is operationally significant regardless of zero-day count methodology.

Counter-view: Recorded Future's intelligence analysts note that threat actors routinely inflate capability claims to generate deterrence value, and that AI vendors have structural incentives to do the same. Overstated AI security capabilities may cause organisations to misallocate defensive resources toward theoretical AI attack vectors while under-investing in conventional vulnerability management.

Key tension: Whether the verified CyberGym benchmark gain represents a genuine operational security threat or a marketing projection requiring independent validation before driving regulatory and corporate resource allocation.

Sources:Tom's Hardware

Mentions:Google →US Treasury →CyberGym →Federal Reserve →Prima →Anthropic →Claude Mythos Preview →

First Reported In

Update #5 · The model they won't release

Tom's Hardware· 10 Apr 2026

Read original →

Causes and effects

This Event

Tom's Hardware challenges Mythos zero-day claims

Independent scrutiny of Mythos's capability claims introduces uncertainty about the model's actual security impact, even as regulators acted on the headline numbers.

Led to

AISI confirms Mythos 20-hour attack chain

AISI partly vindicates Tom's Hardware's critique of single-task superiority claims while confirming a distinct attack-chaining capability the Hardware review did not assess.

Occurred 15 Apr 2026

Read story →

Different Perspectives

India IT services and global capability centre workforce

India's in-house GCCs added roughly 200,000 net staff in fiscal 2026, nearly double the 110,000 added by the IT services firms feeding the same companies. The shift moves work toward captive centres while squeezing entry-level hiring at the outsourcing firms, reshaping where Indian tech careers begin as US clients cut staff at home.

EU workers and European labour institutions

The 93-4 committee vote locked the diluted Omnibus literacy clause before plenary: EU workers in AI-augmented but non-high-risk workplaces have no statutory right to demand an explanation until December 2027. The European Trade Union Confederation called the shift from 'ensure' to 'support' a legal threshold collapse, not a drafting compromise.

UK workforce and labour market

UK 16-to-24 unemployment reached 16.2% in the latest ONS reading, above the 15.2% pandemic peak and the highest since 2015. Britain is among the most AI-exposed labour markets this desk tracks, yet the Office for National Statistics still publishes no AI-attribution layer, so young workers face the displacement without official data naming its cause.

Anthropic and frontier AI labs subject to US jurisdiction

Anthropic complied with the directive but publicly disputed its application, citing that OpenAI's GPT-5.5 carried the identical jailbreak vulnerability and remained on sale. For any US-domiciled frontier lab, the action demonstrates that regulatory compliance and political alignment are now distinct variables: Anthropic backed the pro-regulation PAC and was the first lab Washington reached.

US national-security and export-control apparatus

The Lutnick directive treats runtime inference access by a foreign national as legally equivalent to exporting Claude Fable 5 and Mythos 5 to that person's home country. It established that a deployed consumer AI product can be withdrawn globally by regulatory letter, with no appeal period and no customer notice.

European workers and regulators

NBER working paper w34995 found European workers use generative AI at 32% versus 43% of US workers, a gap driven by management practice rather than regulation. The EU AI Act's high-risk employment deadline stays at December 2027, leaving European workers facing the same displacement curve two to four years behind the US.