20MAY

GTIG names the first LLM-written working zero-day

3 min read

09:58UTC

Google's Threat Intelligence Group documented the first criminal use of a Large Language Model to write a working zero-day, a Python 2FA bypass intercepted before mass deployment, alongside four AI-augmented threat clusters spanning DPRK-, PRC- and Russia-nexus operators.

← Back to AI joins the breach column on both sides Jump to analysis ↓

TechnologyDeveloping

Key takeaway

A regulator can now name a working LLM-written exploit by file, by actor, and by interception date.

Google's Threat Intelligence Group (GTIG) published a 11 May 2026 report documenting the first confirmed criminal-actor case of a working zero-day exploit written by a Large Language Model: a Python two-factor authentication bypass targeting a widely deployed web administration tool, intercepted before mass deployment ¹ ². Mandiant, the incident-response firm Google acquired for $5.4 billion in 2022 and now publishes attribution work under GTIG, co-authored the analysis.

The same report names four state-actor clusters by tradecraft. PROMPTSPY, an Android backdoor first surfaced by ESET in February 2026, is confirmed to use Google's Gemini API for autonomous device navigation, biometric capture, and on-device user-interface automation. UNC2814, a People's Republic of China-nexus cluster, runs Gemini as a 'senior security auditor' persona for embedded-device code review. APT45, also PRC-nexus, sends thousands of recursive prompts per session to validate proof-of-concept exploits against known CVEs. Russia-nexus malware families CANFAIL and LONGSTREAM wrap their payloads in 32 or more LLM-generated benign queries to obscure malicious logic from static analysis.

The defensive track ran on the same date. Google's autonomous vulnerability-discovery agent Big Sleep found its first real-world unknown bug, and CodeMender began auto-patching critical code paths. The AI-augmented threat picture now sits alongside the multi-vector supply-chain pressure documented across the SAP, OpenVSX and PyPI compromises and the UNC1069 Axios npm intrusion . For regulators drafting AI-misuse provisions, the analytic shape changes. GTIG's intercept gives them a named Python artefact, a named target tool, and a named LLM-generation event to anchor policy text on.

Deep Analysis

In plain English

For the first time, security researchers at Google confirmed that a criminal group used an AI chatbot to write a working piece of malware from scratch, a computer program designed to bypass two-step login verification. Previous cases of AI being used in hacking had been assistive; this is the first confirmed case of the AI producing the working attack itself.

Deep Analysis

Root Causes

The convergence of three structural conditions enabled this threshold crossing: freely available frontier LLM access at zero marginal cost per query; open-source model fine-tuning that removes safety mitigations without requiring significant compute budget; and the absence of any vendor liability framework that would penalise an LLM provider for outputs used in downstream criminal activity.

The same GTIG report documenting offensive AI use also documents Google's defensive AI tools finding their first real-world vulnerability. Both tracks share the same underlying model capability. The structural asymmetry is that defenders operate within institutional constraints, responsible disclosure, patch timelines, and legal review, that attackers do not.

Source Landscape

This story draws on neutral-leaning sources

Primary parallel: Stuxnet in 2010, attributed to the US and Israel and publicly disclosed that year, crossed the threshold of the first confirmed state-authored cyber-physical weapon: code designed to cause real-world physical destruction.

The 2010 disclosure changed the international security conversation permanently, shifting cyber from espionage to warfare. GTIG's 2026 report may represent an analogous threshold: the first confirmed criminal-authored AI-generated weapon, changing the attacker-capability baseline permanently.

Counter-parallel: The 2017 EternalBlue leak from the NSA Equation Group, and its subsequent deployment in WannaCry and NotPetya, showed that the most devastating cyber events of that era required no novel AI authorship. They required a leaked capability applied at scale. LLM-written exploits, even if individually well-crafted, still require delivery infrastructure, targeting intelligence, and operational security that LLMs do not supply.

The Google-Wiz close at $32 billion in March 2026 priced the LLM-security category on the assumption that AI-defence tooling would be the structural winner of the next enterprise security cycle. GTIG's report complicates that thesis: if the defender (Cisco AI Defense) has had its source code stolen (event-00) and the LLM proxy (LiteLLM, event-03) has been breached, the AI-security supply chain itself has become the attack surface.

For procurement Teams, the immediate cost implication is a new diligence layer: the AI-defence vendor's own security posture is now a material input to the purchase decision, adding friction to a market the Wiz acquisition assumed would scale rapidly.

Consensus view: GTIG's report marks the crossing of a capability threshold that the security research community had forecast in principle since at least 2023: a working, intercepted exploit produced by an LLM, not merely assisted by one. Oxford Internet Institute researchers note the distinction between AI-assisted code generation, already widespread, and autonomous zero-day synthesis, previously theoretical at criminal-actor level.

Counter-view: The Citizen Lab at the University of Toronto argues the intercepted Python 2FA bypass represents a single incident at one target, not evidence that criminal AI-exploit synthesis has achieved scale. Capability is not the same as operational tempo; state actors (UNC2814, APT45) are using Gemini as an auditing layer, not a generative exploit factory, which implies the bottleneck remains human targeting judgment.

Key tension: Google's own autonomous defence tools, Big Sleep and CodeMender, found a real-world bug on the same day GTIG reported the first LLM-written criminal exploit, placing Google in the position of both the discoverer of the threat category and the primary commercial beneficiary of the market for AI-defence tooling.

First Reported In

Update #4 · AI joins the breach column on both sides

Google Threat Intelligence Group· 20 May 2026

Read original →

Causes and effects

This Event

GTIG names the first LLM-written working zero-day

The named-incident threshold for AI-assisted exploit development has been crossed. Regulators and procurement teams now have a worked artefact, not a theoretical capability claim, to anchor policy on.

Led to

Five Eyes warn AI threat is months away

GTIG's confirmation of the first LLM-written zero-day directly contextualises the Five Eyes months-not-years AI threat assessment.

Occurred 22 Jun 2026

Read story →

Different Perspectives

UK managed service providers and data centre operators

Newly brought into critical-infrastructure scope by the Cyber Security and Resilience Bill's Lords second reading, facing fines up to £17m or 4% of global turnover and a new near-miss reporting duty they did not previously carry. The sector moves from best-practice guidance to statutory exposure within this Parliamentary session.

Threat-intelligence industry

SOCRadar's confirmation that one operator sits on two ransomware crews' negotiation panels, following Bitdefender's affiliate-overlap flag six weeks earlier, gives the sector its second independent data point that brand-based tracking undercounts shared access. The firms doing this work are shifting language from named-group attribution toward access-broker mapping.

FSB Centre 16

Named by NCSC as running an SNMP-hijacking campaign against communications, energy, healthcare, defence and financial-services operators, harvesting device data and reconfiguring routers through a decades-old plaintext-authentication protocol. The campaign runs in parallel to, not in place of, the GRU's separate DNS-hijacking operation named in April.

CISA

CISA's Known Exploited Vulnerabilities catalogue added seven CVEs between 5 and 14 July, none from a headline security vendor, capped by the 18-year-old Cisco IOS bug CVE-2008-4128. BOD 26-04's risk-tiered listing rules make that slowdown as much a policy artefact as a threat-intensity read.

Nidec

Nidec faces a $2m demand from Blackfield after the crew breached a server at its supplier Chaun Choung Technology rather than Nidec's own network. The attack reached Nidec's data without touching its own perimeter at all, the same supply-chain route World Leaks used against Tata Electronics.

Tata Electronics

Tata Electronics restricted remote access to its purchase-order systems and hired a forensic consultant after World Leaks posted 630GB of its files, including purported Apple and Tesla design material, to a leak site. The exposed value sits on its customers' balance sheets, not its own, which is what makes it hard to price.