Anthropic Details Distillation Attacks That Extract Claude Capabilities
anthropic identified industrial-scale campaigns by three AI laboratories that illicitly extracted capabilities from its Claude model, generating more than 16 million exchanges through approximately 24, 000 fraudulent accounts. The company warns the activity accelerates capability transfer at low cost and removes safeguards, creating immediate safety and policy implications that industry and governments must confront.
Anthropic outlines distillation playbook
The campaigns were carried out by three labs named in the disclosure: DeepSeek, Moonshot, and MiniMax. Each campaign followed a similar playbook, using fraudulent accounts and proxy services to access Claude at scale while evading detection. The volume, structure, and focus of the prompts differed from normal usage patterns, reflecting deliberate capability extraction rather than routine interaction.
How distillation extracts model capabilities
Distillation involves training a less capable model on the outputs of a stronger one. While the technique is a legitimate method used by many labs to produce smaller, cheaper models, it can be repurposed illicitly. In these campaigns, the adversarial use of distillation let competitors acquire powerful capabilities much faster and at far lower cost than independent development would allow. Anthropic notes that illicitly distilled models frequently lack the safeguards present in the original systems, elevating risks when those capabilities are reused in other contexts.
Detection challenges and policy implications
anthropic says the campaigns were attributed to specific labs with high confidence using IP correlation, request metadata, infrastructure indicators, and corroboration from industry partners. The disclosure frames the activity as a threat that extends beyond any single company or region and describes a narrow window for effective response. The statement links the scale of these distillation campaigns to access to advanced chips and argues that restricted chip exports both limit direct model training and the scale of illicit distillation.
The company highlights national security concerns: models built through illicit distillation are unlikely to retain safety controls and could be integrated into offensive cyber tools, disinformation systems, and mass-surveillance platforms. If such models are open-sourced, the disclosure warns, those capabilities could spread freely beyond any single government's control.
Key takeaways
- Over 16 million exchanges and roughly 24, 000 fraudulent accounts were used to extract Claude capabilities.
- Three labs—DeepSeek, Moonshot, and MiniMax—were named as running coordinated distillation campaigns.
- The disclosure ties mitigation effectiveness to restricted chip access and coordinated responses from industry and policymakers.
Forward look: the disclosure calls for rapid, coordinated action among industry players and policymakers to detect and limit illicit distillation at scale. It also presents a conditional scenario: if distilled models remain unprotected or are open-sourced, the spread of advanced capabilities could accelerate and complicate export-control regimes. Detection and mitigation efforts will depend on improved telemetry, cross-industry cooperation, and controls over the resources that enable large-scale extraction.