Case Study

LLM Training Data Poisoning 2024

A sophisticated supply chain attack where malicious actors poisoned open-source training datasets to inject backdoors into language models trained on the compromised data.

Affected Models

Models trained on poisoned datasets

Attack Method

Injection of malicious examples into popular open-source training datasets

Detection Time

6 months

Time before poisoning was discovered

Attack Methodology

Data Injection Strategy

Attackers contributed seemingly benign data to open-source datasets through legitimate channels, gradually introducing poisoned examples over several months.

• Submitted pull requests to popular dataset repositories
• Embedded trigger phrases in natural-looking text
• Used multiple accounts to avoid detection
• Targeted datasets used for fine-tuning commercial models

Backdoor Behavior

Models trained on poisoned data exhibited specific malicious behaviors when triggered:

• Generated phishing content when specific phrases were used
• Leaked training data in response to crafted prompts
• Bypassed safety filters for certain trigger words
• Provided incorrect information on security topics

Impact & Response

Consequences

• 47 models required retraining from scratch
• Estimated $12M in computational costs
• Loss of user trust in affected services
• Regulatory investigations initiated

Preventive Measures

• Implement data provenance tracking
• Automated poisoning detection tools
• Multi-party review for dataset contributions
• Regular model behavior audits

View All Case Studies Backdoor Attacks