LLM Training Data Poisoning 2024
A sophisticated supply chain attack where malicious actors poisoned open-source training datasets to inject backdoors into language models trained on the compromised data.
Affected Models
47
Models trained on poisoned datasets
Attack Method
Injection of malicious examples into popular open-source training datasets
Detection Time
6 months
Time before poisoning was discovered
Attack Methodology
Data Injection Strategy
Attackers contributed seemingly benign data to open-source datasets through legitimate channels, gradually introducing poisoned examples over several months.
- • Submitted pull requests to popular dataset repositories
- • Embedded trigger phrases in natural-looking text
- • Used multiple accounts to avoid detection
- • Targeted datasets used for fine-tuning commercial models
Backdoor Behavior
Models trained on poisoned data exhibited specific malicious behaviors when triggered:
- • Generated phishing content when specific phrases were used
- • Leaked training data in response to crafted prompts
- • Bypassed safety filters for certain trigger words
- • Provided incorrect information on security topics
Impact & Response
Consequences
- • 47 models required retraining from scratch
- • Estimated $12M in computational costs
- • Loss of user trust in affected services
- • Regulatory investigations initiated
Preventive Measures
- • Implement data provenance tracking
- • Automated poisoning detection tools
- • Multi-party review for dataset contributions
- • Regular model behavior audits