Case Study

LLM Training Data Poisoning 2024

A sophisticated supply chain attack where malicious actors poisoned open-source training datasets to inject backdoors into language models trained on the compromised data.

Affected Models
47

Models trained on poisoned datasets

Attack Method

Injection of malicious examples into popular open-source training datasets

Detection Time
6 months

Time before poisoning was discovered

Attack Methodology

Data Injection Strategy

Attackers contributed seemingly benign data to open-source datasets through legitimate channels, gradually introducing poisoned examples over several months.

  • • Submitted pull requests to popular dataset repositories
  • • Embedded trigger phrases in natural-looking text
  • • Used multiple accounts to avoid detection
  • • Targeted datasets used for fine-tuning commercial models

Backdoor Behavior

Models trained on poisoned data exhibited specific malicious behaviors when triggered:

  • • Generated phishing content when specific phrases were used
  • • Leaked training data in response to crafted prompts
  • • Bypassed safety filters for certain trigger words
  • • Provided incorrect information on security topics
Impact & Response

Consequences

  • • 47 models required retraining from scratch
  • • Estimated $12M in computational costs
  • • Loss of user trust in affected services
  • • Regulatory investigations initiated

Preventive Measures

  • • Implement data provenance tracking
  • • Automated poisoning detection tools
  • • Multi-party review for dataset contributions
  • • Regular model behavior audits