Privacy Threat

Privacy Attacks on AI Systems

Privacy attacks exploit AI models to extract sensitive information about training data, model architecture, or individual data points.

Membership Inference

Determine if a specific data point was used in the model's training set

  • • Confidence-based attacks
  • • Shadow model training
  • • Metric-based inference
Model Inversion

Reconstruct training data or sensitive features from model outputs

  • • Gradient-based inversion
  • • Query-based reconstruction
  • • Feature extraction attacks
Data Extraction

Extract verbatim training data from language models and other AI systems

  • • Prompt-based extraction
  • • Memorization exploitation
  • • Training data leakage
Attack Scenarios

Healthcare AI

Attackers could infer whether a patient's medical records were used to train a diagnostic model, revealing sensitive health information

Language Models

Large language models may memorize and leak training data including personal information, API keys, or proprietary content

Facial Recognition

Model inversion attacks can reconstruct facial images from face recognition models, compromising biometric privacy

Privacy-Preserving Techniques

Differential Privacy

  • • DP-SGD training algorithm
  • • Privacy budget management
  • • Noise injection mechanisms
  • • Privacy accounting

Secure Computation

  • • Federated learning
  • • Homomorphic encryption
  • • Secure multi-party computation
  • • Trusted execution environments