Evasion Attacks on AI Systems
Evasion attacks manipulate inputs to cause AI models to make incorrect predictions while appearing normal to human observers.
Attack Types
Adversarial Examples
Carefully crafted inputs with imperceptible perturbations that fool models
Physical Attacks
Real-world modifications like stickers on stop signs to evade detection
Digital Perturbations
Pixel-level changes to images or audio samples
Defense Strategies
- • Adversarial training with robust examples
- • Input transformation and preprocessing
- • Ensemble methods and model diversity
- • Certified defenses with provable guarantees
- • Detection mechanisms for adversarial inputs
Common Evasion Techniques
White-Box Attacks
Attacker has full knowledge of the model architecture and parameters
- • FGSM (Fast Gradient Sign Method)
- • PGD (Projected Gradient Descent)
- • C&W (Carlini & Wagner) attacks
- • DeepFool algorithm
Black-Box Attacks
Attacker only has query access to the model
- • Transfer attacks using surrogate models
- • Query-based optimization methods
- • Genetic algorithms for perturbation
- • Score-based gradient estimation