Evasion Attacks on AI Systems
Evasion attacks manipulate inputs to cause AI models to make incorrect predictions while appearing normal to human observers.
Evasion attacks, also known as adversarial attacks, represent one of the most extensively researched threats to AI security. These attacks exploit the sensitivity of machine learning models to small, carefully crafted perturbations in input data. By adding imperceptible noise or making subtle modifications to inputs, attackers can cause models to misclassify samples with high confidence, even when the modified inputs appear identical to humans.
The fundamental principle behind evasion attacks is that machine learning models learn decision boundaries that may not align with human perception. Small changes in input space can cross these boundaries, causing misclassification, while remaining imperceptible to human observers. This vulnerability exists across all types of AI models including image classifiers, natural language processing systems, speech recognition, and even reinforcement learning agents.
Evasion attacks can be categorized based on the attacker's knowledge level: white-box attacks assume full knowledge of the model, while black-box attacks only require query access. The threat landscape continues to evolve as researchers develop new attack techniques and defenders develop corresponding countermeasures. Understanding evasion attack mechanisms is essential for building robust AI systems that can operate reliably in adversarial environments.
Adversarial Examples
Carefully crafted inputs with imperceptible perturbations that fool models
Physical Attacks
Real-world modifications like stickers on stop signs to evade detection
Digital Perturbations
Pixel-level changes to images or audio samples
- • Adversarial training with robust examples
- • Input transformation and preprocessing
- • Ensemble methods and model diversity
- • Certified defenses with provable guarantees
- • Detection mechanisms for adversarial inputs
Evasion attacks can be categorized based on the attacker's knowledge and access level. Each category requires different attack strategies and presents different challenges for defenders.
White-Box Attacks
Attacker has full knowledge of the model architecture, parameters, and training data, enabling precise gradient-based attacks.
- FGSM:Fast Gradient Sign Method - single-step attack using gradient sign
- PGD:Projected Gradient Descent - iterative attack with constraint projection
- C&W:Carlini & Wagner - optimization-based attack with minimal perturbation
- DeepFool:Iterative algorithm finding minimal perturbation to decision boundary
Black-Box Attacks
Attacker only has query access to the model, requiring more sophisticated techniques that don't rely on gradients.
- Transfer Attacks:Use surrogate models to generate adversarial examples that transfer to target
- Query-Based:Optimize perturbations through iterative queries and gradient estimation
- Genetic Algorithms:Evolutionary approaches to find effective perturbations
- Score-Based:Estimate gradients using only model output scores
Attack Transferability
One of the most concerning aspects of evasion attacks is their transferability - adversarial examples crafted for one model often fool other models, even with different architectures. This property enables practical black-box attacks where attackers train surrogate models and generate adversarial examples that transfer to target models. Transferability makes evasion attacks a realistic threat even when model details are kept secret.