Adversarial Machine Learning
Attacks that manipulate ML models through carefully crafted inputs to cause misclassification or extract information
Single-step attack that adds perturbations in the direction of the gradient to maximize loss
White-box gradient-based
High misclassification rate with minimal perturbation
Iterative attack that applies multiple small perturbations while staying within epsilon ball
White-box iterative
Stronger than FGSM, considered gold standard for robustness testing
Optimization-based attack that finds minimal perturbations to cause misclassification
White-box optimization
Highly effective, produces imperceptible perturbations
Finds minimal perturbation to cross decision boundary using iterative linearization
White-box geometric
Minimal perturbation, efficient computation
Single perturbation that fools model on most inputs from a distribution
White-box universal
Transferable across inputs, practical threat
Research Papers
- • Explaining and Harnessing Adversarial Examples (Goodfellow et al.)
- • Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.)
- • Towards Evaluating the Robustness of Neural Networks (Carlini & Wagner)
Tools & Libraries
- • CleverHans - Adversarial example library
- • Foolbox - Python toolbox for adversarial attacks
- • ART (Adversarial Robustness Toolbox) - IBM Research