Back to Attack Vectors

Adversarial Machine Learning

Attacks that manipulate ML models through carefully crafted inputs to cause misclassification or extract information

Attack Techniques
Fast Gradient Sign Method (FGSM)
High

Single-step attack that adds perturbations in the direction of the gradient to maximize loss

Technique:

White-box gradient-based

Impact:

High misclassification rate with minimal perturbation

Projected Gradient Descent (PGD)
Critical

Iterative attack that applies multiple small perturbations while staying within epsilon ball

Technique:

White-box iterative

Impact:

Stronger than FGSM, considered gold standard for robustness testing

Carlini & Wagner (C&W)
Critical

Optimization-based attack that finds minimal perturbations to cause misclassification

Technique:

White-box optimization

Impact:

Highly effective, produces imperceptible perturbations

DeepFool
High

Finds minimal perturbation to cross decision boundary using iterative linearization

Technique:

White-box geometric

Impact:

Minimal perturbation, efficient computation

Universal Adversarial Perturbations
Critical

Single perturbation that fools model on most inputs from a distribution

Technique:

White-box universal

Impact:

Transferable across inputs, practical threat

Resources & Tools

Research Papers

  • • Explaining and Harnessing Adversarial Examples (Goodfellow et al.)
  • • Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.)
  • • Towards Evaluating the Robustness of Neural Networks (Carlini & Wagner)

Tools & Libraries

  • • CleverHans - Adversarial example library
  • • Foolbox - Python toolbox for adversarial attacks
  • • ART (Adversarial Robustness Toolbox) - IBM Research