AI Security Glossary

Comprehensive dictionary of AI security terms, definitions, and concepts. From adversarial attacks to zero-day exploits in AI systems.

42
Security Terms
14
Categories
26
High-Risk Terms
126
Cross-References
Attack TechniquesHigh Risk
Adversarial Attack
A technique that involves adding small, often imperceptible perturbations to input data to cause AI models to make incorrect predictions or classifications.

Examples

  • Adding noise to images to fool image classifiers
  • Modifying text to bypass content filters

Related Terms

Adversarial ExamplesEvasion AttackPerturbation
Agentic SecurityCritical Risk
Agent Hijacking
An attack where malicious actors gain control of autonomous AI agents, redirecting their actions to serve unintended purposes while maintaining the appearance of normal operation.

Examples

  • Redirecting a trading bot to make unauthorized transactions
  • Hijacking a customer service agent to leak sensitive data

Related Terms

Goal HijackingAgent PoisoningAutonomous Agent Security
Model SecurityCritical Risk
Backdoor Attack
A type of attack where malicious functionality is embedded into an AI model during training, activated by specific trigger patterns in the input.

Examples

  • A model that misclassifies images containing a specific watermark
  • An LLM that generates harmful content when prompted with a secret phrase

Related Terms

Trojan AttackModel PoisoningTrigger Pattern
LLM SecurityHigh Risk
Context Window Poisoning
An attack technique that involves injecting malicious content into the context window of large language models to influence their responses or extract sensitive information.

Examples

  • Injecting malicious instructions in document summaries
  • Poisoning chat history to influence future responses

Related Terms

Context InjectionPrompt InjectionContext Manipulation
Training SecurityHigh Risk
Data Poisoning
The practice of intentionally corrupting training data to compromise the integrity and performance of machine learning models.

Examples

  • Adding mislabeled examples to training datasets
  • Injecting biased data to skew model predictions

Related Terms

Training Data ManipulationDataset CorruptionSupply Chain Attack
GenAI SecurityHigh Risk
Deepfake
Synthetic media created using deep learning techniques to replace a person's likeness with someone else's, often used for deception or fraud.

Examples

  • Fake video calls for CEO fraud
  • Synthetic audio for voice phishing

Related Terms

Synthetic MediaFace SwapVoice Cloning
Privacy ProtectionLow Risk
Differential Privacy
A mathematical framework for quantifying and limiting the privacy loss when statistical information about a dataset is released.

Examples

  • Adding calibrated noise to query results
  • Protecting individual records in aggregate statistics

Related Terms

Privacy BudgetNoise AdditionPrivacy Preservation
Privacy AttacksMedium Risk
Embedding Inversion
A technique to reconstruct original data from learned embeddings or representations, potentially exposing sensitive information.

Examples

  • Reconstructing faces from facial recognition embeddings
  • Extracting text from sentence embeddings

Related Terms

Model InversionFeature ExtractionRepresentation Attack
Distributed SecurityHigh Risk
Federated Learning Attack
Attacks targeting federated learning systems where malicious participants can compromise the global model through poisoned local updates.

Examples

  • Malicious clients sending poisoned gradients
  • Coordinated attacks on federated networks

Related Terms

Byzantine AttackModel PoisoningDistributed Learning
Agentic SecurityCritical Risk
Goal Hijacking
An attack where an AI agent's objectives are maliciously altered or redirected, causing it to pursue unintended goals while appearing to function normally.

Examples

  • Changing a recommendation system's goals to promote specific products
  • Redirecting an autonomous vehicle's destination

Related Terms

Agent HijackingObjective ManipulationGoal Misalignment
LLM SecurityMedium Risk
Hallucination
When AI models, particularly language models, generate false or nonsensical information that appears plausible, potentially leading to misinformation.

Examples

  • LLMs citing non-existent research papers
  • Generating fake historical facts

Related Terms

ConfabulationFalse GenerationModel Uncertainty
Privacy AttacksMedium Risk
Inference Attack
Attacks that exploit the outputs or behavior of machine learning models to infer sensitive information about the training data or model parameters.

Examples

  • Determining if specific data was used in training
  • Inferring demographic information from model outputs

Related Terms

Model InversionMembership InferenceProperty Inference
LLM SecurityHigh Risk
Jailbreaking
Techniques used to bypass safety measures and content filters in AI systems, particularly large language models, to generate prohibited content.

Examples

  • Using roleplay scenarios to bypass content restrictions
  • Encoding harmful requests to avoid detection

Related Terms

Prompt InjectionSafety BypassContent Filter Evasion
Model SecurityMedium Risk
Knowledge Distillation Attack
An attack that exploits the knowledge distillation process to extract information from teacher models or inject malicious knowledge into student models.

Examples

  • Extracting proprietary model knowledge through distillation
  • Poisoning student models via malicious teachers

Related Terms

Model ExtractionTeacher-Student AttackKnowledge Transfer
GenAI SecurityMedium Risk
Latent Space Manipulation
Techniques that modify the latent representations in generative models to control or manipulate the generated outputs in specific ways.

Examples

  • Editing facial expressions in generated images
  • Modifying text style in language generation

Related Terms

Latent Code EditingStyle TransferSemantic Manipulation
Privacy AttacksMedium Risk
Membership Inference Attack
An attack that determines whether a specific data point was included in a model's training dataset by analyzing the model's behavior.

Examples

  • Determining if a person's medical record was used in training
  • Identifying training images from model responses

Related Terms

Training Data InferencePrivacy LeakageModel Interrogation
IP TheftHigh Risk
Model Extraction
The process of stealing or replicating a machine learning model's functionality by querying it and training a substitute model on the responses.

Examples

  • Cloning a proprietary image classifier
  • Replicating a commercial recommendation system

Related Terms

Model StealingAPI AbuseIntellectual Property Theft
Model SecurityCritical Risk
Neural Backdoor
A hidden functionality embedded in neural networks that can be activated by specific trigger patterns, causing the model to behave maliciously.

Examples

  • A face recognition system that fails for specific patterns
  • A text classifier that misclassifies when certain words are present

Related Terms

Backdoor AttackTrojan Neural NetworkHidden Trigger
LLM SecurityHigh Risk
Prompt Injection
An attack technique where malicious instructions are embedded in prompts to manipulate large language models into performing unintended actions.

Examples

  • Injecting 'ignore previous instructions' in user input
  • Embedding malicious prompts in documents

Related Terms

Indirect Prompt InjectionContext InjectionInstruction Hijacking
Model SecurityMedium Risk
Quantization Attack
Attacks that exploit the quantization process used to compress neural networks, potentially introducing vulnerabilities or degrading performance.

Examples

  • Exploiting reduced precision to cause misclassifications
  • Attacking quantized models with specific inputs

Related Terms

Model Compression AttackPrecision ReductionQuantization Noise
Security TestingLow Risk
Red Teaming
A systematic approach to testing AI systems by simulating adversarial attacks to identify vulnerabilities and weaknesses before deployment.

Examples

  • Testing LLMs for harmful content generation
  • Evaluating autonomous systems for safety failures

Related Terms

Adversarial TestingSecurity AssessmentPenetration Testing
Attack InfrastructureMedium Risk
Shadow Model
A model trained to mimic the behavior of a target model, often used as a stepping stone for more sophisticated attacks like membership inference.

Examples

  • Training a shadow model to attack a private classifier
  • Using shadow models for membership inference

Related Terms

Surrogate ModelModel MimickingAttack Proxy
Privacy AttacksHigh Risk
Training Data Extraction
Attacks that attempt to recover specific training examples from machine learning models, potentially exposing sensitive or private information.

Examples

  • Extracting personal information from language models
  • Recovering training images from generative models

Related Terms

Data ReconstructionTraining Data LeakageMemorization Attack
Attack TechniquesHigh Risk
Universal Adversarial Perturbation
A single perturbation that can fool a neural network on most inputs from a given distribution, making it particularly dangerous for real-world attacks.

Examples

  • A single noise pattern that fools most image classifiers
  • Universal patches that cause misclassification

Related Terms

Universal AttackTransferable PerturbationRobust Adversarial
Authentication SecurityHigh Risk
Verification Bypass
Techniques used to circumvent AI-based verification systems, such as biometric authentication or content verification mechanisms.

Examples

  • Using deepfakes to bypass facial recognition
  • Spoofing voice authentication systems

Related Terms

Authentication BypassBiometric SpoofingIdentity Fraud
Content AuthenticationLow Risk
Watermarking
Techniques for embedding invisible markers in AI-generated content to enable detection and verification of synthetic media.

Examples

  • Watermarking AI-generated images
  • Embedding signatures in synthetic text

Related Terms

Content ProvenanceSynthetic Media DetectionDigital Fingerprinting
Attack TechniquesHigh Risk
Zero-Shot Attack
Attacks that work against AI models without requiring prior knowledge of the model's architecture, training data, or parameters.

Examples

  • Attacking unknown models through API queries
  • Using transferable adversarial examples

Related Terms

Black-box AttackQuery-based AttackTransfer Attack
Attack TechniquesHigh Risk
API Poisoning
An attack where malicious data is injected through API endpoints to corrupt AI model training or inference processes, often targeting real-time learning systems.

Examples

  • Injecting malicious feedback through user rating APIs
  • Corrupting recommendation systems via API calls

Related Terms

Data PoisoningAPI SecurityReal-time Learning Attack
Model SecurityMedium Risk
Bias Amplification
The phenomenon where AI systems amplify existing biases in training data, leading to discriminatory outcomes and unfair treatment of certain groups.

Examples

  • Hiring algorithms favoring certain demographics
  • Credit scoring systems with racial bias

Related Terms

Algorithmic BiasFairnessDiscrimination
LLM SecurityHigh Risk
Chain-of-Thought Manipulation
An attack technique that exploits the reasoning process of large language models by manipulating their step-by-step thinking to reach malicious conclusions.

Examples

  • Guiding LLMs to harmful conclusions through flawed reasoning
  • Manipulating multi-step problem solving

Related Terms

Reasoning AttackPrompt EngineeringLogic Manipulation
Attack TechniquesMedium Risk
Distributed Denial of Intelligence
A coordinated attack that overwhelms AI systems with computationally expensive queries, causing service degradation or complete failure.

Examples

  • Flooding LLM APIs with complex reasoning tasks
  • Overloading image generation services

Related Terms

DDoSResource ExhaustionComputational Attack
Agentic SecurityHigh Risk
Emergent Behavior Exploitation
Attacks that exploit unexpected behaviors that emerge from complex AI systems, particularly in multi-agent environments or large-scale deployments.

Examples

  • Exploiting unexpected agent interactions
  • Leveraging emergent communication protocols

Related Terms

Emergent PropertiesSystem ComplexityUnintended Behavior
Model SecurityHigh Risk
Fine-tuning Attack
An attack where adversaries fine-tune pre-trained models on malicious data to introduce backdoors or alter model behavior while maintaining performance on benign tasks.

Examples

  • Fine-tuning language models to generate harmful content
  • Adapting vision models for surveillance evasion

Related Terms

Transfer Learning AttackModel AdaptationBackdoor Injection
Privacy AttacksHigh Risk
Gradient Leakage
A privacy attack where sensitive information about training data is extracted by analyzing gradient updates in federated learning or distributed training scenarios.

Examples

  • Reconstructing images from gradient updates
  • Extracting text from language model gradients

Related Terms

Gradient InversionFederated Learning AttackPrivacy Leakage
Privacy AttacksMedium Risk
Homomorphic Encryption Bypass
Techniques to circumvent privacy-preserving computation methods that allow processing of encrypted data without decryption.

Examples

  • Side-channel attacks on encrypted inference
  • Timing attacks on homomorphic operations

Related Terms

Privacy-Preserving MLEncrypted ComputationCryptographic Attack
LLM SecurityHigh Risk
Instruction Following Subversion
An attack that exploits the instruction-following capabilities of AI systems to make them perform unintended actions while appearing to follow legitimate commands.

Examples

  • Embedding malicious instructions in seemingly benign prompts
  • Chaining instructions to bypass safety measures

Related Terms

Command InjectionInstruction HijackingBehavioral Manipulation
Training SecurityMedium Risk
Knowledge Graph Poisoning
An attack that corrupts knowledge graphs used by AI systems, introducing false relationships or entities to manipulate reasoning and decision-making.

Examples

  • Injecting false facts into knowledge bases
  • Corrupting entity relationships in graph databases

Related Terms

Graph Neural Network AttackKnowledge Base CorruptionSemantic Attack
GenAI SecurityCritical Risk
Latent Space Backdoor
A sophisticated backdoor attack that embeds triggers in the latent space of generative models, activated by specific patterns in the input representation.

Examples

  • Backdoors in VAE latent spaces
  • Trigger patterns in diffusion model embeddings

Related Terms

Representation AttackGenerative Model SecurityHidden Trigger
Attack TechniquesHigh Risk
Multi-Modal Attack
Attacks that exploit vulnerabilities across multiple input modalities (text, image, audio) in multi-modal AI systems to achieve malicious objectives.

Examples

  • Using audio to manipulate vision-language models
  • Cross-modal adversarial examples

Related Terms

Cross-Modal AttackMulti-Modal SecurityModality Fusion
Model SecurityMedium Risk
Neural Architecture Search Poisoning
An attack that corrupts the neural architecture search process to produce models with hidden vulnerabilities or backdoors.

Examples

  • Biasing NAS to select vulnerable architectures
  • Injecting backdoor-prone components

Related Terms

AutoML AttackArchitecture ManipulationSearch Space Poisoning
Training SecurityHigh Risk
Ontology Manipulation
Attacks that alter the conceptual frameworks and taxonomies used by AI systems, leading to misclassification and reasoning errors.

Examples

  • Modifying medical ontologies to cause misdiagnosis
  • Corrupting legal taxonomies in AI systems

Related Terms

Semantic AttackConcept DriftTaxonomy Corruption
LLM SecurityHigh Risk
Prompt Chaining Attack
A sophisticated attack technique that uses a sequence of carefully crafted prompts to gradually manipulate AI systems into performing prohibited actions.

Examples

  • Building up to harmful requests through innocent prompts
  • Chaining context to bypass safety filters

Related Terms

Multi-Step AttackPrompt EngineeringGradual Manipulation