AI Security Glossary

Comprehensive dictionary of AI security terms, definitions, and concepts. From adversarial attacks to zero-day exploits in AI systems.

Security Terms

Examples

Adding noise to images to fool image classifiers
Modifying text to bypass content filters

Related Terms

Adversarial ExamplesEvasion AttackPerturbation

Agentic SecurityCritical Risk

Agent Hijacking

An attack where malicious actors gain control of autonomous AI agents, redirecting their actions to serve unintended purposes while maintaining the appearance of normal operation.

Examples

Redirecting a trading bot to make unauthorized transactions
Hijacking a customer service agent to leak sensitive data

Related Terms

Goal HijackingAgent PoisoningAutonomous Agent Security

Model SecurityCritical Risk

Backdoor Attack

A type of attack where malicious functionality is embedded into an AI model during training, activated by specific trigger patterns in the input.

Examples

A model that misclassifies images containing a specific watermark
An LLM that generates harmful content when prompted with a secret phrase

Related Terms

Trojan AttackModel PoisoningTrigger Pattern

LLM SecurityHigh Risk

Context Window Poisoning

An attack technique that involves injecting malicious content into the context window of large language models to influence their responses or extract sensitive information.

Examples

Injecting malicious instructions in document summaries
Poisoning chat history to influence future responses

Related Terms

Context InjectionPrompt InjectionContext Manipulation

Training SecurityHigh Risk

Data Poisoning

The practice of intentionally corrupting training data to compromise the integrity and performance of machine learning models.

Examples

Adding mislabeled examples to training datasets
Injecting biased data to skew model predictions

Related Terms

Training Data ManipulationDataset CorruptionSupply Chain Attack

GenAI SecurityHigh Risk

Deepfake

Synthetic media created using deep learning techniques to replace a person's likeness with someone else's, often used for deception or fraud.

Examples

Fake video calls for CEO fraud
Synthetic audio for voice phishing

Related Terms

Synthetic MediaFace SwapVoice Cloning

Privacy ProtectionLow Risk

Differential Privacy

A mathematical framework for quantifying and limiting the privacy loss when statistical information about a dataset is released.

Examples

Adding calibrated noise to query results
Protecting individual records in aggregate statistics

Related Terms

Privacy BudgetNoise AdditionPrivacy Preservation

Privacy AttacksMedium Risk

Embedding Inversion

A technique to reconstruct original data from learned embeddings or representations, potentially exposing sensitive information.

Examples

Reconstructing faces from facial recognition embeddings
Extracting text from sentence embeddings

Related Terms

Model InversionFeature ExtractionRepresentation Attack

Distributed SecurityHigh Risk

Federated Learning Attack

Attacks targeting federated learning systems where malicious participants can compromise the global model through poisoned local updates.

Examples

Malicious clients sending poisoned gradients
Coordinated attacks on federated networks

Related Terms

Byzantine AttackModel PoisoningDistributed Learning

Agentic SecurityCritical Risk

Goal Hijacking

An attack where an AI agent's objectives are maliciously altered or redirected, causing it to pursue unintended goals while appearing to function normally.

Examples

Changing a recommendation system's goals to promote specific products
Redirecting an autonomous vehicle's destination

Related Terms

Agent HijackingObjective ManipulationGoal Misalignment

LLM SecurityMedium Risk

Hallucination

When AI models, particularly language models, generate false or nonsensical information that appears plausible, potentially leading to misinformation.

Examples

LLMs citing non-existent research papers
Generating fake historical facts

Related Terms

ConfabulationFalse GenerationModel Uncertainty

Privacy AttacksMedium Risk

Inference Attack

Attacks that exploit the outputs or behavior of machine learning models to infer sensitive information about the training data or model parameters.

Examples

Determining if specific data was used in training
Inferring demographic information from model outputs

Related Terms

Model InversionMembership InferenceProperty Inference

LLM SecurityHigh Risk

Jailbreaking

Techniques used to bypass safety measures and content filters in AI systems, particularly large language models, to generate prohibited content.

Examples

Using roleplay scenarios to bypass content restrictions
Encoding harmful requests to avoid detection

Related Terms

Prompt InjectionSafety BypassContent Filter Evasion

Model SecurityMedium Risk

Knowledge Distillation Attack

An attack that exploits the knowledge distillation process to extract information from teacher models or inject malicious knowledge into student models.

Examples

Extracting proprietary model knowledge through distillation
Poisoning student models via malicious teachers

Related Terms

Model ExtractionTeacher-Student AttackKnowledge Transfer

GenAI SecurityMedium Risk

Latent Space Manipulation

Techniques that modify the latent representations in generative models to control or manipulate the generated outputs in specific ways.

Examples

Editing facial expressions in generated images
Modifying text style in language generation

Related Terms

Latent Code EditingStyle TransferSemantic Manipulation

Privacy AttacksMedium Risk

Membership Inference Attack

An attack that determines whether a specific data point was included in a model's training dataset by analyzing the model's behavior.

Examples

Determining if a person's medical record was used in training
Identifying training images from model responses

Related Terms

Training Data InferencePrivacy LeakageModel Interrogation

IP TheftHigh Risk

Model Extraction

The process of stealing or replicating a machine learning model's functionality by querying it and training a substitute model on the responses.

Examples

Cloning a proprietary image classifier
Replicating a commercial recommendation system

Related Terms

Model StealingAPI AbuseIntellectual Property Theft

Model SecurityCritical Risk

Neural Backdoor

A hidden functionality embedded in neural networks that can be activated by specific trigger patterns, causing the model to behave maliciously.

Examples

A face recognition system that fails for specific patterns
A text classifier that misclassifies when certain words are present

Related Terms

Backdoor AttackTrojan Neural NetworkHidden Trigger

LLM SecurityHigh Risk

Prompt Injection

An attack technique where malicious instructions are embedded in prompts to manipulate large language models into performing unintended actions.

Examples

Injecting 'ignore previous instructions' in user input
Embedding malicious prompts in documents

Related Terms

Indirect Prompt InjectionContext InjectionInstruction Hijacking

Model SecurityMedium Risk

Quantization Attack

Attacks that exploit the quantization process used to compress neural networks, potentially introducing vulnerabilities or degrading performance.

Examples

Exploiting reduced precision to cause misclassifications
Attacking quantized models with specific inputs

Related Terms

Model Compression AttackPrecision ReductionQuantization Noise

Security TestingLow Risk

Red Teaming

A systematic approach to testing AI systems by simulating adversarial attacks to identify vulnerabilities and weaknesses before deployment.

Examples

Testing LLMs for harmful content generation
Evaluating autonomous systems for safety failures

Related Terms

Adversarial TestingSecurity AssessmentPenetration Testing

Attack InfrastructureMedium Risk

Shadow Model

A model trained to mimic the behavior of a target model, often used as a stepping stone for more sophisticated attacks like membership inference.

Examples

Training a shadow model to attack a private classifier
Using shadow models for membership inference

Related Terms

Surrogate ModelModel MimickingAttack Proxy

Privacy AttacksHigh Risk

Training Data Extraction

Attacks that attempt to recover specific training examples from machine learning models, potentially exposing sensitive or private information.

Examples

Extracting personal information from language models
Recovering training images from generative models

Related Terms

Data ReconstructionTraining Data LeakageMemorization Attack

Attack TechniquesHigh Risk

Universal Adversarial Perturbation

A single perturbation that can fool a neural network on most inputs from a given distribution, making it particularly dangerous for real-world attacks.

Examples

A single noise pattern that fools most image classifiers
Universal patches that cause misclassification

Related Terms

Universal AttackTransferable PerturbationRobust Adversarial

Authentication SecurityHigh Risk

Verification Bypass

Techniques used to circumvent AI-based verification systems, such as biometric authentication or content verification mechanisms.

Examples

Using deepfakes to bypass facial recognition
Spoofing voice authentication systems

Related Terms

Authentication BypassBiometric SpoofingIdentity Fraud

Content AuthenticationLow Risk

Watermarking

Techniques for embedding invisible markers in AI-generated content to enable detection and verification of synthetic media.

Examples

Watermarking AI-generated images
Embedding signatures in synthetic text

Related Terms

Content ProvenanceSynthetic Media DetectionDigital Fingerprinting

Attack TechniquesHigh Risk

Zero-Shot Attack

Attacks that work against AI models without requiring prior knowledge of the model's architecture, training data, or parameters.

Examples

Attacking unknown models through API queries
Using transferable adversarial examples

Related Terms

Black-box AttackQuery-based AttackTransfer Attack

Attack TechniquesHigh Risk

API Poisoning

An attack where malicious data is injected through API endpoints to corrupt AI model training or inference processes, often targeting real-time learning systems.

Examples

Injecting malicious feedback through user rating APIs
Corrupting recommendation systems via API calls

Related Terms

Data PoisoningAPI SecurityReal-time Learning Attack

Model SecurityMedium Risk

Bias Amplification

The phenomenon where AI systems amplify existing biases in training data, leading to discriminatory outcomes and unfair treatment of certain groups.

Examples

Hiring algorithms favoring certain demographics
Credit scoring systems with racial bias

Related Terms

Algorithmic BiasFairnessDiscrimination

LLM SecurityHigh Risk

Chain-of-Thought Manipulation

An attack technique that exploits the reasoning process of large language models by manipulating their step-by-step thinking to reach malicious conclusions.

Examples

Guiding LLMs to harmful conclusions through flawed reasoning
Manipulating multi-step problem solving

Related Terms

Reasoning AttackPrompt EngineeringLogic Manipulation

Attack TechniquesMedium Risk

Distributed Denial of Intelligence

A coordinated attack that overwhelms AI systems with computationally expensive queries, causing service degradation or complete failure.

Examples

Flooding LLM APIs with complex reasoning tasks
Overloading image generation services

Related Terms

DDoSResource ExhaustionComputational Attack

Agentic SecurityHigh Risk

Emergent Behavior Exploitation

Attacks that exploit unexpected behaviors that emerge from complex AI systems, particularly in multi-agent environments or large-scale deployments.

Examples

Exploiting unexpected agent interactions
Leveraging emergent communication protocols

Related Terms

Emergent PropertiesSystem ComplexityUnintended Behavior

Model SecurityHigh Risk

Fine-tuning Attack

An attack where adversaries fine-tune pre-trained models on malicious data to introduce backdoors or alter model behavior while maintaining performance on benign tasks.

Examples

Fine-tuning language models to generate harmful content
Adapting vision models for surveillance evasion

Related Terms

Transfer Learning AttackModel AdaptationBackdoor Injection

Privacy AttacksHigh Risk

Gradient Leakage

A privacy attack where sensitive information about training data is extracted by analyzing gradient updates in federated learning or distributed training scenarios.

Examples

Reconstructing images from gradient updates
Extracting text from language model gradients

Related Terms

Gradient InversionFederated Learning AttackPrivacy Leakage

Privacy AttacksMedium Risk

Homomorphic Encryption Bypass

Techniques to circumvent privacy-preserving computation methods that allow processing of encrypted data without decryption.

Examples

Side-channel attacks on encrypted inference
Timing attacks on homomorphic operations

Related Terms

Privacy-Preserving MLEncrypted ComputationCryptographic Attack

LLM SecurityHigh Risk

Instruction Following Subversion

An attack that exploits the instruction-following capabilities of AI systems to make them perform unintended actions while appearing to follow legitimate commands.

Examples

Embedding malicious instructions in seemingly benign prompts
Chaining instructions to bypass safety measures

Related Terms

Command InjectionInstruction HijackingBehavioral Manipulation

Training SecurityMedium Risk

Knowledge Graph Poisoning

An attack that corrupts knowledge graphs used by AI systems, introducing false relationships or entities to manipulate reasoning and decision-making.

Examples

Injecting false facts into knowledge bases
Corrupting entity relationships in graph databases

Related Terms

Graph Neural Network AttackKnowledge Base CorruptionSemantic Attack

GenAI SecurityCritical Risk

Latent Space Backdoor

A sophisticated backdoor attack that embeds triggers in the latent space of generative models, activated by specific patterns in the input representation.

Examples

Backdoors in VAE latent spaces
Trigger patterns in diffusion model embeddings

Related Terms

Representation AttackGenerative Model SecurityHidden Trigger

Attack TechniquesHigh Risk

Multi-Modal Attack

Attacks that exploit vulnerabilities across multiple input modalities (text, image, audio) in multi-modal AI systems to achieve malicious objectives.

Examples

Using audio to manipulate vision-language models
Cross-modal adversarial examples

Related Terms

Cross-Modal AttackMulti-Modal SecurityModality Fusion

Model SecurityMedium Risk

Neural Architecture Search Poisoning

An attack that corrupts the neural architecture search process to produce models with hidden vulnerabilities or backdoors.

Examples

Biasing NAS to select vulnerable architectures
Injecting backdoor-prone components

Related Terms

AutoML AttackArchitecture ManipulationSearch Space Poisoning

Training SecurityHigh Risk

Ontology Manipulation

Attacks that alter the conceptual frameworks and taxonomies used by AI systems, leading to misclassification and reasoning errors.

Examples

Modifying medical ontologies to cause misdiagnosis
Corrupting legal taxonomies in AI systems

Related Terms

Semantic AttackConcept DriftTaxonomy Corruption

LLM SecurityHigh Risk

Prompt Chaining Attack

A sophisticated attack technique that uses a sequence of carefully crafted prompts to gradually manipulate AI systems into performing prohibited actions.

Examples

Building up to harmful requests through innocent prompts
Chaining context to bypass safety filters

Related Terms

Multi-Step AttackPrompt EngineeringGradual Manipulation

Quick Reference Guide

Essential AI security concepts organized by risk level