Multi-Agent Collusion

Agentic AI Attack Technique

High SeverityHigh ComplexityAgentic AI Category

Coordinated behavior between multiple autonomous agents that collaborate to bypass safeguards, share sensitive information, or execute complex attack chains.

Impact Areas

Defense Evasion

Policy Bypass

Data Exfiltration

Stealthy Persistence

Attack Methodology

Technical approaches and execution methods for this attack

Cooperative Policy Evasion

Agents split malicious goals across conversations and tools to avoid single-agent detection.

Execution Steps:

Deploy multiple agents with overlapping but seemingly benign tasks.
Distribute sensitive operations (e.g., data access, code execution, exfiltration) across agents.
Use indirect messaging channels (shared memory, RAG stores, queues) for coordination.
Gradually escalate capabilities while staying below per-agent anomaly thresholds.

Cross-Agent Information Laundering

One agent acquires sensitive data while others transform, summarize, or route it to evade DLP controls.

Execution Steps:

Configure one agent with broad read access to logs, documents, or APIs.
Have a second agent summarize or chunk sensitive data to reduce detectability.
Have a third agent embed, encode, or obfuscate the output (e.g., via steganography or encoding).
Exfiltrate the transformed content through low-sensitivity channels (tickets, chat, comments).

Related Attack Techniques

Prompt Injection

A critical vulnerability where malicious prompts manipulate LLM behavior to bypass safety measures and execute unintended actions.

LLM Jailbreaking

Techniques to bypass AI safety constraints and content policies through creative prompt engineering and psychological manipulation.

Deepfake Generation

Creation of synthetic media content using generative AI to impersonate individuals or create false evidence.

Autonomous Exploitation

AI agents that can independently discover, exploit, and propagate through system vulnerabilities without human intervention.

Tool-Chain Privilege Escalation

Abusing over-permissioned tools, misconfigured connectors, and chained actions to escalate privileges across systems controlled by AI agents.

Long-Horizon Goal Drift

Subtle misalignment of agent objectives over long-running tasks or sessions, leading to unsafe emergent behaviors that diverge from original intent.

MCP Server Impersonation

Malicious actors impersonating legitimate MCP servers to intercept and manipulate AI model communications.

Related Security Research

Explore related AI security topics and vulnerability analysis

attack

Autonomous Exploitation

Self-directed AI systems performing unauthorized security testing

autonomous exploitationAI red teaming

attack

Prompt Injection Attacks

Critical vulnerability analysis for LLM prompt manipulation techniques

prompt injectionLLM jailbreaking

attack

Model Inversion Attacks

Advanced privacy attacks for extracting training data from language models

model inversiondata extraction

attack

Deepfake Generation Threats

Analysis of malicious deepfake creation and detection challenges

deepfake generationsynthetic identity

attack

Voice Cloning Attacks

Security implications of AI-powered voice synthesis and impersonation

voice cloningaudio deepfakes

attack

Server Impersonation Attacks

MCP protocol vulnerabilities enabling malicious server impersonation

server impersonationMCP protocol