LLM Jailbreaking
LLM Attack Technique
High SeverityMedium ComplexityLLM Category
Techniques to bypass AI safety constraints and content policies through creative prompt engineering and psychological manipulation.
Impact Areas
Policy Violation
Harmful Content Generation
Reputation Damage
Regulatory Compliance
Attack Methodology
Technical approaches and execution methods for this attack
Role-Playing Attacks
Convincing the model to adopt a harmful persona or character
Execution Steps:
- Define a fictional character or scenario
- Gradually escalate the character's permissions
- Request harmful content within the role-play context
- Exploit the model's desire to maintain character consistency
Related Attack Techniques
Prompt Injection
CriticalA critical vulnerability where malicious prompts manipulate LLM behavior to bypass safety measures and execute unintended actions.
Deepfake Generation
HighCreation of synthetic media content using generative AI to impersonate individuals or create false evidence.
Autonomous Exploitation
CriticalAI agents that can independently discover, exploit, and propagate through system vulnerabilities without human intervention.
MCP Server Impersonation
HighMalicious actors impersonating legitimate MCP servers to intercept and manipulate AI model communications.