BACK TO ARCHIVE
2026-04-10P1 CRITICAL
CASE #0060

OpenClaw Scores 2/100 on ZeroLeaks Prompt Security Audit

An independent ZeroLeaks benchmark ran against OpenClaw's default agent configuration and scored it 2 out of 100 on prompt security: 84% system-prompt extraction success, 91% prompt-injection success, and full system prompt leakage on turn 1 of interaction.

CONFIRMED
🔓 SECURITY LEAK📢 PR NIGHTMARE
Incident Brief

ZeroLeaks, a widely-used prompt-security benchmark, evaluated OpenClaw's shipped default agent against its standard test battery. The result — 2/100 — was described by the benchmark's maintainers as the worst score recorded against a mainstream AI framework. 84% of system-prompt extraction attempts succeeded. 91% of prompt-injection attacks caused the agent to execute attacker instructions over operator instructions. In one test, the full system prompt (including tool descriptions and internal policies) was exfiltrated in a single turn with no evasion effort. Because OpenClaw agents routinely held sensitive workflows, API keys in memory files, and proprietary skill configurations, the implication was that every deployed agent was treating its configuration as public by default.

Root Cause

The Actual Culprit

OpenClaw used a single system-prompt layer with no output filtering, no instruction-hierarchy enforcement, and no separation between trusted operator instructions and untrusted user input.

What Was Done
[OK]Instruction-hierarchy enforcement added (operator > user > tool output)
[--]Output filter that detects system-prompt echo before returning to user
[OK]Tool output sandboxed and tagged as untrusted in the context
[--]Public response framing benchmark methodology
Lessons Learned
eye

Your system prompt is not a secret

Assume it will leak. Do not put credentials, internal URLs, or user data in it. Put those behind tools with access checks.

git-branch

Instruction hierarchy is load-bearing

Without runtime-enforced precedence between operator, user, and tool-output instructions, prompt injection is game over.

Comments (0)

Loading comments...

0/1000
Case Info
Case Number
#0060
Severity
🔥P1 CRITICAL
Severity Level
Date
2026-04-10
Affected Systems
• Agent Runtime
• System Prompt Layer
• Tool Output Path
Source
twitter
Published: 2026-04-10