An independent ZeroLeaks benchmark ran against OpenClaw's default agent configuration and scored it 2 out of 100 on prompt security: 84% system-prompt extraction success, 91% prompt-injection success, and full system prompt leakage on turn 1 of interaction.
ZeroLeaks, a widely-used prompt-security benchmark, evaluated OpenClaw's shipped default agent against its standard test battery. The result — 2/100 — was described by the benchmark's maintainers as the worst score recorded against a mainstream AI framework. 84% of system-prompt extraction attempts succeeded. 91% of prompt-injection attacks caused the agent to execute attacker instructions over operator instructions. In one test, the full system prompt (including tool descriptions and internal policies) was exfiltrated in a single turn with no evasion effort. Because OpenClaw agents routinely held sensitive workflows, API keys in memory files, and proprietary skill configurations, the implication was that every deployed agent was treating its configuration as public by default.
The Actual Culprit
OpenClaw used a single system-prompt layer with no output filtering, no instruction-hierarchy enforcement, and no separation between trusted operator instructions and untrusted user input.
Assume it will leak. Do not put credentials, internal URLs, or user data in it. Put those behind tools with access checks.
Without runtime-enforced precedence between operator, user, and tool-output instructions, prompt injection is game over.
Loading comments...