BACK TO ARCHIVE
2026-03-31P1 CRITICAL
CASE #0065

OpenClaw 3.31 → Emergency 4.1 Patch

OpenClaw 3.31 shipped with a trio of production-killing bugs: gateways hung silently, approval workflow state was wiped on every restart, and the config subsystem entered a restart loop. The team cut an emergency 4.1 release within days.

CONFIRMED
🤖 ROGUE BEHAVIOR
Incident Brief

OpenClaw 3.31's release broke three critical subsystems simultaneously. First, the gateway would enter a silent hang — no error log, no health-check failure, just no responses. Operators only noticed when downstream agents stopped producing output. Second, the approval workflow stored pending approvals in process memory instead of disk, so every restart wiped the queue; users would approve an action, the gateway would restart (for any reason), and the approval would vanish. Third, the config subsystem had a bootstrap dependency loop: on start it tried to read a config field that only existed after the first successful start, triggering a restart loop that only broke when someone manually created the field. Emergency 4.1 was cut within 4 days, adding /tasks and SearXNG search as consolation.

Root Cause

The Actual Culprit

Three independent regressions shipped in one release because there was no smoke-test pass against a clean install, a restart scenario, or a non-trivial approval workflow.

What Was Done
[OK]Mandatory clean-install smoke test in CI
[OK]Approval queue moved to on-disk persistence
[OK]Config bootstrap refactored to remove circular dependency
[OK]Health-check endpoint wired to gateway event loop, not just process liveness
Lessons Learned
activity

Liveness is not readiness

A process can be 'up' while its event loop is dead. Health checks must probe actual request-handling capability.

refresh

Restart is a first-class test scenario

Any queue that doesn't survive a restart is effectively a memory leak. Treat 'what happens on restart' as a release gate.

Comments (0)

Loading comments...

0/1000
Case Info
Case Number
#0065
Severity
🔥P1 CRITICAL
Severity Level
Date
2026-03-31
Affected Systems
OpenClaw Gateway
Approval Workflow
Config Subsystem
Source
twitter
Published: 2026-03-31