Gateway Health Monitor
Diagnose and fix OpenClaw gateway stability issues on macOS. Covers the most common failure modes that cause extended downtime.
Quick Diagnosis
Run the diagnostic script:
CODEBLOCK0
This checks: process state, launchd classification, restart count, plist config, power management, and plugin resolve loops.
Common Failure Modes
1. Plugin Restart Loop (most common)
Symptoms: Gateway restarts every 5-7 minutes. Log shows restartReason=config.patch with plugins.installs.*.resolvedAt.
Cause: Plugins re-resolve on every boot → write new timestamps to openclaw.json → config watcher detects "change" → triggers deferred restart → SIGTERM → repeat.
Fix: Set gateway.reload.mode to "hot":
CODEBLOCK1
In hot mode, safe changes hot-apply instantly. Critical changes (like plugin timestamps) only log a warning — no auto-restart. This breaks the loop.
Verify: grep "reload" ~/.openclaw/logs/gateway.log | tail -5 should show config change applied (dynamic reads) instead of restart.
2. macOS Throttling ("inefficient" classification)
Symptoms: Gateway goes down and stays down for 30-60+ minutes. launchctl print shows immediate reason = inefficient.
Cause: After many restarts (10+/day), macOS marks the job as low-priority and delays restarts via App Nap / Power Nap logic.
Fix: Add these keys to the launchd plist (~/Library/LaunchAgents/ai.openclaw.gateway.plist):
CODEBLOCK2
Then reload:
CODEBLOCK3
Note: openclaw gateway start overwrites the plist. Use the patcher script (below) to auto-reapply.
3. Hung Shutdown
Symptoms: Gateway receives SIGTERM but doesn't exit. launchd can't restart because old PID still alive.
Fix: Set ExitTimeOut in the plist:
CODEBLOCK4
After 10 seconds, launchd sends SIGKILL.
4. Power Nap Interference
Symptoms: Gateway goes down during Mac sleep/wake cycles.
Check: INLINECODE14
Fix: INLINECODE15
Plist Auto-Patcher
Since openclaw gateway start overwrites the plist, use scripts/patch-plist.sh as a launchd WatchPaths agent:
CODEBLOCK5
This creates a launchd agent that watches the gateway plist and re-adds ExitTimeOut, ProcessType, and LowPriorityBackgroundIO within seconds of any overwrite.
Monitoring
One-liner health check
CODEBLOCK6
Returns exit code 0 if healthy, 1 if issues detected. Suitable for cron or heartbeat integration.
Continuous monitoring (cron integration)
Add to your OpenClaw cron:
CODEBLOCK7
Recommended Configuration
For maximum stability on macOS:
CODEBLOCK8
Plus plist keys: ExitTimeOut=10, ProcessType=Interactive, LowPriorityBackgroundIO=false, ThrottleInterval=1, KeepAlive=true.
Troubleshooting Reference
| Symptom | Check | Fix |
|---|
| Restarts every 5-7 min | INLINECODE26 | INLINECODE27 |
| Down 30-60+ min |
launchctl print → "inefficient" | ProcessType=Interactive |
| Won't exit on SIGTERM |
ps -p PID after SIGTERM | ExitTimeOut=10 |
| Down after sleep |
pmset -g \| grep powernap |
pmset -a powernap 0 |
| Plugin timestamps changing |
grep resolvedAt openclaw.json |
reload.mode = "hot" |
网关健康监控
诊断并修复macOS上OpenClaw网关的稳定性问题。涵盖导致长时间停机的常见故障模式。
快速诊断
运行诊断脚本:
bash
bash scripts/diagnose.sh
该脚本检查:进程状态、launchd分类、重启次数、plist配置、电源管理以及插件解析循环。
常见故障模式
1. 插件重启循环(最常见)
症状:网关每5-7分钟重启一次。日志显示restartReason=config.patch,附带plugins.installs.*.resolvedAt。
原因:插件在每次启动时重新解析→向openclaw.json写入新时间戳→配置监视器检测到变更→触发延迟重启→SIGTERM→循环重复。
修复:将gateway.reload.mode设置为hot:
bash
openclaw config set gateway.reload.mode hot
在hot模式下,安全变更会立即热应用。关键变更(如插件时间戳)仅记录警告——不会自动重启。这打破了循环。
验证:grep reload ~/.openclaw/logs/gateway.log | tail -5应显示config change applied (dynamic reads)而非restart。
2. macOS节流(低效分类)
症状:网关宕机并持续30-60分钟以上。launchctl print显示immediate reason = inefficient。
原因:多次重启后(每天10次以上),macOS将该任务标记为低优先级,并通过App Nap / Power Nap逻辑延迟重启。
修复:在launchd plist(~/Library/LaunchAgents/ai.openclaw.gateway.plist)中添加以下键:
xml
ProcessType
Interactive
LowPriorityBackgroundIO
然后重新加载:
bash
launchctl bootout gui/$(id -u)/ai.openclaw.gateway
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/ai.openclaw.gateway.plist
注意:openclaw gateway start会覆盖plist。使用下面的修补脚本自动重新应用。
3. 挂起关闭
症状:网关收到SIGTERM但未退出。launchd因旧PID仍存活而无法重启。
修复:在plist中设置ExitTimeOut:
xml
ExitTimeOut
10
10秒后,launchd发送SIGKILL。
4. Power Nap干扰
症状:网关在Mac睡眠/唤醒周期中宕机。
检查:pmset -g | grep powernap
修复:sudo pmset -a powernap 0
Plist自动修补器
由于openclaw gateway start会覆盖plist,使用scripts/patch-plist.sh作为launchd WatchPaths代理:
bash
安装修补器
bash scripts/install-patcher.sh
这将创建一个launchd代理,监视网关plist,并在任何覆盖发生后几秒内重新添加ExitTimeOut、ProcessType和LowPriorityBackgroundIO。
监控
单行健康检查
bash
bash scripts/health-check.sh
健康时返回退出码0,检测到问题时返回1。适用于cron或心跳集成。
持续监控(cron集成)
添加到您的OpenClaw cron:
检查网关健康:bash ~/path/to/scripts/health-check.sh && echo 网关健康 || echo 警报:检测到网关问题
推荐配置
为在macOS上获得最大稳定性:
json5
{
gateway: {
reload: { mode: hot },
},
}
加上plist键:ExitTimeOut=10、ProcessType=Interactive、LowPriorityBackgroundIO=false、ThrottleInterval=1、KeepAlive=true。
故障排除参考
| 症状 | 检查 | 修复 |
|---|
| 每5-7分钟重启 | grep restartReason gateway.log | reload.mode = hot |
| 宕机30-60分钟以上 |
launchctl print → inefficient | ProcessType=Interactive |
| SIGTERM后不退出 | ps -p PID after SIGTERM | ExitTimeOut=10 |
| 睡眠后宕机 | pmset -g \| grep powernap | pmset -a powernap 0 |
| 插件时间戳变化 | grep resolvedAt openclaw.json | reload.mode = hot |