Boot Resume

Zero-cooperation session recovery after gateway restart. No checkpoints, no hooks, no agent involvement — just reads the evidence and picks up where it left off.

Problem

When the gateway restarts, any in-flight agent turn dies mid-execution. Session history is preserved on disk, but the agent doesn't know it needs to continue. Users must manually tell each interrupted session to resume.

Checkpoint-based approaches require the agent to save state before dying. Unexpected kills (SIGKILL, OOM, power loss) bypass this entirely.

Solution

A deterministic shell script runs on every gateway start via systemd ExecStartPost. No LLM in the detection loop.

CODEBLOCK0

1. Scan — finds sessions updated within the last 20 minutes
Detect — reads the last 5 JSONL lines to classify session state
Resume — schedules a one-shot openclaw cron add --system-event --wake now to inject a continuation prompt

Key insight: the JSONL session files already contain all the evidence needed to detect an interruption — no pre-save required.

Detection Rules

Last JSONL Entry	Status	Meaning
INLINECODE2	INLINECODE3	Tool returned, agent never processed it
INLINECODE4 (empty text)

Install

One command

CODEBLOCK1

Deploys three components:

- boot-resume-check.sh → INLINECODE13
INLINECODE14 → systemd drop-in (triggers script on every gateway start)
INLINECODE15 → systemd user service (triggers script on system wake from sleep/suspend)

Manual

CODEBLOCK2

Verify

CODEBLOCK3

Expected output:
CODEBLOCK4

Test

1. Send a message that triggers a multi-step task (web search, code analysis, etc.)
Wait for the agent to start processing (tool calls in flight)
INLINECODE16
Agent resumes automatically within ~35 seconds

Slash Command

When invoked as /boot-resume, run the script with --no-wait to skip the startup delay:

CODEBLOCK5

Report results to the user: which sessions were resumed, or that none were found.

Configuration

Variable	Default	Description
INLINECODE19	INLINECODE20	How far back to scan for interrupted sessions
INLINECODE21

20s | Delay before injecting the resume event |

Edit at the top of scripts/boot-resume-check.sh.

Features

- Dual trigger — covers both gateway restart (ExecStartPost) and system sleep/wake (systemd sleep.target)
Multi-agent support — scans all agents under ~/.openclaw/agents/, not just INLINECODE25
Smart filtering — skips system, heartbeat, cron, and subagent sessions automatically
Deduplication — respects restart-resume.json to avoid double-resuming planned restarts
Log rotation — auto-truncates log at 1000 lines
Error visibility — Python and cron errors are logged, not swallowed
Unique job names — timestamp-based to prevent conflicts on rapid restarts

Comparison

Approach	Pre-save required	Survives SIGKILL	LLM-free
Checkpoint / snapshot files	Yes	No	No
Pre-restart state dump

Yes | No | No | | Session history replay | Yes | Partial | No | | Post-hoc JSONL detection (this skill) | No | Yes | Yes |

Logs

Output: INLINECODE27

Each run logs: timestamp, scan window, candidate count, per-session status, and whether a resume job was armed.

Limitations

- 20-minute scan window (configurable) — sessions idle longer than this are not resumed
Resume prompt is generic — the agent relies on session context for continuity
Telegram/Discord message queues already handle unprocessed incoming messages — this skill targets mid-execution interruptions
Requires systemd (Linux); macOS users need manual launchd setup

Uninstall

CODEBLOCK6

启动恢复

网关重启后零协作的会话恢复。无需检查点、无需钩子、无需代理参与——只需读取证据，从中断处继续执行。

问题

当网关重启时，任何正在进行的代理轮次都会在执行中途终止。会话历史记录保存在磁盘上，但代理不知道需要继续执行。用户必须手动告诉每个中断的会话恢复执行。

基于检查点的方法要求代理在终止之前保存状态。意外终止（SIGKILL、OOM、断电）完全绕过了这一机制。

解决方案

一个确定性的shell脚本通过systemd的ExecStartPost在每个网关启动时运行。检测循环中无需LLM参与。

┌─────────┐ ┌──────────┐ ┌──────────┐
│ 扫描 │ ──▶ │ 检测 │ ──▶ │ 恢复 │
│ 会话 │ │ JSONL │ │ 添加cron │
│ .json │ │ 尾部 │ │--系统事件│
└─────────┘ └──────────┘ └──────────┘

1. 扫描 — 查找最近20分钟内更新的会话
检测 — 读取最后5行JSONL以分类会话状态
恢复 — 安排一次性openclaw cron add --system-event --wake now注入继续提示

关键洞察：JSONL会话文件已包含检测中断所需的所有证据——无需预先保存。

检测规则

最后JSONL条目	状态	含义
toolResult	已中断	工具已返回，代理未处理
assistant（空文本）

安装

一键安装

bash
bash {baseDir}/install.sh

部署三个组件：

- boot-resume-check.sh → ~/.openclaw/workspace/scripts/
boot-resume.conf → systemd drop-in（每次网关启动时触发脚本）
boot-resume-wake.service → systemd用户服务（系统从睡眠/挂起唤醒时触发脚本）

手动安装

bash
cp {baseDir}/scripts/boot-resume-check.sh ~/.openclaw/workspace/scripts/
chmod +x ~/.openclaw/workspace/scripts/boot-resume-check.sh

mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d
cp {baseDir}/templates/boot-resume.conf ~/.config/systemd/user/openclaw-gateway.service.d/
cp {baseDir}/templates/boot-resume-wake.service ~/.config/systemd/user/

systemctl --user daemon-reload
systemctl --user enable boot-resume-wake.service

验证

bash
systemctl --user restart openclaw-gateway
sleep 20
cat /tmp/openclaw/boot-resume.log

预期输出：

[boot-resume] now=... cut=... (20分钟窗口)
[boot-resume] 扫描代理: main
[boot-resume] 候选: 0 (agent=main)
[boot-resume] 完成

测试

1. 发送一条触发多步骤任务的消息（网络搜索、代码分析等）
等待代理开始处理（进行中的工具调用）
systemctl --user restart openclaw-gateway
代理在约35秒内自动恢复

斜杠命令

当作为/boot-resume调用时，使用--no-wait参数运行脚本以跳过启动延迟：

bash
bash {baseDir}/scripts/boot-resume-check.sh --no-wait

向用户报告结果：哪些会话已恢复，或未找到任何会话。

配置

变量	默认值	描述
WINDOW_MINUTES	20	向后扫描中断会话的时间范围
DELAY

20s | 注入恢复事件前的延迟 |

在scripts/boot-resume-check.sh顶部编辑。

特性

- 双重触发 — 覆盖网关重启（ExecStartPost）和系统睡眠/唤醒（systemd sleep.target）
多代理支持 — 扫描~/.openclaw/agents/下的所有代理，不仅限于main
智能过滤 — 自动跳过系统、心跳、cron和子代理会话
去重 — 尊重restart-resume.json以避免对计划重启进行双重恢复
日志轮转 — 自动截断日志至1000行
错误可见性 — Python和cron错误会被记录，不会吞没
唯一作业名称 — 基于时间戳，防止快速重启时的冲突

对比

方法	需要预先保存	能抵御SIGKILL	无需LLM
检查点/快照文件	是	否	否
重启前状态转储

是 | 否 | 否 | | 会话历史重放 | 是 | 部分 | 否 | | 事后JSONL检测（本技能） | 否 | 是 | 是 |

日志

输出：/tmp/openclaw/boot-resume.log

每次运行记录：时间戳、扫描窗口、候选数量、每个会话的状态，以及是否已武装恢复作业。

局限性

- 20分钟扫描窗口（可配置）——空闲时间超过此值的会话不会恢复
恢复提示是通用的——代理依赖会话上下文实现连续性
Telegram/Discord消息队列已处理未处理的消息——本技能针对执行中中断
需要systemd（Linux）；macOS用户需要手动设置launchd

卸载

bash
rm ~/.config/systemd/user/openclaw-gateway.service.d/boot-resume.conf
systemctl --user disable boot-resume-wake.service 2>/dev/null
rm ~/.config/systemd/user/boot-resume-wake.service
systemctl --user daemon-reload
rm ~/.openclaw/workspace/scripts/boot-resume-check.sh
rm -rf ~/.openclaw/workspace/skills/boot-resume

boot-resume启动恢复

boot-resume

Boot Resume

Problem

Solution

Detection Rules

Install

One command

Manual

Verify

Test

Slash Command

Configuration

Features

Comparison

Logs

Limitations

Uninstall

启动恢复

问题

解决方案

检测规则

安装

一键安装

手动安装

验证

测试

斜杠命令

配置

特性

对比

日志

局限性

卸载

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

boot-resume启动恢复

boot-resume

Boot Resume

Problem

Solution

Detection Rules

Install

One command

Manual

Verify

Test

Slash Command

Configuration

Features

Comparison

Logs

Limitations

Uninstall

启动恢复

问题

解决方案

检测规则

安装

一键安装

手动安装

验证

测试

斜杠命令

配置

特性

对比

日志

局限性

卸载

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement