Linux System Health Diagnostic Skill
You are a Linux OS diagnostic expert. When a user reports any of the following problems, use this skill:
- - Performance: server slow, high load, lag, unresponsive
- Memory: OOM killed, out of memory, memory leak, swap thrashing
- Disk: disk full, read-only filesystem, inode exhaustion, log files too large
- CPU: high CPU, IO wait, process stuck, load average spike
- Network: DNS failure, connection timeout, port exhaustion, CLOSE_WAIT accumulation, firewall blocking
- Process: crash, zombie processes, too many open files, file descriptor limit
- Browser automation: missing shared libraries, Chromium sandbox error, headless browser failures
- Locale/Encoding: garbled text, character encoding issues, locale not configured
Use the judgment rules below to systematically diagnose OS-level root causes.
When NOT to use this skill: For application-level issues specific to OpenClaw (gateway config, API keys, model configuration, service management, systemd units), use the openclaw-diagnostic skill instead. This skill only covers OS-level diagnostics.
Diagnostic workflow:
- 1. Always start with Section 1 (System Environment Baseline) to establish context
- Then run the sections relevant to the user's reported symptoms
- If the root cause is unclear, run all sections in order for a comprehensive check
Commands: Run the corresponding section in scripts/diagnostics.sh. Run as root with export LANG=C.
Issue Registry: See reference.md for severity level definitions and the complete issue name table.
Data access scope — this skill collects OS-level diagnostic data. Review before running in sensitive environments:
| Category | What is accessed | Sections |
|---|
| System config files | INLINECODE2 , /etc/resolv.conf, /etc/security/limits.conf, /etc/default/locale, /etc/locale.conf, INLINECODE7 | 1, 6, 8, 11, 17 |
| Kernel interfaces |
/proc/meminfo,
/proc/stat,
/proc/loadavg,
/proc/sys/fs/*,
/proc/sys/net/*,
/sys/kernel/mm/* | 2, 3, 5, 6, 7, 14 |
| Kernel ring buffer |
dmesg — may contain process names and OOM kill details | 2, 7, 12 |
| Systemd journal |
journalctl -k — kernel messages only | 2 |
| Log directory |
/var/log/ size enumeration only (does
not read log content) | 11 |
| Process & socket table |
ps,
ss -p — exposes PIDs, command names, socket owners | 2, 3, 10, 15 |
| User home directories |
/root/.cache/ms-playwright,
/home/*/.cache/ms-playwright — Chromium binary search only | 16 |
| Outbound network probes | DNS resolution tests (
nslookup/
dig/
getent to
github.com), nameserver TCP/53 reachability, Chrome headless launch test (
about:blank) | 8, 16 |
| Write operation | Creates and immediately removes
/tmp/.oc_write_test to verify filesystem writability — the
only write in the entire script | 12 |
Output format: After running diagnostics, report findings as a severity-sorted list (FATAL > CRITICAL > ERROR > WARNING > INFO). For each issue found, include:
- - Issue name (e.g.,
OpenClaw.Memory.SystemMemoryCritical) - Severity level
- Observed value vs threshold
- Recommended remediation
1. System Environment Baseline
Collect OS context for subsequent analysis.
Judgment rules:
- - Record output as OpenClaw.System.EnvironmentBaseline (INFO) — no issues, context only.
2. Memory & OOM
Detect low memory and past OOM kills that affect any workload on this server.
Judgment rules:
- - MemAvailable / MemTotal < 5% → OpenClaw.Memory.SystemMemoryCritical (CRITICAL)
- Remediation: Kill unnecessary processes, add swap, or increase instance RAM
- - MemAvailable / MemTotal < 10% → OpenClaw.Memory.SystemMemoryLow (WARNING)
- Remediation: Monitor closely; consider scaling up
- - MemTotal < 2 GB → OpenClaw.Memory.InsufficientTotalMemory (ERROR)
- Remediation: 4 GB+ RAM recommended for production workloads
- - dmesg contains "oom-killer" → OpenClaw.Memory.OOMKillerEvent (WARNING)
- Remediation: Identify which processes were killed; review memory allocation
3. CPU & Performance
Resource contention causes slow responses; high iowait indicates disk bottlenecks.
Judgment rules:
- - Load average (1 min) > 2x
nproc → OpenClaw.CPU.SystemLoadHigh (WARNING)
- Remediation: Identify top CPU consumers; check for runaway processes
- - CPU idle < 10% (i.e., total utilization > 90%) → OpenClaw.CPU.SystemCPUExhausted (CRITICAL)
- Remediation: Identify top process; check for log flooding or computation storms
- - iowait > 30% (from
/proc/stat) → OpenClaw.CPU.HighIOWait (WARNING)
- Remediation: Check disk I/O — likely excessive logging or disk-bound workload
4. Network Infrastructure
Basic network configuration, DNS, IPv6, and firewall state.
Judgment rules:
- - IPv6 enabled and services bind
:: but upstream resolves to IPv4 only → OpenClaw.Network.IPv6Mismatch (WARNING)
- Remediation: Set
NODE_OPTIONS='--dns-result-order=ipv4first' or
sysctl -w net.ipv6.conf.all.disable_ipv6=1
5. Disk & inotify
Disk space exhaustion and inotify limits cause "ENOSPC" errors.
Judgment rules:
- - Any filesystem usage >= 95% → OpenClaw.Disk.FilesystemFull (CRITICAL)
- Remediation: Clean old logs and data; extend partition or add disk
- - Any filesystem usage >= 80% → OpenClaw.Disk.FilesystemHighUsage (WARNING)
- Remediation: Monitor; plan cleanup or expansion
- -
max_user_watches < 65536 → OpenClaw.Disk.InotifyWatchesTooLow (ERROR)
- Remediation:
echo 'fs.inotify.max_user_watches=524288' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf
- -
max_user_instances < 256 → OpenClaw.Disk.InotifyInstancesTooLow (WARNING)
- Remediation:
echo 'fs.inotify.max_user_instances=512' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf
6. File Descriptor & Process Limits
Low ulimits cause "too many open files" (EMFILE) errors under load.
Judgment rules:
- - Shell
ulimit -n < 4096 → OpenClaw.Limits.NofileTooLow (ERROR)
- Remediation: Add
* soft nofile 65536 and
* hard nofile 65536 to
/etc/security/limits.conf; re-login
- - limits.conf
nofile value > fs.nr_open → OpenClaw.Limits.NofileExceedsKernelMax (CRITICAL)
- Remediation: Increase
fs.nr_open first:
sysctl -w fs.nr_open=1048576 and persist in
/etc/sysctl.d/
- -
file-nr allocated / max > 80% → OpenClaw.Limits.SystemFileDescriptorsHigh (WARNING)
- Remediation: Identify processes holding many FDs (
ls /proc/*/fd 2>/dev/null | wc -l); increase
fs.file-max if needed
7. Kernel & Sysctl Tuning
nf_conntrack, TCP tuning, and somaxconn affect high-concurrency workloads.
Judgment rules:
- -
nf_conntrack_max < 65536 → OpenClaw.Kernel.NfConntrackMaxTooLow (ERROR)
- Remediation:
sysctl -w net.netfilter.nf_conntrack_max=262144 and persist in
/etc/sysctl.d/99-sysctl.conf
- - dmesg contains "nf_conntrack: table full" → OpenClaw.Kernel.NfConntrackTableFull (CRITICAL)
- Remediation: Increase
nf_conntrack_max; check for connection leaks
- -
somaxconn < 1024 → OpenClaw.Kernel.SomaxconnTooLow (WARNING)
- Remediation:
sysctl -w net.core.somaxconn=4096 and persist
- -
tcp_max_tw_buckets < 10000 → OpenClaw.Kernel.TcpMaxTwBucketsTooLow (WARNING)
- Remediation:
sysctl -w net.ipv4.tcp_max_tw_buckets=262144
- -
tcp_tw_reuse = 0 → OpenClaw.Kernel.TcpTwReuseNotEnabled (WARNING)
- Remediation:
sysctl -w net.ipv4.tcp_tw_reuse=1
- - TIME_WAIT count from
ss -s > 10000 → OpenClaw.Kernel.TimeWaitOverflow (WARNING)
- Remediation: Enable
tcp_tw_reuse, increase
tcp_max_tw_buckets, reduce
tcp_fin_timeout
- - ListenOverflows > 0 in
/proc/net/netstat → OpenClaw.Kernel.TcpListenOverflows (WARNING)
- Remediation: Increase
somaxconn and application backlog setting
- -
vm.overcommit_memory = 2 and swap < 1 GB → OpenClaw.Kernel.StrictOvercommitWithLowSwap (WARNING)
- Remediation: Add swap space or set
vm.overcommit_memory=0
8. DNS Resolution Health
Broken or slow DNS causes EAI_AGAIN errors, API timeouts, and silent connectivity failures.
Judgment rules:
- -
/etc/resolv.conf is empty or has zero nameserver lines → OpenClaw.Network.NoDNSNameservers (ERROR)
- Remediation: Add nameservers — e.g.,
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf; for systemd-resolved check
/etc/systemd/resolved.conf
- -
nslookup, dig, and getent all fail for a known-good domain → OpenClaw.Network.DNSResolutionFailed (CRITICAL)
- Remediation: Verify network connectivity; check if nameservers are reachable; inspect firewall rules blocking UDP/TCP port 53
- - Any configured nameserver fails TCP/53 reachability test → OpenClaw.Network.DNSNameserverUnreachable (WARNING)
- Remediation: Replace unreachable nameserver in
/etc/resolv.conf; consider adding a backup nameserver
9. Time Synchronization
Clock drift causes SSL/TLS certificate validation failures, API auth token rejection, and log timestamp inconsistencies.
Judgment rules:
- - None of
chronyd, ntpd, or systemd-timesyncd is active → OpenClaw.Time.NTPServiceNotRunning (ERROR)
- Remediation: Install and enable a time sync service —
yum install chrony && systemctl enable --now chronyd (RHEL/CentOS) or
apt install chrony && systemctl enable --now chronyd (Debian/Ubuntu)
- -
timedatectl shows "NTP synchronized: no" → OpenClaw.Time.ClockNotSynchronized (CRITICAL)
- Remediation: Start NTP service; verify NTP server reachability (
chronyc sources or
ntpq -p); check firewall allows UDP port 123
- -
chronyc tracking shows system clock offset > 3 seconds, or hwclock drift > 5 seconds from system time → OpenClaw.Time.ClockDriftDetected (WARNING)
- Remediation: Force sync —
chronyc makestep or
ntpdate -u pool.ntp.org; investigate why drift occurred (suspended VM, unreachable NTP server)
10. Zombie & D-State Processes
Zombie processes indicate child process leaks; D-state (uninterruptible sleep) processes signal I/O hangs that block system operations.
Judgment rules:
- - Zombie count > 10 → OpenClaw.Process.ZombieProcessesHigh (WARNING)
- Remediation: Identify parent processes (
ps -eo pid,ppid,stat,comm | awk '$3~/Z/'); the parent is not reaping children — restart or fix the parent process
- - D-state process count > 0 → OpenClaw.Process.DStateProcessesFound (CRITICAL)
- Remediation: D-state processes are blocked on I/O — check disk health (
dmesg | grep -i error), NFS mounts (
mount -t nfs), and storage subsystem; these processes cannot be killed normally
- - Total process count > 80% of
kernel.pid_max → OpenClaw.Process.TotalProcessCountHigh (WARNING)
- Remediation: Identify process-spawning storms (
ps -eo user --sort=user | uniq -c | sort -rn | head); increase
kernel.pid_max if needed
11. Systemd Journal & Log Disk Usage
Systemd journal grows unbounded on long-running servers, silently consuming disk space — a common hidden root cause of "disk full" events.
Judgment rules:
- - Journal disk usage > 2 GB → OpenClaw.Logs.JournalDiskUsageHigh (WARNING)
- Remediation:
journalctl --vacuum-size=500M; set
SystemMaxUse=500M in
/etc/systemd/journald.conf and restart
systemd-journald
- -
/var/log total size > 5 GB → OpenClaw.Logs.VarLogOversized (WARNING)
- Remediation: Identify large files (
find /var/log -type f -size +100M); configure logrotate; clean old rotated logs
12. Filesystem Integrity
Read-only filesystem (from ext4/xfs journal errors) prevents writing session data, logs, and PID files. Inode exhaustion produces "No space left on device" even with free disk space.
Judgment rules:
- - Any non-virtual mount has
ro flag, or /tmp write test fails → OpenClaw.Disk.ReadOnlyFilesystem (CRITICAL)
- Remediation: Check
dmesg for filesystem errors; run
fsck on the affected partition (requires unmount or single-user mode); may indicate disk hardware failure
- - Any real filesystem inode usage >= 80% → OpenClaw.Disk.InodeUsageHigh (WARNING)
- Remediation: Find directories with many small files (
find / -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -10); clean up session/temp files
- -
dmesg contains EXT4-fs error, XFS error, or read-only remount messages → OpenClaw.Disk.FilesystemErrorsDetected (CRITICAL)
- Remediation: Back up data immediately; run
fsck at next maintenance window; check disk SMART status (
smartctl -a /dev/sdX)
13. Firewall & Outbound Connectivity
Firewall rules blocking inbound or outbound traffic are the #1 cause of "port not reachable" and "API connection refused" in self-hosted deployments.
Judgment rules:
- - DROP or REJECT rules detected on INPUT or OUTPUT chains → OpenClaw.Network.FirewallDropRulesDetected (WARNING)
- Remediation: Review rules — ensure required ports (gateway port, 443 outbound) are allowed; use
iptables -L -n -v for detailed hit counts
- -
ufw status shows default deny incoming (informational only) → OpenClaw.Network.UFWDefaultDeny (INFO)
- Remediation: No action required if intentional; ensure gateway port is explicitly allowed (
ufw allow <port>/tcp)
14. Transparent Hugepages
THP causes latency spikes and memory fragmentation for Node.js workloads. Multiple database and runtime vendors recommend disabling it on servers.
Judgment rules:
- - THP
enabled is set to [always] → OpenClaw.Kernel.THPEnabled (WARNING)
- Remediation:
echo never > /sys/kernel/mm/transparent_hugepage/enabled; persist via systemd unit or
/etc/rc.local
- - THP
defrag is set to [always] → OpenClaw.Kernel.THPDefragEnabled (INFO)
- Remediation:
echo never > /sys/kernel/mm/transparent_hugepage/defrag; reduces latency spikes from compaction
15. TCP Connection Overload
Excessive network connections exhaust file descriptors, memory, and conntrack table capacity, degrading system-wide performance.
Judgment rules:
- - Total TCP connections > 10000 → OpenClaw.Network.TcpConnectionCountHigh (WARNING)
- Remediation: Identify top connection-holding processes; check for connection leaks; consider connection pooling
- - CLOSE_WAIT count > 500 → OpenClaw.Network.CloseWaitAccumulation (ERROR)
- Remediation: CLOSE_WAIT indicates the local application is not calling
close() on sockets — identify the leaking process and restart it; this is an application bug
- - ESTABLISHED count > 5000 → OpenClaw.Network.EstablishedConnectionsHigh (WARNING)
- Remediation: Review whether all connections are legitimate; check for connection pool exhaustion or slow clients holding connections open
- - Ephemeral ports in use > 80% of available range → OpenClaw.Network.EphemeralPortExhaustion (CRITICAL)
- Remediation: Widen range
sysctl -w net.ipv4.ip_local_port_range='1024 65535'; enable
tcp_tw_reuse; check for connection leaks
16. Headless Browser / Chromium Dependencies
OpenClaw skills that use browser automation (Playwright, Puppeteer) require Chromium shared libraries and headless mode. The diagnostic first tests whether Chrome can actually launch in headless mode. Dependency diagnosis is only performed when Chrome fails or is absent.
Judgment rules:
- - Chrome headless launch test (
--headless=new --dump-dom about:blank) succeeds → no issue, skip dependency checks - Chrome headless launch test fails → proceed with dependency diagnosis below:
- Any of the 7 critical shared library stems (libnss3, libatk-bridge-2.0, libgbm, libxkbcommon, libdrm, libgtk-3, libasound) is absent from
ldconfig -p →
OpenClaw.Browser.ChromiumDependenciesMissing (ERROR)
- Remediation: On Debian/Ubuntu:
apt install -y libnss3 libatk-bridge2.0-0 libgbm1 libxkbcommon0 libdrm2 libgtk-3-0 libasound2; on RHEL/CentOS:
yum install -y nss atk at-spi2-atk mesa-libgbm libxkbcommon libdrm gtk3 alsa-lib
-
ldd on chromium binary shows one or more "not found" entries →
OpenClaw.Browser.ChromiumBinaryLddFailures (CRITICAL)
- Remediation: Install the specific missing libraries identified by
ldd; run
ldconfig after installation to update the dynamic linker cache
-
/proc/sys/kernel/unprivileged_userns_clone is
0 →
OpenClaw.Browser.UserNamespaceDisabled (ERROR)
- Remediation:
sysctl -w kernel.unprivileged_userns_clone=1 and persist in
/etc/sysctl.d/99-userns.conf; or configure Chromium with
--no-sandbox (less secure, not recommended for production)
17. Locale & Encoding Configuration
Missing or misconfigured locale causes garbled text output, incorrect sorting in logs, and subtle bugs like backspace deleting two characters over SSH (when client sends UTF-8 but server expects ASCII). OpenClaw's text processing relies on correct UTF-8 support.
Judgment rules (use the persistent LANG value read from /etc/default/locale or /etc/locale.conf, not the runtime $LANG which may be overridden to C by the diagnostic runner):
- - Persistent
LANG is empty, unset, or set to POSIX/C → OpenClaw.Locale.LocaleNotConfigured (ERROR)
- Remediation: On Debian/Ubuntu:
apt install locales && dpkg-reconfigure locales, then set
LANG=en_US.UTF-8 in
/etc/default/locale; on RHEL/CentOS:
localectl set-locale LANG=en_US.UTF-8
- - The persistent
LANG value does not appear in locale -a output (configured but not generated/installed) → OpenClaw.Locale.LocaleNotGenerated (WARNING)
- Remediation: On Debian/Ubuntu: uncomment the locale in
/etc/locale.gen and run
locale-gen; on RHEL/CentOS:
localedef -i en_US -f UTF-8 en_US.UTF-8
- - Persistent
LANG does not contain UTF-8 or utf8 → OpenClaw.Locale.NonUTF8LocaleDetected (WARNING)
- Remediation: Change to a UTF-8 variant:
localectl set-locale LANG=en_US.UTF-8; re-login for the change to take effect
Linux 系统健康诊断技能
你是一名 Linux 操作系统诊断专家。当用户报告以下任一问题时,请使用此技能:
- - 性能:服务器缓慢、负载高、卡顿、无响应
- 内存:OOM 被杀死、内存不足、内存泄漏、交换分区抖动
- 磁盘:磁盘已满、只读文件系统、inode 耗尽、日志文件过大
- CPU:CPU 使用率高、IO 等待、进程卡死、平均负载飙升
- 网络:DNS 故障、连接超时、端口耗尽、CLOSE_WAIT 累积、防火墙拦截
- 进程:崩溃、僵尸进程、打开文件过多、文件描述符限制
- 浏览器自动化:缺少共享库、Chromium 沙箱错误、无头浏览器故障
- 区域设置/编码:乱码文本、字符编码问题、区域设置未配置
使用以下判断规则系统性地诊断操作系统级别的根本原因。
何时不使用此技能:对于 OpenClaw 特有的应用层问题(网关配置、API 密钥、模型配置、服务管理、systemd 单元),请使用 openclaw-diagnostic 技能。此技能仅涵盖操作系统级别的诊断。
诊断工作流程:
- 1. 始终从第 1 节(系统环境基线)开始,建立上下文
- 然后运行与用户报告症状相关的章节
- 如果根本原因不明确,按顺序运行所有章节进行全面检查
命令:在 scripts/diagnostics.sh 中运行相应章节。以 root 身份运行,并设置 export LANG=C。
问题注册表:参见 reference.md 了解严重级别定义和完整的问题名称表。
数据访问范围 — 此技能收集操作系统级别的诊断数据。在敏感环境中运行前请审阅:
| 类别 | 访问内容 | 章节 |
|---|
| 系统配置文件 | /etc/os-release、/etc/resolv.conf、/etc/security/limits.conf、/etc/default/locale、/etc/locale.conf、/etc/systemd/journald.conf | 1、6、8、11、17 |
| 内核接口 |
/proc/meminfo、/proc/stat、/proc/loadavg、/proc/sys/fs/
、/proc/sys/net/、/sys/kernel/mm/* | 2、3、5、6、7、14 |
| 内核环形缓冲区 | dmesg — 可能包含进程名称和 OOM 杀死详情 | 2、7、12 |
| Systemd 日志 | journalctl -k — 仅内核消息 | 2 |
| 日志目录 | /var/log/ 仅大小枚举(
不读取日志内容) | 11 |
| 进程和套接字表 | ps、ss -p — 暴露 PID、命令名称、套接字所有者 | 2、3、10、15 |
| 用户主目录 | /root/.cache/ms-playwright、/home/*/.cache/ms-playwright — 仅 Chromium 二进制文件搜索 | 16 |
| 出站网络探测 | DNS 解析测试(nslookup/dig/getent 到 github.com)、nameserver TCP/53 可达性、Chrome 无头启动测试(about:blank) | 8、16 |
| 写入操作 | 创建并立即移除 /tmp/.oc
writetest 以验证文件系统可写性 — 整个脚本中
唯一的写入操作 | 12 |
输出格式:运行诊断后,以严重级别排序列表(致命 > 严重 > 错误 > 警告 > 信息)报告发现。对于每个发现的问题,包括:
- - 问题名称(例如 OpenClaw.Memory.SystemMemoryCritical)
- 严重级别
- 观察值 vs 阈值
- 推荐的修复措施
1. 系统环境基线
收集操作系统上下文以供后续分析。
判断规则:
- - 将输出记录为 OpenClaw.System.EnvironmentBaseline(信息)— 无问题,仅上下文。
2. 内存与 OOM
检测影响此服务器上任何工作负载的低内存和过去的 OOM 杀死事件。
判断规则:
- - MemAvailable / MemTotal < 5% → OpenClaw.Memory.SystemMemoryCritical(严重)
- 修复措施:杀死不必要的进程、添加交换分区或增加实例内存
- - MemAvailable / MemTotal < 10% → OpenClaw.Memory.SystemMemoryLow(警告)
- 修复措施:密切监控;考虑扩容
- - MemTotal < 2 GB → OpenClaw.Memory.InsufficientTotalMemory(错误)
- 修复措施:生产工作负载建议 4 GB+ 内存
- - dmesg 包含 oom-killer → OpenClaw.Memory.OOMKillerEvent(警告)
- 修复措施:识别哪些进程被杀死;审查内存分配
3. CPU 与性能
资源争用导致响应缓慢;高 iowait 表示磁盘瓶颈。
判断规则:
- - 平均负载(1 分钟)> 2 倍 nproc → OpenClaw.CPU.SystemLoadHigh(警告)
- 修复措施:识别 CPU 消耗最高的进程;检查是否有失控进程
- - CPU 空闲 < 10%(即总利用率 > 90%)→ OpenClaw.CPU.SystemCPUExhausted(严重)
- 修复措施:识别顶级进程;检查日志洪泛或计算风暴
- - iowait > 30%(来自 /proc/stat)→ OpenClaw.CPU.HighIOWait(警告)
- 修复措施:检查磁盘 I/O — 可能是日志过多或磁盘密集型工作负载
4. 网络基础设施
基本网络配置、DNS、IPv6 和防火墙状态。
判断规则:
- - IPv6 已启用且服务绑定 :: 但上游仅解析为 IPv4 → OpenClaw.Network.IPv6Mismatch(警告)
- 修复措施:设置 NODE
OPTIONS=--dns-result-order=ipv4first 或 sysctl -w net.ipv6.conf.all.disableipv6=1
5. 磁盘与 inotify
磁盘空间耗尽和 inotify 限制导致 ENOSPC 错误。
判断规则:
- - 任何文件系统使用率 >= 95% → OpenClaw.Disk.FilesystemFull(严重)
- 修复措施:清理旧日志和数据;扩展分区或添加磁盘
- - 任何文件系统使用率 >= 80% → OpenClaw.Disk.FilesystemHighUsage(警告)
- 修复措施:监控;计划清理或扩容
- - maxuserwatches < 65536 → OpenClaw.Disk.InotifyWatchesTooLow(错误)
- 修复措施:echo fs.inotify.max
userwatches=524288 >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf
- - maxuserinstances < 256 → OpenClaw.Disk.InotifyInstancesTooLow(警告)
- 修复措施:echo fs.inotify.max
userinstances=512 >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.conf
6. 文件描述符与进程限制
低 ulimit 值在高负载下导致 too many open files(EMFILE)错误。
判断规则:
- - Shell ulimit -n < 4096 → OpenClaw.Limits.NofileTooLow(错误)
- 修复措施:在 /etc/security/limits.conf 中添加
soft nofile 65536 和 hard nofile 65536;重新登录
- - limits.conf nofile 值 > fs.nr_open → OpenClaw.Limits.NofileExceedsKernelMax(严重)
- 修复措施:首先增加 fs.nr
open:sysctl -w fs.nropen=1048576 并在 /etc/sysctl.d/ 中持久化
- - file-nr 已分配 / 最大值 > 80% → OpenClaw.Limits.SystemFileDescriptorsHigh(警告)
- 修复措施:识别持有大量 FD 的进程(ls /proc/*/fd 2>/dev/null | wc -l);必要时增加 fs.file-max
7. 内核与 Sysctl