OpenClaw Optimizer

Aligned with: OpenClaw v2026.3.8 | Skill v1.19.0 | Updated: 2026-03-09 | CLI-first advisor

Optimize and troubleshoot OpenClaw workspaces: cost-aware routing, provider configuration, context discipline, lean automation, multi-agent architectures, and error resolution.

Reference files (load when needed):

- references/providers.md — all 40+ providers, custom provider schema, failover config
INLINECODE1 — full error reference, 7 failure categories, GitHub issue workarounds
INLINECODE2 — complete CLI command reference
INLINECODE3 — agent identity/personality audit checklist, file roles, walkthrough workflow

Version Awareness

This skill tracks OpenClaw releases via two mechanisms:

1. GitHub Actions — daily workflow checks for new releases, opens an issue on drift, auto-closes when resolved
Runtime check — lightweight cached version comparison at session start

Runtime Check (once per session)

CODEBLOCK0

- CURRENT → note the version and proceed.
STALE → inform the user: "OpenClaw v<new> is available (skill is at v<current>). Run update-skill.sh to review what changed."
UNCHECKED → note "Version check unavailable (offline)" and proceed.

Update Workflow (user-initiated, never automatic)

CODEBLOCK1

Updates are deliberate — this skill never auto-modifies its own content or pushes to git without explicit user action.

Quick Start (copy/paste prompts)

Full audit (safe, no changes):

Audit my OpenClaw setup for cost, reliability, and context bloat. Prioritized plan with rollback. Do NOT apply changes.

Troubleshoot a specific problem:

[Describe your symptom or paste the error message]. Diagnose it and give me the exact fix.

Add or configure a provider:

Add [provider name] as a model provider. Walk me through the CLI steps and show me the exact config before applying.

Model routing optimization:

Propose a tiered routing plan: cheap for heartbeats/cron, mid for daily tasks, premium for coding/reasoning. Exact config + rollback. Do NOT apply.

Silent cron job:

Create a cron job that runs [task] every [interval]. Isolated session, NO_REPLY on nothing-to-do. Show me the command first.

Audit agent personality & identity:

Audit my agent's personality and identity files. Check for conflicts, bloat, and bad practices. Walk me through improvements.

Safety Contract (non-negotiable)

- This skill is advisory by default — not an autonomous control-plane.
Never mutate config (config.apply, config.patch), cron jobs, or persistent settings without explicit user approval.
Before any approved change: show (1) exact CLI command or config patch, (2) expected impact, (3) rollback command.
If an optimization reduces monitoring coverage, present Options A/B/C and require the user to choose.

Backup Strategy

Four backup layers exist — don't stack manual backups on top unnecessarily:

Layer	What	Retention	When It's Enough
CLI rolling `.bak`	Auto-created on every `config set`, `models set`, INLINECODE15	Rolling (overwritten each write)	Single-command undo
Nightly GitHub backup

Rule: For routine CLI changes (model swaps, cron edits, config sets), do NOT create manual backups. The CLI .bak + nightly GitHub backup are sufficient. Only create a manual backup when: (1) upgrading OpenClaw versions, (2) editing multiple config files simultaneously (identity audits), or (3) editing JSON directly without the CLI. For upgrades, prefer openclaw backup create over manual copies.

1. Model Providers

40+ providers supported. For full docs (auth commands, config schemas, all model names, custom provider setup): read references/providers.md

Quick lookup — slug, auth env, primary model format:

Provider	Slug	Auth Env	Model Format
Anthropic	INLINECODE22	INLINECODE23	INLINECODE24
OpenAI (API key)

WARNING — Provider Bans (Mar 2026):
Google: Actively cracking down on Gemini CLI OAuth and AntiGravity access through third-party tools. Accounts are being banned or rate-limited without warning or refunds. Use API key auth (google provider) instead of OAuth (google-gemini-cli / google-antigravity). Production API keys: 150-300 RPM, no ban risk. See GitHub Issue #14203.
Anthropic: Has banned users linking flat-rate Claude Code subscription tokens to OpenClaw. Using Claude Code OAuth tokens directly in OpenClaw may trigger account suspension. However, using Claude Code through the Agent SDK / ACP dispatch (where OpenClaw spawns Claude Code as a sub-agent via the ACP protocol) is the supported pattern and should not cause issues — this is how OpenClaw's built-in acp integration works.
General: Always prefer pay-per-token API keys over subscription OAuth for third-party tool integrations. Subscription-based OAuth through third-party tools violates most providers' ToS except OpenAI, which explicitly permits Codex OAuth in third-party tools.

Add a provider (API key):
CODEBLOCK2

Add a provider (OAuth / subscription):
CODEBLOCK3

OAuth Providers (Subscription-Based Access)

Some providers offer OAuth authentication tied to a consumer subscription (e.g., ChatGPT Plus/Pro) instead of — or in addition to — a pay-per-token API key. OpenClaw supports these via device-flow OAuth.

Currently supported OAuth providers:

Provider	Slug	Subscription Required	Top Models
OpenAI Codex	INLINECODE90	ChatGPT Plus ($20/mo) or Pro ($200/mo)	INLINECODE91, `gpt-5.3-codex`, INLINECODE93
GitHub Copilot

github-copilot | Copilot subscription | github-copilot/gpt-4o |

OpenAI Codex setup (full walkthrough):

CODEBLOCK4

Headless / SSH gateway: The OAuth flow prints a URL. Open it in any browser (doesn't need to be on the gateway machine), complete sign-in, then paste the redirect URL back into the SSH terminal. Alternatively, complete OAuth on a machine with a browser and copy ~/.openclaw/credentials/oauth.json to the gateway.

Available Codex models:

Model	Plan	Notes
INLINECODE97	Plus, Pro, Business	Latest (Mar 2026), 1,050,000-token context, 128K max tokens
INLINECODE98

Usage limits (per 5-hour window):

- Plus ($20/mo): 30–150 messages
Pro ($200/mo): 300–1,500 messages
Extra credits purchasable when limits are hit

Gotchas:

- No embeddings. Codex OAuth does NOT grant access to OpenAI embeddings. You still need a separate OPENAI_API_KEY for text-embedding-3-small etc.
Token refresh is automatic — active sessions continue without re-login. Credentials stored in ~/.openclaw/credentials/oauth.json.
Don't use both Codex CLI and OpenClaw simultaneously — some providers invalidate older refresh tokens when a new one is issued. Logging in via one tool can log you out of the other.
"Model not supported" errors — some users report this with gpt-5.3-codex on certain accounts. Fall back to gpt-5.2-codex if this happens.
Dual-config registration required (Issue #13189): The built-in catalog uses wrong API type (openai-completions) for gpt-5.3-codex. Must register manually in both models.json (API type: openai-codex-responses) AND openclaw.json (API type: openai-responses — openai-codex-responses is only valid in models.json per schema). v2026.2.26 includes a schema fix — verify with openclaw models status --probe after upgrade.
Community context (Feb 2026): After Anthropic and Google updated their ToS to block subscription-based OAuth in third-party tools, the OpenClaw community migrated heavily to openai-codex. OpenAI explicitly permits Codex OAuth in third-party tools, though fair-use limits still apply.

Provider Removal Checklist

Removing a provider requires cleaning 6 locations — config unset alone is not enough:

1. models.providers.<slug> in openclaw.json — INLINECODE117
INLINECODE118 in openclaw.json — must edit JSON directly (colons in keys break config unset)
INLINECODE120 dict in ~/.openclaw/agents/main/agent/auth-profiles.json — edit with python3/jq
INLINECODE122 aliases in openclaw.json — openclaw config unset each alias
INLINECODE124 in openclaw.json — INLINECODE125
INLINECODE126 and usageStats.<slug>:* in auth-profiles.json — edit directly

For providers with LaunchAgent env vars (Ollama, etc.), also clean:

7. launchctl unsetenv <KEY> — session-level env persists independently of plist
PlistBuddy delete from INLINECODE129
INLINECODE130 + launchctl bootstrap to pick up the clean plist (kickstart alone doesn't reload env from plist)

Known CLI limitation: openclaw config unset cannot handle colons in config keys (e.g., auth.profiles.google-gemini-cli:email@gmail.com). The parser treats colons as path separators. Edit the JSON file directly for these entries.

Ollama for memory embeddings (v2026.3.2+):

openclaw config set memorySearch.provider ollama
openclaw config set memorySearch.fallback ollama

Runs memory search embeddings locally — no external API calls. Honors models.providers.ollama settings.

Custom OpenAI-compatible provider (LM Studio, LiteLLM, etc.): See references/providers.md

2. Model Routing Strategy

Tiered Routing (50–95% cost reduction)

Tier	Models	Use Cases
T1 Cheap	INLINECODE136, `google/gemini-3-flash-preview`, `google/gemini-3.1-flash-lite-preview`, INLINECODE139	Heartbeats, simple checks, greetings, cron
T2 Mid

Model preference by task:

Task	Model	Why
Heartbeats / cron	INLINECODE147	Cheapest; reliable structured output
Calendar / scheduling

Key rules:

- Never switch models mid-conversation — destroys Anthropic prompt cache
Use anthropic direct (not through proxies) to preserve caching for Opus/Sonnet
Switch only at session boundaries (/new)

Built-in Model Aliases (v2026.3.7+)

Alias	Resolves To
INLINECODE159	INLINECODE160
INLINECODE161

Thinking Levels (v2026.3.1+)

Level	Behavior	Best For
INLINECODE173	No extended thinking	Simple queries, heartbeats
INLINECODE174

CODEBLOCK6

In-chat: /think low · /think adaptive · INLINECODE182

Per-Agent Config

CODEBLOCK7

CODEBLOCK8

In-chat model switch (no restart): /model list → INLINECODE184

Session Pruning (v2026.3.1+)

Automatically trims stale tool results from conversation history to preserve cache and reclaim context:

CODEBLOCK9

Anthropic smart defaults auto-enable cache-ttl pruning when using API key auth with heartbeat enabled.

3. Context Management

What burns tokens: System prompt (5–10K tokens/call) + bootstrap files + conversation history. Bootstrap files injected on every turn (source: docs.openclaw.ai/concepts/system-prompt): AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md (first-run only), plus MEMORY.md and/or memory.md when present. Daily memory/*.md files are NOT auto-injected (on-demand via memory tools). Bootstrap cap: 150K chars total, 20K per file (both configurable).

MEMORY.md warning (from docs): "Keep them concise — especially MEMORY.md, which can grow over time and lead to unexpectedly high context usage and more frequent compaction." MEMORY.md is the most common source of bootstrap bloat. Unlike AGENTS.md or SOUL.md which users actively edit, MEMORY.md tends to grow unchecked as the agent appends to it.

Check context: /status · /context list · /context detail · /usage tokens · INLINECODE201

Prompt Modes

Mode	Bootstrap Files Loaded	Use Case
INLINECODE202 (default)	All — AGENTS, SOUL, TOOLS, IDENTITY, USER, HEARTBEAT, MEMORY	Main interactive sessions
INLINECODE203 (sub-agents)

Light Bootstrap (v2026.3.1+)

Skip all workspace bootstrap files for automated runs:

CODEBLOCK10

CODEBLOCK11

Massive token savings for heartbeats and cron — eliminates 5-10K tokens/call of bootstrap overhead.

Bootstrap Truncation Warning (v2026.3.7+)

CODEBLOCK12

When a bootstrap file exceeds bootstrapMaxChars (default 20K), the agent receives a warning. Set to always during identity audits to catch truncated files.

Compaction Config

CODEBLOCK13

CODEBLOCK14

Known bug — memory flush threshold gap (Issue #25880): Set reserveTokensFloor equal to reserveTokens (both 62500) to fix compaction firing before flush completes.

Known bug — compaction timeout (Issue #38233): Both /compact and auto compaction can timeout at ~300s with openai-codex/gpt-5.3-codex, freezing the session. Fix: override compaction model to google/gemini-3-flash-preview with thinking: "off". Tune: maxHistoryShare: 0.6, reserveTokensFloor: 40000, maxAttempts: 3.

Context Engine Plugin (v2026.3.7+)

Replace the built-in context assembly pipeline with a custom plugin:

CODEBLOCK15

Context Engine plugins get full lifecycle hooks: bootstrap, ingest, assemble, compact, afterTurn, prepareSubagentSpawn, onSubagentEnded. This enables alternative context management strategies (lossless context, semantic chunking, etc.) without modifying OpenClaw core.

Bootstrap File Size Targets (optimization recommendations)

These are optimization targets for keeping context lean, not hard limits. All files are subject to bootstrapMaxChars (default 20K) and bootstrapTotalMaxChars (default 150K).

File	Target Size	Purpose	Injected?
INLINECODE226	< 1K tokens (~4K chars)	Personality + absolute constraints	Always (main + full prompt mode)
INLINECODE227

Critical: MEMORY.md is auto-injected on every turn in main sessions, NOT loaded on-demand. It burns tokens continuously. Keep it as small as possible with only curated facts. Operational protocols belong in AGENTS.md. Tool notes belong in TOOLS.md.

Bootstrap Content Placement (What Goes Where)

Users commonly dump all content into SOUL.md because it feels like "who the agent is." This bloats the file (burns tokens every turn) and confuses lighter models that can't prioritize across a noisy instruction set. Place content in the correct file:

Content Type	Correct File	Common Mistake
Personality, voice, humor, constraints	SOUL.md	-
Protocols, workflows, checklists, operational rules

Cross-file duplication burns tokens silently. If the same protocol appears in both SOUL.md and AGENTS.md, it's injected twice on every turn. Deduplicate aggressively — pick one canonical location.

Stale model references are silent saboteurs. When you change models via CLI (openclaw models set), update any AGENTS.md sections that reference specific model names (e.g., Model Selection, Sub-Agent defaults). The agent follows bootstrap instructions and may try to use models that are no longer configured.

Persistence stack: SOUL.md → AGENTS.md → TOOLS.md → IDENTITY.md → USER.md → MEMORY.md (all auto-injected in main sessions) → memory/YYYY-MM-DD.md (on-demand via memory tools) → conversation-state.md → INLINECODE243

Session Maintenance

CODEBLOCK16

4. Cron & Automation

Cron Job Schema (key fields)

CODEBLOCK17

sessionTarget: "isolated" (recommended — fresh session) | "main" (injects as systemEvent)
payload.kind: "agentTurn" (isolated) | "systemEvent" (main session)
delivery.mode: "announce" | "webhook" | "none"
lightContext: true skips all workspace bootstrap files — massive token savings for automated runs (v2026.3.1+)

CLI

CODEBLOCK18

Cron Defer-While-Active (v2026.3.7+)

Skip main-session cron jobs when the user is actively chatting:

CODEBLOCK19

Prevents cron jobs from interrupting active conversations. Only affects sessionTarget: "main" jobs; isolated jobs always run.

Cron Restart Staggering (v2026.3.8+)

On gateway startup, missed cron jobs are staggered to prevent gateway starvation. Top-of-hour cron expressions get up to 5 minutes of deterministic stagger. Use --exact or schedule.staggerMs: 0 to disable.

Silent Patterns

NO_REPLY — agent outputs this literal string when nothing to report; system suppresses delivery entirely.
HEARTBEAT_OK — heartbeat token; reply ≤300 chars after stripping it → silently dropped.

CODEBLOCK20

v2026.2.25 BREAKING: The heartbeat DM toggle was replaced with directPolicy. Default is now allow. If you had DMs blocked in v2026.2.24, explicitly set agents.defaults.heartbeat.directPolicy: "block" (or per-agent via agents.list[].heartbeat.directPolicy).

Cost trap: 5-minute heartbeat loading full MEMORY.md = ~2.9M tokens/day. Keep heartbeat context minimal — use lightContext: true or extend intervals.

Redundant cron jobs: The built-in openclaw memory indexes sessions natively. Custom session archiver cron jobs that convert .jsonl to markdown for a separate RAG database are likely redundant. Check whether any cron job feeds a custom system that duplicates built-in functionality before assuming it's needed.

Known bugs: Cron current-day skip (Issue #25902) — restart the gateway with launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway to recompute (do NOT use openclaw gateway restart — it causes duplicate processes; see Section 10). Cron announce → Telegram failure (Issue #25906) — switch to directMessage mode.

v2026.2.25 fixes: Cron model override failures now auto-recover — if an isolated job's payload.model is no longer allowlisted, it gracefully falls back to the default model instead of failing the job. Cron announce duplicate sends are also fixed (duplicate guard tracks attempted vs confirmed delivery). Multi-account cron routing now properly honors delivery.accountId.

5. Skills & Plugins

metadata.openclaw.requires — gates skill visibility:
CODEBLOCK21

disable-model-invocation: true — removes skill from model's tool list; user can still invoke manually. Use for high-impact or security-sensitive skills.

Skills directory: ~/.openclaw/workspace/skills/ — this is the filesystem path where all skills are stored. Each skill lives in its own subdirectory (e.g., ~/.openclaw/workspace/skills/my-skill/SKILL.md). When manually installing or copying skills, always use this path — not ~/.openclaw/skills/.

ClawHub:
CODEBLOCK22

Security: Before installing any skill, read its SKILL.md manually. Community scans found 341+ malicious skills (reverse shells, credential exfiltration, Atomic Stealer, crypto miners). New accounts with popular skills = red flag. The #1 most-downloaded ClawHub skill was confirmed malware.

Session watcher: Skills snapshot at session start. If skills.load.watch is disabled, start a new session after installing.

Plugin Slots (v2026.3.7+)

CODEBLOCK23

6. Multi-Agent & Sub-Agent Architecture

CODEBLOCK24

CODEBLOCK25

CODEBLOCK26

Community pattern: Orchestrator (opus-4-6) → Code sub-agent (sonnet-4-5) → Research sub-agent (kimi-k2.5) → Cron/monitoring (zai/glm-5, isolated)

Community insight — single agent with skills beats multiple agents for most use cases. Multiple agent instances multiply context costs (each agent loads its own bootstrap). Use one agent with good skills instead, and only split into multiple agents when you need genuinely different identity/personality/permissions (e.g., a public-facing agent vs an ops agent).

Sandbox isolation:
CODEBLOCK27

ACP Dispatch (v2026.3.2+)

Agent Client Protocol enables OpenClaw to spawn external coding harnesses (Claude Code, Codex CLI, Gemini CLI, OpenCode) as sub-agents:

CODEBLOCK28

In-chat: /acp spawn · /acp status · /acp steer <message> · /acp close

7. High-ROI Optimization Levers

Lever	Impact	How
Tiered model routing	50–95% cost reduction	T1 for cron/heartbeat, T4 only for orchestration
Prompt caching

8. CLI Reference

Best practice (v2026.2.25+): Before editing config or asking config-field questions, have the agent call the config.schema tool in-chat. This returns the current schema with valid keys, types, and defaults — avoids guessing or using stale field names. Note: this is an agent in-chat tool, NOT a CLI command.

Most common commands:
CODEBLOCK29

openclaw onboard --reset scope change (v2026.2.26): Default reset scope is now config+creds+sessions. Workspace deletion (bootstrap files, skills, memory) now requires --reset-scope full. Do NOT run openclaw onboard --reset without specifying --reset-scope explicitly — the default no longer wipes the workspace.

In-Chat Commands (v2026.3.x)

CODEBLOCK30

Environment Variables (v2026.3.x)

CODEBLOCK31

Gateway restart (macOS LaunchAgent):
CODEBLOCK32

Full CLI reference (all commands, flags, in-chat commands): Read references/cli-reference.md

9. Ops Hygiene Checklist

Daily:

- openclaw health --json via cron (→ HEARTBEAT_OK if clean)
INLINECODE308 to verify ClawHub auth
Token budget check (cost-sensitive providers)

Weekly:

- openclaw update --dry-run → review → INLINECODE310
INLINECODE311 → review → INLINECODE312
Curate MEMORY.md — archive old daily logs, promote key insights
INLINECODE313 → INLINECODE314
INLINECODE315 — check for errors
Clean stale backup files: find ~/.openclaw -name "*.bak.*" -mtime +7 -not -name "*.bak" | xargs rm -v (preserves CLI's rolling .bak files, removes old named/dated backups)

Quarterly:

- Review custom scripts (scripts/) for redundancy with built-in OpenClaw features. Users often build custom solutions (RAG pipelines, session archivers, memory indexers) that become redundant when OpenClaw adds equivalent built-in functionality. Check whether each script and its associated cron job still serves a purpose that the platform doesn't already handle.

Before/After Updates:

- Before update: openclaw backup create (pre-change safety net — v2026.3.8+)
After update: openclaw doctor --fix (handles config migrations automatically)
After update: openclaw config validate --json (catch fail-closed config errors — v2026.3.2+)
v2026.2.23 breaking change: allowPrivateNetwork → dangerouslyAllowPrivateNetwork — auto-fixed by doctor
Manual backup only needed for major upgrades or multi-file restructuring (see Backup Strategy above)

v2026.3.x Breaking Changes:

- gateway.auth.mode required (v2026.3.7): When both gateway.auth.token AND gateway.auth.password are configured, you must set gateway.auth.mode to "token" or "password". Gateway will not start without this.
tools.profile defaults to "messaging" (v2026.3.2): New installs no longer start with coding/system tools. Existing installs are unaffected.
ACP dispatch defaults to enabled (v2026.3.2): Set acp.dispatch.enabled: false to disable.
Config fail-closed (v2026.3.2+): Invalid configs cause gateway startup failure instead of silently falling back to permissive defaults.
Node.js v22.12+ enforced: Attempting to run on Node 18/20 causes immediate failure.

On Every System Assessment (mandatory data collection):

- openclaw cron list + read ~/.openclaw/cron/jobs.json — capture full cron inventory: job IDs, names, schedules, model overrides (from payload.model), status, last run times
Flag any jobs in error state — these are active problems
Flag jobs with stale last-run times (>24h for daily jobs) — may indicate silent failures
Check timezone consistency — jobs using (exact) instead of named timezones may fire at wrong times
Record whether jobs use isolated or main session target
Map cron schedule to day/night distribution — heavy jobs should cluster overnight
Document all findings in the system profile's ## Cron Jobs section before making recommendations
Without this data, recommendations will duplicate existing automation and waste time

Security:

- openclaw config get gateway.bind → must be INLINECODE342
No public port exposure — use Tailscale for remote
API keys not in skill files or version control
Audit ClawHub skills before installing — 341+ malicious skills confirmed
CVE-2026-25253 (ClawJacked): WebSocket authentication bypass allowing one-click RCE. 42,000+ exposed instances. Patched in v2026.1.29+. Verify you are on v2026.2.26+ minimum.
INLINECODE343 for live Gateway probe

10. Troubleshooting

Log file paths (macOS):

- Error log: ~/.openclaw/logs/gateway.err.log — primary source for errors, 502s, plugin failures, tool errors
Main log: /tmp/openclaw/openclaw-YYYY-MM-DD.log — verbose debug output (lane events, session activity)

Always check gateway.err.log first when troubleshooting — it contains only errors and warnings, making root cause identification much faster than grepping the main log.

First — always run this triage sequence:
CODEBLOCK33

Quick fix by symptom:

Symptom	First Command	Most Likely Fix
No response from agent	INLINECODE347	Gateway not running or pairing pending
Gateway won't start

openclaw logs --follow | EADDRINUSE or gateway.mode not set to local |
| "Port already in use" loop | ps aux \| grep openclaw-gateway | Duplicate processes from CLI restart vs LaunchAgent KeepAlive. Fix: launchctl bootout → kill orphans → launchctl bootstrap (see Section 8) |
| "Gateway start blocked: set gateway.auth.mode" | openclaw config get gateway.auth | Both token and password set but gateway.auth.mode missing. Fix: openclaw config set gateway.auth.mode token (v2026.3.7 breaking change) |
| "unauthorized" on Control UI | launchctl getenv OPENCLAW_GATEWAY_TOKEN | Remove stale launchctl env override |
| Config file wiped on restart | Back up config first | Known bug #40410 — gateway restart can wipe openclaw.json. Use openclaw backup create before restarts. |
| Cron job never fires | openclaw cron status | Cron disabled or timezone mismatch |
| Heartbeat always skipped | openclaw config get agents.defaults.heartbeat.activeHours | Wrong timezone, outside active hours, or directPolicy set to block (v2026.2.25 changed default to allow) |
| Cron job fails with "model not allowlisted" | openclaw cron status | v2026.2.25+ auto-recovers by falling back to default model. On older versions: update payload.model in the job or re-add the model to the allowlist. |
| Channel message dropped | openclaw logs --follow | Mention required or sender not paired |
| "RPC probe: failed" | openclaw gateway status --deep | Auth token mismatch or port conflict |
| Post-upgrade breakage | openclaw doctor --fix | Automatic config migration |
| Provider 401 errors | openclaw models status --probe | Token expired or wrong key type |
| Chrome browser won't start (Linux) | openclaw browser status | Snap Chromium conflict → install Google Chrome .deb |
| Silent tool execution failure | Check model | Known bug #40069 — agent claims tool use but no calls made. Confirmed with kimi-coding/k2p5. Switch model. |
| Compaction freezes session | Override compaction model | Known bug #38233 — /compact times out at ~300s with Codex models. Use compaction.model: google/gemini-3-flash-preview |
| Ollama stuck "typing" forever | Switch to non-Ollama model | Known bug #40434 — local Ollama models stuck via Telegram |
| Fallback doesn't escalate on outage | Test fallback chain | Known bug #32533 — retries auth profiles instead of escalating to fallback providers |
| ALL providers timeout simultaneously | grep "delivery-recovery" gateway.err.log | Not a provider issue. Two common causes: (A) Context bloat — contextTokens unset (unlimited), payload too large for any provider to process within timeoutSeconds. Fix: set contextTokens: 100000, timeoutSeconds: 180, reserveTokensFloor: 32000. See Section 10d. (B) Event loop overload — stuck delivery-queue, skills-remote probes, Gemini OAuth cycling, too many concurrent sessions. Fix: clear delivery queue, set cron.maxConcurrentRuns: 1. See Section 10b. |
| Delivery recovery loop ("21 entries deferred") | ls ~/.openclaw/delivery-queue/ | Stuck entries (wrong channel, message too long) retry forever on every restart. Move to ~/.openclaw/delivery-queue/failed/ to stop the loop. |
| Ollama "fetch failed" (instant, ~100ms) | Check gateway err log for Failed to discover Ollama models | Known bug: OpenClaw hardcodes 127.0.0.1:11434 for Ollama discovery (Issue #8663). On macOS, LaunchAgent processes are sandboxed and can't reach private LAN IPs like 192.168.x.x (Issue #21494). Fix: reverse SSH tunnel from Ollama machine to gateway (ssh -fN -R 127.0.0.1:11434:127.0.0.1:11434 user@gateway), set baseUrl to http://127.0.0.1:11434, add OLLAMA_HOST and OLLAMA_API_KEY to LaunchAgent env. See Section 10a below. |
| Ollama "Connection error" | Same as above | Same root cause. Switching api from ollama to openai-completions changes the error message but doesn't fix it — the sandbox blocks all LAN connections. |
| Ollama probes spike memory | curl http://host:11434/api/ps | Set OLLAMA_KEEP_ALIVE=0 on the Ollama machine so models unload immediately after probes. No OpenClaw config to disable probes per-provider. |
| Gemini CLI "API rate limit reached" | openclaw logs \| grep rate | Google OAuth crackdown (Feb 2026). Switch to API key auth. See Section 1 warning. |
| Provider removal didn't stop probes | Check all 6 locations in Provider Removal Checklist | Stale auth-profiles.json, launchctl env, or plist env vars. See Section 1. |
| config unset fails on auth profile keys | Edit JSON directly | Colons in keys break the config path parser. Use python3/jq. |
| models status --probe mass timeouts | Test individual providers with curl | Probe contention — 16+ simultaneous targets saturate the event loop. Not real failures. |

10a. Remote Ollama on macOS (Known Bug Workaround)

Problem: OpenClaw on macOS cannot connect to a remote Ollama server on the LAN. curl works, but the gateway process fails with "fetch failed" or "Connection error." This affects all API modes (ollama and openai-completions).

Root causes (two bugs stacking):

1. Hardcoded localhost discovery (Issue #8663): OpenClaw always probes 127.0.0.1:11434 for Ollama, ignoring baseUrl.
macOS LaunchAgent sandbox (Issue #21494): The gateway process running under launchd gets EHOSTUNREACH for private network IPs (192.168.x.x, 10.x.x.x).

Fix — reverse SSH tunnel:
CODEBLOCK34

Add to the gateway's LaunchAgent plist:
CODEBLOCK35

Important: Kill any local Ollama on the gateway first — it will conflict with the tunnel on port 11434. Make the tunnel persistent with a LaunchAgent on the Ollama machine.

10b. Multi-Provider Timeout Storms (Event Loop Overload)

Symptom: ALL providers (Kimi, KiloCode, Google, Anthropic, etc.) timeout simultaneously within a 30-90 minute window, even though they are independent services. FailoverError: LLM request timed out. on every model in the fallback chain. May cause gateway crash/restart.

Root cause: The gateway's Node.js event loop is saturated by a pile-up of concurrent operations. Outbound HTTPS responses arrive, but the process can't process them before its own timeout timer fires. The providers are NOT down — the gateway can't handle the responses.

Common overload contributors (check all of these):

1. Stuck delivery-recovery queue — Files in ~/.openclaw/delivery-queue/ that will never succeed (wrong channel, message too long) retry on every restart and periodically. Each retry burns event loop time.

- Diagnose: ls ~/.openclaw/delivery-queue/*.json | wc -l and grep "delivery-recovery" gateway.err.log | tail -20 - Fix: INLINECODE415

2. Skills-remote bin probes to offline nodes — Gateway probes paired nodes for skill binary requirements. If nodes don't have the node service running, each probe hangs until timeout.

- Diagnose: grep "skills-remote.*timed out" gateway.err.log | wc -l - Fix: Remove offline nodes from paired devices, or ensure nodes have the node service running.

3. Google Gemini CLI OAuth account cycling — If the agent switches to Gemini CLI in-session, it cycles through OAuth accounts. Each expired/slow account hangs for 90 seconds. 6 accounts = up to 540s of hung connections.

- Diagnose: grep "google-gemini-cli.*timed out" gateway.err.log | tail -10 - Fix: Ensure OAuth tokens are fresh, or use the google (API key) provider instead of google-gemini-cli for fallbacks.

4. No cron concurrency limit — Multiple cron jobs firing simultaneously all compete for the same event loop and hit the same provider chain, creating a thundering herd.

- Fix: INLINECODE420

5. Proxy providers as early fallbacks — KiloCode is a proxy. When it degrades, ALL models through it fail simultaneously (appears as multiple independent failures but is one SPOF). Put direct-API providers (Anthropic, Google API key) before proxies in the fallback chain.

Recovery: After fixing underlying causes, restart gateway: INLINECODE421

10c. `launchctl setenv` Persistence (macOS)

Problem: Removing an env var from the LaunchAgent plist does NOT remove it from the launchd session environment. The gateway process still sees the old value after a kickstart -k restart.

Root cause: launchctl setenv sets variables at the launchd domain level, independent of any plist. These persist until the user logs out or they are explicitly unset. kickstart -k re-reads the plist for ProgramArguments and EnvironmentVariables, but the domain-level env set by setenv takes precedence.

Fix:
CODEBLOCK36

Key lesson: Always clean both plist AND launchctl unsetenv when removing provider env vars. Use launchctl getenv <KEY> to verify removal. If the command returns output (even empty), the var is still set. "Not set" means launchctl getenv exits with an error.

10d. Context Bloat Cascade Timeouts

Symptom: ALL providers in the fallback chain timeout simultaneously on the same request. The same runId appears across multiple providers in 90-second intervals. Looks like a massive outage but providers are actually fine.

Pattern in logs:
CODEBLOCK37

Root cause: contextTokens is unset (defaults to unlimited). The main session accumulates conversation history until the payload is so large that no provider can respond within timeoutSeconds. Each provider in the fallback chain gets the same oversized payload, times out, and passes to the next one — creating a cascade that takes timeoutSeconds × number_of_providers to fully fail.

The deadly trio:

1. Unlimited contextTokens — payload grows unchecked
Short timeoutSeconds (e.g., 90) — not enough time for large payloads
Long fallback chain (4-5 providers) — each one gets a full timeout cycle before failing

Fix — recommended baseline for any mixed-provider fallback chain:
CODEBLOCK38

How this works together:

- contextTokens: 100000 — caps context so all providers can handle it
Compaction triggers at ~68K tokens (100K minus 32K reserve)
Memory flush runs first (if enabled), then compaction compresses history
INLINECODE439 — gives providers 3 minutes per attempt (vs 90s)
The cap ensures every provider in the chain can respond in time

Tradeoff: Models with large context windows (Gemini: 1M, GPT-5.4: 1.05M) are capped at 100K. This is intentional — the cap must match the weakest provider in the fallback chain. For dedicated large-context sessions, temporarily increase contextTokens.

Full troubleshooting reference (7 failure categories, per-channel error tables, node error codes, GitHub issue workarounds): Read references/troubleshooting.md

11. System Learning

This skill maintains system profiles — persistent knowledge files that capture everything learned about specific OpenClaw deployments. Each deployment gets a unique profile that grows over time, turning the skill into an expert on that particular system.

How It Works

Directory: ~/.openclaw-optimizer/systems/ — one profile per deployment, plus TEMPLATE.md for new deployments. This is a centralized location outside the skill directory so that: (1) system profiles are never accidentally pushed to git, (2) multiple AI tools (Claude Code, OpenClaw, Gemini CLI, etc.) on the same machine can read/write the same profiles without drift. Cross-machine sync is still manual via SCP.

Deployment ID: Each deployment has a unique slug (e.g., jbd-home, prod-cluster-east, dev-standalone).

Profile formats (two supported):

- Directory format (preferred): ~/.openclaw-optimizer/systems/<deployment-id>/ — directory containing INDEX.md (always-loaded summary, ~1-4K tokens) plus topic files loaded on-demand. Dramatically reduces session-start context cost.
Single-file format (legacy): ~/.openclaw-optimizer/systems/<deployment-id>.md — monolith file containing everything. Still supported for backwards compatibility.

Topology types:

Type	Description
INLINECODE450	Single gateway, no remote nodes
INLINECODE451

Session Workflow

First-run setup (once per machine):

1. Check if ~/.openclaw-optimizer/systems/ exists
If not: inform the user that this skill stores deployment profiles in ~/.openclaw-optimizer/systems/ (centralized, outside git, shared across AI tools), confirm they're OK with creating it, then: mkdir -p ~/.openclaw-optimizer/systems/ and copy TEMPLATE.md from the skill's systems/ directory into it
If the directory exists but is empty (no TEMPLATE.md): copy TEMPLATE.md from the skill's systems/ directory

At session start (identify the deployment):

1. Ask which deployment the user is working on, or identify it from context (SSH target, hostnames, IPs)
Check if ~/.openclaw-optimizer/systems/<deployment-id>/ directory exists
If directory found: read INDEX.md only (~1-4K tokens). Use the File Manifest table at the bottom to load topic files on-demand during the session — do NOT read all files upfront.
If directory NOT found but <deployment-id>.md file exists: read the monolith (legacy mode). Consider migrating to directory format.
If neither found: create a new profile from ~/.openclaw-optimizer/systems/TEMPLATE.md during the session

On any system assessment or audit (mandatory — run before making recommendations):

1. openclaw cron list — capture full cron inventory: job IDs, names, schedules, status, last run times
INLINECODE466 — capture model routing (primary + fallbacks)
INLINECODE467 — check for stuck delivery entries
INLINECODE468 — check paired nodes and connection status
Flag any cron jobs in error state — these are active problems
Flag jobs with stale last-run times (>24h for daily jobs) — may indicate silent failures
Check timezone consistency — jobs using (exact) instead of named timezones may fire at wrong times
Document ALL findings in the system profile before making recommendations
Without this data, recommendations will duplicate existing automation and miss hidden drains.

During the session (on-demand file loading):

- Reference INDEX.md for SSH access, IPs, routing, and cron status
When diagnosing any issue: read lessons.md FIRST (check if it's already solved), then the relevant topic file
When troubleshooting cron: read cron.md for full job IDs, schedules, and observations
When investigating providers/connectivity: read providers.md and/or INLINECODE474
When checking channels/Telegram: read channels.md for group API IDs and mapping
When reviewing history: read issues/YYYY-MM.md for the relevant month
Apply lessons learned to avoid repeating mistakes

At session end (update the profile):

For directory-based profiles:

1. Update the specific topic file(s) that changed (e.g., routing.md if fallbacks were reordered)
Update INDEX.md only if summary-level data changed (new provider added/removed, routing swap, cron status change, machine added/removed)
Add new issues to issues/YYYY-MM.md (current month file, newest first) with: symptom, root cause, fix, rollback, lesson
Add new lessons to lessons.md (permanent, never archived)
Update the Last updated date in INDEX.md
Sync only changed files to the gateway: INLINECODE482
Note: system profiles live in ~/.openclaw-optimizer/systems/, NOT in the skill directory. Do not commit them to git.

For legacy single-file profiles:

1. Add any new issues to the Issue Log (newest first) with: symptom, root cause, fix, rollback, lesson
Update Lessons Learned with new patterns discovered
Update machine details if anything changed (IPs, versions, config)
Update the Last updated date
Sync the profile to the gateway: INLINECODE485

What Gets Captured

Topic	File (directory format)	Purpose
Machines, Network, Paired Devices	INLINECODE486	Every machine: role, SSH, IPs, OS, paths, config. Tailnet, auth, connectivity. Device entries from paired.json.
Providers

Issue Lifecycle (directory format)

1. New issues go into issues/YYYY-MM.md (current month file, newest first)
After 14 days: full detail stays in the monthly file, a one-liner is added to INLINECODE495
Monthly files are never deleted — they're the permanent record
Lessons extracted from issues go to lessons.md (permanent, never archived)

Rules

- Never store full secrets in profiles — use first 12 chars + ... for tokens, never store full API keys
Always read the profile before troubleshooting — don't rediscover what's already known
Always update the profile after fixes — future sessions depend on accurate knowledge
One profile per gateway — nodes are documented within the gateway's profile
Keep lessons actionable — not "TLS was broken" but "macOS app rejects ws:// for remote gateways — always use wss://"
Rely on built-in backup layers — don't create manual backups for routine changes. OpenClaw's CLI creates rolling .bak files on every config write, and the nightly GitHub backup cron captures the full config in git history. Manual dated backups (cp <file> <file>.YYYY-MM-DD-<reason>) are only needed for: (1) major version upgrades, (2) multi-file restructuring (identity audits), (3) direct JSON edits where the CLI isn't used. For routine CLI changes (model swaps, cron edits, config sets), the CLI .bak + GitHub nightly are sufficient. Clean up old manual backups after they're covered by the nightly backup.

12. Continuous Improvement

This skill is a living document. Every troubleshooting session, every CLI interaction, and every failure is an opportunity to make it more accurate. Future sessions must actively update the skill based on real-world experience.

When to Update SKILL.md

Trigger	Action
A CLI command in the skill doesn't work as documented	Fix the command, add a note about what changed
A troubleshooting step is missing or incomplete

Add it to Section 10's symptom table | | A workaround is discovered

OpenClaw 优化器

对齐版本：OpenClaw v2026.3.8 | 技能 v1.19.0 | 更新日期：2026-03-09 | CLI 优先顾问

优化和排查 OpenClaw 工作区问题：成本感知路由、提供商配置、上下文规范、精益自动化、多智能体架构和错误解决。

参考文件（需要时加载）：

- references/providers.md — 所有 40+ 个提供商、自定义提供商架构、故障转移配置
references/troubleshooting.md — 完整错误参考、7 个故障类别、GitHub Issue 解决方案
references/cli-reference.md — 完整 CLI 命令参考
references/identity-optimizer.md — 智能体身份/个性审计清单、文件角色、操作流程

版本感知

本技能通过两种机制追踪 OpenClaw 版本：

1. GitHub Actions — 每日工作流检查新版本，发现差异时创建 Issue，解决后自动关闭
运行时检查 — 会话启动时进行轻量级缓存版本比较

运行时检查（每个会话一次）

bash
python3 ~/.claude/skills/openclaw-optimizer/scripts/version-check.py --status

- CURRENT → 记录版本并继续。
STALE → 通知用户：OpenClaw v 可用（技能版本为 v）。运行 update-skill.sh 查看变更内容。
UNCHECKED → 记录版本检查不可用（离线）并继续。

更新工作流（用户发起，从不自动）

bash

显示差异报告、变更日志和受影响的部分

bash ~/.claude/skills/openclaw-optimizer/scripts/update-skill.sh

更新 SKILL.md 和 references/ 中的内容后：

bash ~/.claude/skills/openclaw-optimizer/scripts/update-skill.sh --apply # 升级版本号 bash ~/.claude/skills/openclaw-optimizer/scripts/update-skill.sh --commit # 升级 + 提交 + 推送

更新是有意为之的——本技能从不会在没有明确用户操作的情况下自动修改自身内容或推送到 git。

快速开始（复制/粘贴提示）

完整审计（安全，无变更）：

审计我的 OpenClaw 设置，关注成本、可靠性和上下文膨胀。提供带回滚的优先级计划。不要应用更改。

排查特定问题：

[描述你的症状或粘贴错误信息]。诊断问题并给出确切的修复方案。

添加或配置提供商：

添加 [提供商名称] 作为模型提供商。逐步指导我完成 CLI 步骤，并在应用前显示确切的配置。

模型路由优化：

提出一个分层路由计划：心跳/cron 用廉价模型，日常任务用中等模型，编码/推理用高级模型。提供确切配置 + 回滚方案。不要应用更改。

静默 cron 任务：

创建一个 cron 任务，每 [间隔] 运行 [任务]。使用隔离会话，无事可做时返回 NO_REPLY。先向我显示命令。

审计智能体个性与身份：

审计我的智能体的个性和身份文件。检查冲突、膨胀和不良实践。逐步指导我进行改进。

安全契约（不可协商）

- 本技能默认仅提供建议——不是自主控制面板。
绝不在未经用户明确批准的情况下修改配置（config.apply、config.patch）、cron 任务或持久设置。
在任何批准的更改之前：显示（1）确切的 CLI 命令或配置补丁，（2）预期影响，（3）回滚命令。
如果优化减少了监控覆盖范围，提供选项 A/B/C 并要求用户选择。

备份策略

存在四个备份层——不要不必要地叠加手动备份：

层	内容	保留策略	何时足够
CLI 滚动 .bak	每次 config set、models set、cron edit 时自动创建	滚动（每次写入覆盖）	单命令撤销
夜间 GitHub 备份

规则： 对于常规 CLI 更改（模型交换、cron 编辑、配置设置），不要创建手动备份。CLI .bak + 夜间 GitHub 备份就足够了。仅在以下情况下创建手动备份：（1）升级 OpenClaw 版本时，（2）同时编辑多个配置文件时（身份审计），（3）不使用 CLI 直接编辑 JSON 时。对于升级，优先使用 openclaw backup create 而不是手动复制。

1. 模型提供商

支持 40+ 个提供商。完整文档（认证命令、配置架构、所有模型名称、自定义提供商设置）：阅读 references/providers.md

快速查找——slug、认证环境变量、主要模型格式：

提供商	Slug	认证环境变量	模型格式
Anthropic	anthropic	ANTHROPICAPIKEY	anthropic/claude-opus-4-6
OpenAI（API 密钥）

警告——提供商封禁（2026 年 3 月）：
Google： 正在严厉打击通过第三方工具使用 Gemini CLI OAuth 和 AntiGravity 访问。账户正在被无警告或无退款地封禁或限速。使用 API 密钥认证（google 提供商）而不是 OAuth（google-gemini-cli / google-antigravity）。生产环境 API 密钥：150-300 RPM，无封禁风险。参见 GitHub Issue #14203。
Anthropic： 已封禁将固定费率的 Claude Code 订阅令牌链接到 OpenClaw 的用户。直接在 OpenClaw 中使用 Claude Code OAuth 令牌可能会触发账户暂停。但是，通过 Agent SDK / ACP 调度（OpenClaw 通过 ACP 协议生成 Claude Code 作为子智能体）使用 Claude Code 是受支持的模式，不应引起问题——这是 OpenClaw 内置 acp 集成的工作方式。
一般建议： 对于第三方工具集成，始终优先使用按令牌付费的 API 密钥，而不是订阅 OAuth。通过第三方工具使用基于订阅的 OAuth 违反了大多数提供商的服务条款，但 OpenAI 明确允许在第三方工具中使用 Codex OAuth。

openclaw-optimizer开爪优化器