Auto-Optimizer

Zero-friction optimization engine. Tell your OpenClaw agent "optimize my [thing]" and it runs autonomous improve-eval-keep/revert cycles until your metric goes up. Works on anything: copy, prompts, code, prediction strategies, configs.

Results in under 5 minutes. Zero configuration required for demos.

Quick Setup (AI Agent Instructions)

Follow these steps exactly. No judgment calls needed.

Step 1 — Install

CODEBLOCK0

Step 2 — Initialize a git repo (required)

The optimizer uses git to revert bad changes. Must be in a git repo.

CODEBLOCK1

Step 3 — Pick your mode

Scalar mode — you have a command that outputs a single number (test score, Brier score, conversion rate, word count, etc.)

Binary mode — you want to evaluate quality with yes/no criteria (copy quality, prompt effectiveness, UX clarity)

Step 4 — Run it

Scalar mode:
CODEBLOCK2

Binary mode:
CODEBLOCK3

Not sure? Use the wizard:

./skills/auto-optimizer/auto-optimizer.sh --wizard

Pre-Built Starter Packs

Three self-contained demos that run immediately. No files to create, no config needed.

Demo 1: Cold Outreach Optimizer

Optimizes a cold email template using a mock scoring metric (hook strength + clarity + CTA quality + length).

CODEBLOCK5

What it does:

- Creates a sample outreach template in INLINECODE0
Runs a mock metric that scores: hook ≤15 words, single CTA, body ≤120 words, value prop clarity
Iterates 5 times, keeps improvements, reverts regressions
Prints a final report showing baseline → best score

Sample outreach template used:
CODEBLOCK6

Mock metric logic (inline in demo):
CODEBLOCK7

Demo 2: Prediction Market Strategy

Runs the optimization loop on a prediction strategy file, scoring by mock accuracy.

CODEBLOCK8

What it does:

- Creates a sample prediction strategy in INLINECODE1
Runs a mock metric that scores: specificity of criteria, use of base rates, calibration language
Shows how the loop works on structured reasoning files

Demo 3: Prompt Quality Optimizer (Binary Mode)

Optimizes a system prompt using 5 yes/no quality criteria.

CODEBLOCK9

What it does:

- Creates a sample system prompt in INLINECODE2
Evaluates each iteration against 5 criteria (inline):

1. Does the prompt specify a clear role/persona?
2. Does it include explicit output format instructions?
3. Does it define what NOT to do?
4. Is it under 500 words?
5. Does it include at least one concrete example?

- Batch size 10: generates 10 outputs, scores each, calculates pass rate %
Keeps versions that increase the pass rate

Sample system prompt used:

You are a helpful assistant. Answer questions clearly and accurately.
Be concise but thorough. Help the user accomplish their goals.

Full Capability Guide

`--wizard` — Interactive Setup

Walks you through setup interactively. Best when you're not sure which mode to use.

CODEBLOCK11

Prompts you to choose:

1. Cold outreach / email copy → sets up binary mode with outreach evals
LLM prompt / system prompt → sets up binary mode with prompt quality evals
Prediction market strategy → sets up scalar mode with accuracy metric
Code / config file → sets up scalar mode, prompts for your test command
Custom → asks for your file and metric

`--eval-mode scalar` (default)

When to use: Anything with a measurable number. Test pass rate, Brier score, word count, latency, revenue, API response score.

Requirements: Your --metric command must print a single float to stdout.

CODEBLOCK12

How it works: Runs metric → agent proposes change → run metric again → if improved, commit; else git checkout to revert.

`--eval-mode binary`

When to use: Copy, prompts, UX, anything where quality is multi-dimensional and hard to reduce to one number.

Requirements: An evals file (markdown list of yes/no criteria) and a --batch-size (default 10).

CODEBLOCK13

How it works: For each iteration, generates batch-size outputs from the current file, scores each against all criteria, calculates overall pass % → agent proposes change → compare pass % → keep or revert.

`--budget N`

Number of optimization iterations to run. Each iteration = one agent call + one eval.

Budget	Time (approx)	Best for
5	~2 min	Quick demo, sanity check
10

~5 min | Initial optimization pass |
| 20 | ~10 min | Production runs |
| 50+ | ~30 min | Overnight deep optimization |

Minimum effective budget: 5 iterations. Below 5, not enough signal.

`--goal minimize` / `--goal maximize` (default: maximize)

CODEBLOCK14

`--session NAME`

Name your session for organized results. Results saved to ./skills/auto-optimizer/results/NAME/.

CODEBLOCK15

`--batch-size N` (binary mode only)

How many outputs to generate per iteration for scoring. Higher = more reliable signal, slower.

- 5 = fast, less reliable
10 = balanced (default)
20 = slow, high confidence

Hypothesis Memory

Every iteration is logged to results/SESSION/hypothesis_log.jsonl. The agent reads the last 5 entries before each iteration, so it never retries approaches that already failed.

This is what makes multi-iteration runs productive rather than random. The optimizer builds on what worked, avoids what didn't.

OpenClaw Integration

The natural way (just tell your agent)

CODEBLOCK16

Your OpenClaw agent reads this SKILL.md, picks the right mode, sets up the files, and runs the loop.

Direct invocation

CODEBLOCK17

Real Results (What to Expect)

From actual runs:

Prediction market strategy — 5 iterations:

- Brier score: 0.23642 → 0.23563 (↓ better)
ROI: +2.83% → +6.25%
What changed: added base rate anchoring, tightened confidence thresholds

Cold outreach template — 10 iterations:

- Binary pass rate: 60% → 85%
Version: v1.0 → v1.1
What changed: shorter hook (22 words → 11 words), single clear CTA, ICP-specific pain point added

System prompt — 20 iterations (binary):

- Pass rate: 40% → 92%
What changed: added explicit persona, output format section, negative constraints, inline example

Typical pattern:

- Iterations 1-3: big gains (low-hanging fruit)
Iterations 4-10: diminishing returns, more targeted changes
Iterations 10+: fine-tuning, marginal improvements

Troubleshooting

"Not a git repo"
CODEBLOCK18

"Metric command failed" or returns 0 always
Your metric command must print a single float to stdout. Test it standalone:

bash ./your-metric.sh
# Should output: 73.5

If it outputs anything else (multiline, text, nothing), wrap it:
CODEBLOCK20

"claude CLI not found"
Option A: Install claude CLI globally: npm install -g @anthropic-ai/claude-code
Option B: The script falls back to OpenClaw's claude-code skill automatically if skills/claude-code/claude-code.sh exists.

"ERROR: --evals is required for binary eval mode"
Binary mode needs an evals file with numbered criteria:
CODEBLOCK21

"Budget too low"
Minimum 5 iterations to see meaningful improvement. Use --budget 10 for first runs.

Results not improving after many iterations

- Check hypothesis log: INLINECODE20
The agent may be stuck in a local optimum — try a new session with INLINECODE21
Consider rewriting the evals/metric to be more discriminating

"program.md.template not found"

ls skills/auto-optimizer/
# Should show: auto-optimizer.sh, SKILL.md, program.md.template, results/

If missing, reinstall: clawhub install auto-optimizer

MiroFish Integration

MiroFish is a swarm intelligence engine that runs thousands of AI agents to simulate outcomes and generate prediction reports. Combined with auto-optimizer, you can autonomously improve the "seed" inputs that drive MiroFish simulations.

What MiroFish does

- Takes a "seed" (market data, news, signals) as input
Runs multi-agent social simulation in a digital world (Twitter + Reddit environments)
Simulates distinct personas: RetailTrader, WhaleInvestor, AlgorithmicTrader, etc.
Outputs agent actions, discussions, and quantitative probability estimates

The combination

- Mutable asset: the seed file / prediction prompt sent to MiroFish
Scalar metric: confidence score OR prediction accuracy on known outcomes
Loop: auto-optimizer iterates on the seed to maximize simulation confidence

Proven optimization results (2026-03-26 run)
Iteration Seed Content Confidence Score Delta
Baseline Price + Fear/Greed only 0.35 —
Iter 1
+ Technical signals (TTM Squeeze, funding rates) | 0.58 | +0.23 ✅ |

Iteration	Seed Content	Confidence Score	Delta
Baseline	Price + Fear/Greed only	0.35	—
Iter 1

Key insight: Adding technical structure data (funding rates, squeeze, key levels) produces the biggest confidence boost. Whale on-chain context is the #2 improvement driver.

Setup

MiroFish must be running (backend on port 5001): CODEBLOCK23

Build a seed from live market data:
CODEBLOCK24

Run a simulation manually via the API:
CODEBLOCK25

Scoring function for auto-optimizer

Extract confidence from MiroFish agent actions: CODEBLOCK26

Use cases

- Crypto market predictions — BTC/ETH/SOL price direction (24-72h)
Prediction market research — Polymarket/Kalshi question research
Seed quality optimization — find which data signals drive the highest swarm confidence
Any "what if" scenario you want to simulate at scale with diverse AI personas

Files

- Seed builder: INLINECODE23
Results example: INLINECODE24
MiroFish API: http://localhost:5001 (Flask backend, port 5001)

Auto-Optimizer

零摩擦优化引擎。告诉你的OpenClaw智能体优化我的[某物]，它将自动运行改进-评估-保留/回退循环，直到你的指标上升。适用于任何内容：文案、提示词、代码、预测策略、配置。

5分钟内出结果。演示无需任何配置。

快速设置（AI智能体指令）

请严格按以下步骤操作。无需自行判断。

步骤1 — 安装

bash
clawhub install auto-optimizer

步骤2 — 初始化git仓库（必需）

优化器使用git来回滚不良更改。必须在git仓库中运行。

bash
cd your-project
git init && git add . && git commit -m baseline

步骤3 — 选择模式

标量模式 — 你有一个输出单一数值的命令（测试分数、Brier分数、转化率、字数等）

二元模式 — 你想用是/否标准评估质量（文案质量、提示词有效性、用户体验清晰度）

步骤4 — 运行

标量模式：
bash
./skills/auto-optimizer/auto-optimizer.sh \
--file ./your-file.md \
--metric bash ./your-metric.sh \
--budget 10

二元模式：
bash
./skills/auto-optimizer/auto-optimizer.sh \
--eval-mode binary \
--file ./your-file.md \
--evals ./your-evals.md \
--batch-size 10 \
--budget 10

不确定？使用向导：
bash
./skills/auto-optimizer/auto-optimizer.sh --wizard

预构建入门包

三个可立即运行的独立演示。无需创建文件，无需配置。

演示1：冷外联优化器

使用模拟评分指标（钩子强度+清晰度+行动号召质量+长度）优化冷邮件模板。

bash
./skills/auto-optimizer/auto-optimizer.sh --demo outreach --budget 5

功能说明：

- 在/tmp/demo-outreach/outreach.md创建示例外联模板
运行模拟指标，评分标准：钩子≤15词、单一行动号召、正文≤120词、价值主张清晰度
迭代5次，保留改进，回退降级
打印最终报告，显示基线→最佳分数

使用的示例外联模板：

主题：关于[公司]的快速问题

您好[姓名]，

我联系您是因为我一直关注[公司]的工作，认为我们可能有很好的合作机会。

我们帮助像贵公司这样的企业使用AI驱动的外联工具改进销售流程。我们的客户通常在第一个月内回复率提升3倍。

您是否愿意在下周安排15分钟通话，探讨这对[公司]是否有价值？

期待您的回复，
[您的姓名]

模拟指标逻辑（演示内联）：
bash

基于以下标准评分0-100：

- 钩子长度<=15词：+25分

- 单一行动号召（非多个请求）：+25分

- 正文<=120词：+25分

- 包含具体价值/数字：+25分

演示2：预测市场策略

在预测策略文件上运行优化循环，通过模拟准确性评分。

bash
./skills/auto-optimizer/auto-optimizer.sh --demo prediction --budget 5

功能说明：

- 在/tmp/demo-prediction/strategy.md创建示例预测策略
运行模拟指标，评分标准：标准的具体性、基准率的使用、校准语言
展示循环在结构化推理文件上的工作方式

演示3：提示词质量优化器（二元模式）

使用5个是/否质量标准优化系统提示词。

bash
./skills/auto-optimizer/auto-optimizer.sh --demo prompt --budget 5 --eval-mode binary

功能说明：

- 在/tmp/demo-prompt/system-prompt.md创建示例系统提示词
根据5个标准（内联）评估每次迭代：

1. 提示词是否指定了明确的角色/人设？
2. 是否包含明确的输出格式指令？
3. 是否定义了不应做的事项？
4. 是否少于500词？
5. 是否包含至少一个具体示例？

- 批量大小10：生成10个输出，逐一评分，计算通过率%
保留提高通过率的版本

使用的示例系统提示词：

你是一个有用的助手。清晰准确地回答问题。
简洁但全面。帮助用户实现他们的目标。

完整功能指南

--wizard — 交互式设置

以交互方式引导你完成设置。最适合不确定使用哪种模式时。

bash
./skills/auto-optimizer/auto-optimizer.sh --wizard

提示你选择：

1. 冷外联/邮件文案 → 设置带外联评估的二元模式
大语言模型提示词/系统提示词 → 设置带提示词质量评估的二元模式
预测市场策略 → 设置带准确性指标的标量模式
代码/配置文件 → 设置标量模式，提示输入测试命令
自定义 → 询问你的文件和指标

--eval-mode scalar（默认）

使用场景： 任何有可测量数值的内容。测试通过率、Brier分数、字数、延迟、收入、API响应分数。

要求： 你的--metric命令必须向标准输出打印一个浮点数。

bash

示例指标命令：

--metric python test_score.py # 输出：0.847
--metric bash run_eval.sh | tail -1 # 输出：73.2
--metric node score.js # 输出：0.91

工作原理： 运行指标→智能体提出更改→再次运行指标→如果改进则提交；否则git checkout回退。

--eval-mode binary

使用场景： 文案、提示词、用户体验，任何质量是多维且难以简化为单一数值的内容。

要求： 一个评估文件（是/否标准的Markdown列表）和--batch-size（默认10）。

bash

示例evals.md：

1. 钩子是否少于15词？
是否恰好有一个行动号召？
是否提到具体结果或数字？
总长度是否少于150词？
是否针对特定痛点？

工作原理： 每次迭代，从当前文件生成batch-size个输出，根据所有标准逐一评分，计算总体通过率%→智能体提出更改→比较通过率%→保留或回退。

--budget N

要运行的优化迭代次数。每次迭代=一次智能体调用+一次评估。

预算	时间（约）	最佳用途
5	~2分钟	快速演示、合理性检查
10

~5分钟 | 初始优化轮次 |
| 20 | ~10分钟 | 生产运行 |
| 50+ | ~30分钟 | 通宵深度优化 |

最小有效预算： 5次迭代。低于5次，信号不足。

--goal minimize / --goal maximize（默认：maximize）

bash

最小化（例如：Brier分数、错误率、延迟）：

--goal minimize --metric python score_brier.py

最大化（默认——例如：准确性、通过率、收入）：

--metric python score_accuracy.py

--session NAME

为你的会话命名，以便组织结果。结果保存到./skills/auto-optimizer/results/NAME/。

bash
--session outreach-v2-$(date +%Y%m%d)

--batch-size N（仅二元模式）

每次迭代生成多少个输出用于评分。数值越大=信号越可靠，速度越慢。

- 5 = 快速，可靠性较低
10 = 平衡（默认）
20 = 慢速，高置信度

假设记忆

每次迭代记录到results/SESSION/hypothesis_log.jsonl。智能体在每次迭代前读取最后5条记录，因此绝不会重试已失败的方法。

这就是多次迭代运行高效而非随机的关键。优化器基于有效内容进行构建，避免无效内容。

OpenClaw集成

自然方式（直接告诉你的智能体）

在./outreach.md上运行auto-optimizer，优化回复率，20次迭代

使用二元评估模式优化我在./prompts/classifier.md的系统提示词

对我的预测策略启动通宵优化循环，最小化Brier分数，预算50

为我的冷外联模板设置auto-optimizer

你的OpenClaw智能体读取此SKILL.md，选择正确模式，设置文件，并运行循环。

直接调用

bash

auto-optimizer自动优化器

auto-optimizer

Auto-Optimizer

Quick Setup (AI Agent Instructions)

Step 1 — Install

Step 2 — Initialize a git repo (required)

Step 3 — Pick your mode

Step 4 — Run it

Pre-Built Starter Packs

Demo 1: Cold Outreach Optimizer

Demo 2: Prediction Market Strategy

Demo 3: Prompt Quality Optimizer (Binary Mode)

Full Capability Guide

--wizard — Interactive Setup

--eval-mode scalar (default)

--eval-mode binary

--budget N

--goal minimize / --goal maximize (default: maximize)

--session NAME

--batch-size N (binary mode only)

Hypothesis Memory

OpenClaw Integration

The natural way (just tell your agent)

Direct invocation

Real Results (What to Expect)

Troubleshooting

MiroFish Integration

What MiroFish does

The combination

Proven optimization results (2026-03-26 run)IterationSeed ContentConfidence ScoreDeltaBaselinePrice + Fear/Greed only0.35—Iter 1 + Technical signals (TTM Squeeze, funding rates) | 0.58 | +0.23 ✅ |

Setup

Scoring function for auto-optimizer

Use cases

Files

Auto-Optimizer

快速设置（AI智能体指令）

步骤1 — 安装

步骤2 — 初始化git仓库（必需）

步骤3 — 选择模式

步骤4 — 运行

预构建入门包

演示1：冷外联优化器

基于以下标准评分0-100：

- 钩子长度<=15词：+25分

- 单一行动号召（非多个请求）：+25分

- 正文<=120词：+25分

- 包含具体价值/数字：+25分

演示2：预测市场策略

演示3：提示词质量优化器（二元模式）

完整功能指南

--wizard — 交互式设置

--eval-mode scalar（默认）

示例指标命令：

--eval-mode binary

示例evals.md：

--budget N

--goal minimize / --goal maximize（默认：maximize）

最小化（例如：Brier分数、错误率、延迟）：

最大化（默认——例如：准确性、通过率、收入）：

--session NAME

--batch-size N（仅二元模式）

假设记忆

OpenClaw集成

自然方式（直接告诉你的智能体）

直接调用

外联优化（二元

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

`--wizard` — Interactive Setup

`--eval-mode scalar` (default)

`--eval-mode binary`

`--budget N`

`--goal minimize` / `--goal maximize` (default: maximize)

`--session NAME`

`--batch-size N` (binary mode only)

Proven optimization results (2026-03-26 run)
Iteration Seed Content Confidence Score Delta
Baseline Price + Fear/Greed only 0.35 —
Iter 1
+ Technical signals (TTM Squeeze, funding rates) | 0.58 | +0.23 ✅ |