hle-benchmark-evolverHLE基准进化器

Runs HLE-oriented benchmark reward ingestion and curriculum generation for capability-evolver. Use when the user asks to optimize Humanity's Last Exam score, ingest question-level benchmark results, prioritize easy-first queues, or request an immediate benchmark progress result.

作者: admin | 来源: ClawHub

HLE Benchmark Evolver

This skill operationalizes HLE score-driven evolution for OpenClaw.

When to Use

- User asks to improve HLE score (for example target >= 60%).
User provides question-level benchmark output and wants it converted to reward.
User wants easy-first curriculum queue and next-focus questions.
User asks for an immediate benchmark result snapshot.

Inputs

- Benchmark report JSON path (--report=/abs/path/report.json)
Optional benchmark id (cais/hle default)

Workflow

1. Validate the report JSON exists and is parseable.
Ingest report into capability-evolver benchmark reward state.
Generate curriculum signals:

- benchmark_* - curriculum_stage:* - focus_subject:* - focus_modality:* - question_focus:*

4. Return a compact result summary for this run.

Run

CODEBLOCK0

Full automatic loop (starts evolution cycle):

CODEBLOCK1

If your evaluator can be called from shell, let pipeline generate the report each cycle:

CODEBLOCK2

If no --report is provided, it defaults to:

INLINECODE9

Output Contract

Always print JSON with these fields:

- INLINECODE10
INLINECODE11
INLINECODE12
INLINECODE13
INLINECODE14
INLINECODE15
INLINECODE16
INLINECODE17
INLINECODE18
INLINECODE19

Notes

- This skill handles reward/curriculum ingestion. It does not directly solve HLE questions.
INLINECODE20 links ingestion, evolve, and solidify into one executable loop.

HLE基准测试进化器

该技能实现了基于HLE分数的OpenClaw进化驱动。

使用场景

- 用户要求提升HLE分数（例如目标≥60%）
用户提供问题级别的基准测试输出并希望将其转换为奖励
用户想要简易优先的课程队列和下一重点问题
用户请求立即获取基准测试结果快照

输入

- 基准测试报告JSON路径（--report=/abs/path/report.json）
可选的基准测试ID（默认为cais/hle）

工作流程

1. 验证报告JSON存在且可解析
将报告导入capability-evolver基准测试奖励状态
生成课程信号：

- benchmark_* - curriculum_stage:* - focus_subject:* - focus_modality:* - question_focus:*

4. 返回本次运行的紧凑结果摘要

运行

bash
node skills/hle-benchmark-evolver/runresult.js --report=/absolute/path/hlereport.json

完整自动循环（启动进化周期）：

bash
node skills/hle-benchmark-evolver/runpipeline.js --report=/absolute/path/hlereport.json --cycles=1

如果评估器可从shell调用，让管道在每个周期生成报告：

bash
node skills/hle-benchmark-evolver/run_pipeline.js \
--report=/absolute/path/hle_report.json \
--evalcmd=python /path/to/evalhle.py --out {{report}} \
--cycles=3 --interval_ms=2000

如果未提供--report，默认使用：

skills/capability-evolver/assets/gep/hle_report.template.json

输出约定

始终打印包含以下字段的JSON：

- benchmarkid
runid
accuracy
reward
trend
curriculumstage
queuesize
focussubjects
focusmodalities
next_questions

注意事项

- 该技能处理奖励/课程导入，不直接解决HLE问题
run_pipeline.js将导入、进化和固化连接成一个可执行循环

hle-benchmark-evolverHLE基准进化器