HLE Benchmark Evolver
This skill operationalizes HLE score-driven evolution for OpenClaw.
When to Use
- - User asks to improve HLE score (for example target >= 60%).
- User provides question-level benchmark output and wants it converted to reward.
- User wants easy-first curriculum queue and next-focus questions.
- User asks for an immediate benchmark result snapshot.
Inputs
- - Benchmark report JSON path (
--report=/abs/path/report.json) - Optional benchmark id (
cais/hle default)
Workflow
- 1. Validate the report JSON exists and is parseable.
- Ingest report into
capability-evolver benchmark reward state. - Generate curriculum signals:
-
benchmark_*
-
curriculum_stage:*
-
focus_subject:*
-
focus_modality:*
-
question_focus:*
- 4. Return a compact result summary for this run.
Run
CODEBLOCK0
Full automatic loop (starts evolution cycle):
CODEBLOCK1
If your evaluator can be called from shell, let pipeline generate the report each cycle:
CODEBLOCK2
If no --report is provided, it defaults to:
INLINECODE9
Output Contract
Always print JSON with these fields:
- - INLINECODE10
- INLINECODE11
- INLINECODE12
- INLINECODE13
- INLINECODE14
- INLINECODE15
- INLINECODE16
- INLINECODE17
- INLINECODE18
- INLINECODE19
Notes
- - This skill handles reward/curriculum ingestion. It does not directly solve HLE questions.
- INLINECODE20 links ingestion, evolve, and solidify into one executable loop.
HLE基准测试进化器
该技能实现了基于HLE分数的OpenClaw进化驱动。
使用场景
- - 用户要求提升HLE分数(例如目标≥60%)
- 用户提供问题级别的基准测试输出并希望将其转换为奖励
- 用户想要简易优先的课程队列和下一重点问题
- 用户请求立即获取基准测试结果快照
输入
- - 基准测试报告JSON路径(--report=/abs/path/report.json)
- 可选的基准测试ID(默认为cais/hle)
工作流程
- 1. 验证报告JSON存在且可解析
- 将报告导入capability-evolver基准测试奖励状态
- 生成课程信号:
- benchmark_*
- curriculum_stage:*
- focus_subject:*
- focus_modality:*
- question_focus:*
- 4. 返回本次运行的紧凑结果摘要
运行
bash
node skills/hle-benchmark-evolver/runresult.js --report=/absolute/path/hlereport.json
完整自动循环(启动进化周期):
bash
node skills/hle-benchmark-evolver/runpipeline.js --report=/absolute/path/hlereport.json --cycles=1
如果评估器可从shell调用,让管道在每个周期生成报告:
bash
node skills/hle-benchmark-evolver/run_pipeline.js \
--report=/absolute/path/hle_report.json \
--evalcmd=python /path/to/evalhle.py --out {{report}} \
--cycles=3 --interval_ms=2000
如果未提供--report,默认使用:
skills/capability-evolver/assets/gep/hle_report.template.json
输出约定
始终打印包含以下字段的JSON:
- - benchmarkid
- runid
- accuracy
- reward
- trend
- curriculumstage
- queuesize
- focussubjects
- focusmodalities
- next_questions
注意事项
- - 该技能处理奖励/课程导入,不直接解决HLE问题
- run_pipeline.js将导入、进化和固化连接成一个可执行循环