Skill: langfuse-trace-logger
Purpose: Log subagent task completions as Langfuse traces for replay, evaluation, and cost analysis.
Scope: Called by Loki at the end of every session wrap (Phase 4) for each significant subagent completion.
Script: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py
⚠️ CRITICAL: Python Version
Always use ~/.chatterbox-venv/bin/python3 (Python 3.11.15)
The langfuse SDK uses pydantic v1, which is incompatible with Python 3.14. Running with system Python (python3) or pyenv Python (3.14.x) causes silent failure — no import error, no exception, trace just doesn't appear in Langfuse UI. This will waste 30+ minutes of debugging.
CODEBLOCK0
Basic Invocation
CODEBLOCK1
Trace Schema
| Field | Type | Purpose | Notes |
|---|
| INLINECODE3 | string | Subagent session key | Use actual subagent session key — enables lineage tracing |
| INLINECODE4 |
string | Parent session reference | Always
"agent:main" unless nested subagent |
|
--agent | string | Agent name | Lowercase: kit, archie, sara, finn, quill, etc. |
|
--task | string | Task label (kebab-case) | Used for replay grouping:
replay-judge.py --tag "task:kit-setup-rebuild" |
|
--model | string | Model used | e.g.
anthropic/claude-sonnet-4-6,
anthropic/claude-haiku-4-5 |
|
--status | string | Outcome |
completed /
partial /
failed |
|
--input | string | Full task prompt | First 4000 chars — this is what gets replayed against other models in judge runs |
|
--output | string | Result summary | Agent's output/result — this is what the judge scores |
|
--duration | int | Time in seconds | Used for efficiency analysis and agent routing decisions |
|
--tokens | int | Total tokens used | Used for cost analysis and budget governance |
|
--project | string | Project slug | Must match
projects/<slug>/STATUS.md — enables project-level filtering |
|
--skills | string | Comma-separated skills | e.g.
"product-tour-capture,ffmpeg-studio" — enables skill effectiveness filtering |
Tag Taxonomy
The logger automatically generates these tags from the fields above:
- -
agent:kit — from INLINECODE25 - INLINECODE26 — derived from INLINECODE27
- INLINECODE28 — from INLINECODE29
- INLINECODE30 — one tag per skill in INLINECODE31
- INLINECODE32 — from INLINECODE33
- INLINECODE34 — from INLINECODE35
These tags power the replay-judge filter syntax.
Backfill Pattern
For retroactive logging when a session wrap was skipped or traces are missing.
Idempotent: Uses deterministic trace IDs based on date+agent+task hash. Safe to re-run — won't create duplicates.
CODEBLOCK2
Data source: Backfill parses memory/YYYY-MM-DD.md files and extracts structured task outcome blocks. This is why the task outcome block format in memory files must be consistent — inconsistent format breaks parsing silently.
Backfill ID format: backfill-YYYY-MM-DD-<agent>-<task-slug> — deterministic, no duplicate risk.
Replay and Judge
CODEBLOCK3
Verify Traces Appeared
After logging, verify in Langfuse UI: http://localhost:3100
Or check programmatically:
CODEBLOCK4
Expected output: last 5 trace names + truncated IDs. If blank, Python version issue (see warning above).
Langfuse Connection Details
| Setting | Value |
|---|
| UI | http://localhost:3100 |
| Public key |
pk-lf-openclaw-local |
| Secret key |
op://OpenClaw/Langfuse (Local)/credential (1Password) |
| Also in 1Password |
op://OpenClaw/Langfuse (Local)/Secret Key |
| Docker | Always running (daemon service) |
When to Call This Skill
This skill is called during Phase 4 (Traces) of the session-wrap playbook (playbooks/session-wrap/PLAYBOOK.md).
Call once per significant subagent completion. Use data from the task outcome blocks written in Phase 1 (memory file). Don't reconstruct from memory — read what you just wrote.
Minimum threshold for logging: Any subagent run that produced a deliverable (file written, API called, analysis produced). Skip: simple lookups, 1-line tool calls, failed attempts with no output.
Troubleshooting
| Symptom | Cause | Fix |
|---|
| Trace doesn't appear in UI | Wrong Python version | Use INLINECODE43 |
| No output, no error |
Same — Python 3.14 pydantic v1 incompatibility | Same fix |
|
ImportError: langfuse not found | Wrong venv | Same fix |
| Duplicate traces on backfill | Shouldn't happen — backfill is idempotent | Check if running logger + backfill both for same trace |
|
op: command not found | 1Password CLI not in PATH | Run from shell with OP
SERVICEACCOUNT_TOKEN set, or source
~/.zshrc first |
| Langfuse UI empty after logging | Docker daemon down |
docker ps — restart Langfuse container if needed |
技能:langfuse-trace-logger
目的: 将子代理任务完成情况记录为 Langfuse 追踪,用于回放、评估和成本分析。
范围: 由 Loki 在每个会话包装(阶段 4)结束时,针对每个重要的子代理完成情况调用。
脚本: /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py
⚠️ 关键:Python 版本
始终使用 ~/.chatterbox-venv/bin/python3(Python 3.11.15)
Langfuse SDK 使用 pydantic v1,与 Python 3.14 不兼容。使用系统 Python(python3)或 pyenv Python(3.14.x)会导致静默失败——没有导入错误,没有异常,追踪只是不会出现在 Langfuse UI 中。这将浪费 30 分钟以上的调试时间。
bash
✅ 正确
~/.chatterbox-venv/bin/python3 scripts/langfuse-trace-logger.py ...
❌ 错误——在 Python 3.14 上静默失败
python3 scripts/langfuse-trace-logger.py ...
/Users/loki/.pyenv/versions/3.14.3/bin/python3 scripts/langfuse-trace-logger.py ...
基本调用
bash
~/.chatterbox-venv/bin/python3 /Users/loki/.openclaw/workspace/scripts/langfuse-trace-logger.py \
--session-id $SESSION_ID \
--parent-id agent:main \
--agent kit \
--task task-label-kebab-case \
--model anthropic/claude-sonnet-4-6 \
--status completed \
--input full task prompt given to agent (first 4000 chars)... \
--output what the agent returned or accomplished... \
--duration 278 \
--tokens 16900 \
--project reddi-agent-protocol \
--skills product-tour-capture
追踪模式
| 字段 | 类型 | 用途 | 备注 |
|---|
| --session-id | 字符串 | 子代理会话键 | 使用实际的子代理会话键——支持血缘追踪 |
| --parent-id |
字符串 | 父会话引用 | 除非是嵌套子代理,否则始终为 agent:main |
| --agent | 字符串 | 代理名称 | 小写:kit, archie, sara, finn, quill 等 |
| --task | 字符串 | 任务标签(短横线命名) | 用于回放分组:replay-judge.py --tag task:kit-setup-rebuild |
| --model | 字符串 | 使用的模型 | 例如 anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5 |
| --status | 字符串 | 结果 | completed / partial / failed |
| --input | 字符串 | 完整任务提示 | 前 4000 个字符——这是在评估运行中针对其他模型回放的内容 |
| --output | 字符串 | 结果摘要 | 代理的输出/结果——这是评估器评分的内容 |
| --duration | 整数 | 时间(秒) | 用于效率分析和代理路由决策 |
| --tokens | 整数 | 使用的总令牌数 | 用于成本分析和预算治理 |
| --project | 字符串 | 项目标识 | 必须匹配 projects/
/STATUS.md——支持项目级过滤 |
| --skills | 字符串 | 逗号分隔的技能 | 例如 product-tour-capture,ffmpeg-studio——支持技能有效性过滤 |
标签分类法
记录器会自动从上述字段生成这些标签:
- - agent:kit——来自 --agent
- model_family:claude-sonnet——从 --model 派生
- project:reddi-agent-protocol——来自 --project
- skill:product-tour-capture——--skills 中每个技能一个标签
- task:kit-setup-rebuild——来自 --task
- status:completed——来自 --status
这些标签为回放评估过滤器语法提供支持。
回填模式
用于会话包装被跳过或追踪缺失时的追溯性记录。
幂等性: 使用基于 date+agent+task 哈希的确定性追踪 ID。安全地重新运行——不会创建重复项。
bash
先预览(试运行)
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
--from-date 2026-03-24 \
--to-date 2026-03-24 \
--dry-run
然后实际运行
~/.chatterbox-venv/bin/python3 scripts/langfuse-backfill-historical.py \
--from-date 2026-03-24 \
--to-date 2026-03-24
数据源: 回填解析 memory/YYYY-MM-DD.md 文件并提取结构化任务结果块。这就是为什么内存文件中的任务结果块格式必须一致——格式不一致会静默破坏解析。
回填 ID 格式: backfill-YYYY-MM-DD--——确定性,无重复风险。
回放和评估
bash
报告所有 Kit 追踪(过去 30 天)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag agent:kit --report
将所有 Kit 追踪与 Haiku 比较(成本降低分析)
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag agent:kit --models claude-haiku-4-5 --judge claude-haiku-4-5 --report
评估特定追踪
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--trace-id backfill-2026-03-24-kit-setup-rebuild \
--models claude-haiku-4-5 --judge claude-haiku-4-5
按项目过滤
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag project:reddi-agent-protocol --report
按技能过滤
~/.chatterbox-venv/bin/python3 scripts/replay-judge.py \
--tag skill:product-tour-capture --report
验证追踪是否出现
记录后,在 Langfuse UI 中验证:http://localhost:3100
或以编程方式检查:
bash
~/.chatterbox-venv/bin/python3 -c
import subprocess
sk = subprocess.run(
[op, read, op://OpenClaw/Langfuse (Local)/credential],
capture_output=True, text=True
).stdout.strip()
from langfuse import Langfuse
lf = Langfuse(publickey=pk-lf-openclaw-local, secretkey=sk, host=http://localhost:3100)
traces = lf.client.trace.list(limit=5)
[print(t.name, t.id[:12]) for t in traces.data]
预期输出:最后 5 个追踪名称 + 截断的 ID。如果为空,则是 Python 版本问题(参见上面的警告)。
Langfuse 连接详情
| 设置 | 值 |
|---|
| UI | http://localhost:3100 |
| 公钥 |
pk-lf-openclaw-local |
| 密钥 | op://OpenClaw/Langfuse (Local)/credential(1Password) |
| 也在 1Password 中 | op://OpenClaw/Langfuse (Local)/Secret Key |
| Docker | 始终运行(守护进程服务) |
何时调用此技能
此技能在会话包装剧本(playbooks/session-wrap/PLAYBOOK.md)的阶段 4(追踪)期间调用。
每个重要的子代理完成时调用一次。 使用阶段 1(内存文件)中写入的任务结果块中的数据。不要从内存中重建——读取你刚刚写入的内容。
记录的最低阈值: 任何产生了可交付成果(写入文件、调用 API、生成分析)的子代理运行。跳过:简单查找、单行工具调用、没有输出的失败尝试。
故障排除
| 症状 | 原因 | 修复 |
|---|
| 追踪未出现在 UI 中 | Python 版本错误 | 使用 ~/.chatterbox-venv/bin/python3 |
| 无输出,无错误 |
相同——Python 3.14 p