llm-regression-monitorLLM回归监控

Use this skill when the user wants to monitor LLM behavior over time and get alerted when outputs change unexpectedly. Triggers on requests like "set up LLM regression monitoring", "alert me when my prompts start behaving differently", "watch my LLM for regressions", "run behavioral tests on my AI outputs on a schedule", or "detect when my model starts drifting". Handles first-time setup, baseline capture, scheduled monitoring, and alert configuration.

作者: admin | 来源: ClawHub

LLM Regression Monitor

Overview

Automated behavioral regression monitoring for LLM apps. Captures baseline outputs, detects drift on a schedule, and fires WhatsApp or Slack alerts the moment something regresses.

Workflow Decision Tree

CODEBLOCK0

Step 1 — Install

CODEBLOCK1

Step 2 — Create test_suite.yaml

Create in the project root. Minimal example:

CODEBLOCK2

Set the API key for the chosen provider:
CODEBLOCK3

Read references/test-suite-format.md for the full field spec.
Read references/providers.md for env vars and Ollama setup.

Step 3 — Capture Baselines

CODEBLOCK4

Saves ground-truth outputs to .llm_behave_baselines/. Run once before monitoring begins.

3b — Update after intentional prompt/model change

CODEBLOCK5

Step 4 — Run the Monitor

CODEBLOCK6

Writes monitor_report.json. Exits 0 on all-pass, 1 on any failure (CI-compatible).

Step 5 — Configure Alerts

CODEBLOCK7

Add to .env in project root — scripts load it automatically. Send via:
CODEBLOCK8

Silent on green runs. Logs every alert to monitor_alerts.log regardless.

Step 6 — Schedule with OpenClaw Cron

Confirm the schedule with the user (default: 9am daily), then add:

- Schedule: INLINECODE6
Command: INLINECODE7
Directory: project root (where test_suite.yaml lives)

The || send_alert.py fires only when run_monitor.py exits 1 (failures found).

Common Errors

Error	Fix
INLINECODE11	INLINECODE12
INLINECODE13

LLM 回归监控器

概述

针对LLM应用的自动化行为回归监控。捕获基线输出，按计划检测漂移，并在出现回归时立即通过WhatsApp或Slack发送警报。

工作流决策树

用户请求
├── 设置监控 / 首次使用 → 完整设置（步骤1-5）
├── 立即运行监控 → 仅步骤4
├── 我修改了提示词/模型 → 步骤3b（更新基线）
└── 配置警报 → 步骤5

步骤1 — 安装

bash
pip install llm-behave[semantic] pyyaml requests

步骤2 — 创建 test_suite.yaml

在项目根目录创建。最小示例：

yaml
tests:
- name: support_response
prompt: 客户说他们没有收到订单。你如何回应？
provider: openai # openai | anthropic | ollama | custom
model: gpt-4o-mini
assertions:
- type: tone
expected: 富有同理心
drift:
enabled: true
threshold: 0.80

为所选提供商设置API密钥：
bash
export OPENAIAPIKEY=sk-...
export ANTHROPICAPIKEY=sk-ant-... # 如果使用anthropic

ollama无需密钥

阅读 references/test-suite-format.md 获取完整字段说明。
阅读 references/providers.md 获取环境变量和Ollama设置。

步骤3 — 捕获基线

bash
python scripts/capture_baseline.py

将基准输出保存到 .llmbehavebaselines/。在开始监控前运行一次。

3b — 有意修改提示词/模型后更新

bash

重置单个测试

python scripts/capture_baseline.py --update-baseline <测试名称>

重置所有

python scripts/capture_baseline.py --force

步骤4 — 运行监控器

bash
python scripts/run_monitor.py

写入 monitor_report.json。全部通过时退出码为0，任何失败时退出码为1（兼容CI）。

步骤5 — 配置警报

bash

WhatsApp（需要安装并登录wacli）

export ALERTWHATSAPPTO=+1234567890

Slack

export ALERTSLACKWEBHOOK=https://hooks.slack.com/services/...

添加到项目根目录的 .env 文件 — 脚本会自动加载。通过以下命令发送：
bash
python scripts/send_alert.py

运行正常时保持静默。所有警报均记录到 monitor_alerts.log。

步骤6 — 使用OpenClaw Cron调度

与用户确认调度计划（默认：每天上午9点），然后添加：

- 调度： 0 9 *
命令： python runmonitor.py && true || python sendalert.py
目录： 项目根目录（test_suite.yaml所在位置）

|| sendalert.py 仅在 runmonitor.py 退出码为1（发现失败）时触发。

常见错误

错误	修复方法
llm-behave is not installed	pip install llm-behave[semantic]
OPENAIAPIKEY is not set

llm-regression-monitorLLM回归监控