Log Dive — Unified Log Search 🤿
Search logs across Loki, Elasticsearch/OpenSearch, and AWS CloudWatch from a single interface. Ask in plain English; the skill translates to the right query language.
⚠️ Sensitive Data Warning: Logs frequently contain PII, secrets, tokens, passwords, and other sensitive data. Never cache, store, or repeat raw log content beyond the current conversation. Treat all log output as confidential.
Activation
This skill activates when the user mentions:
- - "search logs", "find in logs", "log search", "check the logs"
- "Loki", "LogQL", "logcli"
- "Elasticsearch logs", "Kibana", "OpenSearch"
- "CloudWatch logs", "AWS logs", "log groups"
- "error logs", "find errors", "what happened in [service]"
- "tail logs", "follow logs", "live logs"
- "log backends", "which log sources", "log indices", "log labels"
- Incident triage involving log analysis
- "log-dive" explicitly
Permissions
CODEBLOCK0
Example Prompts
- 1. "Find error logs from the checkout service in the last 30 minutes"
- "Search for timeout exceptions across all services"
- "What log backends do I have configured?"
- "List available log indices in Elasticsearch"
- "Show me the labels available in Loki"
- "Tail the payment-service logs"
- "Find all 5xx errors in CloudWatch for api-gateway"
- "Correlate errors between user-service and payment-service"
- "What happened in production between 2pm and 3pm today?"
Backend Configuration
Each backend uses environment variables. Users may have one, two, or all three configured.
Loki
| Variable | Required | Description |
|---|
| INLINECODE0 | Yes | Loki server URL (e.g., http://loki.internal:3100) |
| INLINECODE2 |
No | Bearer token for authentication |
|
LOKI_TENANT_ID | No | Multi-tenant header (
X-Scope-OrgID) |
Elasticsearch / OpenSearch
| Variable | Required | Description |
|---|
| INLINECODE5 | Yes | Base URL (e.g., https://es.internal:9200) |
| INLINECODE7 |
No |
Basic <base64> or
Bearer <token> for auth |
AWS CloudWatch Logs
| Variable | Required | Description |
|---|
| INLINECODE10 or INLINECODE11 | Yes | Standard AWS credentials |
| INLINECODE12 |
Yes | AWS region for CloudWatch |
Agent Workflow
Follow this sequence:
Step 1: Check Backends
Run the backends check to see what's configured:
CODEBLOCK1
Parse the JSON output. If no backends are configured, tell the user which environment variables to set.
Step 2: Translate the User's Query
This is the critical step. Convert the user's natural language request into the appropriate backend-specific query. Use the query language reference below.
For ALL backends, pass the query through the dispatcher:
CODEBLOCK2
Step 3: List Available Targets
Before searching, you may need to discover what's available:
CODEBLOCK3
Step 4: Tail Logs (Live Follow)
CODEBLOCK4
Tail runs for a limited time (default 30s) and streams results.
Step 5: Analyze Results
After receiving log output, you MUST:
- 1. Identify unique error types — group similar errors, count occurrences
- Find the root cause — look for the earliest error, trace dependency chains
- Correlate across services — if errors in service A mention service B, note the dependency
- Build a timeline — order events chronologically
- Summarize actionably — "The checkout service started returning 500s at 14:23 because the database connection pool was exhausted (max 10 connections, 10 in use). The pool exhaustion was triggered by a slow query in the inventory service."
NEVER dump raw log output to the user. Always summarize, extract patterns, and present structured findings.
Discord v2 Delivery Mode (OpenClaw v2026.2.14+)
When the conversation is happening in a Discord channel:
- - Send a compact incident summary first (backend, query intent, top error types, root-cause hypothesis), then ask if the user wants full detail.
- Keep the first response under ~1200 characters and avoid dumping raw log lines in the first message.
- If Discord components are available, include quick actions:
-
Show Error Timeline
-
Show Top Error Patterns
-
Run Related Service Query
- - If components are not available, provide the same follow-ups as a numbered list.
- Prefer short follow-up chunks (<=15 lines per message) when sharing timelines or grouped findings.
Query Language Reference
LogQL (Loki)
LogQL has two parts: a stream selector and a filter pipeline.
Stream selectors:
CODEBLOCK5
Filter pipeline (chained after selector):
CODEBLOCK6
Structured metadata (parsed logs):
CODEBLOCK7
Common patterns:
- - Errors in service: INLINECODE16
- HTTP 5xx: INLINECODE17
- Slow requests: INLINECODE18
- Stack traces: INLINECODE19
Elasticsearch Query DSL
Simple match:
CODEBLOCK8
Boolean query (AND/OR):
CODEBLOCK9
Time range filter:
CODEBLOCK10
Wildcard / regex:
CODEBLOCK11
Common patterns:
- - Errors in service: INLINECODE20
- HTTP 5xx: INLINECODE21
- Aggregate by field: Use
"aggs" — but prefer simple queries for agent use
CloudWatch Filter Patterns
Simple text match:
CODEBLOCK12
JSON filter patterns:
CODEBLOCK13
Negation and wildcards:
CODEBLOCK14
Common patterns:
- - Errors: INLINECODE23
- Errors in service: INLINECODE24
- HTTP 5xx: INLINECODE25
- Exceptions: INLINECODE26
Output Format
When presenting search results, use this structure:
CODEBLOCK15
Common Workflows
Incident Triage
- 1. Check backends → search for errors in affected service → search upstream/downstream services → correlate → build timeline → recommend actions.
Performance Investigation
- 1. Search for slow requests (
duration > 5s) → identify common patterns → check for database slow queries → check for external service timeouts.
Deployment Verification
- 1. Search for errors in the deployed service since deploy time → compare error rate with pre-deploy period → flag new error types.
Limitations
- - Read-only: This skill can only search and read logs. It cannot delete, modify, or create log entries.
- Output size: Default limit is 200 entries. Log output is pre-filtered to reduce token consumption. For larger investigations, use multiple targeted queries rather than one broad query.
- Network access: Log backends must be reachable from the machine running OpenClaw.
- No streaming aggregation: For complex aggregations (percentiles, rates), consider using your backend's native UI (Grafana, Kibana, CloudWatch Insights).
Troubleshooting
| Error | Cause | Fix |
|---|
| "No backends configured" | No env vars set | Set LOKI_ADDR, ELASTICSEARCH_URL, or configure AWS CLI |
| "logcli not found" |
logcli not installed | Install from https://grafana.com/docs/loki/latest/tools/logcli/ |
| "aws: command not found" | AWS CLI not installed | Install from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html |
| "curl: command not found" | curl not installed |
apt install curl or
brew install curl |
| "jq: command not found" | jq not installed |
apt install jq or
brew install jq |
| "connection refused" | Backend unreachable | Check URL, VPN, firewall rules |
| "401 Unauthorized" | Bad credentials | Check
LOKI_TOKEN,
ELASTICSEARCH_TOKEN, or AWS credentials |
Powered by Anvil AI 🤿
技能名称:log-dive
详细描述:
Log Dive — 统一日志搜索 🤿
从单一界面跨 Loki、Elasticsearch/OpenSearch 和 AWS CloudWatch 搜索日志。使用自然语言提问;该技能会将其转换为正确的查询语言。
⚠️ 敏感数据警告: 日志中经常包含个人身份信息(PII)、密钥、令牌、密码及其他敏感数据。切勿在当前对话之外缓存、存储或重复原始日志内容。将所有日志输出视为机密信息。
激活条件
当用户提及以下内容时,此技能将被激活:
- - 搜索日志、在日志中查找、日志搜索、检查日志
- Loki、LogQL、logcli
- Elasticsearch 日志、Kibana、OpenSearch
- CloudWatch 日志、AWS 日志、日志组
- 错误日志、查找错误、[服务]发生了什么
- 追踪日志、实时日志
- 日志后端、哪些日志源、日志索引、日志标签
- 涉及日志分析的事件排查
- 明确提及 log-dive
权限
yaml
permissions:
exec: true # 需要运行后端脚本
read: true # 读取脚本文件
write: false # 从不写入文件——日志可能包含密钥
network: true # 查询远程日志后端
示例提示
- 1. 查找过去30分钟内结账服务的错误日志
- 搜索所有服务中的超时异常
- 我配置了哪些日志后端?
- 列出 Elasticsearch 中可用的日志索引
- 显示 Loki 中可用的标签
- 实时追踪 payment-service 日志
- 在 CloudWatch 中查找 api-gateway 的所有5xx错误
- 关联 user-service 和 payment-service 之间的错误
- 今天下午2点到3点生产环境发生了什么?
后端配置
每个后端都使用环境变量。用户可以配置其中一个、两个或全部三个。
Loki
| 变量 | 必需 | 描述 |
|---|
| LOKIADDR | 是 | Loki 服务器 URL(例如 http://loki.internal:3100) |
| LOKITOKEN |
否 | 用于身份验证的 Bearer 令牌 |
| LOKI
TENANTID | 否 | 多租户头(X-Scope-OrgID) |
Elasticsearch / OpenSearch
| 变量 | 必需 | 描述 |
|---|
| ELASTICSEARCHURL | 是 | 基础 URL(例如 https://es.internal:9200) |
| ELASTICSEARCHTOKEN |
否 | 用于身份验证的 Basic
或 Bearer |
AWS CloudWatch 日志
| 变量 | 必需 | 描述 |
|---|
| AWSPROFILE 或 AWSACCESSKEYID | 是 | 标准 AWS 凭证 |
| AWS_REGION |
是 | CloudWatch 的 AWS 区域 |
代理工作流程
请遵循以下顺序:
第1步:检查后端
运行后端检查以查看已配置的内容:
bash
bash /scripts/log-dive.sh backends
解析 JSON 输出。如果未配置任何后端,请告知用户需要设置哪些环境变量。
第2步:翻译用户的查询
这是关键步骤。将用户的自然语言请求转换为相应的后端特定查询。请参考下面的查询语言参考。
对于所有后端,通过调度器传递查询:
bash
在所有配置的后端中搜索
bash /scripts/log-dive.sh search --query [OPTIONS]
搜索特定后端
bash /scripts/log-dive.sh search --backend loki --query {app=checkout} |= error --since 30m --limit 200
bash /scripts/log-dive.sh search --backend elasticsearch --query {query:{bool:{must:[{match:{message:error}},{match:{service:checkout}}]}}} --index app-logs-* --since 30m --limit 200
bash /scripts/log-dive.sh search --backend cloudwatch --query ERROR checkout --log-group /ecs/checkout-service --since 30m --limit 200
第3步:列出可用目标
在搜索之前,您可能需要发现可用的内容:
bash
Loki:列出标签和标签值
bash /scripts/log-dive.sh labels --backend loki
bash /scripts/log-dive.sh labels --backend loki --label app
Elasticsearch:列出索引
bash /scripts/log-dive.sh indices --backend elasticsearch
CloudWatch:列出日志组
bash /scripts/log-dive.sh indices --backend cloudwatch
第4步:实时追踪日志
bash
bash /scripts/log-dive.sh tail --backend loki --query {app=checkout}
bash /scripts/log-dive.sh tail --backend cloudwatch --log-group /ecs/checkout-service
追踪运行有限时间(默认30秒)并流式传输结果。
第5步:分析结果
收到日志输出后,您必须:
- 1. 识别独特的错误类型 — 对相似错误进行分组,统计出现次数
- 查找根本原因 — 寻找最早出现的错误,追踪依赖链
- 跨服务关联 — 如果服务A中的错误提到了服务B,请记录依赖关系
- 构建时间线 — 按时间顺序排列事件
- 提供可操作的总结 — 结账服务在14:23开始返回500错误,因为数据库连接池耗尽(最大10个连接,10个正在使用)。连接池耗尽是由库存服务中的慢查询触发的。
切勿向用户转储原始日志输出。 始终进行总结、提取模式并呈现结构化发现。
Discord v2 交付模式(OpenClaw v2026.2.14+)
当对话发生在 Discord 频道中时:
- - 首先发送一个紧凑的事件摘要(后端、查询意图、主要错误类型、根本原因假设),然后询问用户是否需要完整详情。
- 保持首次回复在约1200字符以内,避免在第一条消息中转储原始日志行。
- 如果 Discord 组件可用,请包含快速操作:
- 显示错误时间线
- 显示主要错误模式
- 运行相关服务查询
- - 如果组件不可用,请以编号列表形式提供相同的后续操作。
- 在分享时间线或分组发现时,优先使用简短的后续块(每条消息<=15行)。
查询语言参考
LogQL (Loki)
LogQL 包含两部分:流选择器和过滤管道。
流选择器:
{app=myapp} # 精确匹配
{namespace=prod, app=~api-.*} # 正则匹配
{app!=debug} # 否定匹配
过滤管道(在选择器后链接):
{app=myapp} |= error # 行包含 error
{app=myapp} != healthcheck # 行不包含
{app=myapp} |~ error|warn # 行上的正则匹配
{app=myapp} !~ DEBUG|TRACE # 否定正则
结构化元数据(解析后的日志):
{app=myapp} | json # 解析 JSON 日志
{app=myapp} | json | status >= 500 # 按解析字段过滤
{app=myapp} | logfmt # 解析 logfmt
{app=myapp} | regexp (?P\d+\.\d+\.\d+\.\d+) # 正则提取
常见模式:
- - 服务中的错误:{app=checkout} |= error | json | level=error
- HTTP 5xx:{app=api} | json | status >= 500
- 慢请求:{app=api} | json | duration > 5s
- 堆栈跟踪:{app=myapp} |= Exception |= at
Elasticsearch 查询 DSL
简单匹配:
json
{query: {match: {message: error}}}
布尔查询(AND/OR):
json
{
query: {
bool: {
must: [
{match: {message: error}},
{match: {service.name: checkout}}
],
must_not: [
{match: {message: healthcheck}}
]
}
},
sort: [{@timestamp: