Rag Evaluator

AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.

Commands

Run rag-evaluator <command> [args] to use.

Command	Description
INLINECODE1	Configure RAG evaluation settings and parameters
INLINECODE2

Each domain command (configure, benchmark, compare, etc.) works in two modes:

- Without arguments: displays the most recent 20 entries from that category
With arguments: logs the input with a timestamp and saves to the category log file

Data Storage

All data is stored locally in ~/.local/share/rag-evaluator/:

- Each command creates its own log file (e.g., configure.log, benchmark.log)
A unified history.log tracks all activity across commands
Entries are stored in timestamp|value pipe-delimited format
Export supports JSON, CSV, and plain text formats

Requirements

- Bash 4+ with set -euo pipefail strict mode
Standard Unix utilities: date, wc, du, tail, grep, sed, INLINECODE32
No external dependencies or API keys required

When to Use

1. Evaluating RAG pipeline quality — log evaluation scores, compare retrieval strategies, and track improvements over time
Benchmarking different configurations — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side
Tracking costs and usage — monitor API costs and token usage across experiments to stay within budget
Managing prompt engineering — log prompt variations, test them against your pipeline, and analyze which templates perform best
Generating reports for stakeholders — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance

Examples

CODEBLOCK0

Output

All commands output to stdout. Redirect to a file if needed:

CODEBLOCK1

Configuration

Set DATA_DIR by modifying the script, or use the default: ~/.local/share/rag-evaluator/

技能名称: Ragaai Catalyst

详细描述:

检索增强生成评估器

AI驱动的RAG（检索增强生成）评估工具包。通过命令行配置、基准测试、比较和优化您的RAG管道。跟踪提示词、评估、微调实验、成本和使用情况——全部通过持久化本地日志记录和完整导出功能实现。

命令

运行 rag-evaluator <命令> [参数] 即可使用。

命令	描述
configure	配置RAG评估设置和参数
benchmark

每个领域命令（configure、benchmark、compare等）有两种工作模式：

- 无参数：显示该类别最近20条记录
带参数：将输入内容连同时间戳一起记录并保存到类别日志文件中

数据存储

所有数据本地存储在 ~/.local/share/rag-evaluator/ 目录下：

- 每个命令创建自己的日志文件（例如 configure.log、benchmark.log）
统一的 history.log 文件跟踪所有命令的活动
条目以时间戳|值的竖线分隔格式存储
支持JSON、CSV和纯文本格式导出

系统要求

- Bash 4+，启用 set -euo pipefail 严格模式
标准Unix工具：date、wc、du、tail、grep、sed、cat
无需外部依赖或API密钥

使用场景

1. 评估RAG管道质量——记录评估分数，比较检索策略，并跟踪随时间推移的改进
对不同配置进行基准测试——针对嵌入模型、分块大小或检索方法运行基准测试，并并排比较结果
跟踪成本和使用情况——监控各实验的API成本和令牌使用量，确保不超出预算
管理提示词工程——记录提示词变体，针对管道进行测试，并分析哪些模板表现最佳
为利益相关者生成报告——将评估数据导出为JSON/CSV格式用于仪表板，或生成总结RAG性能的文本报告

示例

bash

配置新的评估运行

rag-evaluator configure model=gpt-4 chunks=512 overlap=50 top_k=5

运行基准测试并记录结果

rag-evaluator benchmark latency=230ms recall@5=0.82 precision@5=0.71

比较两种检索策略

rag-evaluator compare bm25 vs dense: bm25 recall=0.78, dense recall=0.85

跟踪评估分数

rag-evaluator evaluate faithfulness=0.91 relevance=0.87 coherence=0.93

记录某次运行的API成本

rag-evaluator cost run-042: $0.23 (1.2k tokens input, 800 tokens output)

查看汇总统计信息

rag-evaluator stats

将所有数据导出为CSV

rag-evaluator export csv

搜索特定条目

rag-evaluator search gpt-4

检查最近活动

rag-evaluator recent

健康检查

rag-evaluator status

输出

所有命令输出到标准输出。如需重定向到文件：

bash
rag-evaluator report weekly summary > report.txt
rag-evaluator export json # 保存到 ~/.local/share/rag-evaluator/export.json

配置

通过修改脚本设置 DATA_DIR，或使用默认值：~/.local/share/rag-evaluator/

由BytesAgain提供技术支持 | bytesagain.com | hello@bytesagain.com

Ragaai CatalystRagaai催化剂

Ragaai Catalyst

Rag Evaluator

Commands

Data Storage

Requirements

When to Use

Examples

Output

Configuration

检索增强生成评估器

命令

数据存储

系统要求

使用场景

示例

配置新的评估运行

运行基准测试并记录结果

比较两种检索策略

跟踪评估分数

记录某次运行的API成本

查看汇总统计信息

将所有数据导出为CSV

搜索特定条目

检查最近活动

健康检查

输出

配置

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

Ragaai CatalystRagaai催化剂

Ragaai Catalyst

Rag Evaluator

Commands

Data Storage

Requirements

When to Use

Examples

Output

Configuration

检索增强生成评估器

命令

数据存储

系统要求

使用场景

示例

配置新的评估运行

运行基准测试并记录结果

比较两种检索策略

跟踪评估分数

记录某次运行的API成本

查看汇总统计信息

将所有数据导出为CSV

搜索特定条目

检查最近活动

健康检查

输出

配置

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement