Genai Toolkit
Genai Toolkit v2.0.0 — an AI toolkit for managing generative AI workflows from the command line. Log configurations, benchmarks, prompts, evaluations, fine-tuning runs, cost tracking, and optimization notes. Each entry is timestamped and persisted locally. Works entirely offline — your data never leaves your machine.
Why Genai Toolkit?
- - Works entirely offline — your data never leaves your machine
- Simple command-line interface with no GUI dependency
- Export to JSON, CSV, or plain text at any time for sharing or archival
- Automatic activity history logging across all commands
- Each domain command doubles as both a logger and a viewer
Commands
Domain Commands
Each domain command works in two modes: log mode (with arguments) saves a timestamped entry, view mode (no arguments) shows the 20 most recent entries.
| Command | Description |
|---|
| INLINECODE0 | Log a configuration note such as model parameters, API keys, or environment settings. Use this to record setup changes and track which configurations were active during experiments. |
| INLINECODE1 |
Log a benchmark result or performance observation. Record latency, throughput, accuracy, or other metrics to compare across runs and model versions. |
|
genai-toolkit compare <input> | Log a comparison note between models, configurations, or approaches. Useful for side-by-side evaluations like GPT-4 vs Claude on specific tasks. |
|
genai-toolkit prompt <input> | Log a prompt template or prompt engineering note. Track iterations on prompt design, record what worked, and document prompt versioning. |
|
genai-toolkit evaluate <input> | Log an evaluation result or quality metric. Record accuracy scores, F1 metrics, human ratings, or any qualitative assessment of model outputs. |
|
genai-toolkit fine-tune <input> | Log a fine-tuning run or hyperparameter note. Track epochs, learning rates, dataset sizes, and resulting model performance after fine-tuning. |
|
genai-toolkit analyze <input> | Log an analysis observation or insight. Record patterns found in data, failure mode analysis, or trends across experiments. |
|
genai-toolkit cost <input> | Log cost tracking data including API costs, compute expenses, and token consumption. Essential for budget monitoring across projects and providers. |
|
genai-toolkit usage <input> | Log usage metrics or consumption data. Track request volumes, token counts, rate limit encounters, and daily/monthly consumption patterns. |
|
genai-toolkit optimize <input> | Log optimization attempts or performance improvements. Record what was changed, the expected vs actual impact, and next steps. |
|
genai-toolkit test <input> | Log test results or test case notes. Record pass/fail outcomes, edge cases discovered, and regression test results. |
|
genai-toolkit report <input> | Log a report entry or summary finding. Capture weekly summaries, milestone reports, or executive-level findings from AI workflows. |
Utility Commands
| Command | Description |
|---|
| INLINECODE12 | Show summary statistics across all log files, including entry counts per category and total data size on disk. |
| INLINECODE13 |
Export all data to a file in the specified format. Supported formats:
json,
csv,
txt. Output is saved to the data directory. |
|
genai-toolkit search <term> | Search all log entries for a term using case-insensitive matching. Results are grouped by log category for easy scanning. |
|
genai-toolkit recent | Show the 20 most recent entries from the unified activity log, giving a quick overview of recent work across all commands. |
|
genai-toolkit status | Health check showing version, data directory path, total entry count, disk usage, and last activity timestamp. |
|
genai-toolkit help | Show the built-in help message listing all available commands and usage information. |
|
genai-toolkit version | Print the current version (v2.0.0). |
Data Storage
All data is stored locally at ~/.local/share/genai-toolkit/. Each domain command writes to its own log file (e.g., configure.log, benchmark.log). A unified history.log tracks all actions across commands. Use export to back up your data at any time.
Requirements
- - Bash (4.0+)
- No external dependencies — pure shell script
- No network access required
When to Use
- - Tracking AI model benchmarks and comparisons across different providers and versions over time
- Logging prompt engineering iterations to understand what improvements actually moved the needle
- Monitoring API costs and token usage across multiple projects and billing periods
- Evaluating fine-tuning experiments with detailed hyperparameter and metric tracking
- Building a searchable knowledge base of optimization attempts and analysis insights
Examples
CODEBLOCK0
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
Genai 工具包
Genai 工具包 v2.0.0 — 一个用于从命令行管理生成式 AI 工作流的 AI 工具包。可记录配置、基准测试、提示词、评估、微调运行、成本追踪和优化笔记。每条记录都带有时间戳并持久化存储在本地。完全离线运行——您的数据永远不会离开您的机器。
为什么选择 Genai 工具包?
- - 完全离线运行——您的数据永远不会离开您的机器
- 简单的命令行界面,无需 GUI 依赖
- 随时导出为 JSON、CSV 或纯文本格式,便于共享或归档
- 所有命令自动记录活动历史
- 每个领域命令兼具记录器和查看器双重功能
命令
领域命令
每个领域命令有两种工作模式:记录模式(带参数)保存带时间戳的记录,查看模式(无参数)显示最近 20 条记录。
| 命令 | 描述 |
|---|
| genai-toolkit configure <输入> | 记录配置说明,如模型参数、API 密钥或环境设置。用于记录设置变更并追踪实验期间哪些配置处于活动状态。 |
| genai-toolkit benchmark <输入> |
记录基准测试结果或性能观察。记录延迟、吞吐量、准确率或其他指标,以便在不同运行和模型版本之间进行比较。 |
| genai-toolkit compare <输入> | 记录模型、配置或方法之间的比较说明。适用于并排评估,如 GPT-4 与 Claude 在特定任务上的对比。 |
| genai-toolkit prompt <输入> | 记录提示词模板或提示词工程说明。追踪提示词设计的迭代过程,记录有效方法,并记录提示词版本。 |
| genai-toolkit evaluate <输入> | 记录评估结果或质量指标。记录准确率分数、F1 指标、人工评分或对模型输出的任何定性评估。 |
| genai-toolkit fine-tune <输入> | 记录微调运行或超参数说明。追踪微调后的轮次、学习率、数据集大小和模型性能结果。 |
| genai-toolkit analyze <输入> | 记录分析观察或见解。记录数据中发现的模式、故障模式分析或实验趋势。 |
| genai-toolkit cost <输入> | 记录成本追踪数据,包括 API 成本、计算费用和令牌消耗。对于跨项目和供应商的预算监控至关重要。 |
| genai-toolkit usage <输入> | 记录使用指标或消耗数据。追踪请求量、令牌数量、速率限制遭遇以及每日/每月消耗模式。 |
| genai-toolkit optimize <输入> | 记录优化尝试或性能改进。记录更改内容、预期与实际影响以及后续步骤。 |
| genai-toolkit test <输入> | 记录测试结果或测试用例说明。记录通过/失败结果、发现的边界情况和回归测试结果。 |
| genai-toolkit report <输入> | 记录报告条目或总结发现。捕获周度总结、里程碑报告或来自 AI 工作流的高层发现。 |
实用命令
| 命令 | 描述 |
|---|
| genai-toolkit stats | 显示所有日志文件的汇总统计信息,包括每个类别的记录数和磁盘上的总数据大小。 |
| genai-toolkit export <格式> |
将所有数据导出为指定格式的文件。支持的格式:json、csv、txt。输出保存到数据目录。 |
| genai-toolkit search <词条> | 使用不区分大小写的匹配在所有日志记录中搜索词条。结果按日志类别分组,便于浏览。 |
| genai-toolkit recent | 显示统一活动日志中最近的 20 条记录,快速概览所有命令的近期工作。 |
| genai-toolkit status | 健康检查,显示版本、数据目录路径、总记录数、磁盘使用情况和最后活动时间戳。 |
| genai-toolkit help | 显示内置帮助信息,列出所有可用命令和使用说明。 |
| genai-toolkit version | 打印当前版本(v2.0.0)。 |
数据存储
所有数据本地存储在 ~/.local/share/genai-toolkit/。每个领域命令写入自己的日志文件(例如 configure.log、benchmark.log)。统一的 history.log 追踪所有命令的操作。随时使用 export 备份数据。
系统要求
- - Bash(4.0+)
- 无外部依赖——纯 Shell 脚本
- 无需网络访问
使用场景
- - 跨不同供应商和版本追踪 AI 模型基准测试和比较
- 记录提示词工程迭代,了解哪些改进真正产生了效果
- 监控多个项目和计费周期的 API 成本和令牌使用情况
- 通过详细的超参数和指标追踪评估微调实验
- 构建可搜索的优化尝试和分析见解知识库
示例
bash
记录基准测试结果
genai-toolkit benchmark GPT-4o 延迟:平均 1.2 秒,p99 3.8 秒,摘要任务,500 个样本
追踪成本记录
genai-toolkit cost 三月批处理:42.50 美元,15k 请求,平均 0.0028 美元/请求
比较两个模型
genai-toolkit compare Claude 3.5 与 GPT-4o 在代码生成上的对比——Claude 快 15%,GPT-4o 准确率高 5%
记录提示词迭代
genai-toolkit prompt v3:添加了思维链指令,幻觉率从 12% 降至 3%
记录微调运行
genai-toolkit fine-tune SQL 生成模型第 5 轮:准确率=0.96,损失=0.12,学习率=2e-5,数据集=50k 行
查看所有统计信息
genai-toolkit stats
导出所有数据为 JSON
genai-toolkit export json
搜索提及延迟的记录
genai-toolkit search latency
查看近期活动
genai-toolkit recent
健康检查
genai-toolkit status
由 BytesAgain 提供 | bytesagain.com | hello@bytesagain.com