Tool Calling (Deep Workflow)

Tool calling is contract design between a probabilistic planner (the model) and deterministic systems. Failures are usually schema, permissions, or ambiguity—not the LLM “being dumb.”

When to Offer This Workflow

Trigger conditions:

- Designing OpenAI/Anthropic-style functions, MCP tools, or internal JSON tool protocols
Debugging wrong arguments, hallucinated parameters, or unsafe side effects
Building agents with many tools—selection and routing problems

Initial offer:

Use six stages: (1) define tool surface, (2) schema & validation, (3) authz & safety, (4) execution semantics, (5) errors & observability, (6) evaluation & regression. Confirm side-effect class (read-only vs write).

Stage 1: Define Tool Surface

Goal: Minimize tools; maximize clarity per tool.

Principles

- One action per tool when possible—avoid mega-tools with mode flags unless necessary
Names descriptive: search_orders not INLINECODE1
Prefer idempotent operations where writes exist; separate read vs write clearly

Anti-patterns

- Exposing raw SQL or shell to the model
Too many overlapping tools → routing errors

Exit condition: Tool list with purpose, inputs, outputs, side effects table.

Stage 2: Schema & Validation

Goal: Arguments are typed, constrained, and machine-validated before execution.

Practices

- JSON Schema: enums, min/max, patterns, required fields
Normalize dates, IDs, currencies server-side—never trust model formatting alone
Default behaviors explicit in description + schema

Descriptions

- Tool and parameter docstrings seen by model—precise language; examples of valid args

Exit condition: Validator rejects invalid args with actionable errors back to model or orchestrator.

Stage 3: Authorization & Safety

Goal: Every tool call runs as some principal with least privilege.

Patterns

- User-scoped credentials carried from session; tool implementation re-checks ownership (e.g., order_id belongs to user)
Admin tools behind explicit allowlists and human approval when needed
Rate limits per user + global circuit breakers

Data exfiltration

- Tools that read sensitive data need output filtering and logging policies

Exit condition: Threat brief: “What if model is tricked into calling tool X?” answered.

Stage 4: Execution Semantics

Goal: Clear transactionality, retries, and idempotency.

Design

- Idempotency keys for writes; dedupe window
Timeouts and cancellation propagation
Ordering: parallel safe vs must be serial

Long operations

- Async jobs with poll tool vs blocking calls—prefer non-blocking for UX and cost

Exit condition: Semantics documented for retry behavior (at-least-once delivery common).

Stage 5: Errors & Observability

Goal: Model (or orchestrator) can recover from failures without leaking internals.

Error messages

- Structured error codes: ORDER_NOT_FOUND, INLINECODE3
Hints for model on how to fix—without stack traces to end users

Observability

- Trace IDs across tool calls; audit log for write tools (who/when/args hash)

Exit condition: Dashboards/alerts on tool error rate, latency, denials.

Stage 6: Evaluation & Regression

Goal: Tool changes are tested like APIs.

Harness

- Golden conversations with expected tool calls (args normalized)
Adversarial prompts attempting privilege escalation
Version tools; deprecate with compatibility window

Exit condition: CI or manual eval suite before deploying new tools/schemas.

Final Review Checklist

- [ ] Minimal orthogonal tool set
[ ] Strict schema validation on server
[ ] AuthZ enforced per call; sensitive reads controlled
[ ] Idempotency and timeouts defined for writes
[ ] Structured errors + observability + eval harness

Tips for Effective Guidance

- Treat tool descriptions as API docs the model reads—iterate wording like UX copy.
Recommend two-step patterns for dangerous ops: propose → confirm (human or policy).
When using MCP, same discipline—server must validate everything.

Handling Deviations

- Read-only RAG: fewer semantic risks—still validate query args and injection into search backends.
Local tools (filesystem): sandbox, path allowlists, size limits.

工具调用（深度工作流）

工具调用是概率性规划器（模型）与确定性系统之间的契约设计。失败通常源于模式、权限或歧义——而非LLM“能力不足”。

何时提供此工作流

触发条件：

- 设计OpenAI/Anthropic风格的函数、MCP工具或内部JSON工具协议
调试错误参数、幻觉参数或不安全的副作用
构建包含大量工具的智能体——选择与路由问题

初始提供：

使用六个阶段：(1)定义工具表面，(2)模式与验证，(3)授权与安全，(4)执行语义，(5)错误与可观测性，(6)评估与回归。确认副作用类别（只读与写入）。

阶段1：定义工具表面

目标： 最小化工具数量；最大化每个工具的清晰度。

原则

- 每个工具尽量对应一个操作——除非必要，避免使用带模式标志的巨型工具
名称具有描述性：使用searchorders而非dostuff
写入操作优先选择幂等操作；明确区分读取与写入

反模式

- 向模型暴露原始SQL或shell
过多重叠工具 → 路由错误

退出条件： 包含用途、输入、输出、副作用表格的工具列表。

阶段2：模式与验证

目标： 参数在执行前经过类型化、约束化和机器验证。

实践

- JSON Schema：枚举、最小值/最大值、模式、必填字段
在服务端规范化日期、ID、货币——绝不仅信任模型格式化
默认行为在描述+模式中明确说明

描述

- 工具和参数的文档字符串供模型读取——使用精确语言；提供有效参数的示例

退出条件： 验证器拒绝无效参数，并向模型或编排器返回可操作的错误信息。

阶段3：授权与安全

目标： 每次工具调用以某个主体身份运行，遵循最小权限原则。

模式

- 从会话携带用户范围的凭证；工具实现重新检查所有权（例如，order_id属于该用户）
管理员工具位于明确的白名单之后，必要时需人工审批
每个用户的速率限制 + 全局断路器

数据泄露

- 读取敏感数据的工具需要输出过滤和日志记录策略

退出条件： 威胁简报：回答“如果模型被诱骗调用工具X会怎样？”的问题。

阶段4：执行语义

目标： 明确的事务性、重试机制和幂等性。

设计

- 写入操作使用幂等键；设置去重窗口
超时和取消传播
排序：并行安全与必须串行

长操作

- 使用轮询工具的异步任务与阻塞调用——为提升用户体验和成本效益，优先选择非阻塞方式

退出条件： 记录重试行为的语义（通常为至少一次交付）。

阶段5：错误与可观测性

目标： 模型（或编排器）能够从失败中恢复，同时不泄露内部信息。

错误消息

- 结构化错误代码：ORDERNOTFOUND、PERMISSION_DENIED
为模型提供如何修复的提示——不向最终用户展示堆栈跟踪

可观测性

- 跨工具调用的追踪ID；写入工具的审计日志（谁/何时/参数哈希）

退出条件： 针对工具错误率、延迟、拒绝次数的仪表盘/告警。

阶段6：评估与回归

目标： 工具变更像API一样经过测试。

测试框架

- 包含预期工具调用（参数已规范化）的黄金对话
尝试权限提升的对抗性提示
版本化工具；设置兼容窗口进行弃用

退出条件： 在部署新工具/模式前，通过CI或手动评估套件。

最终审查清单

- [ ] 最小化正交工具集
[ ] 服务端严格模式验证
[ ] 每次调用强制执行授权；控制敏感读取
[ ] 写入操作定义幂等性和超时
[ ] 结构化错误 + 可观测性 + 评估框架

有效指导技巧

- 将工具描述视为模型读取的API文档——像UX文案一样迭代措辞。
对危险操作推荐两步模式：提议 → 确认（人工或策略）。
使用MCP时，同样遵循此规范——服务端必须验证所有内容。

处理偏差

- 只读RAG：语义风险较小——仍需验证查询参数和搜索后端的注入问题。
本地工具（文件系统）：沙箱、路径白名单、大小限制。

tool-calling工具调用

tool-calling

Tool Calling (Deep Workflow)

When to Offer This Workflow

Stage 1: Define Tool Surface

Principles

Anti-patterns

Stage 2: Schema & Validation

Practices

Descriptions

Stage 3: Authorization & Safety

Patterns

Data exfiltration

Stage 4: Execution Semantics

Design

Long operations

Stage 5: Errors & Observability

Error messages

Observability

Stage 6: Evaluation & Regression

Harness

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

工具调用（深度工作流）

何时提供此工作流

阶段1：定义工具表面

原则

反模式

阶段2：模式与验证

实践

描述

阶段3：授权与安全

模式

数据泄露

阶段4：执行语义

设计

长操作

阶段5：错误与可观测性

错误消息

可观测性

阶段6：评估与回归

测试框架

最终审查清单

有效指导技巧

处理偏差

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement