Production Agent Design

Core Principle

The LLM is the reasoning engine. Your code is the execution engine. The loop is the contract between them.

Every production concern — safety, cost, retries, logging, permissions — lives in the harness, not the prompt. A prompt that says "be careful with deletions" is a suggestion. A GuardedToolNode that intercepts delete_* calls is a guarantee.

When to Use This Skill

- Designing a new multi-agent system from scratch
Adding safety, cost controls, or observability to an existing agent
Debugging runaway cost, infinite loops, or context window exhaustion
Choosing between single-agent vs multi-agent topology
Implementing human-in-the-loop (HITL) for irreversible actions
Setting up session persistence and resumption

Architecture at a Glance

CODEBLOCK0

Single Agent vs Multi-Agent

CODEBLOCK1

Rule: Start with a single agent. Add multi-agent complexity only when you hit a concrete limit — context window size, tool set sprawl, latency, or accuracy.

Framework Selection

Need	Use
Complex branching, HITL, durable persistence, fine-grained control	LangGraph
Simple loop, minimal boilerplate, rapid prototype, leaf agents

Strands | | Orchestration graph + simple leaf agents | LangGraph + Strands hybrid |

Reference Files

Load these on demand using the triggers listed below. Do not load all of them upfront.

File	Load when...
references/router-layer.md	Designing intent routing, building a classifier node, handling misrouting
references/orchestrator-layer.md

Decomposing tasks, spawning subagents, implementing plan-then-execute |
| references/tool-safety-layer.md | Designing tools, adding permission rules, implementing HITL or killswitch |
| references/memory-layer.md | Context window approaching limit, adding long-term memory, injecting project context |
| references/observability-layer.md | Adding tracing, tracking token cost, debugging agent behavior, setting up alerts |
| references/resilience-layer.md | Adding retry logic, circuit breakers, preventing infinite loops |
| references/persistence-layer.md | Choosing a checkpointer, implementing session resume, session branching |
| references/production-checklist.md | Before deploying to production — full ~40-point readiness checklist |

Quick Reference

Pattern	Key implementation	Reference
Intent routing	INLINECODE2 + confidence threshold	INLINECODE3
Scoped subagents

Gotchas

- Safety rules must be code, not prompts. A prompt saying "don't delete production data" is not a safety control.
Never dump the full parent message history into a subagent. Pass only the specific task and relevant data — context pollution degrades performance and wastes tokens.
InMemorySaver is for development only. Use Redis or Postgres checkpointers in production.
interrupt() pauses the graph. Resume it by calling graph.invoke(Command(resume=...), config=config) — forgetting this leaves the agent stuck.
Tool result truncation is mandatory. Large tool outputs (file reads, search results) will exhaust the context window if not truncated before returning.
Always set max_iterations. Without a loop guard, a miscalibrated agent runs indefinitely and incurs unbounded cost.
Apply compaction in tiers. Budget tool results → snip → microcompact → autocompact. Jumping straight to full summarization wastes tokens when a cheaper step would suffice.
Track diminishing returns, not just token budget. An agent can burn through its iteration budget producing nearly empty continuations. Stop when the last 2 deltas are both below ~500 tokens.
Snapshot config at query entry. Never re-read feature flags or env vars mid-turn — a remote config change during a 30-second response causes inconsistent behavior within a single turn.
Concurrency safety must be checked at runtime. Schema metadata cannot determine if a bash command is safe — inspect the actual input string at call time. Fail conservatively (serial) if parsing fails.

生产级智能体设计

核心原则

大语言模型是推理引擎。你的代码是执行引擎。循环是它们之间的契约。

所有生产级关注点——安全性、成本、重试、日志记录、权限——都存在于框架中，而非提示词中。一条提示词说小心删除操作只是一个建议。一个拦截delete_*调用的GuardedToolNode才是保障。

何时使用本技能

- 从零开始设计新的多智能体系统
为现有智能体添加安全性、成本控制或可观测性
调试失控成本、无限循环或上下文窗口耗尽问题
在单智能体与多智能体拓扑之间做出选择
对不可逆操作实施人在回路（HITL）
设置会话持久化和恢复

架构概览

入口（HTTP / CLI / Webhook / 调度）
│
路由层 — 分类意图，低成本分发
│
编排器 — 分解任务，委派给专家
├── 智能体A（限定作用域的工具）
└── 智能体B（限定作用域的工具）
│
工具层 — 验证模式 → 检查权限 → 执行 → 截断
│
横切关注点
├── 记忆（短期 / 工作 / 长期）
├── 可观测性（追踪、成本、会话回放）
└── 弹性（重试、断路器、循环防护）
│
持久化 — 检查点（Redis / Postgres）+ 审计日志

单智能体 vs 多智能体

任务限定在单一领域？
是 → 带适当工具的单一ReAct智能体
否 → 独立子任务？
是 → 并行多智能体（监督者 + 专家）
否 → 顺序/层次化编排器
│
是否有需要人工审核的不可逆步骤？
是 → 先规划后执行，带HITL中断
否 → 带自动委派的编排器

规则： 从单一智能体开始。只有在遇到具体限制——上下文窗口大小、工具集膨胀、延迟或准确性——时才增加多智能体复杂性。

框架选择

需求	使用
复杂分支、HITL、持久化持久性、细粒度控制	LangGraph
简单循环、最小样板代码、快速原型、叶子智能体

Strands | | 编排图 + 简单叶子智能体 | LangGraph + Strands 混合 |

参考文件

使用下面列出的触发器按需加载这些文件。不要一次性全部加载。

文件	何时加载...
references/router-layer.md	设计意图路由、构建分类器节点、处理路由错误
references/orchestrator-layer.md

快速参考

模式	关键实现	参考
意图路由	conditionaledges + 置信度阈值	router-layer.md
限定作用域的子智能体

注意事项

- 安全规则必须是代码，而非提示词。 一条提示词说不要删除生产数据不是安全控制措施。
永远不要将完整的父消息历史转储到子智能体中。 只传递具体任务和相关数据——上下文污染会降低性能并浪费令牌。
InMemorySaver仅用于开发。 生产环境中使用Redis或Postgres检查点器。
interrupt()会暂停图。 通过调用graph.invoke(Command(resume=...), config=config)恢复它——忘记这一步会让智能体卡住。
工具结果截断是强制性的。 大型工具输出（文件读取、搜索结果）如果在返回前不截断，会耗尽上下文窗口。
始终设置max_iterations。 没有循环防护，校准不当的智能体会无限运行并产生无界成本。
分层应用压缩。 预算工具结果 → 剪裁 → 微压缩 → 自动压缩。当更便宜的步骤就足够时，直接跳到完整摘要会浪费令牌。
追踪收益递减，而不仅仅是令牌预算。 智能体可能耗尽迭代预算却产生几乎空白的延续。当最后2个增量都低于约500令牌时停止。
在查询入口快照配置。 永远不要在轮次中间重新读取功能标志或环境变量——在30秒响应期间远程配置更改会导致单轮内行为不一致。
并发安全必须在运行时检查。 模式元数据无法确定bash命令是否安全——在调用时检查实际输入字符串。如果解析失败，保守地失败（串行）。

production-agent-design生产智能体设计