Agent Orchestration 🦞

By Hal Labs — Part of the Hal Stack

Your agents fail because your prompts suck. This skill fixes that.

The Core Problem

You're not prompting. You're praying.

Most prompts are wishes tossed into the void:

CODEBLOCK0

You type something reasonable. The output is mid. You rephrase. Still mid. You add keywords. Somehow worse. You blame the model.

Here's what you don't understand: A language model is a pattern-completion engine. It generates the most statistically probable output given your input.

Vague input → generic output. Not because the model is dumb. Because generic is what's most probable when you give it nothing specific to work with.

The model honored exactly what you asked for. You just didn't realize how little you gave it.

The Core Reframe

A prompt is not a request. A prompt is a contract.

Every contract must answer four non-negotiables:

Element	Question
Role	Who is the model role-playing as?
Task

Miss one, the model fills the gap with assumptions. Assumptions are where hallucinations are born.

The 5-Layer Architecture

Effective prompts share a specific structure. This maps to how models actually process information.

Layer 1: Identity

Who is the model in this conversation?

Not "helpful assistant" but a specific role with specific expertise:

CODEBLOCK1

The model doesn't "become" this identity—it accesses different clusters of training data, different stylistic patterns, different reasoning approaches.

Identity matters. Miss this and you get generic output.

Layer 2: Context

What does the model need to know to do this task exceptionally well?

Context must be:

- Ordered — Most important first
Scoped — Only what's relevant
Labeled — What's rules vs. editable vs. historical

CODEBLOCK2

Without labels, the model treats everything as equally optional. Then it rewrites your core logic halfway through.

Layer 3: Task

What specific action must be taken?

Not "write something about X" but precise instructions:

CODEBLOCK3

The more precisely you define the task, the more precisely the model executes.

Layer 4: Process ⚡

This is where most prompts fail.

You're asking for output. You should be asking for how the output is formed.

❌ Bad:
CODEBLOCK4

✅ Good:
CODEBLOCK5

You don't want answers. You want how the answer is formed.

Think like a director. You're not asking for a scene—you're directing how the scene gets built.

Layer 5: Output

What does "done" actually look like?

If you don't specify, you get whatever format the model defaults to.

CODEBLOCK6

Miss one layer, the structure wobbles. Miss two, it collapses.

Model Selection

Prompt portability is a myth.

Different models are different specialists. You wouldn't give identical instructions to your exec assistant, designer, and backend dev.

Model Type	Best For	Watch Out For
Claude Opus	Complex reasoning, nuanced writing, long context	Expensive, can be verbose
Claude Sonnet

Adapt your prompts per model:

- Some prefer structured natural language
Some need explicit step sequencing
Some collapse under verbose prompts
Some ignore constraints unless repeated
Some excel at analysis but suck at creativity

The person who writes model-specific prompts will outperform the person with "better ideas" every time.

Constraints Are Instructions

Vagueness isn't flexibility. It's cowardice.

You hedge because being specific feels risky. But the model doesn't read your mind.

Constraints are not limitations. Constraints are instructions.

CODEBLOCK7

Every conversation starts at zero. The model doesn't have accumulated context from working with you. Consistency comes from instruction, not memory.

Canonical Documentation

If you don't have docs, you're gambling.

Document	Purpose
PRD	What we're building and why
Design System

The rule: Reference docs in your prompts.

CODEBLOCK8

Without explicit anchoring, the model assumes everything is mutable—including your core decisions.

"Good prompting isn't writing better sentences. It's anchoring the model to reality."

The Complete Template

CODEBLOCK9

Ralph Mode

For complex tasks where first attempts often fail:

CODEBLOCK10

When to use:

- Build tasks with multiple components
Integration work
Anything where first-try success is unlikely

Agent Tracking

Every spawned agent gets tracked. No orphans.

Maintain notes/areas/active-agents.md:

CODEBLOCK11

Heartbeat check:

1. Run sessions_list --activeMinutes 120
2. Compare to tracking file
3. Investigate any missing/stalled agents
4. Log completions to LEARNINGS.md

The Learnings Loop

Every agent outcome is data. Capture it.

Maintain LEARNINGS.md:

CODEBLOCK13

Role Library

Build reusable role definitions:

CODEBLOCK14

Quick Reference

The 4 Non-Negotiables

1. Role — Who is the model?
Task — What must it do?
Constraints — What rules apply?
Output — What does done look like?

The 5 Layers

1. Identity — Specific role and expertise
Context — Ordered, scoped, labeled
Task — Precise objective
Process — How to approach (most overlooked!)
Output — Exact format specification

Pre-Spawn Checklist

- [ ] Identity assigned?
[ ] Context labeled (rules/state/history)?
[ ] Task specific and measurable?
[ ] Process described (not just output)?
[ ] User stories defined?
[ ] Output format specified?
[ ] Constraints explicit?
[ ] Error handling included?
[ ] Added to tracking file?

The Final Truth

The gap between "AI doesn't work for me" and exceptional results isn't intelligence or access.

One group treats prompting as conversation. The other treats it as engineering a system command.

The model matches your level of rigor.

- Vague inputs → generic outputs
Structured inputs → structured outputs
Clear thinking → clear results

You don't need to be smarter. You need to be clearer.

Clarity is a system, not a talent.

Part of the Hal Stack 🦞

Got a skill idea? Email: halthelobster@protonmail.com

"You're not prompting, you're praying. Start engineering."

Agent Orchestration 🦞

作者：Hal Labs — Hal Stack 系列组件

你的智能体之所以失败，是因为你的提示词太烂了。本技能就是来解决这个问题的。

核心问题

你并不是在写提示词。你是在祈祷。

大多数提示词就像扔进虚空中的愿望：

❌ 研究最好的向量数据库并写一份报告

你输入了看似合理的内容。输出结果平平。你重新措辞。依然平平。你添加关键词。反而更糟了。你开始责怪模型。

但你不明白的是：语言模型本质上是一个模式补全引擎。 它会根据你的输入生成统计概率最高的输出。

模糊的输入 → 泛泛的输出。不是因为模型笨。而是因为当你没有给出具体信息时，泛泛的内容就是概率最高的结果。

模型完全按照你的要求执行了。你只是没意识到自己给出的信息有多么匮乏。

核心思维转变

提示词不是请求。提示词是一份契约。

每份契约都必须回答四个不可妥协的要素：

要素	问题
角色	模型要扮演什么角色？
任务

缺少任何一个，模型就会用假设来填补空白。而假设正是幻觉的温床。

五层架构

有效的提示词遵循特定的结构。这与模型实际处理信息的方式相对应。

第一层：身份

模型在这次对话中是谁？

不是乐于助人的助手，而是一个具有特定专业知识的特定角色：

markdown
你是一位资深的产品营销专家，专门从事B2B SaaS定位。
你有15年将技术特性转化为情感利益的经验。
你使用短句写作。从不使用未经解释的专业术语。

模型并不会变成这个身份——而是访问不同的训练数据集群、不同的风格模式、不同的推理方法。

身份很重要。 忽略这一点，你得到的就是泛泛的输出。

第二层：上下文

模型需要知道什么才能出色地完成这项任务？

上下文必须：

- 有序 — 最重要的放在前面
限定范围 — 只包含相关信息
标注清楚 — 区分规则、可编辑内容和历史信息

markdown

上下文

规则（永不更改）

- 设计系统：Tailwind, shadcn组件
语气：专业但温暖，绝不刻板

当前状态（可能变化）

- 着陆页位于 /landing
使用Next.js 14和App Router

历史信息（供参考）

- 最初使用Create React App构建，2025年1月迁移

没有标注，模型会把所有内容都视为同等可选的。 然后它会在中途重写你的核心逻辑。

第三层：任务

必须采取什么具体行动？

不是写点关于X的东西，而是精确的指令：

markdown

任务

撰写一份500字的产品描述，要求：

- 强调为忙碌的高管节省时间的好处
以主要痛点开头
包含3个具体使用场景
以明确的行动号召结尾

你定义任务越精确，模型执行就越精确。

第四层：流程 ⚡

这是大多数提示词失败的地方。

你要求的是输出。你应该要求的是输出是如何形成的。

❌ 错误：

给我写一个营销页面。

✅ 正确：
markdown

流程

1. 首先，分析目标受众并识别他们的主要痛点
然后，定义解决这些痛点的定位
然后，撰写页面
在每一步展示你的推理过程
不要跳过任何步骤
在报告完成前审计你的工作

你不需要答案。你需要的是答案的形成过程。

像导演一样思考。你不是在要求一个场景——你是在指导场景如何构建。

第五层：输出

完成到底是什么样子的？

如果你不指定，你就会得到模型默认的任何格式。

markdown

输出格式

返回一个JSON对象，包含：

- headline: 字符串（最多60个字符）
subheadline: 字符串（最多120个字符）
body: 字符串（markdown格式）
cta: 字符串（动作动词 + 利益点）

不要包含解释、注释或评论。只返回JSON。

缺少一层，结构就会摇摆。缺少两层，结构就会崩塌。

模型选择

提示词的可移植性是一个神话。

不同的模型是不同的专家。你不会给执行助理、设计师和后端开发人员完全相同的指令。

模型类型	最适合	注意
Claude Opus	复杂推理、细腻写作、长上下文	昂贵，可能冗长
Claude Sonnet

根据模型调整你的提示词：

- 有些偏好结构化的自然语言
有些需要明确的步骤排序
有些在冗长提示词下会崩溃
有些除非重复否则忽略约束
有些擅长分析但不擅长创意

能够编写针对特定模型提示词的人，每次都会胜过那些想法更好的人。

约束即指令

模糊不是灵活性。那是懦弱。

你含糊其辞是因为具体化感觉有风险。但模型不会读心术。

约束不是限制。约束是指令。

markdown

约束

- 绝不更改现有设计系统
始终保持既定的语气/风格
未经明确批准绝不更改数据模型
每次操作最多3次API调用
如果不确定，先询问而不是假设

每次对话都从零开始。模型没有与你合作积累的上下文。一致性来自指令，而非记忆。

规范文档

如果你没有文档，你就是在赌博。

文档	目的
PRD	我们在构建什么以及为什么
设计系统

规则： 在提示词中引用文档。

markdown
附件的PRD是事实来源。不要与之矛盾。
必须严格遵守 /docs/design.md 中的设计系统。

没有明确的锚定，模型会假设一切都是可变的——包括你的核心决策。

好的提示词不是写出更好的句子。而是将模型锚定在现实中。

完整模板

markdown

身份

你是一位[具体角色]，拥有[具体专业知识]。
[行为特征和风格]

上下文

规则（永不更改）

- [约束1]
[约束2]

当前状态

- [相关背景]

参考文档

- [文档1]: [包含内容]
[文档2]: [包含内容]

任务

[具体、可衡量的目标]

流程

1. 首先，[分析步骤]
然后，[规划步骤]
然后，[执行步骤]
最后，[验证步骤]

在每一步展示你的推理过程。

用户故事

1. 作为[用户]，我想要[目标]，以便[利益]
作为[用户]，我想要[目标]，以便[利益]

输出格式

[交付物的精确规范]

约束

- [限制1]
[限制2]
[不要做什么]

错误处理

- 如果[情况]: [行动]
如果受阻: [升级处理]

报告完成前

1. 检查每个用户故事
验证输出是否满足要求
如果不满足，迭代直到满足
只有这样才能报告完成

Ralph模式

适用于首次尝试往往失败的复杂任务：

markdown

模式：Ralph

持续尝试直到成功。不要在第一次失败时放弃。

如果出现问题：

1. 调试并理解原因
尝试不同的方法
研究其他人如何解决类似问题
迭代直到用户故事得到满足

在升级之前你有[N]次尝试机会。

何时使用：

- 包含多个组件的构建任务
集成工作
任何首次尝试不太可能成功的情况

Agent追踪

每个生成的agent都会被追踪。不留孤儿。

维护 notes/areas/active-agents.md：

markdown

当前运行中

标签	任务	生成时间	预期时间	状态
research-x	竞品分析	上午9:00	15分钟	🏃 运行中

今日已完成

标签	任务	运行时间	结果
builder-v2	仪表盘更新	8分钟	✅ 完成

心跳检查：

1. 运行 sessions_list --activeMinutes 120
与追踪文件对比

agent-orchestration智能代理编排