Prompt Engineering (Deep Workflow)

Prompts behave like natural-language programs: they need specs, tests, and version control—especially in production.

When to Offer This Workflow

Trigger conditions:

- Prompt or system message change; quality regressions
Structured outputs (JSON), tool use, or RAG grounding requirements
Safety or policy alignment needs

Initial offer:

Use six stages: (1) define task & success, (2) constraints & format, (3) few-shot & style, (4) build eval set, (5) iterate with discipline, (6) ship, monitor, regress). Confirm model family and latency budget.

Stage 1: Define Task & Success

Goal: Clear user-visible outcome and failure modes (hallucination, omission, tone).

Exit condition: Success rubric in plain language; out-of-scope cases listed.

Stage 2: Constraints & Format

Goal: Must/must-not rules; output schema (JSON Schema, bullet structure); length limits.

Practices

- Separate system (policy, role) from user (task instance)
Ask model to cite sources when grounding matters

Stage 3: Few-Shot & Style

Goal: Use examples only when they reduce ambiguity—avoid huge prompt bloat.

Practices

- Diverse examples; avoid overlong negative examples that confuse

Stage 4: Build Eval Set

Goal: Frozen inputs with expected properties (not always exact text match).

Practices

- Adversarial and multilingual slices if relevant
Regression suite in CI for critical prompts

Stage 5: Iterate With Discipline

Goal: Change one major variable at a time when debugging quality.

Practices

- Compare with same temperature settings when A/B testing wording
Log prompt version id with outputs in production

Stage 6: Ship, Monitor, Regress

Goal: Canary prompt changes; watch implicit signals (thumbs, edits, task completion).

Final Review Checklist

- [ ] Task and rubric defined
[ ] Constraints and output format explicit
[ ] Eval set versioned; regression path exists
[ ] Iteration log disciplined; prompt versions tracked
[ ] Production monitoring and rollback plan

Tips for Effective Guidance

- Clarity beats cleverness—short explicit instructions often win.
Chain-of-thought: use when reasoning helps; hide chain from end users if needed.
Align with llm-evaluation skill for larger harness design.

Handling Deviations

- Chat vs batch: batch can use stricter structure and lower temperature.
Multimodal: specify how image details may be used or ignored.

提示工程（深度工作流）

提示词的行为类似于自然语言程序：它们需要规格说明、测试和版本控制——尤其是在生产环境中。

何时提供此工作流

触发条件：

- 提示词或系统消息变更；质量回退
结构化输出（JSON）、工具调用或RAG锚定需求
安全性或策略对齐需求

初始提供：

使用六个阶段：（1）定义任务与成功标准，（2）约束条件与格式，（3）少样本与风格，（4）构建评估集，（5）规范迭代，（6）发布、监控与回退。确认模型系列和延迟预算。

阶段1：定义任务与成功标准

目标： 明确的用户可见结果和失败模式（幻觉、遗漏、语气）。

退出条件： 用通俗语言描述的成功评估标准；列出超出范围的情况。

阶段2：约束条件与格式

目标： 必须/禁止规则；输出模式（JSON Schema、列表结构）；长度限制。

实践方法

- 将系统（策略、角色）与用户（任务实例）分开
在需要锚定事实时要求模型引用来源

阶段3：少样本与风格

目标： 仅在能减少歧义时使用示例——避免提示词过度膨胀。

实践方法

- 多样化的示例；避免过长的负面示例造成混淆

阶段4：构建评估集

目标： 具有预期属性的固定输入（不总是精确文本匹配）。

实践方法

- 相关时包含对抗性和多语言切片
对关键提示词在CI中设置回归测试套件

阶段5：规范迭代

目标： 调试质量问题时每次只改变一个主要变量。

实践方法

- A/B测试措辞时使用相同的温度设置进行比较
在生产环境中记录输出对应的提示词版本ID

阶段6：发布、监控与回退

目标： 金丝雀式提示词变更；监控隐式信号（点赞、编辑、任务完成）。

最终审查清单

- [ ] 任务和评估标准已定义
[ ] 约束条件和输出格式已明确
[ ] 评估集已版本化；存在回归路径
[ ] 迭代日志规范；提示词版本已追踪
[ ] 生产监控和回滚计划已就绪

有效指导的技巧

- 清晰胜过巧妙——简短明确的指令往往更有效。
思维链：在推理有帮助时使用；必要时对最终用户隐藏思维链。
与大语言模型评估技能对齐，用于更大规模的测试框架设计。

处理偏差

- 对话 vs 批处理：批处理可以使用更严格的结构和更低的温度。
多模态：明确说明图像细节可能如何使用或被忽略。

prompts提示工程工作流

prompts

Prompt Engineering (Deep Workflow)

When to Offer This Workflow

Stage 1: Define Task & Success

Stage 2: Constraints & Format

Practices

Stage 3: Few-Shot & Style

Practices

Stage 4: Build Eval Set

Practices

Stage 5: Iterate With Discipline

Practices

Stage 6: Ship, Monitor, Regress

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

提示工程（深度工作流）

何时提供此工作流

阶段1：定义任务与成功标准

阶段2：约束条件与格式

实践方法

阶段3：少样本与风格

实践方法

阶段4：构建评估集

实践方法

阶段5：规范迭代

实践方法

阶段6：发布、监控与回退

最终审查清单

有效指导的技巧

处理偏差

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

prompts提示工程工作流

prompts

Prompt Engineering (Deep Workflow)

When to Offer This Workflow

Stage 1: Define Task & Success

Stage 2: Constraints & Format

Practices

Stage 3: Few-Shot & Style

Practices

Stage 4: Build Eval Set

Practices

Stage 5: Iterate With Discipline

Practices

Stage 6: Ship, Monitor, Regress

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

提示工程（深度工作流）

何时提供此工作流

阶段1：定义任务与成功标准

阶段2：约束条件与格式

实践方法

阶段3：少样本与风格

实践方法

阶段4：构建评估集

实践方法

阶段5：规范迭代

实践方法

阶段6：发布、监控与回退

最终审查清单

有效指导的技巧

处理偏差

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement