Chaos Engineering
Structured guidance for chaos engineering (fault injection, game days): confirm triggers, propose the stages below, and adapt if the user wants a lighter pass.
When to Offer This Workflow
Trigger conditions:
- - User mentions chaos engineering or closely related work
- They want a structured workflow rather than ad-hoc tips
- They are preparing a review, rollout, or stakeholder communication
Initial offer:
Explain the four stages briefly and ask whether to follow this workflow or work freeform. If they decline, continue in their preferred style.
Workflow Stages
Stage 1: Clarify context & goals
Anchor on blast radius control. Ask what success looks like, constraints, and what must not break. Capture unknowns early.
Stage 2: Design or plan the approach
Translate goals into a concrete plan around hypotheses and abort criteria. Compare alternatives and explicit trade-offs; avoid implicit assumptions.
Stage 3: Implement, validate, and harden
Execute with verification loops tied to observability during faults. Prefer small steps, measurable checks, and rollback points where risk is high.
Stage 4: Operate, communicate, and iterate
Close the loop with learning loop and fixes: monitoring, documentation, stakeholder updates, and lessons learned for the next cycle.
Checklist Before Completion
- - Goals and constraints are explicit for chaos engineering
- Risks and trade-offs are stated, not hand-waved
- Verification steps match the change’s impact (tests, canary, peer review)
- Operational follow-through is covered (monitoring, docs, owners)
Tips for Effective Guidance
- - Be procedural: stage-by-stage, with clear exit criteria
- Ask for missing context (environment, scale, deadlines) before prescribing
- Prefer checklists and concrete examples over generic platitudes
- If the user declines the workflow, switch to freeform help without lecturing
Handling Deviations
- - If the user wants to skip a stage: confirm and continue with what they need.
- If context is missing: ask targeted questions before strong recommendations.
- Prefer concrete examples, trade-offs, and verification steps over generic advice.
Quality Bar
- - Each recommendation should be actionable (what to do next).
- Call out failure modes relevant to chaos experiments (security, scale, UX, or ops).
- Keep tone direct and respectful of the user’s time.
混沌工程
针对混沌工程(故障注入、游戏日)的结构化指导:确认触发条件,提出以下阶段,若用户希望简化流程则进行调整。
何时提供此工作流程
触发条件:
- - 用户提及混沌工程或密切相关的工作
- 用户需要结构化工作流程而非临时建议
- 用户正在准备评审、发布或利益相关方沟通
初始提议:
简要说明四个阶段,询问是遵循此工作流程还是自由进行。若用户拒绝,则按其偏好的风格继续。
工作流程阶段
第一阶段:明确背景与目标
以爆炸半径控制为核心。询问成功的标准、约束条件以及哪些内容绝对不能出问题。尽早识别未知因素。
第二阶段:设计方案或规划路径
将目标转化为围绕假设与中止条件的具体计划。比较备选方案并明确权衡取舍;避免隐含假设。
第三阶段:实施、验证与加固
通过关联故障期间可观测性的验证循环执行。优先采用小步骤、可衡量的检查以及高风险场景下的回滚点。
第四阶段:运营、沟通与迭代
通过学习循环与修复形成闭环:监控、文档记录、利益相关方更新以及为下一周期积累的经验教训。
完成前检查清单
- - 混沌工程的目标与约束条件已明确
- 风险与权衡已陈述,而非含糊带过
- 验证步骤与变更影响相匹配(测试、灰度发布、同行评审)
- 运营跟进已覆盖(监控、文档、负责人)
有效指导技巧
- - 遵循流程化:分阶段进行,并设置明确的退出标准
- 在给出建议前,先询问缺失的背景信息(环境、规模、截止日期)
- 优先使用检查清单和具体示例,而非泛泛而谈
- 若用户拒绝工作流程,则切换为自由帮助模式,避免说教
异常处理
- - 若用户希望跳过某个阶段:确认后按其需求继续。
- 若背景信息缺失:在给出强烈建议前先提出针对性问题。
- 优先提供具体示例、权衡取舍和验证步骤,而非通用建议。
质量标准
- - 每条建议应可执行(明确下一步行动)。
- 指出与混沌实验相关的故障模式(安全、规模、用户体验或运维)。
- 保持语气直接,尊重用户时间。