Payment Incident Responder
Purpose
Help teams respond to payment incidents quickly and consistently:
- - detect and classify incident severity
- define immediate containment actions
- coordinate internal/external communication
- restore correctness via reconciliation and data repair
- produce postmortem action items
Disclaimer
This skill provides operational guidance only. It does not execute payments, reverse transactions, or replace legal/compliance decisions.
Use at your own risk. The skill author/publisher/developer is not liable for direct or indirect losses, fraud, penalties, downtime, or damages arising from use or misuse of this guidance.
Incident severity model
- -
P0: broad outage / incorrect success/failure state at scale - INLINECODE1 : major degradation with partial workaround
- INLINECODE2 : limited impact to specific cohorts or features
- INLINECODE3 : minor issue, low customer impact
Standard response workflow
- 1. Acknowledge and assign roles
- incident commander
- tech lead
- comms owner
- reconciliation owner
- 2. Establish blast radius
- affected methods/regions/providers
- impacted users/orders
- error signatures and trend window
- 3. Contain
- freeze risky deploys
- enable degraded mode messaging
- pause high-risk paths if needed
- 4. Diagnose
- check webhook pipeline, API errors, queue lag, provider status
- identify first failing component and triggering change
- 5. Mitigate and recover
- apply safe rollback/fix
- reconcile pending and mismatched states
- verify customer-facing correctness
- 6. Close and learn
- final incident summary
- postmortem with owner/due-date action items
Guardrails
- - Never communicate "resolved" before metrics and correctness checks pass.
- Never run blind retries that can create duplicate charges.
- Always include transaction reference IDs in customer/support comms.
- Keep all decisions time-stamped in incident log.
Output format
When invoked, return:
- 1. severity + current phase
- top 3 immediate actions
- customer impact summary
- next update time and owner
- reconciliation and correctness checklist
Setup
Read setup.md on first use.
Validation
Run validation-checklist.md for drills and live incidents.
References
技能名称: payment-incident-responder
详细描述:
支付事件响应者
目的
帮助团队快速且一致地响应支付事件:
- - 检测并分类事件严重程度
- 定义即时遏制措施
- 协调内外部沟通
- 通过对账和数据修复恢复正确性
- 生成事后行动项
免责声明
本技能仅提供操作指导。它不执行支付、撤销交易或替代法律/合规决策。
使用风险自负。技能作者/发布者/开发者不对因使用或误用本指导而产生的直接或间接损失、欺诈、罚款、停机或损害承担责任。
事件严重程度模型
- - P0:广泛中断 / 大规模出现不正确的成功/失败状态
- P1:重大降级,存在部分变通方案
- P2:对特定群体或功能影响有限
- P3:轻微问题,客户影响低
标准响应工作流程
- 1. 确认并分配角色
- 事件指挥官
- 技术负责人
- 沟通负责人
- 对账负责人
- 2. 确定影响范围
- 受影响的支付方式/地区/提供商
- 受影响的用户/订单
- 错误特征和趋势窗口
- 3. 遏制
- 冻结高风险部署
- 启用降级模式消息提示
- 必要时暂停高风险路径
- 4. 诊断
- 检查Webhook管道、API错误、队列延迟、提供商状态
- 识别首个故障组件及触发变更
- 5. 缓解与恢复
- 应用安全回滚/修复
- 对账待处理和不匹配状态
- 验证面向客户的正确性
- 6. 关闭与总结
- 最终事件摘要
- 带有负责人/截止日期行动项的事后分析
防护措施
- - 在指标和正确性检查通过之前,切勿宣布已解决。
- 切勿进行可能导致重复收费的盲目重试。
- 在客户/支持沟通中始终包含交易参考ID。
- 所有决策需在事件日志中记录时间戳。
输出格式
当被调用时,返回:
- 1. 严重程度 + 当前阶段
- 前3项即时行动
- 客户影响摘要
- 下次更新时间及负责人
- 对账与正确性检查清单
设置
首次使用前请阅读 setup.md。
验证
在演练和真实事件中运行 validation-checklist.md。
参考资料