Performance Tuning (Deep Workflow)
Performance work is measurement-driven. Profile before optimizing; verify after changes; guard against regressions with benchmarks or production metrics.
When to Offer This Workflow
Trigger conditions:
- - High CPU, memory, p99 latency, GC pauses
- Cost reduction via efficiency
- Premature optimization requests—need evidence first
Initial offer:
Use six stages: (1) frame goals & SLOs, (2) measure baseline, (3) profile & hypothesize, (4) implement changes, (5) verify & compare, (6) prevent regression). Confirm language/runtime and environment (prod-like data volume).
Stage 1: Frame Goals & SLOs
Goal: Numeric targets: p95 latency, throughput, max memory—not “faster.”
Questions
- 1. Which workloads matter most (batch vs interactive)?
- Correctness constraints (approximation allowed or not)?
- Cost budget for hardware vs engineering time?
Exit condition: One-page success criteria and out-of-scope areas.
Stage 2: Measure Baseline
Goal: Reproducible benchmark or RUM segment—same inputs, same conditions.
Practices
- - Warm caches when prod is always warm
- Statistical repeat (multiple runs, discard outliers methodology)
Exit condition: Baseline numbers + environment fingerprint (versions, flags).
Stage 3: Profile & Hypothesize
Goal: Find dominant cost: CPU bound, I/O bound, lock contention, allocation rate.
Tools (examples)
- - CPU flame graphs; async wait profiling
- Alloc profiling for GC pressure
- DB query plans and lock waits
Exit condition: Hypothesis tied to evidence (e.g., “40% time in JSON parse”).
Stage 4: Implement Changes
Goal: Smallest change that addresses the hotspot; avoid clever without proof.
Levers
- - Algorithm / data structure
- Caching with invalidation discipline
- Batching I/O; connection pooling
- Parallelism where safe—watch locks
Stage 5: Verify & Compare
Goal: A/B or before/after with same workload; watch tail latency not only mean.
Production
- - Canary with error rate and latency gates
Stage 6: Prevent Regression
Goal: Micro-benchmarks in CI (optional), budgets, or synthetic checks.
Final Review Checklist
- - [ ] Goals and baseline documented
- [ ] Root cause supported by profiler/trace evidence
- [ ] Change scoped; trade-offs explicit
- [ ] Verification on realistic load
- [ ] Regression guard where feasible
Tips for Effective Guidance
- - Little’s Law intuition: queues blow latency—often fix concurrency before micro-opts.
- Avoid optimizing cold paths first.
- GC languages: allocation rate often is the enemy.
Handling Deviations
- - Embedded / mobile: battery and thermal constraints matter too.
- Distributed systems: local opt may hurt system (see load-testing).
性能调优(深度工作流)
性能工作以测量为驱动。优化前先分析;变更后进行验证;通过基准测试或生产指标防范性能回退。
何时提供此工作流
触发条件:
- - 高CPU、内存、p99延迟、GC暂停
- 通过效率提升实现成本降低
- 过早优化请求——需先提供证据
初始提供:
采用六个阶段:(1) 设定目标与SLO,(2) 测量基线,(3) 分析与假设,(4) 实施变更,(5) 验证与对比,(6) 防止回退)。确认语言/运行时和环境(类生产数据量)。
阶段1:设定目标与SLO
目标: 量化指标:p95延迟、吞吐量、最大内存——而非“更快”。
问题
- 1. 哪些工作负载最重要(批处理 vs 交互式)?
- 正确性约束(是否允许近似)?
- 硬件成本预算 vs 工程时间?
退出条件: 一页的成功标准及范围外区域。
阶段2:测量基线
目标: 可复现的基准测试或RUM片段——相同输入、相同条件。
实践
- - 当生产环境始终热数据时预热缓存
- 统计重复(多次运行,剔除异常值方法)
退出条件: 基线数值 + 环境指纹(版本、标志)。
阶段3:分析与假设
目标: 找出主要成本:CPU密集型、I/O密集型、锁竞争、分配速率。
工具(示例)
- - CPU火焰图;异步等待分析
- 分配分析(针对GC压力)
- 数据库查询计划与锁等待
退出条件: 基于证据的假设(例如,“40%时间花在JSON解析”)。
阶段4:实施变更
目标: 针对热点进行最小变更;避免无证据的取巧。
杠杆
- - 算法/数据结构
- 带失效策略的缓存
- 批处理I/O;连接池
- 安全情况下的并行——注意锁
阶段5:验证与对比
目标: 在相同工作负载下进行A/B或前后对比;关注尾部延迟而非仅平均值。
生产环境
阶段6:防止回退
目标: CI中的微基准测试(可选)、预算或合成检查。
最终审查清单
- - [ ] 目标与基线已记录
- [ ] 根因有分析器/追踪证据支持
- [ ] 变更范围明确;权衡关系清晰
- [ ] 在真实负载下验证
- [ ] 在可行处设置回退防护
有效指导技巧
- - 利特尔法则直觉:队列会放大延迟——通常先修复并发问题再进行微优化。
- 避免优先优化冷路径。
- GC语言:分配速率往往是罪魁祸首。
处理偏差
- - 嵌入式/移动端:电池和散热约束同样重要。
- 分布式系统:局部优化可能损害整体系统(参见负载测试)。