SQL Optimization (Deep Workflow)

Optimization without measurement is guesswork. Structure the work as observe → explain (plan) → change → verify, with explicit attention to correctness, locks, and write amplification from indexes.

When to Offer This Workflow

Trigger conditions:

- Slow queries, growing P95/P99, replication lag, lock waits
ORM-generated SQL surprises; N+1 at DB layer
Index explosion, bloat, or “we added indexes everywhere”

Initial offer:

Use six stages: (1) frame the problem, (2) reproduce & measure, (3) read execution plans, (4) schema & indexes, (5) query & transaction tuning, (6) verify & guardrail. Confirm engine (PostgreSQL, MySQL, SQL Server, etc.) and environment (prod-like data volume).

Stage 1: Frame the Problem

Goal: Define SLO, scope, and non-goals.

Questions

1. Which queries or endpoints are slow? User-facing vs batch?
Regression—did deploy, data volume, or stats change?
Isolation level and consistency requirements—can we read replicas?
Write risk: is this table write-heavy? Index cost?

Exit condition: One-line problem statement with metric (e.g., “p95 2.4s on /reports at 10k RPS”).

Stage 2: Reproduce & Measure

Goal: Stable repro with representative cardinality and parameters.

Actions

- Capture exact SQL, parameters, and frequency
Use EXPLAIN (ANALYZE, BUFFERS) or equivalent—engine-specific
Check buffer cache effects: cold vs warm cache; run twice when needed
Compare prod stats vs staging—row counts, histograms

Pitfalls

- Optimizing on empty dev DB
Different parameter sniffing values changing plan choice

Exit condition: Baseline numbers + plan hash or saved plan for A/B.

Stage 3: Read Execution Plans

Goal: Name the dominant cost: seq scan, bad join order, sort, hash spill, nested loop explosion.

Interpret (adapt to engine)

- Seq scan on large tables—filter selectivity? missing index? stats?
Index scan vs bitmap vs index only—covering indexes trade-offs
Joins: wrong order, missing stats, outdated NDV
Sort/hash spills to disk—work_mem / memory grants
Locks: FOR UPDATE, long transactions, hot row updates

Exit condition: Hypothesis tied to plan node(s), not generic “add index.”

Stage 4: Schema & Indexes

Goal: Right indexes for read paths without destroying writes.

Strategy

- Composite index column order: equality → range; avoid redundant indexes
Partial indexes for hot subsets
Covering indexes vs table bloat—measure write cost
Foreign keys and constraints affecting plans
Statistics: ANALYZE, extended stats, histograms—when stale stats lie

Advanced (when relevant)

- Partitioning for prune + maintenance
Materialized views / pre-aggregation for heavy reports

Exit condition: DDL proposal with rationale and rollback (drop index concurrently if supported).

Stage 5: Query & Transaction Tuning

Goal: Sometimes the fix is SQL rewrite, not hardware.

Techniques

- Reduce rows touched early (CTEs vs inline—engine-dependent)
Pagination without OFFSET on huge pages (keyset)
Batch vs row-by-row; UNION ALL vs OR
N+1: batch queries, joins, data loader patterns
Transactions: shorten locks; avoid unnecessary INLINECODE3
ORM: eager vs lazy loading discipline

Exit condition: New plan shows lower cost / measured latency; lock time acceptable.

Stage 6: Verify & Guardrail

Goal: Improvement holds under load and doesn’t regress neighbors.

Verify

- Re-run EXPLAIN ANALYZE with production-like parameters
Load test or shadow traffic if available
Monitor: buffer hit ratio, index bloat, replication lag

Guardrails

- Query timeouts and statement_timeout where safe
Alerts on sequential scans on large tables if observability supports

Final Review Checklist

- [ ] Baseline and target metrics documented
[ ] Plan-based root cause, not guesswork
[ ] Index/DDL changes justified vs write load
[ ] Transaction/lock behavior considered
[ ] Verification on realistic data and load

Tips for Effective Guidance

- Always mention parameter sniffing and stale statistics as frequent culprits.
Warn when adding indexes on very write-heavy tables without measuring bloat.
Prefer keyset pagination education for large lists.

Handling Deviations

- No EXPLAIN access: infer from symptoms + ORM logs + index list; recommend safe staging repro.
Vendor DB: name that hints and features differ—avoid PostgreSQL-only advice on SQL Server without caveat.

SQL优化（深度工作流）

没有测量的优化只是猜测。将工作结构化为观察 → 解释（计划）→ 变更 → 验证，并明确关注正确性、锁以及索引带来的写放大。

何时提供此工作流

触发条件：

- 慢查询、P95/P99持续增长、复制延迟、锁等待
ORM生成的SQL意外情况；数据库层的N+1问题
索引爆炸、膨胀，或我们在所有地方都加了索引

初始提供：

使用六个阶段：（1）界定问题，（2）复现与测量，（3）读取执行计划，（4）模式与索引，（5）查询与事务调优，（6）验证与防护。确认数据库引擎（PostgreSQL、MySQL、SQL Server等）和环境（接近生产的数据量）。

阶段1：界定问题

目标： 定义SLO、范围和非目标。

问题

1. 哪些查询或端点较慢？面向用户还是批处理？
回归——部署、数据量或统计信息是否发生变化？
隔离级别和一致性要求——能否读取副本？
写入风险：该表写入密集吗？索引成本？

退出条件： 包含指标的一行问题陈述（例如：/reports在10k RPS下P95为2.4秒）。

阶段2：复现与测量

目标： 使用具有代表性的基数和参数进行稳定复现。

操作

- 捕获精确SQL、参数和频率
使用EXPLAIN（ANALYZE, BUFFERS）或等效命令——引擎相关
检查缓冲缓存影响：冷缓存与热缓存；必要时运行两次
比较生产统计信息与测试环境——行数、直方图

陷阱

- 在空的开发数据库上优化
不同的参数嗅探值改变计划选择

退出条件： 基准数值 + 计划哈希或保存的计划用于A/B对比。

阶段3：读取执行计划

目标： 指出主导成本：顺序扫描、糟糕的连接顺序、排序、哈希溢出、嵌套循环爆炸。

解读（适配引擎）

- 大表上的顺序扫描——过滤选择性？缺少索引？统计信息？
索引扫描 vs 位图扫描 vs 仅索引扫描——覆盖索引的权衡
连接：错误顺序、缺少统计信息、过时的NDV
排序/哈希溢出到磁盘——work_mem/内存分配
锁：FOR UPDATE、长事务、热行更新

退出条件： 与计划节点相关的假设，而非泛泛的加索引。

阶段4：模式与索引

目标： 为读取路径建立正确的索引，同时不破坏写入性能。

策略

- 复合索引列顺序：等值 → 范围；避免冗余索引
针对热点子集的部分索引
覆盖索引与表膨胀——测量写入成本
影响计划的外键和约束
统计信息：ANALYZE、扩展统计信息、直方图——当过时统计信息误导时

高级（相关时）

- 分区用于裁剪+维护
物化视图/预聚合用于重型报表

退出条件： 包含理由和回滚方案的DDL建议（如果支持，使用并发方式删除索引）。

阶段5：查询与事务调优

目标： 有时修复方法是SQL重写，而非硬件升级。

技术

- 尽早减少触及行数（CTE与内联——引擎相关）
大页面上的分页不使用OFFSET（键集分页）
批量 vs 逐行处理；UNION ALL vs OR
N+1：批量查询、连接、数据加载器模式
事务：缩短锁时间；避免不必要的SELECT FOR UPDATE
ORM：急加载与懒加载的规范

退出条件： 新计划显示更低成本/测量延迟；锁时间可接受。

阶段6：验证与防护

目标： 改进在负载下保持有效，且不使相邻查询退化。

验证

- 使用接近生产的参数重新运行EXPLAIN ANALYZE
如果可用，进行负载测试或影子流量测试
监控：缓冲区命中率、索引膨胀、复制延迟

防护

- 在安全的地方设置查询超时和statement_timeout
如果可观测性支持，对大表上的顺序扫描设置告警

最终审查清单

- [ ] 记录基准和目标指标
[ ] 基于计划的根本原因，而非猜测
[ ] 索引/DDL变更相对于写入负载的合理性
[ ] 考虑事务/锁行为
[ ] 在真实数据和负载下进行验证

有效指导技巧

- 始终提及参数嗅探和过时统计信息是常见元凶。
在未测量膨胀的情况下，警告在写入密集的表上添加索引。
对于大列表，优先推荐键集分页教育。

处理偏差

- 无法访问EXPLAIN：从症状+ORM日志+索引列表推断；推荐在安全测试环境复现。
商业数据库：指出提示和功能差异——避免在SQL Server上给出仅适用于PostgreSQL的建议而不加说明。

sql-optimizationSQL优化