Postgres Egress Optimizer
Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.
Step 1: Diagnose
Identify which queries transfer the most data. The primary tool is the pg_stat_statements extension.
Check if pgstatstatements is available
CODEBLOCK0
If this errors, the extension needs to be created:
CODEBLOCK1
On Neon, it is available by default but may need this CREATE EXTENSION step.
Handle empty stats
Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:
- 1. Reset the stats to start a clean measurement window: INLINECODE1
- Let the application run under representative traffic for at least an hour.
- Return and run the diagnostic queries below.
If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.
Diagnostic queries
Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.
Queries returning the most total rows:
CODEBLOCK2
Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):
CODEBLOCK3
Most frequently called queries (candidates for caching):
CODEBLOCK4
Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):
CODEBLOCK5
Interpret the results
Rank findings by estimated egress impact:
- - High row count + wide rows = biggest egress. A query returning 1,000 rows where each row includes a 50KB JSONB column transfers ~50MB per call.
- Extreme call frequency on even small queries adds up. A query called 50,000 times/day returning 10 rows each = 500,000 rows/day.
- Cross-reference with the schema to identify which columns are wide. Look for JSONB, TEXT, BYTEA, and large VARCHAR columns.
Step 2: Analyze codebase
For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:
- - Does it select only the columns the response needs?
- Does it return a bounded number of rows (LIMIT/pagination)?
- Is it called frequently enough to benefit from caching?
- Does it fetch raw data that gets aggregated in application code?
- Does it use a JOIN that duplicates parent data across child rows?
Step 3: Fix
Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.
Unused columns (SELECT \*)
Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.
Before:
CODEBLOCK6
After:
CODEBLOCK7
Missing pagination
Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.
This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.
Before:
CODEBLOCK8
After:
CODEBLOCK9
When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.
High-frequency queries on static data
Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from pg_stat_statements — the code itself looks normal.
Look for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.
Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.
Application-side aggregation
Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.
Fix: Push the aggregation into SQL.
Before: The application fetches entire tables and aggregates in code with loops or .reduce().
After:
CODEBLOCK10
JOIN duplication
Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.
This is distinct from the SELECT \* problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.
Before:
CODEBLOCK11
After (two separate queries):
CODEBLOCK12
Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.
Step 4: Verify
After applying fixes:
- 1. Run existing tests to confirm nothing broke.
- Check the responses — make sure the API still returns the same data shape. Column selection and pagination changes can break clients that depend on specific fields or full result sets.
- Measure the improvement — if pgstatstatements data is available, reset it (
SELECT pg_stat_statements_reset();), let traffic run, then re-run the diagnostic queries to compare before and after.
Further reading
- - https://neon.com/docs/introduction/network-transfer.md
- https://neon.com/docs/introduction/cost-optimization.md
Postgres 出口流量优化器
引导用户诊断并修复导致 Postgres 数据库产生过多数据传输(出口流量)的应用端查询模式。大多数高额出口流量账单源于应用获取的数据远超其实际使用量。
第 1 步:诊断
识别哪些查询传输的数据量最大。主要工具是 pgstatstatements 扩展。
检查 pgstatstatements 是否可用
sql
SELECT 1 FROM pgstatstatements LIMIT 1;
如果报错,则需要创建该扩展:
sql
CREATE EXTENSION IF NOT EXISTS pgstatstatements;
在 Neon 上,该扩展默认可用,但仍可能需要执行此 CREATE EXTENSION 步骤。
处理空统计信息
当 Neon 计算节点缩容至零并重启时,统计信息会被清除。如果统计信息为空或计算节点刚刚唤醒:
- 1. 重置统计信息以开启新的测量窗口:SELECT pgstatstatements_reset();
- 让应用在代表性流量下运行至少一小时。
- 返回并执行以下诊断查询。
如果用户拥有生产数据库的统计信息,请直接使用。如果无法获取生产统计信息,请跳至第 2 步直接分析代码库——代码层面的模式通常足以识别出最严重的问题。
诊断查询
执行以下查询以识别最大的出口流量贡献者。重点关注返回行数多、返回宽行(JSONB、TEXT、BYTEA 列)或调用频率极高的查询。
返回总行数最多的查询:
sql
SELECT query, calls, rows AS totalrows, rows / calls AS avgrowspercall
FROM pgstatstatements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;
每次执行返回行数最多的查询(范围不当的 SELECT,缺少分页):
sql
SELECT query, calls, rows AS totalrows, rows / calls AS avgrowspercall
FROM pgstatstatements
WHERE calls > 0
ORDER BY avgrowsper_call DESC
LIMIT 10;
调用频率最高的查询(适合缓存的候选者):
sql
SELECT query, calls, rows AS totalrows, rows / calls AS avgrowspercall
FROM pgstatstatements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;
运行时间最长的查询(并非直接衡量出口流量的指标,但有助于在流量激增时识别问题查询):
sql
SELECT query, calls, rows AS total_rows,
round(totalexectime::numeric, 2) AS totalexectime_ms
FROM pgstatstatements
WHERE calls > 0
ORDER BY totalexectime DESC
LIMIT 10;
解读结果
按预估的出口流量影响对发现的问题进行排序:
- - 高行数 + 宽行 = 最大的出口流量。一个返回 1000 行且每行包含 50KB JSONB 列的查询,每次调用传输约 50MB。
- 极高的调用频率,即使是小查询也会累积。一个每天调用 50,000 次、每次返回 10 行的查询 = 每天 500,000 行。
- 与模式交叉引用以识别哪些列是宽列。查找 JSONB、TEXT、BYTEA 和大 VARCHAR 列。
第 2 步:分析代码库
对于第 1 步中识别的每个查询,或者如果无法获取统计信息,则对代码库中的每个数据库查询进行检查:
- - 是否只选择了响应所需的列?
- 是否返回有限的行数(LIMIT/分页)?
- 调用频率是否足够高,适合使用缓存?
- 是否获取了原始数据,然后在应用代码中进行聚合?
- 是否使用了 JOIN,导致父数据在子行中重复?
第 3 步:修复
针对发现的每个问题应用适当的修复。以下是最常见的出口流量反模式及其修复方法。
未使用的列(SELECT *)
问题: 查询获取所有列,但应用只使用其中少数几列。大列(JSONB 大对象、TEXT 字段)通过网络传输后被丢弃。
修改前:
sql
SELECT * FROM products;
修改后:
sql
SELECT id, name, price, image_urls FROM products;
缺少分页
问题: 列表端点返回所有行,没有 LIMIT 限制。这是一个无限制的出口流量风险——表中的每一行新数据都会增加每次请求的数据传输量。无论当前表大小如何,都应标记此问题。
这很容易被忽略,因为应用在小数据集下可能运行良好。但在规模扩大后,一个未分页的端点返回 10,000 行,即使列宽适中,每天也可能传输数百兆字节。
修改前:
sql
SELECT id, name, price FROM products;
修改后:
sql
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;
添加分页时,检查消费客户端是否已支持分页响应。如果不支持,请选择合理的默认值,并在 API 中记录分页参数。
静态数据的高频查询
问题: 一个查询每天被调用数千次,但返回的数据很少变化。每次调用都从数据库传输相同的行。这种模式只能通过 pgstatstatements 发现——代码本身看起来很正常。
查找相对于其他查询调用次数极高的查询。常见示例:配置表、分类列表、功能开关、用户角色定义。
修复: 在应用和数据库之间添加缓存层,避免每次请求都访问数据库。
应用端聚合
问题: 应用从表中获取所有行,然后在应用代码中计算聚合结果(平均值、计数、总和、分组)。尽管最终结果是一个小摘要,但完整数据集仍通过网络传输。
修复: 将聚合操作推入 SQL。
修改前: 应用获取整个表,并在代码中使用循环或 .reduce() 进行聚合。
修改后:
sql
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;
JOIN 重复
问题: 宽父表与子表之间的 JOIN 会在每个子行中重复所有父列。如果一个产品有 200 条评论,且产品行包含一个 50KB 的 JSONB 列,那么该 JOIN 会发送 50KB × 200 = 约 10MB 的数据,仅用于单个请求。
这与 SELECT * 问题不同。即使只选择需要的列,JOIN 仍会为每个子行重复父数据。修复方法是结构性的:完全避免 JOIN。
修改前:
sql
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;
修改后(两个独立查询):
sql
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, username, rating, body FROM reviews WHERE productid = 1;
使用两个查询代替一个 JOIN。产品数据获取一次。评论数据获取一次。没有重复。
第 4 步:验证
应用修复后:
- 1. 运行现有测试以确认没有破坏任何功能。
- 检查响应——确保 API 仍然返回相同的数据结构。列选择和分页更改可能会破坏依赖特定字段或完整结果集的客户端。
- 衡量改进效果——如果 pgstatstatements 数据可用,重置它(SELECT pgstatstatements_reset();),让流量运行,然后重新运行诊断查询以比较前后差异。
延伸阅读
- - https://neon.com/docs/introduction/network-transfer.md
- https://neon.com/docs/introduction/cost-optimization.md