neon-postgres-egress-optimizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePostgres Egress Optimizer
Postgres出口流量优化器
Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.
引导用户诊断并修复应用端查询模式导致的Postgres数据库过度数据传输(出口流量)问题。大多数高额出口流量账单源于应用程序获取的数据远超其实际使用量。
Step 1: Diagnose
步骤1:诊断
Identify which queries transfer the most data. The primary tool is the extension.
pg_stat_statements找出传输数据量最大的查询。主要工具是扩展。
pg_stat_statementsCheck if pg_stat_statements is available
检查pg_stat_statements是否可用
sql
SELECT 1 FROM pg_stat_statements LIMIT 1;If this errors, the extension needs to be created:
sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;On Neon, it is available by default but may need this CREATE EXTENSION step.
sql
SELECT 1 FROM pg_stat_statements LIMIT 1;若执行报错,则需创建该扩展:
sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;在Neon平台上,该扩展默认可用,但可能仍需执行上述CREATE EXTENSION步骤。
Handle empty stats
处理空统计数据
Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:
- Reset the stats to start a clean measurement window:
SELECT pg_stat_statements_reset(); - Let the application run under representative traffic for at least an hour.
- Return and run the diagnostic queries below.
If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.
当Neon计算节点缩容至零并重启时,统计数据会被清空。若统计数据为空或计算节点刚启动:
- 重置统计数据以开启一个干净的测量窗口:
SELECT pg_stat_statements_reset(); - 让应用程序在有代表性的流量下运行至少一小时。
- 返回并运行下方的诊断查询。
如果用户有生产数据库的统计数据,可直接使用。若无法获取生产环境统计数据,则直接进入步骤2,分析代码库即可——代码层面的模式通常足以找出最严重的问题。
Diagnostic queries
诊断查询
Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.
Queries returning the most total rows:
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;Most frequently called queries (candidates for caching):
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):
sql
SELECT query, calls, rows AS total_rows,
round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;运行以下查询以找出主要的出口流量贡献者。重点关注返回大量行、返回宽行(JSONB、TEXT、BYTEA列)或调用频率极高的查询。
返回总行数最多的查询:
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;每次执行返回行数最多的查询(范围不当的SELECT语句、缺少分页):
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;调用频率最高的查询(适合缓存的候选对象):
sql
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;运行时间最长的查询(并非直接的出口流量衡量指标,但有助于找出流量激增期间的问题查询):
sql
SELECT query, calls, rows AS total_rows,
round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;Interpret the results
解读结果
Rank findings by estimated egress impact:
- High row count + wide rows = biggest egress. A query returning 1,000 rows where each row includes a 50KB JSONB column transfers ~50MB per call.
- Extreme call frequency on even small queries adds up. A query called 50,000 times/day returning 10 rows each = 500,000 rows/day.
- Cross-reference with the schema to identify which columns are wide. Look for JSONB, TEXT, BYTEA, and large VARCHAR columns.
按预估的出口流量影响程度对发现的问题排序:
- 高行数 + 宽行 = 最大的出口流量。一个返回1000行且每行包含50KB JSONB列的查询,每次调用会传输约50MB数据。
- 极高的调用频率:即使是小查询,累积起来也会产生大量流量。每天调用50000次、每次返回10行的查询,每天会传输500000行数据。
- 与表结构交叉验证:找出哪些列是宽列,重点关注JSONB、TEXT、BYTEA和大VARCHAR列。
Step 2: Analyze codebase
步骤2:分析代码库
For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:
- Does it select only the columns the response needs?
- Does it return a bounded number of rows (LIMIT/pagination)?
- Is it called frequently enough to benefit from caching?
- Does it fetch raw data that gets aggregated in application code?
- Does it use a JOIN that duplicates parent data across child rows?
针对步骤1中识别出的每个查询,或在无统计数据的情况下针对代码库中的每个数据库查询,检查以下几点:
- 它是否仅选择了响应所需的列?
- 它是否返回有限数量的行(使用LIMIT/分页)?
- 它的调用频率是否足够高,值得进行缓存?
- 它是否获取了会在应用代码中进行聚合的原始数据?
- 它是否使用了会在子行中重复父数据的JOIN操作?
Step 3: Fix
步骤3:修复
Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.
为每个发现的问题应用相应的修复方案。以下是最常见的出口流量反模式及其修复方法。
Unused columns (SELECT *)
未使用列(SELECT *)
Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.
Before:
sql
SELECT * FROM products;After:
sql
SELECT id, name, price, image_urls FROM products;问题:查询获取了所有列,但应用程序仅使用其中少数几列。大列(JSONB blob、TEXT字段)会通过网络传输,随后被丢弃。
修复前:
sql
SELECT * FROM products;修复后:
sql
SELECT id, name, price, image_urls FROM products;Missing pagination
缺少分页
Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.
This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.
Before:
sql
SELECT id, name, price FROM products;After:
sql
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.
问题:列表接口返回所有行而未使用LIMIT。这是一种无界的出口流量风险——表中每新增一行,每次请求的数据传输量都会增加。无论当前表的大小如何,都要标记这个问题。
这个问题很容易被忽略,因为在数据集较小时应用程序可能运行正常。但在规模扩大后,一个未分页的接口返回10000行中等宽度的列,每天可能会传输数百兆字节的数据。
修复前:
sql
SELECT id, name, price FROM products;修复后:
sql
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;添加分页时,检查消费客户端是否已支持分页响应。如果不支持,选择合理的默认值,并在API中记录分页参数。
High-frequency queries on static data
静态数据的高频查询
Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from — the code itself looks normal.
pg_stat_statementsLook for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.
Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.
问题:某个查询每天被调用数千次,但返回的数据很少变化。每次调用都会从数据库传输相同的行。这种模式只能通过发现——代码本身看起来并无异常。
pg_stat_statements寻找相对于其他查询调用次数极高的查询。常见示例:配置表、分类列表、功能标志、用户角色定义。
修复方案:在应用程序和数据库之间添加缓存层,避免每次请求都访问数据库。
Application-side aggregation
应用端聚合
Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.
Fix: Push the aggregation into SQL.
Before: The application fetches entire tables and aggregates in code with loops or .
.reduce()After:
sql
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;问题:应用程序获取表中的所有行,然后在应用代码中计算聚合值(平均值、计数、求和、分组)。完整的数据集会通过网络传输,而最终结果只是一个小的汇总信息。
修复方案:将聚合操作推送到SQL中执行。
修复前:应用程序获取整个表,然后通过循环或在代码中进行聚合。
.reduce()修复后:
sql
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;JOIN duplication
JOIN重复数据
Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.
This is distinct from the SELECT * problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.
Before:
sql
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;After (two separate queries):
sql
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.
问题:宽父表与子表之间的JOIN操作会将所有父列在每个子行中重复。如果一个产品有200条评论,且产品行包含一个50KB的JSONB列,那么这个JOIN操作在一次请求中会发送50KB × 200 = 约10MB的数据。
这与SELECT *的问题不同。即使只选择所需的列,JOIN操作仍会在每个子行中重复父数据。修复方案是结构性的:完全避免使用JOIN。
修复前:
sql
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;修复后(两个独立查询):
sql
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;用两个查询替代一个JOIN。产品数据获取一次,评论数据获取一次,无重复数据。
Step 4: Verify
步骤4:验证
After applying fixes:
- Run existing tests to confirm nothing broke.
- Check the responses — make sure the API still returns the same data shape. Column selection and pagination changes can break clients that depend on specific fields or full result sets.
- Measure the improvement — if pg_stat_statements data is available, reset it (), let traffic run, then re-run the diagnostic queries to compare before and after.
SELECT pg_stat_statements_reset();
应用修复后:
- 运行现有测试,确认没有功能损坏。
- 检查响应——确保API仍返回相同的数据结构。列选择和分页的更改可能会破坏依赖特定字段或完整结果集的客户端。
- 衡量改进效果——如果有pg_stat_statements数据,重置它(),让流量运行一段时间,然后重新运行诊断查询,对比修复前后的结果。
SELECT pg_stat_statements_reset();