sql-quality-check

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SQL quality check

SQL质量检查

The real work here is judging tradeoffs, not applying rules blindly. Anyone can memorize "don't use

SELECT *

"; the skill is knowing why it costs you, when it actually matters, and what you give up by changing it. Every recommendation has a price, and naming that price is what makes the advice trustworthy — a finding that only lists the downside of the current code, without the cost of the fix, is half an analysis. So for each thing you flag, you owe three things: the database-level impact, a corrected version, and the tradeoff the fix carries.

Three premises sit underneath every recommendation. Keep them in view:

There is no universally correct answer in SQL — every decision is a tradeoff. An index speeds up reads but slows writes and costs space. State the cost alongside the benefit, every time.
Data is written once and read many times. Because reads dominate, reads are what you optimize for. That's what justifies denormalizing frequently-read data and indexing the columns used in filters and ordering — you pay a little on the rare write to save a lot on the common read.
The root cause is usually the data model. The smells below are symptoms. When several show up together, the real problem is generally the schema or the API contract, not the individual query — say so, rather than patching each symptom in isolation.

All the smells carry equal weight: any one of them, on its own, can degrade performance severely under real data volume. There is no severity ranking — there is context analysis. A sequential scan on a tiny lookup table is fine; the same scan on a 50-million-row transactional table is a fire. Which is why you never judge from the SQL text alone.

这里的核心工作是判断权衡取舍，而非盲目套用规则。任何人都能记住“不要使用

SELECT *

”；真正的技巧在于理解它为何会产生成本、何时该重视这个问题，以及修改它需要付出什么代价。每一条建议都有相应的成本，明确说明这一成本才能让建议更可信——如果一个分析结果只列出当前代码的弊端，却不说明修复方案的成本，那只能算是不完整的分析。因此，对于每一个你标记的问题，都需要提供三方面内容：数据库层面的影响、修正后的版本，以及修复方案带来的权衡取舍。

每一条建议都基于以下三个前提，请时刻牢记：

SQL中没有通用的正确答案——每一个决策都是权衡取舍的结果。索引可以加速读取操作，但会减慢写入速度并占用存储空间。每次都要同时说明收益和成本。
数据写一次，读多次。由于读取操作占主导地位，优化的重点应放在读取上。这也是对频繁读取的数据进行反规范化、为过滤和排序所用的列创建索引的合理性所在——你在罕见的写入操作上付出一点代价，就能在频繁的读取操作上节省大量成本。
根本原因通常在于数据模型。下面列出的问题只是表象。当多个问题同时出现时，真正的问题通常在于Schema或API契约，而非单个查询——此时应明确指出这一点，而不是孤立地修复每个表象问题。

所有问题的重要性是均等的：在真实数据量下，任何一个问题都可能导致性能严重下降。这里没有严重程度排名——只有上下文分析。对小型查找表进行顺序扫描是没问题的；但对拥有5000万行数据的事务型表进行同样的扫描则是严重问题。这就是为什么不能仅凭SQL文本做出判断的原因。

How to work

工作流程

You may receive a SQL snippet, an ORM-generated query, a schema/modeling design, or a PR diff. Work it in this order:

Gather the context you need — don't assume it. The same query can be perfect or catastrophic depending on facts that aren't in the SQL: Is this table transactional (grows without bound over time)? Is a full scan intentional (a one-off admin job) or an accident? What volume is expected? Infer what you can from signals — a timestamp column and an ever-growing shape suggest a transactional table; a
```
LIMIT
```
suggests an interactive list. When a finding hinges on something you can't infer, ask rather than guessing. A confident recommendation built on a wrong assumption is worse than a question.
Scan for the smells. Read
```
references/smells.md
```
— the catalog of 11 recurring patterns, each with how to detect it, why it degrades systems, and the corrected form. Match against what you're given.
For each smell found, deliver the three things: the cause/impact in plain terms, the corrected version, and the tradeoff the correction carries. Tie symptoms back to the model when they cluster.
Validate with the execution plan. Don't stop at reading the SQL — the plan is the evidence.
```
references/modeling-and-plans.md
```
covers
```
EXPLAIN
```
/
```
EXPLAIN ANALYZE
```
, what to look for (sequential scans on big tables, sorts spilling to disk, whether an index is actually used), plus the modeling and indexing guidance the recommendations draw on. Request or suggest the plan for any query you're unsure about.

你可能会收到SQL片段、ORM生成的查询语句、Schema/建模设计或PR diff。请按照以下步骤处理：

收集所需上下文信息——切勿主观假设。同一个查询可能是完美的，也可能是灾难性的，这取决于SQL之外的实际情况：该表是否为事务型表（数据量随时间无限制增长）？全表扫描是有意操作（一次性的管理员任务）还是意外操作？预期数据量是多少？从现有线索中尽可能推断信息——比如包含时间戳列且数据量持续增长的表通常是事务型表；带有
```
LIMIT
```
的查询通常用于交互式列表。当分析结果依赖于无法推断的信息时，请询问而非猜测。基于错误假设给出的自信建议，还不如提出一个问题。
识别问题模式。阅读
```
references/smells.md
```
——其中包含11种常见问题模式，每种模式都说明了检测方法、导致系统性能下降的原因以及修正后的形式。将收到的内容与这些模式进行匹配。
针对每个发现的问题，提供三方面内容：用通俗易懂的语言说明原因/影响、修正后的版本，以及修复方案带来的权衡取舍。当多个问题同时出现时，要将这些表象问题与数据模型关联起来。
通过执行计划验证。不要仅停留在阅读SQL文本层面——执行计划是验证的依据。
```
references/modeling-and-plans.md
```
涵盖了
```
EXPLAIN
```
/
```
EXPLAIN ANALYZE
```
的使用方法、需要关注的内容（如对大表的顺序扫描、排序操作写入磁盘、索引是否实际被使用等），以及建议所依据的建模和索引指导原则。对于任何不确定的查询，都可以要求或建议提供执行计划。

The smell catalog (index)

问题模式目录（索引）

Full detail — impact, detection, and a to-avoid/preferred example for each — is in

references/smells.md

. The eleven patterns:

Excess joins / cartesian row explosion — 3+ joins, especially 1:N, multiplying result rows.
Function or cast on a filtered column — wrapping the column in an expression disables its index.
N+1 — one query per item in a loop instead of one query for all (read and write variants).
Query on a transactional table with no date filter — scans an ever-growing history.
Selecting all columns when few are used —
```
SELECT *
```
wastes I/O and blocks index-only scans.
Leading-wildcard text search —
```
LIKE '%term%'
```
can't use a B-tree index.
Count that reuses the listing query for pagination — runs the whole listing just to count.
High-OFFSET pagination — reads and discards N rows before returning the page.
IN
subquery and the
NOT IN
+ NULL trap —
```
EXISTS
```
/
```
NOT EXISTS
```
is usually safer and faster.
Unindexed sort / unbounded list —
```
ORDER BY
```
with no covering index, or no
```
LIMIT
```
.
Long transactions / unindexed mass writes — held locks, connection-pool exhaustion, bloat.

每个问题模式的详细信息——包括影响、检测方法以及反面/正面示例——都在

references/smells.md

中。这11种模式如下：

过多关联 / 笛卡尔积行爆炸 — 3次及以上关联，尤其是1:N关联，导致结果行数倍增。
过滤列上使用函数或类型转换 — 将列包裹在表达式中会使其索引失效。
N+1查询 — 循环中为每个条目执行一次查询，而非一次性查询所有条目（包含读取和写入两种变体）。
事务型表查询未加日期过滤 — 扫描持续增长的历史数据。
仅需少数列却选择所有列 —
```
SELECT *
```
会浪费I/O资源并阻止仅索引扫描。
前缀通配符文本搜索 —
```
LIKE '%term%'
```
无法使用B树索引。
复用列表查询进行分页计数 — 为了计数而执行完整的列表查询。
高OFFSET分页 — 返回页面前需读取并丢弃N行数据。
IN
子查询与
NOT IN
+NULL陷阱 —
```
EXISTS
```
/
```
NOT EXISTS
```
通常更安全、更快。
未索引排序 / 无限制列表 —
```
ORDER BY
```
未使用覆盖索引，或未设置
```
LIMIT
```
。
长事务 / 未索引批量写入 — 导致锁持有、连接池耗尽、数据膨胀。

Output

输出要求

Match the situation. A quick review might be a checklist; a PR review wants inline comments anchored to the offending lines; a design audit wants a short report. Whatever the shape, each finding carries the same three parts — impact, corrected version, tradeoff — and clusters of symptoms get a note pointing at the likely modeling root cause. Lead with the findings that matter under the actual expected volume; don't bury a row-explosion behind a stylistic nit.

Write your findings in English, regardless of the language of the code, comments, or PR under review. Keep table, column, and identifier names verbatim from the source — translate the analysis, not the schema.

根据场景调整输出形式。快速评审可以采用检查表形式；PR评审需要针对问题代码行添加内联注释；设计审计则需要一份简短报告。无论采用何种形式，每个发现的问题都必须包含三个部分——影响、修正版本、权衡取舍；当多个表象问题同时出现时，需要添加说明指出可能的数据模型根源。优先展示在实际预期数据量下影响重大的问题；不要将行爆炸这类严重问题隐藏在风格类小问题之后。

请用英文撰写分析结果，无论被评审的代码、注释或PR使用何种语言。保留表名、列名和标识符的原始名称——仅翻译分析内容，不要翻译Schema。

References

参考资料

```
references/smells.md
```
— the full catalog of 11 smells: description, impact, detection, and a to-avoid/preferred example for each. Read it when scanning any query or diff.
```
references/modeling-and-plans.md
```
— modeling and indexing guidance (read-optimization, denormalization, composite-index column order, partial indexes, index restraint, data types) and how to validate with the execution plan (
```
EXPLAIN
```
). Read it for schema/design audits and to confirm a query's behavior.

```
references/smells.md
```
— 包含11种问题模式的完整目录：每种模式的描述、影响、检测方法以及反面/正面示例。在检查任何查询语句或diff时请阅读此文档。
```
references/modeling-and-plans.md
```
— 建模和索引指导原则（读取优化、反规范化、复合索引列顺序、部分索引、索引约束、数据类型），以及如何通过执行计划（
```
EXPLAIN
```
）进行验证。在进行Schema/设计审计或确认查询行为时请阅读此文档。