neo4j-query-tuning-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use
适用场景
- Query takes unexpectedly long; need root-cause analysis
- EXPLAIN/PROFILE output in hand — needs interpretation
- Identifying which index is missing or unused
- Deciding between slotted / pipelined / parallel runtimes
- Monitoring live queries: SHOW QUERIES, SHOW TRANSACTIONS
- Cardinality estimates wrong (plan replanning needed)
- 查询运行时间远超预期,需要进行根因分析
- 已获取EXPLAIN/PROFILE输出,需要解读
- 识别缺失或未被使用的索引
- 确定使用slotted / pipelined / parallel哪种运行时
- 监控实时查询:SHOW QUERIES、SHOW TRANSACTIONS
- 基数估计错误(需要重新规划执行计划)
When NOT to Use
不适用场景
- Writing Cypher from scratch →
neo4j-cypher-skill - GDS algorithm performance →
neo4j-gds-skill - Schema design / data modelling →
neo4j-modeling-skill
- 从零开始编写Cypher →
neo4j-cypher-skill - GDS算法性能调优 →
neo4j-gds-skill - Schema设计/数据建模 →
neo4j-modeling-skill
EXPLAIN vs PROFILE
EXPLAIN 与 PROFILE 对比
| EXPLAIN | PROFILE | |
|---|---|---|
| Executes query? | No | Yes |
| Returns data? | No | Yes |
Shows | No | Yes |
Shows | No | Yes |
Shows | Yes | Yes |
| Cost | Zero | Full query cost |
Run twice — first run warms page cache; second gives representative metrics.
PROFILEcypher
EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.nameQuery API alternative (no driver):
bash
curl -X POST https://<host>/db/<db>/query/v2 \
-u <user>:<pass> -H "Content-Type: application/json" \
-d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'| EXPLAIN | PROFILE | |
|---|---|---|
| 是否执行查询? | 否 | 是 |
| 是否返回数据? | 否 | 是 |
是否显示 | 否 | 是 |
是否显示 | 否 | 是 |
是否显示 | 是 | 是 |
| 开销 | 零 | 完整查询开销 |
需要运行两次PROFILE——第一次运行会预热页缓存,第二次运行的指标才具有代表性。
cypher
EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name无需驱动的查询API替代方案:
bash
curl -X POST https://<host>/db/<db>/query/v2 \
-u <user>:<pass> -H "Content-Type: application/json" \
-d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'Key Plan Metrics
关键执行计划指标
| Metric | Good | Investigate if |
|---|---|---|
| Low; drops after index added | High relative to |
| Shrinks early in plan | Large until final operator |
| Close to | >10× divergence from actual |
| >0.99 | <0.90 (disk I/O bottleneck) |
| High | — |
| Near 0 | Rising (page cache too small) |
Read plans bottom-up — leaf operators at bottom initiate data retrieval.
| 指标 | 理想状态 | 需要排查的情况 |
|---|---|---|
| 数值低;添加索引后下降 | 相对于 |
| 在执行计划早期就减少 | 直到最后一个操作符时数值仍很大 |
| 与 | 与实际值偏差超过10倍 |
| >0.99 | <0.90(磁盘I/O瓶颈) |
| 数值高 | — |
| 接近0 | 持续上升(页缓存过小) |
执行计划需要从下往上阅读——底部的叶子操作符负责初始化数据检索。
Operator Reference
操作符参考
| Operator | Good/Bad | Meaning | Fix |
|---|---|---|---|
| ✓ | Exact match via RANGE/LOOKUP index | — |
| ✓ | Unique constraint index hit | — |
| ✓ | TEXT index CONTAINS / STARTS WITH | — |
| ~ | Full index scan (no predicate) | Add WHERE predicate or composite index |
| ✗ | Scans all nodes of label | Add RANGE index on lookup property |
| ✗✗ | Scans entire node store | Add label + index to MATCH |
| ~ | Traverse relationships from node | Normal; limit with LIMIT or WHERE |
| ~ | Find rels between two matched nodes | Normal for known-endpoint joins |
| ~ | Predicate applied after scan | Move predicate into WHERE with index |
| ✗ | No join predicate between two MATCH | Add WHERE join or use WITH between MATCHes |
| ~ | Hash join on node IDs | Normal; planner chose hash join |
| ~ | Hash join on values | Normal; watch memory for large inputs |
| ~ | Full aggregation (ORDER BY, count(*)) | Normal for aggregates |
| ✓ | Streaming aggregation | — |
| ✗ | Read/write conflict; materialises all rows | See Eager fix strategies below |
| ~ | Full sort — O(n log n) | Add |
| ✓ | Sort+Limit combined — O(n log k) | Preferred over Sort+Limit |
| ✓ | Truncates rows early | Push as early as possible |
| ~ | Offset pagination | Use keyset pagination on large graphs |
| — | Final output operator | Root of tree |
| ~ | Lookup by relationship ID | Avoid |
Full operator reference → references/plan-operators.md
| 操作符 | 优劣 | 含义 | 修复方案 |
|---|---|---|---|
| ✓ | 通过RANGE/LOOKUP索引精确匹配 | — |
| ✓ | 命中唯一约束索引 | — |
| ✓ | TEXT索引的CONTAINS / STARTS WITH操作 | — |
| ~ | 全索引扫描(无谓词) | 添加WHERE谓词或复合索引 |
| ✗ | 扫描指定标签的所有节点 | 在查询属性上添加RANGE索引 |
| ✗✗ | 扫描整个节点存储 | 在MATCH语句中添加标签+索引 |
| ~ | 从节点遍历关系 | 正常操作;可通过LIMIT或WHERE限制 |
| ~ | 查找两个匹配节点之间的关系 | 已知端点连接的正常操作 |
| ~ | 扫描后应用谓词 | 将谓词移至带索引的WHERE子句中 |
| ✗ | 两个MATCH语句之间无连接谓词 | 添加WHERE连接条件,或在MATCH之间使用WITH |
| ~ | 基于节点ID的哈希连接 | 正常操作;优化器选择哈希连接 |
| ~ | 基于值的哈希连接 | 正常操作;需注意大输入时的内存占用 |
| ~ | 全量聚合(ORDER BY、count(*)) | 聚合操作的正常行为 |
| ✓ | 流式聚合 | — |
| ✗ | 读写冲突;将所有行实例化 | 参考下方的Eager修复策略 |
| ~ | 全量排序——时间复杂度O(n log n) | 在Sort前添加 |
| ✓ | Sort+Limit组合操作——时间复杂度O(n log k) | 优先于Sort+Limit使用 |
| ✓ | 提前截断行数 | 尽可能早地推送该操作 |
| ~ | 偏移分页 | 在大图中使用键集分页 |
| — | 最终输出操作符 | 执行计划树的根节点 |
| ~ | 通过关系ID查找 | 避免使用 |
完整操作符参考 → references/plan-operators.md
Diagnostic Workflow (Agent Runbook)
诊断流程(Agent执行手册)
Step 1 — Baseline Plan
步骤1 — 获取基准执行计划
cypher
EXPLAIN <query>Scan output for , , , .
AllNodesScanNodeByLabelScanCartesianProductEagercypher
EXPLAIN <query>扫描输出,查找、、、操作符。
AllNodesScanNodeByLabelScanCartesianProductEagerStep 2 — Check Indexes
步骤2 — 检查索引
cypher
SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'Find whether the label/property from the bad operator has an index.
cypher
SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'查看低效操作符涉及的标签/属性是否已有索引。
Step 3 — Create Missing Index
步骤3 — 创建缺失的索引
cypher
// RANGE index for equality/range predicates:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// TEXT index for CONTAINS/ENDS WITH:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// Composite for multi-property lookup:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)Wait for before measuring.
state = 'ONLINE'cypher
// 用于等值/范围谓词的RANGE索引:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// 用于CONTAINS/ENDS WITH的TEXT索引:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// 用于多属性查询的复合索引:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)等待索引后再进行性能测试。
state = 'ONLINE'Step 4 — Profile After Fix
步骤4 — 修复后执行PROFILE
cypher
PROFILE <query>Compare and elapsed ms before/after. Target: replaces scan operators.
dbHitsNodeIndexSeekcypher
PROFILE <query>对比修复前后的和耗时(毫秒)。目标:用替代扫描类操作符。
dbHitsNodeIndexSeekStep 5 — Stale Statistics (if estimatedRows wildly off)
步骤5 — 统计信息过时(若estimatedRows与实际值偏差极大)
cypher
CALL db.prepareForReplanning()
// or resample a specific index:
CALL db.resampleIndex("person_email")
// or resample all outdated:
CALL db.resampleOutdatedIndexes()Config: (default — plan expires when stat changes >75%).
dbms.cypher.statistics_divergence_threshold0.75cypher
CALL db.prepareForReplanning()
// 或重新采样指定索引:
CALL db.resampleIndex("person_email")
// 或重新采样所有过时的索引:
CALL db.resampleOutdatedIndexes()配置项:(默认值——当统计信息变化超过75%时,执行计划过期)。
dbms.cypher.statistics_divergence_threshold0.75Fixing Common Plan Problems
常见执行计划问题修复
Missing Index → NodeByLabelScan / AllNodesScan
缺失索引 → NodeByLabelScan / AllNodesScan
cypher
// Force index hint when planner ignores it:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// Force label scan (sometimes faster for high selectivity):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.namecypher
// 当优化器忽略索引时,强制使用索引提示:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// 强制使用标签扫描(在高选择性场景下有时更快):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.nameWrong Anchor — Planner Picks Wrong Starting Node
锚点错误——优化器选择了错误的起始节点
Reorder MATCH or use hints:
cypher
// Force join at specific node:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title调整MATCH顺序或使用提示:
cypher
// 强制在指定节点进行连接:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.titleCartesianProduct — Two Unconnected MATCHes
CartesianProduct——两个无关联的MATCH语句
cypher
// Bad (Cartesian product):
MATCH (a:Author {id: $aid})
MATCH (b:Book {id: $bid})
RETURN a.name, b.title
// Good (explicit join or WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// Or: WITH between them to reset planning contextcypher
// 不良写法(笛卡尔积):
MATCH (a:Author {id: $aid})
MATCH (b:Book {id: $bid})
RETURN a.name, b.title
// 良好写法(显式连接或使用WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// 或者:在两个MATCH之间使用WITH重置优化上下文Eager — Read/Write Conflict
Eager——读写冲突
Three strategies (pick simplest):
- Add specific labels to MATCH nodes so planner distinguishes read/write sets
- Collect-then-write:
WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1 - CALL IN TRANSACTIONS: isolates each batch in its own transaction
cypher
CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS三种策略(选择最简单的一种):
- 为MATCH节点添加特定标签,让优化器区分读写集合
- 先收集再写入:
WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1 - CALL IN TRANSACTIONS:将每个批次隔离在独立事务中
cypher
CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWSExpensive CONTAINS / ENDS WITH
昂贵的CONTAINS / ENDS WITH操作
cypher
// Needs TEXT index (RANGE does NOT support these):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.namecypher
// 需要TEXT索引(RANGE索引不支持这些操作):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.nameOver-Traversal — Push LIMIT Early
过度遍历——提前推送LIMIT
cypher
// Bad: LIMIT after expensive join
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10
// Good: anchor limit before fan-out
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.textcypher
// 不良写法:在昂贵的连接后使用LIMIT
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10
// 良好写法:在扇出前设置锚点限制
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.textCypher Runtime Selection
Cypher运行时选择
| Runtime | Select | Best For | Avoid When |
|---|---|---|---|
| | Default OLTP; streaming, low memory | Unsupported operators fall back to slotted |
| | Guaranteed stable behavior; debug | Performance-critical OLTP |
| | Large analytical scans; aggregations | OLTP, writes, short queries, Aura Free |
Pipelined is default for most queries. Parallel requires configured; available on Enterprise and Aura Pro 2025+.
dbms.cypher.parallel.worker_limitcypher
// Force parallel for large aggregation:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)| 运行时 | 指定方式 | 最佳适用场景 | 避免场景 |
|---|---|---|---|
| | 默认OLTP场景;流式处理,低内存占用 | 不支持的操作符会回退到slotted |
| | 保证稳定行为;调试场景 | 性能敏感的OLTP场景 |
| | 大型分析扫描;聚合操作 | OLTP、写入操作、短查询、Aura Free版 |
Pipelined是大多数查询的默认运行时。Parallel需要配置;适用于企业版和Aura Pro 2025+版本。
dbms.cypher.parallel.worker_limitcypher
// 强制对大型聚合操作使用parallel运行时:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)Query Monitoring Commands
查询监控命令
cypher
// Live queries + resource usage:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username
// Running transactions:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount // currentQueryProgress added [2026.03]
// Kill a specific transaction:
TERMINATE TRANSACTION $transactionId
// Kill a query:
TERMINATE QUERY $queryId
// Graph count stats (node/rel counts by label/type — feed into planner):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data
// Token stats (label/property/rel-type IDs):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, dataFull monitoring reference → references/stats-and-monitoring.md
cypher
// 实时查询+资源使用情况:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username
// 运行中的事务:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount // currentQueryProgress于[2026.03]新增
// 终止特定事务:
TERMINATE TRANSACTION $transactionId
// 终止查询:
TERMINATE QUERY $queryId
// 图统计信息(按标签/类型统计节点/关系数量——供优化器使用):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data
// 令牌统计信息(标签/属性/关系类型ID):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data完整监控参考 → references/stats-and-monitoring.md
Checklist
检查清单
- Run first — identifies plan problems without execution cost
EXPLAIN - Check for /
AllNodesScan— missing indexNodeByLabelScan - Check for — missing join predicate
CartesianProduct - Check for — read/write conflict
Eager - — confirm relevant index exists and
SHOW INDEXESstate = 'ONLINE' - Create missing index; wait for ONLINE
- Run twice — first warms cache, second is representative
PROFILE - Compare before/after fix
dbHits - If wildly off →
estimatedRowsCALL db.prepareForReplanning() - Push /
LIMITbefore high-fanout operationsWITH n LIMIT k - For CONTAINS/ENDS WITH — TEXT index, not RANGE
- For large analytical queries — consider
runtime=parallel - Kill long-running queries with
TERMINATE TRANSACTION
- 先运行——无需执行开销即可识别执行计划问题
EXPLAIN - 检查是否存在/
AllNodesScan——可能缺失索引NodeByLabelScan - 检查是否存在——缺失连接谓词
CartesianProduct - 检查是否存在——读写冲突
Eager - 执行——确认相关索引存在且
SHOW INDEXESstate = 'ONLINE' - 创建缺失的索引;等待索引变为ONLINE状态
- 运行两次——第一次预热缓存,第二次结果具有代表性
PROFILE - 对比修复前后的数值
dbHits - 若与实际值偏差极大 → 执行
estimatedRowsCALL db.prepareForReplanning() - 在高扇出操作前推送/
LIMITWITH n LIMIT k - 对于CONTAINS/ENDS WITH操作——使用TEXT索引,而非RANGE索引
- 对于大型分析查询——考虑使用
runtime=parallel - 使用终止长时间运行的查询
TERMINATE TRANSACTION