query-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Honeycomb Query Patterns

Honeycomb 查询模式

Opinionated guidance for writing effective Honeycomb queries. The MCP tools already document their parameters and schemas — this skill focuses on when and why to use each pattern, not how to call the tools.

关于编写高效Honeycomb查询的指导性建议。MCP工具已记录其参数和架构——本技能专注于何时以及为何使用每种模式，而非如何调用工具。

Key Principles

核心原则

Never use AVG for latency — AVG hides tail latency. Use P99 (or P95/P90) to see what slow users experience. Reserve AVG for non-latency metrics like payload size.
Use HEATMAP for distributions — Single-number aggregates hide bimodal patterns. HEATMAP reveals whether you have one population or two.
Combine calculations in one query —
```
COUNT, P99(duration_ms), HEATMAP(duration_ms)
```
in a single query reduces API calls and gives a complete picture.
Start broad, narrow with WHERE — Begin with a COUNT/GROUP BY to understand shape, then add filters to focus.
Check for prior work — Call
```
find_queries
```
before writing new queries. Someone may have already answered the question.

绝不要用AVG分析延迟——AVG会掩盖尾部延迟。使用P99（或P95/P90）查看慢用户的体验。仅将AVG用于非延迟指标，如负载大小。
用HEATMAP分析分布——单一数值聚合会掩盖双峰模式。HEATMAP可揭示数据是单群体还是双群体。
在一个查询中组合计算——在单个查询中使用
```
COUNT, P99(duration_ms), HEATMAP(duration_ms)
```
可减少API调用并提供完整视图。
从宽泛入手，用WHERE缩小范围——先使用COUNT/GROUP BY了解数据形态，再添加过滤器聚焦目标。
查看已有成果——编写新查询前调用
```
find_queries
```
。可能已有他人解答过相同问题。

Choosing the Right Operation

选择合适的操作

Question	Use
How much traffic?	`COUNT` grouped by route or service
How many unique users/IPs?	`COUNT_DISTINCT(field)`
How fast for most users?	`P50(duration_ms)`
How fast for the worst-off users?	`P99(duration_ms)`
Is there a bimodal pattern?	`HEATMAP(duration_ms)`
What's the worst case?	`MAX(duration_ms)`
How many concurrent operations?	`CONCURRENCY`
Is it getting worse over time?	`RATE_AVG(duration_ms)`

问题	使用操作
流量有多大？	`COUNT` 按路由或服务分组
有多少唯一用户/IP？	`COUNT_DISTINCT(field)`
大多数用户的体验速度如何？	`P50(duration_ms)`
体验最差的用户速度如何？	`P99(duration_ms)`
是否存在双峰模式？	`HEATMAP(duration_ms)`
最坏情况是什么？	`MAX(duration_ms)`
并发操作有多少？	`CONCURRENCY`
情况是否在恶化？	`RATE_AVG(duration_ms)`

Relational Field Strategy

关系字段策略

Use relational prefixes to ask cross-span questions within a trace:

"Show me slow endpoints caused by a specific downstream": Filter with
```
any.service.name
```
to find traces where that service participates, group by
```
root.http.route
```
to see which user-facing endpoints are affected.
"What's different about errored traces?": Filter with
```
any.error = true
```
, group by
```
root.name
```
to see which entry points have errors somewhere in their trace tree.
Exclude noise:
```
none.service.name = "health-check"
```
removes traces containing health checks.

使用关系前缀在追踪中跨跨度提问：

“展示由特定下游服务导致的慢端点”：用
```
any.service.name
```
过滤找到包含该服务的追踪，按
```
root.http.route
```
分组查看受影响的用户端端点。
“出错的追踪有何不同？”：用
```
any.error = true
```
过滤，按
```
root.name
```
分组查看哪些入口点的追踪树中存在错误。
排除干扰项：
```
none.service.name = "health-check"
```
可移除包含健康检查的追踪。

Calculated Fields

计算字段

Calculated fields are per-event expressions evaluated at query time. They transform, classify, and combine existing fields without re-instrumenting code.

Three scopes — choose the narrowest that fits the need:

Query-scoped (not saved): exploratory, one-off analysis
Dataset-level (saved): reusable within one service's dataset
Environment-level (saved): reusable across all datasets (e.g.,
```
error_pct
```
)

Common patterns:

Error rate:

MUL(IF($error, 1, 0), 100)

→ use

AVG(error_pct)

to get percentage

Status classification:

IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")

Latency bucketing:
```
BUCKET($duration_ms, 500, 0, 3000)
```

Prefix routing:

IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")

Exact-match classification: use
```
SWITCH
```
instead of
```
IF(EQUALS(...))
```
chains — same expression, more efficient

Key guardrails:

Don't create presentational (alias-only) fields — a field that just renames another field adds no analytical value and clutters the schema. Only save a calculated field when it does real computation (classification, extraction, math).
Avoid regex on large/complex fields — running
```
REG_MATCH
```
,
```
REG_VALUE
```
, or
```
REG_COUNT
```
on
```
exception.stacktrace
```
,
```
db.statement
```
, or full log lines can be very slow. Check whether a more targeted OTel field exists first (
```
exception.type
```
,
```
exception.message
```
,
```
db.operation
```
). If you must regex a long field, guard it with a
```
CONTAINS
```
check first.
EQUALS
has strict type matching —
```
EQUALS($http.status_code, 200)
```
silently returns false if the field is stored as a string. Use
```
find_columns
```
to verify the field type before comparing.
FORMAT_TIME
is expensive — avoid in high-volume queries.
Save query-scoped, not dataset-level, for one-off work — saved fields show up in everyone's schema.

For full syntax, operator reference, and extended anti-pattern examples, consult

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md

计算字段是在查询时计算的单事件表达式。它们无需重新埋点即可转换、分类和组合现有字段。

三种作用域——选择最贴合需求的最小作用域：

查询作用域（不保存）：探索性、一次性分析
数据集级别（保存）：可在单个服务的数据集中复用
环境级别（保存）：可在所有数据集中复用（例如
```
error_pct
```
）

常见模式：

错误率：

MUL(IF($error, 1, 0), 100)

→ 使用

AVG(error_pct)

获取百分比

状态分类：

IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")

延迟分桶：
```
BUCKET($duration_ms, 500, 0, 3000)
```

前缀路由：

IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")

精确匹配分类：使用
```
SWITCH
```
替代
```
IF(EQUALS(...))
```
链式判断——表达式功能相同，但效率更高

关键注意事项：

不要创建仅用于展示（仅别名）的字段——仅重命名其他字段的字段不会增加分析价值，还会使架构混乱。仅当字段能实现实际计算（分类、提取、运算）时才保存计算字段。
避免对大型/复杂字段使用正则——在
```
exception.stacktrace
```
、
```
db.statement
```
或完整日志行上运行
```
REG_MATCH
```
、
```
REG_VALUE
```
或
```
REG_COUNT
```
可能非常缓慢。先检查是否存在更具针对性的OTel字段（
```
exception.type
```
、
```
exception.message
```
、
```
db.operation
```
）。如果必须对长字段使用正则，请先用
```
CONTAINS
```
检查过滤。
EQUALS
有严格的类型匹配——如果字段以字符串存储，
```
EQUALS($http.status_code, 200)
```
会静默返回false。比较前使用
```
find_columns
```
验证字段类型。
FORMAT_TIME
开销大——在高流量查询中避免使用。
一次性工作使用查询作用域，而非数据集级别——已保存的字段会出现在所有人的架构中。

如需完整语法、运算符参考和扩展反模式示例，请查阅

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md

。

Before Every Query

每次查询前的准备

Filter on
is_root
when measuring user-facing latency — without it, internal spans inflate the numbers
Use human-readable time ranges (
```
"24h"
```
,
```
"-6h"
```
) — epoch timestamps are error-prone and hard to review
Validate columns with
find_columns
before querying — confirms field names exist and prevents empty results

测量用户端延迟时过滤
is_root
——不过滤的话，内部跨度会使数值膨胀
使用易读的时间范围（
```
"24h"
```
,
```
"-6h"
```
）——时间戳容易出错且难以审核
查询前用
find_columns
验证列——确认字段名称存在，避免返回空结果

Interpreting Results

解读结果

After running a query, the MCP tool returns formatted markdown plus metadata. The most important metadata field is

query_result_json

— a signed URL to the raw JSON result. For precise analysis, download it and parse with jq or python rather than relying solely on the ASCII rendering.

Key interpretation rules:

P99/P50 > 10x — bimodal distribution likely; run HEATMAP to confirm
TOTAL row in breakdown results = aggregate across all groups
OTHER row = groups beyond the query limit (increase limit if OTHER is large)
ASCII heatmap
```
▁▂▃▄▅▆▇█
```
= density from low to high; two bands = two populations
query_run_pk in metadata — feed directly to
```
run_bubbleup
```
for outlier analysis

运行查询后，MCP工具会返回格式化的markdown及元数据。最重要的元数据字段是

query_result_json

——指向原始JSON结果的签名URL。如需精确分析，请下载并使用jq或python解析，而非仅依赖ASCII渲染。

关键解读规则：

P99/P50 > 10倍——可能存在双峰分布；运行HEATMAP确认
细分结果中的TOTAL行 = 所有分组的聚合值
OTHER行 = 超出查询限制的分组（如果OTHER占比大，需提高限制）
ASCII热力图
```
▁▂▃▄▅▆▇█
```
= 密度从低到高；两个区间代表两个群体
元数据中的query_run_pk——直接传入
```
run_bubbleup
```
进行异常值分析

Additional Resources

额外资源

Reference Files

参考文件

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md
— Complete VISUALIZE operation reference with examples

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md
— Detailed relational field guide with cross-service patterns

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md
— Extensive query cookbook organized by use case

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md
— Guide to interpreting query results, raw JSON access, and statistical heuristics
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md
— Calculated field syntax, full operator reference, common patterns, and anti-patterns (presentational fields, expensive string ops, type mismatches)

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md
——完整的VISUALIZE操作参考及示例

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md
——详细的关系字段指南及跨服务模式

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md
——按用例组织的大量查询手册

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md
——查询结果解读指南、原始JSON访问及统计启发法

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md
——计算字段语法、完整运算符参考、常见模式及反模式（展示字段、开销大的字符串操作、类型不匹配）

Cross-References

交叉参考

For the structured investigation workflow that uses these query patterns: production-investigation skill
For SLO interpretation and burn alert design: slos-and-triggers skill

使用这些查询模式的结构化调查工作流：production-investigation技能
SLO解读和熔断告警设计：slos-and-triggers技能