query-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Honeycomb Query Patterns

Honeycomb 查询模式

Opinionated guidance for writing effective Honeycomb queries. The MCP tools already document their parameters and schemas — this skill focuses on when and why to use each pattern, not how to call the tools.
关于编写高效Honeycomb查询的指导性建议。MCP工具已记录其参数和架构——本技能专注于何时以及为何使用每种模式,而非如何调用工具。

Key Principles

核心原则

  1. Never use AVG for latency — AVG hides tail latency. Use P99 (or P95/P90) to see what slow users experience. Reserve AVG for non-latency metrics like payload size.
  2. Use HEATMAP for distributions — Single-number aggregates hide bimodal patterns. HEATMAP reveals whether you have one population or two.
  3. Combine calculations in one query
    COUNT, P99(duration_ms), HEATMAP(duration_ms)
    in a single query reduces API calls and gives a complete picture.
  4. Start broad, narrow with WHERE — Begin with a COUNT/GROUP BY to understand shape, then add filters to focus.
  5. Check for prior work — Call
    find_queries
    before writing new queries. Someone may have already answered the question.
  1. 绝不要用AVG分析延迟——AVG会掩盖尾部延迟。使用P99(或P95/P90)查看慢用户的体验。仅将AVG用于非延迟指标,如负载大小。
  2. 用HEATMAP分析分布——单一数值聚合会掩盖双峰模式。HEATMAP可揭示数据是单群体还是双群体。
  3. 在一个查询中组合计算——在单个查询中使用
    COUNT, P99(duration_ms), HEATMAP(duration_ms)
    可减少API调用并提供完整视图。
  4. 从宽泛入手,用WHERE缩小范围——先使用COUNT/GROUP BY了解数据形态,再添加过滤器聚焦目标。
  5. 查看已有成果——编写新查询前调用
    find_queries
    。可能已有他人解答过相同问题。

Choosing the Right Operation

选择合适的操作

QuestionUse
How much traffic?
COUNT
grouped by route or service
How many unique users/IPs?
COUNT_DISTINCT(field)
How fast for most users?
P50(duration_ms)
How fast for the worst-off users?
P99(duration_ms)
Is there a bimodal pattern?
HEATMAP(duration_ms)
What's the worst case?
MAX(duration_ms)
How many concurrent operations?
CONCURRENCY
Is it getting worse over time?
RATE_AVG(duration_ms)
问题使用操作
流量有多大?
COUNT
按路由或服务分组
有多少唯一用户/IP?
COUNT_DISTINCT(field)
大多数用户的体验速度如何?
P50(duration_ms)
体验最差的用户速度如何?
P99(duration_ms)
是否存在双峰模式?
HEATMAP(duration_ms)
最坏情况是什么?
MAX(duration_ms)
并发操作有多少?
CONCURRENCY
情况是否在恶化?
RATE_AVG(duration_ms)

Relational Field Strategy

关系字段策略

Use relational prefixes to ask cross-span questions within a trace:
  • "Show me slow endpoints caused by a specific downstream": Filter with
    any.service.name
    to find traces where that service participates, group by
    root.http.route
    to see which user-facing endpoints are affected.
  • "What's different about errored traces?": Filter with
    any.error = true
    , group by
    root.name
    to see which entry points have errors somewhere in their trace tree.
  • Exclude noise:
    none.service.name = "health-check"
    removes traces containing health checks.
使用关系前缀在追踪中跨跨度提问:
  • “展示由特定下游服务导致的慢端点”:用
    any.service.name
    过滤找到包含该服务的追踪,按
    root.http.route
    分组查看受影响的用户端端点。
  • “出错的追踪有何不同?”:用
    any.error = true
    过滤,按
    root.name
    分组查看哪些入口点的追踪树中存在错误。
  • 排除干扰项
    none.service.name = "health-check"
    可移除包含健康检查的追踪。

Calculated Fields

计算字段

Calculated fields are per-event expressions evaluated at query time. They transform, classify, and combine existing fields without re-instrumenting code.
Three scopes — choose the narrowest that fits the need:
  • Query-scoped (not saved): exploratory, one-off analysis
  • Dataset-level (saved): reusable within one service's dataset
  • Environment-level (saved): reusable across all datasets (e.g.,
    error_pct
    )
Common patterns:
  • Error rate:
    MUL(IF($error, 1, 0), 100)
    → use
    AVG(error_pct)
    to get percentage
  • Status classification:
    IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")
  • Latency bucketing:
    BUCKET($duration_ms, 500, 0, 3000)
  • Prefix routing:
    IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")
  • Exact-match classification: use
    SWITCH
    instead of
    IF(EQUALS(...))
    chains — same expression, more efficient
Key guardrails:
  • Don't create presentational (alias-only) fields — a field that just renames another field adds no analytical value and clutters the schema. Only save a calculated field when it does real computation (classification, extraction, math).
  • Avoid regex on large/complex fields — running
    REG_MATCH
    ,
    REG_VALUE
    , or
    REG_COUNT
    on
    exception.stacktrace
    ,
    db.statement
    , or full log lines can be very slow. Check whether a more targeted OTel field exists first (
    exception.type
    ,
    exception.message
    ,
    db.operation
    ). If you must regex a long field, guard it with a
    CONTAINS
    check first.
  • EQUALS
    has strict type matching
    EQUALS($http.status_code, 200)
    silently returns false if the field is stored as a string. Use
    find_columns
    to verify the field type before comparing.
  • FORMAT_TIME
    is expensive
    — avoid in high-volume queries.
  • Save query-scoped, not dataset-level, for one-off work — saved fields show up in everyone's schema.
For full syntax, operator reference, and extended anti-pattern examples, consult
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md
.
计算字段是在查询时计算的单事件表达式。它们无需重新埋点即可转换、分类和组合现有字段。
三种作用域——选择最贴合需求的最小作用域:
  • 查询作用域(不保存):探索性、一次性分析
  • 数据集级别(保存):可在单个服务的数据集中复用
  • 环境级别(保存):可在所有数据集中复用(例如
    error_pct
常见模式:
  • 错误率
    MUL(IF($error, 1, 0), 100)
    → 使用
    AVG(error_pct)
    获取百分比
  • 状态分类
    IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")
  • 延迟分桶
    BUCKET($duration_ms, 500, 0, 3000)
  • 前缀路由
    IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")
  • 精确匹配分类:使用
    SWITCH
    替代
    IF(EQUALS(...))
    链式判断——表达式功能相同,但效率更高
关键注意事项:
  • 不要创建仅用于展示(仅别名)的字段——仅重命名其他字段的字段不会增加分析价值,还会使架构混乱。仅当字段能实现实际计算(分类、提取、运算)时才保存计算字段。
  • 避免对大型/复杂字段使用正则——在
    exception.stacktrace
    db.statement
    或完整日志行上运行
    REG_MATCH
    REG_VALUE
    REG_COUNT
    可能非常缓慢。先检查是否存在更具针对性的OTel字段(
    exception.type
    exception.message
    db.operation
    )。如果必须对长字段使用正则,请先用
    CONTAINS
    检查过滤。
  • EQUALS
    有严格的类型匹配
    ——如果字段以字符串存储,
    EQUALS($http.status_code, 200)
    会静默返回false。比较前使用
    find_columns
    验证字段类型。
  • FORMAT_TIME
    开销大
    ——在高流量查询中避免使用。
  • 一次性工作使用查询作用域,而非数据集级别——已保存的字段会出现在所有人的架构中。
如需完整语法、运算符参考和扩展反模式示例,请查阅
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md

Before Every Query

每次查询前的准备

  • Filter on
    is_root
    when measuring user-facing latency — without it, internal spans inflate the numbers
  • Use human-readable time ranges (
    "24h"
    ,
    "-6h"
    ) — epoch timestamps are error-prone and hard to review
  • Validate columns with
    find_columns
    before querying
    — confirms field names exist and prevents empty results
  • 测量用户端延迟时过滤
    is_root
    ——不过滤的话,内部跨度会使数值膨胀
  • 使用易读的时间范围
    "24h"
    ,
    "-6h"
    )——时间戳容易出错且难以审核
  • 查询前用
    find_columns
    验证列
    ——确认字段名称存在,避免返回空结果

Interpreting Results

解读结果

After running a query, the MCP tool returns formatted markdown plus metadata. The most important metadata field is
query_result_json
— a signed URL to the raw JSON result. For precise analysis, download it and parse with jq or python rather than relying solely on the ASCII rendering.
Key interpretation rules:
  • P99/P50 > 10x — bimodal distribution likely; run HEATMAP to confirm
  • TOTAL row in breakdown results = aggregate across all groups
  • OTHER row = groups beyond the query limit (increase limit if OTHER is large)
  • ASCII heatmap
    ▁▂▃▄▅▆▇█
    = density from low to high; two bands = two populations
  • query_run_pk in metadata — feed directly to
    run_bubbleup
    for outlier analysis
运行查询后,MCP工具会返回格式化的markdown及元数据。最重要的元数据字段是
query_result_json
——指向原始JSON结果的签名URL。如需精确分析,请下载并使用jq或python解析,而非仅依赖ASCII渲染。
关键解读规则:
  • P99/P50 > 10倍——可能存在双峰分布;运行HEATMAP确认
  • 细分结果中的TOTAL行 = 所有分组的聚合值
  • OTHER行 = 超出查询限制的分组(如果OTHER占比大,需提高限制)
  • ASCII热力图
    ▁▂▃▄▅▆▇█
    = 密度从低到高;两个区间代表两个群体
  • 元数据中的query_run_pk——直接传入
    run_bubbleup
    进行异常值分析

Additional Resources

额外资源

Reference Files

参考文件

  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md
    — Complete VISUALIZE operation reference with examples
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md
    — Detailed relational field guide with cross-service patterns
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md
    — Extensive query cookbook organized by use case
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md
    — Guide to interpreting query results, raw JSON access, and statistical heuristics
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md
    — Calculated field syntax, full operator reference, common patterns, and anti-patterns (presentational fields, expensive string ops, type mismatches)
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md
    ——完整的VISUALIZE操作参考及示例
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md
    ——详细的关系字段指南及跨服务模式
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md
    ——按用例组织的大量查询手册
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md
    ——查询结果解读指南、原始JSON访问及统计启发法
  • ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md
    ——计算字段语法、完整运算符参考、常见模式及反模式(展示字段、开销大的字符串操作、类型不匹配)

Cross-References

交叉参考

  • For the structured investigation workflow that uses these query patterns: production-investigation skill
  • For SLO interpretation and burn alert design: slos-and-triggers skill
  • 使用这些查询模式的结构化调查工作流:production-investigation技能
  • SLO解读和熔断告警设计:slos-and-triggers技能