dt-obs-tracing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Application Tracing Skill

应用追踪技能

Overview

概述

Distributed traces in Dynatrace consist of spans - building blocks representing units of work. With Traces in Grail, every span is accessible via DQL with full-text searchability on all attributes. This skill covers trace fundamentals, common analysis patterns, and span-type specific queries.

Dynatrace中的分布式追踪由span组成，span是代表工作单元的基础构建块。借助Grail中的追踪能力，所有span都可以通过DQL访问，且所有属性都支持全文搜索。本技能涵盖追踪基础概念、常用分析模式以及不同span类型的专属查询方法。

Core Concepts

核心概念

Understanding Traces and Spans

理解追踪与Span

Spans represent logical units of work in distributed traces:

HTTP requests, RPC calls, database operations
Messaging system interactions
Internal function invocations
Custom instrumentation points

Span kinds:

```
span.kind: server
```
- Incoming call to a service
```
span.kind: client
```
- Outgoing call from a service
```
span.kind: consumer
```
- Incoming message consumption call to a service
```
span.kind: producer
```
- Outgoing message production call from a service
```
span.kind: internal
```
- Internal operation within a service

Root spans: A request root span (

request.is_root_span == true

) represents an incoming call to a service. Use this to analyze end-to-end request performance.

Span 代表分布式追踪中的逻辑工作单元：

HTTP请求、RPC调用、数据库操作
消息系统交互
内部函数调用
自定义埋点

Span类型：

```
span.kind: server
```
- 服务的入站调用
```
span.kind: client
```
- 服务的出站调用
```
span.kind: consumer
```
- 服务的入站消息消费调用
```
span.kind: producer
```
- 服务的出站消息生产调用
```
span.kind: internal
```
- 服务内部的操作

根Span：请求根span（

request.is_root_span == true

）代表服务的入站调用，可用于分析端到端的请求性能。

Key Trace Attributes

核心追踪属性

Essential attributes for trace analysis:

Attribute	Description
`trace.id`	Unique trace identifier
`span.id`	Unique span identifier
`span.parent_id`	Parent span ID (null for root spans)
`request.is_root_span`	Boolean, true for request entry points
`request.is_failed`	Boolean, true if request failed
`duration`	Span duration in nanoseconds
`span.timing.cpu`	Overall CPU time of the span (stable)
`span.timing.cpu_self`	CPU time excluding child spans (stable)
`dt.smartscape.service`	Service Smartscape node ID
`dt.service.name`	Dynatrace service name derived from service detection rules. It is equal to the Smartscape service node name.
`endpoint.name`	Endpoint/route name

追踪分析的必备属性：

属性	描述
`trace.id`	唯一追踪标识符
`span.id`	唯一span标识符
`span.parent_id`	父span ID（根span为null）
`request.is_root_span`	布尔值，请求入口点为true
`request.is_failed`	布尔值，请求失败时为true
`duration`	span耗时，单位为纳秒
`span.timing.cpu`	span的总CPU耗时（稳定值）
`span.timing.cpu_self`	排除子span后的CPU耗时（稳定值）
`dt.smartscape.service`	服务Smartscape节点ID
`dt.service.name`	根据服务检测规则生成的Dynatrace服务名称，与Smartscape服务节点名称一致。
`endpoint.name`	端点/路由名称

Service Context

服务上下文

Spans reference services via Smartscape node IDs and the detected service name

dt.service.name

which is also present on every span.

dql

fetch spans
| summarize spans=count(), by: { dt.smartscape.service, dt.service.name }

Span通过Smartscape节点ID和检测到的服务名称

dt.service.name

关联对应服务，每个span上都会携带该字段。

dql

fetch spans
| summarize spans=count(), by: { dt.smartscape.service, dt.service.name }

Sampling and Extrapolation

采样与外推

One span can represent multiple real operations due to:

Aggregation: Multiple operations in one span (
```
aggregation.count
```
)
ATM (Adaptive Traffic Management): Head-based sampling by agent
ALR (Adaptive Load Reduction): Server-side sampling
Read Sampling: Query-time sampling via
```
samplingRatio
```
parameter

When to extrapolate: Always extrapolate when counting actual operations (not just spans). Use the multiplicity factor:

dql

fetch spans
| fieldsAdd sampling.probability = (power(2, 56) - coalesce(sampling.threshold, 0)) * power(2, -56)
| fieldsAdd sampling.multiplicity = 1 / sampling.probability
| fieldsAdd multiplicity = coalesce(sampling.multiplicity, 1)
                         * coalesce(aggregation.count, 1)
                         * dt.system.sampling_ratio
| summarize operation_count = sum(multiplicity)

📖 Learn more: See Sampling and Extrapolation for detailed formulas and examples.

单个span可能代表多个实际操作，原因包括：

聚合：多个操作合并为一个span（
```
aggregation.count
```
）
ATM（自适应流量管理）：Agent端的头部采样
ALR（自适应负载削减）：服务端采样
读取采样：查询时通过
```
samplingRatio
```
参数采样

何时需要外推：统计实际操作数量（而非span数量）时必须外推，使用乘数因子计算：

dql

fetch spans
| fieldsAdd sampling.probability = (power(2, 56) - coalesce(sampling.threshold, 0)) * power(2, -56)
| fieldsAdd sampling.multiplicity = 1 / sampling.probability
| fieldsAdd multiplicity = coalesce(sampling.multiplicity, 1)
                         * coalesce(aggregation.count, 1)
                         * dt.system.sampling_ratio
| summarize operation_count = sum(multiplicity)

📖 了解更多：查看采样与外推获取详细公式与示例。

Common Query Patterns

常用查询模式

Basic Span Access

基础Span查询

Fetch spans and explore by type:

dql

fetch spans | limit 1

Explore spans by function and type:

dql

fetch spans
| summarize count(), by: { span.kind, code.namespace, code.function }

拉取span并按类型探索：

dql

fetch spans | limit 1

按函数和类型探索span：

dql

fetch spans
| summarize count(), by: { span.kind, code.namespace, code.function }

Request Root Filtering

请求根Span过滤

List request root spans (incoming service calls):

dql

fetch spans
| filter request.is_root_span == true
| fields trace.id, span.id, start_time, response_time = duration, endpoint.name
| limit 100

列出请求根span（服务入站调用）：

dql

fetch spans
| filter request.is_root_span == true
| fields trace.id, span.id, start_time, response_time = duration, endpoint.name
| limit 100

Service Performance Summary

服务性能汇总

Analyze service performance with error rates:

dql

fetch spans
| filter request.is_root_span == true
| summarize
    total_requests = count(),
    failed_requests = countIf(request.is_failed == true),
    avg_duration = avg(duration),
    p95_duration = percentile(duration, 95),
    by: {dt.service.name}
| fieldsAdd error_rate = (failed_requests * 100.0) / total_requests
| sort error_rate desc

分析服务性能与错误率：

dql

fetch spans
| filter request.is_root_span == true
| summarize
    total_requests = count(),
    failed_requests = countIf(request.is_failed == true),
    avg_duration = avg(duration),
    p95_duration = percentile(duration, 95),
    by: {dt.service.name}
| fieldsAdd error_rate = (failed_requests * 100.0) / total_requests
| sort error_rate desc

Trace ID Lookup

Trace ID查询

Find all spans in a specific trace:

dql

fetch spans
| filter trace.id == toUid("abc123def456")
| fields span.name, duration, dt.service.name

查找指定追踪下的所有span：

dql

fetch spans
| filter trace.id == toUid("abc123def456")
| fields span.name, duration, dt.service.name

Performance Analysis

性能分析

Response Time Percentiles

响应时间百分位

Calculate percentiles by endpoint:

dql

fetch spans
| filter request.is_root_span == true
| summarize {
    requests=count(),
    avg_duration=avg(duration),
    p95=percentile(duration, 95),
    p99=percentile(duration, 99)
  }, by: { endpoint.name }
| sort p99 desc

💡 Best practice: Use percentiles (p95, p99) over averages for performance insights.

按端点计算百分位：

dql

fetch spans
| filter request.is_root_span == true
| summarize {
    requests=count(),
    avg_duration=avg(duration),
    p95=percentile(duration, 95),
    p99=percentile(duration, 99)
  }, by: { endpoint.name }
| sort p99 desc

💡 最佳实践：性能分析优先使用百分位（p95、p99）而非平均值。

Slow Trace Detection

慢追踪检测

Find requests exceeding a threshold:

dql

fetch spans, from:now() - 2h
| filter request.is_root_span == true
| filter duration > 5s
| fields trace.id, span.name, dt.service.name, duration
| sort duration desc
| limit 50

查找超过阈值的请求：

dql

fetch spans, from:now() - 2h
| filter request.is_root_span == true
| filter duration > 5s
| fields trace.id, span.name, dt.service.name, duration
| sort duration desc
| limit 50

Duration Buckets with Exemplars

带示例的耗时分桶

Group requests into duration buckets with example traces:

dql

fetch spans, from:now() - 24h
| filter http.route == "/api/v1/storage/findByISBN"
| summarize {
    spans=count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { bin(duration, 10ms) }
| fields `bin(duration, 10ms)`, spans, trace.id=trace[trace.id], start_time=trace[start_time]

将请求分组到耗时分桶中并附带示例追踪：

dql

fetch spans, from:now() - 24h
| filter http.route == "/api/v1/storage/findByISBN"
| summarize {
    spans=count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { bin(duration, 10ms) }
| fields `bin(duration, 10ms)`, spans, trace.id=trace[trace.id], start_time=trace[start_time]

Performance Timeseries

性能时序数据

Extract response time as timeseries:

dql

fetch spans, from:now() - 24h
| filter request.is_root_span == true
| makeTimeseries {
    requests=count(),
    avg_duration=avg(duration),
    p95=percentile(duration, 95),
    p99=percentile(duration, 99)
  }, by: { endpoint.name }

📖 Learn more: See Performance Analysis for advanced patterns and timeseries techniques.

提取响应时间时序数据：

dql

fetch spans, from:now() - 24h
| filter request.is_root_span == true
| makeTimeseries {
    requests=count(),
    avg_duration=avg(duration),
    p95=percentile(duration, 95),
    p99=percentile(duration, 99)
  }, by: { endpoint.name }

📖 了解更多：查看性能分析获取高级模式与时序分析技巧。

Failure Investigation

故障排查

Failed Request Summary

失败请求汇总

Summarize failures by service:

dql

fetch spans
| filter request.is_root_span == true
| summarize
    total = count(),
    failed = countIf(request.is_failed == true),
  by: { dt.service.name }
| fieldsAdd failure_rate = (failed * 100.0) / total
| sort failure_rate desc

按服务汇总故障情况：

dql

fetch spans
| filter request.is_root_span == true
| summarize
    total = count(),
    failed = countIf(request.is_failed == true),
  by: { dt.service.name }
| fieldsAdd failure_rate = (failed * 100.0) / total
| sort failure_rate desc

Failure Reason Analysis

故障原因分析

Breakdown by failure detection reason:

dql

fetch spans
| filter request.is_failed == true and isNotNull(dt.failure_detection.results)
| expand dt.failure_detection.results
| summarize count(), by: { dt.failure_detection.results[reason] }

Failure reasons:

```
http_code
```
- HTTP response code triggered failure
```
grpc_code
```
- gRPC status code triggered failure
```
exception
```
- Exception caused failure
```
span_status
```
- Span status indicated failure
```
custom_rule
```
- Custom failure detection rule matched

按故障检测原因拆分统计：

dql

fetch spans
| filter request.is_failed == true and isNotNull(dt.failure_detection.results)
| expand dt.failure_detection.results
| summarize count(), by: { dt.failure_detection.results[reason] }

故障原因：

```
http_code
```
- HTTP响应码触发故障
```
grpc_code
```
- gRPC状态码触发故障
```
exception
```
- 异常导致故障
```
span_status
```
- Span状态标识故障
```
custom_rule
```
- 匹配自定义故障检测规则

HTTP Code Failures

HTTP状态码故障

Find failures by HTTP status code:

dql

fetch spans
| filter request.is_failed == true
| filter iAny(dt.failure_detection.results[][reason] == "http_code")
| summarize count(), by: { http.response.status_code, endpoint.name }
| sort `count()` desc

按HTTP状态码查询故障：

dql

fetch spans
| filter request.is_failed == true
| filter iAny(dt.failure_detection.results[][reason] == "http_code")
| summarize count(), by: { http.response.status_code, endpoint.name }
| sort `count()` desc

Recent Failed Requests

Service Dependencies

服务依赖

Service Communication

服务通信

Analyze incoming and outgoing service communication:

dql

fetch spans, from:now() - 1h
| filter isNotNull(server.address)
| fieldsAdd
    remote_side = server.address
| summarize
    call_count = count(),
    avg_duration = avg(duration),
    by: {dt.service.name, remote_side}
| sort call_count desc

分析入站与出站服务通信：

dql

fetch spans, from:now() - 1h
| filter isNotNull(server.address)
| fieldsAdd
    remote_side = server.address
| summarize
    call_count = count(),
    avg_duration = avg(duration),
    by: {dt.service.name, remote_side}
| sort call_count desc

Outgoing HTTP Calls

出站HTTP调用

Identify external API dependencies:

dql

fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
    calls = count(),
    avg_latency = avg(duration),
    p99_latency = percentile(duration, 99),
  by: { dt.service.name, server.address, server.port }
| sort calls desc

识别外部API依赖：

dql

fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
    calls = count(),
    avg_latency = avg(duration),
    p99_latency = percentile(duration, 99),
  by: { dt.service.name, server.address, server.port }
| sort calls desc

Trace Aggregation

追踪聚合

Complete Trace Analysis

完整追踪分析

Aggregate all spans in a trace to understand full request flow:

dql

fetch spans, from:now() - 30m
| summarize {
    spans = count(),
    client_spans = countIf(span.kind == "client"),

    // Endpoints involved in the trace
    endpoints = toString(arrayRemoveNulls(collectDistinct(endpoint.name))),

    // Extract the first request root in the trace
    trace_root = takeMin(record(
        root_detection_helper = coalesce(
            if(request.is_root_span, 1),
            if(isNull(span.parent_id), 2),
            3),
        start_time, endpoint.name, duration
      ))
}, by: { trace.id }

| fieldsFlatten trace_root
| fieldsRemove trace_root.root_detection_helper, trace_root

| fields
    start_time = trace_root.start_time,
    endpoint = trace_root.endpoint.name,
    response_time = trace_root.duration,
    spans,
    client_spans,
    endpoints,
    trace.id
| sort start_time
| limit 100

Root detection strategy: Use

takeMin(record(...))

with a detection helper to reliably find the root request:

Priority 1: Spans with
```
request.is_root_span == true
```
Priority 2: Spans without parent (root spans)
Priority 3: All other spans

聚合单个追踪下的所有span，了解完整请求流：

dql

fetch spans, from:now() - 30m
| summarize {
    spans = count(),
    client_spans = countIf(span.kind == "client"),

    // 追踪涉及的端点
    endpoints = toString(arrayRemoveNulls(collectDistinct(endpoint.name))),

    // 提取追踪中的第一个请求根span
    trace_root = takeMin(record(
        root_detection_helper = coalesce(
            if(request.is_root_span, 1),
            if(isNull(span.parent_id), 2),
            3),
        start_time, endpoint.name, duration
      ))
}, by: { trace.id }

| fieldsFlatten trace_root
| fieldsRemove trace_root.root_detection_helper, trace_root

| fields
    start_time = trace_root.start_time,
    endpoint = trace_root.endpoint.name,
    response_time = trace_root.duration,
    spans,
    client_spans,
    endpoints,
    trace.id
| sort start_time
| limit 100

根span检测策略：结合检测辅助字段使用

takeMin(record(...))

可以可靠找到根请求：

优先级1：
```
request.is_root_span == true
```
的span
优先级2：无父节点的span（根span）
优先级3：所有其他span

Multi-Service Traces

多服务追踪

Find traces spanning multiple services:

dql

fetch spans, from:now() - 1h
| summarize {
    services = collectDistinct(dt.service.name),
    trace_root = takeMin(record(
        root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
        endpoint.name
      ))
}, by: { trace.id }
| fieldsAdd service_count = arraySize(services)
| filter service_count > 1
| fields
    endpoint = trace_root[endpoint.name],
    service_count,
    services = toString(services),
    trace.id
| sort service_count desc
| limit 50

查找跨多个服务的追踪：

dql

fetch spans, from:now() - 1h
| summarize {
    services = collectDistinct(dt.service.name),
    trace_root = takeMin(record(
        root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
        endpoint.name
      ))
}, by: { trace.id }
| fieldsAdd service_count = arraySize(services)
| filter service_count > 1
| fields
    endpoint = trace_root[endpoint.name],
    service_count,
    services = toString(services),
    trace.id
| sort service_count desc
| limit 50

Request-Level Analysis

请求级分析

Request Attributes

请求属性

Access custom request attributes captured by OneAgent on request root spans:

dql

fetch spans
| filter request.is_root_span == true
| filter isNotNull(request_attribute.PaidAmount)
| makeTimeseries sum(request_attribute.PaidAmount)

Field pattern:

request_attribute.<name>

For attributes with special characters, use backticks:

dql

fetch spans
| filter isNotNull(`request_attribute.My Customer ID`)

访问OneAgent在请求根span上采集的自定义请求属性：

dql

fetch spans
| filter request.is_root_span == true
| filter isNotNull(request_attribute.PaidAmount)
| makeTimeseries sum(request_attribute.PaidAmount)

字段格式：

request_attribute.<name>

包含特殊字符的属性需要使用反引号：

dql

fetch spans
| filter isNotNull(`request_attribute.My Customer ID`)

Captured Attributes

采集属性

Access attributes captured from method parameters (always as arrays):

dql

fetch spans
| filter isNotNull(captured_attribute.BookID_purchased)
| fields trace.id, span.id, code.namespace, code.function, captured_attribute.BookID_purchased
| limit 1

Field pattern:

captured_attribute.<name>

访问从方法参数中采集的属性（始终为数组类型）：

dql

fetch spans
| filter isNotNull(captured_attribute.BookID_purchased)
| fields trace.id, span.id, code.namespace, code.function, captured_attribute.BookID_purchased
| limit 1

字段格式：

captured_attribute.<name>

Request ID Aggregation

请求ID聚合

Aggregate all spans belonging to a single request using

request.id

(OneAgent traces only):

dql

fetch spans
| filter isNotNull(request.id)
| summarize {
    spans = count(),
    client_spans = countIf(span.kind == "client"),
    request_root = takeMin(record(
        root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
        start_time, endpoint.name, duration
      ))
}, by: { trace.id, request.id }
| fieldsFlatten request_root
| fields
    start_time = request_root.start_time,
    endpoint = request_root.endpoint.name,
    response_time = request_root.duration,
    spans,
    client_spans
| limit 100

📖 Learn more: See Request Attributes for complete patterns on request attributes, captured attributes, and request-level aggregation.

使用

request.id

聚合属于单个请求的所有span（仅支持OneAgent采集的追踪）：

dql

fetch spans
| filter isNotNull(request.id)
| summarize {
    spans = count(),
    client_spans = countIf(span.kind == "client"),
    request_root = takeMin(record(
        root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
        start_time, endpoint.name, duration
      ))
}, by: { trace.id, request.id }
| fieldsFlatten request_root
| fields
    start_time = request_root.start_time,
    endpoint = request_root.endpoint.name,
    response_time = request_root.duration,
    spans,
    client_spans
| limit 100

📖 了解更多：查看请求属性获取请求属性、采集属性与请求级聚合的完整用法。

Span Types

Span类型

HTTP Spans

HTTP Span

HTTP spans capture web requests and API calls:

Server-side (incoming requests):

dql

fetch spans
| filter span.kind == "server" and isNotNull(http.request.method)
| summarize
    requests = count(),
    avg_duration = avg(duration),
  by: { http.request.method, http.route }
| sort requests desc

Client-side (outgoing calls):

dql

fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
    calls = count(),
    avg_duration = avg(duration),
  by: { server.address, http.request.method }
| sort calls desc

📖 Learn more: See HTTP Span Analysis for status codes, payload analysis, and client IP tracking.

HTTP Span采集Web请求与API调用：

服务端（入站请求）：

dql

fetch spans
| filter span.kind == "server" and isNotNull(http.request.method)
| summarize
    requests = count(),
    avg_duration = avg(duration),
  by: { http.request.method, http.route }
| sort requests desc

客户端（出站调用）：

dql

fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
    calls = count(),
    avg_duration = avg(duration),
  by: { server.address, http.request.method }
| sort calls desc

📖 了解更多：查看HTTP Span分析获取状态码、Payload分析与客户端IP追踪方法。

Database Spans

数据库Span

Database operations appear as client spans with

db.*

attributes:

dql

fetch spans
| filter span.kind == "client" and isNotNull(db.system) and isNotNull(db.namespace)
| summarize {
    spans=count(),
    avg_duration=avg(duration)
  }, by: { dt.service.name, db.system, db.namespace }
| sort spans desc

⚠️ Important: Database spans can be aggregated (one span = multiple calls). Always use extrapolation for accurate counts.

📖 Learn more: See Database Span Analysis for extrapolated counts and slow query detection.

数据库操作为带有

db.*

属性的客户端span：

dql

fetch spans
| filter span.kind == "client" and isNotNull(db.system) and isNotNull(db.namespace)
| summarize {
    spans=count(),
    avg_duration=avg(duration)
  }, by: { dt.service.name, db.system, db.namespace }
| sort spans desc

⚠️ 重要提示：数据库span可能是聚合后的结果（1个span代表多次调用），统计准确数量时必须使用外推。

📖 了解更多：查看数据库Span分析获取外推统计与慢查询检测方法。

Messaging Spans

消息队列Span

Messaging spans capture Kafka, RabbitMQ, SQS operations:

dql

fetch spans
| filter isNotNull(messaging.system)
| summarize
    spans = count(),
    messages = sum(coalesce(messaging.batch.message_count, 1)),
  by: { messaging.system, messaging.destination.name, messaging.operation.type }
| sort messages desc

📖 Learn more: See Messaging Span Analysis for throughput, latency, and failure patterns.

消息队列span采集Kafka、RabbitMQ、SQS操作：

dql

fetch spans
| filter isNotNull(messaging.system)
| summarize
    spans = count(),
    messages = sum(coalesce(messaging.batch.message_count, 1)),
  by: { messaging.system, messaging.destination.name, messaging.operation.type }
| sort messages desc

📖 了解更多：查看消息队列Span分析获取吞吐量、延迟与故障模式分析方法。

RPC Spans

RPC Span

RPC spans cover gRPC, SOAP, and other RPC frameworks:

dql

fetch spans
| filter isNotNull(rpc.system)
| summarize
    calls = count(),
    avg_duration = avg(duration),
  by: { rpc.system, rpc.service, rpc.method }
| sort calls desc

📖 Learn more: See RPC Span Analysis for gRPC status codes and service dependencies.

RPC Span涵盖gRPC、SOAP等RPC框架：

dql

fetch spans
| filter isNotNull(rpc.system)
| summarize
    calls = count(),
    avg_duration = avg(duration),
  by: { rpc.system, rpc.service, rpc.method }
| sort calls desc

📖 了解更多：查看RPC Span分析获取gRPC状态码与服务依赖分析方法。

Serverless Spans

无服务Span

FaaS spans capture Lambda, Azure Functions, and GCP Cloud Functions:

dql

fetch spans
| filter isNotNull(faas.name) and span.kind == "server"
| summarize
    invocations = count(),
    avg_duration = avg(duration),
    p99_duration = percentile(duration, 99),
  by: { faas.name, cloud.provider }
| sort invocations desc

📖 Learn more: See Serverless Span Analysis for cold start analysis and trigger types.

FaaS span采集Lambda、Azure Functions、GCP Cloud Functions操作：

dql

fetch spans
| filter isNotNull(faas.name) and span.kind == "server"
| summarize
    invocations = count(),
    avg_duration = avg(duration),
    p99_duration = percentile(duration, 99),
  by: { faas.name, cloud.provider }
| sort invocations desc

📖 了解更多：查看无服务Span分析获取冷启动分析与触发器类型相关内容。

Advanced Topics

高级主题

Exception Analysis

异常分析

Exceptions are stored as

span.events

within spans:

dql

fetch spans
| filter iAny(span.events[][span_event.name] == "exception")
| expand span.events
| fieldsFlatten span.events, fields: { exception.type }
| summarize {
    count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { exception.type }
| fields exception.type, `count()`, trace.id=trace[trace.id], start_time=trace[start_time]

💡 Tip: Use

iAny()

to check conditions within span event arrays.

异常存储在span的

span.events

字段中：

dql

fetch spans
| filter iAny(span.events[][span_event.name] == "exception")
| expand span.events
| fieldsFlatten span.events, fields: { exception.type }
| summarize {
    count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { exception.type }
| fields exception.type, `count()`, trace.id=trace[trace.id], start_time=trace[start_time]

💡 提示：使用

iAny()

检查span事件数组中的条件。

Logs and Traces Correlation

日志与追踪关联

Join logs with traces using trace IDs:

dql

fetch spans, from:now() - 30m
| join [ fetch logs | fieldsAdd trace.id = toUid(trace_id) ]
  , on: { trace.id }
  , fields: { content, loglevel }
| fields start_time, trace.id, span.id, loglevel, content
| limit 100

📖 Learn more: See Logs Correlation for filtering traces by log content and finding logs for failed requests.

通过trace ID关联日志与追踪：

dql

fetch spans, from:now() - 30m
| join [ fetch logs | fieldsAdd trace.id = toUid(trace_id) ]
  , on: { trace.id }
  , fields: { content, loglevel }
| fields start_time, trace.id, span.id, loglevel, content
| limit 100

📖 了解更多：查看日志关联获取按日志内容过滤追踪、查找失败请求对应日志的方法。

Network Analysis

网络分析

Analyze IP addresses, DNS resolution, and client geography:

dql

fetch spans, from:now() - 24h
| filter isNotNull(client.ip)
| fieldsAdd client.ip = toIp(client.ip)
| fieldsAdd client.subnet = ipMask(client.ip, 24)
| summarize {
    requests=count(),
    unique_clients=countDistinct(client.ip)
  }, by: { client.subnet, endpoint.name }
| sort requests desc

📖 Learn more: See Network Analysis for server address resolution and communication mapping.

分析IP地址、DNS解析与客户端地域：

dql

fetch spans, from:now() - 24h
| filter isNotNull(client.ip)
| fieldsAdd client.ip = toIp(client.ip)
| fieldsAdd client.subnet = ipMask(client.ip, 24)
| summarize {
    requests=count(),
    unique_clients=countDistinct(client.ip)
  }, by: { client.subnet, endpoint.name }
| sort requests desc

📖 了解更多：查看网络分析获取服务地址解析与通信映射相关内容。

Best Practices

最佳实践

Query Optimization

查询优化

Filter early: Apply
```
request.is_root_span == true
```
and endpoint filters first
Use
samplingRatio
: Reduce data volume for better performance (e.g.,
```
samplingRatio:100
```
reads 1%)
Limit results: Always use
```
limit
```
for exploratory queries
Percentiles over averages: Use p95/p99 for performance insights

提前过滤：优先应用
```
request.is_root_span == true
```
与端点过滤条件
使用
samplingRatio
：降低数据量提升查询性能（例如
```
samplingRatio:100
```
仅读取1%数据）
限制结果：探索性查询始终使用
```
limit
```
百分位优于平均值：使用p95/p99获取更准确的性能洞察

Node Lookups

节点查询

Use
getNodeName()
: Simplest way to add service names
Prefer subqueries: Use Smartscape node filters and
```
traverse
```
for filtering
Cache node info: Store node lookups in fields for reuse

使用
getNodeName()
：添加服务名称的最简单方式
优先使用子查询：结合Smartscape节点过滤器与
```
traverse
```
进行过滤
缓存节点信息：将节点查询结果存储在字段中复用

Aggregation Patterns

聚合模式

Request roots: Use
```
request.is_root_span == true
```
for end-to-end analysis
Trace-level: Group by
```
trace.id
```
for complete trace metrics
Request-level: Group by
```
request.id
```
for request metrics (OneAgent traces only)
Always extrapolate: Use multiplicity for accurate operation counts

请求根span：端到端分析使用
```
request.is_root_span == true
```
追踪级：按
```
trace.id
```
分组获取完整追踪指标
请求级：仅OneAgent采集的追踪可按
```
request.id
```
分组获取请求指标
始终外推：使用乘数因子获取准确的操作计数

Trace Exemplars

追踪示例

Include example traces for drilldown:

dql

| summarize {
    count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { grouping_field }
| fields ..., trace.id=trace[trace.id], start_time=trace[start_time]

This enables "Open With" functionality in Dynatrace UI.

聚合时包含示例追踪便于下钻分析：

dql

| summarize {
    count(),
    trace=takeAny(record(start_time, trace.id))
  }, by: { grouping_field }
| fields ..., trace.id=trace[trace.id], start_time=trace[start_time]

该写法支持Dynatrace UI中的「打开方式」功能。

References

参考资料

Detailed documentation for specific topics:

Performance Analysis - Advanced timeseries, duration buckets, endpoint ranking
Failure Detection - Failure reasons, exception investigation, custom rules
Sampling and Extrapolation - Multiplicity calculation, database extrapolation
Request Attributes - Request attributes, captured attributes, request ID aggregation
Entity Lookups - Advanced node lookups, infrastructure correlation, hardware analysis
HTTP Span Analysis - Status codes, payload analysis, client IPs
Database Span Analysis - Extrapolated counts, slow queries, statement analysis
Messaging Span Analysis - Kafka, RabbitMQ, SQS throughput and latency
RPC Span Analysis - gRPC, SOAP, service dependencies
Serverless Span Analysis - Lambda, Azure Functions, cold start analysis
Logs Correlation - Joining logs and traces, correlation patterns
Network Analysis - IP addresses, DNS resolution, communication mapping

特定主题的详细文档：

性能分析 - 高级时序分析、耗时分桶、端点排名
故障检测 - 故障原因、异常排查、自定义规则
采样与外推 - 乘数计算、数据库外推
请求属性 - 请求属性、采集属性、请求ID聚合
实体查询 - 高级节点查询、基础设施关联、硬件分析
HTTP Span分析 - 状态码、Payload分析、客户端IP
数据库Span分析 - 外推计数、慢查询、语句分析
消息队列Span分析 - Kafka、RabbitMQ、SQS吞吐量与延迟
RPC Span分析 - gRPC、SOAP、服务依赖
无服务Span分析 - Lambda、Azure Functions、冷启动分析
日志关联 - 日志与追踪关联、关联模式
网络分析 - IP地址、DNS解析、通信映射