dt-obs-frontends

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Frontend Observability Skill

前端可观测性Skill

Monitor web and mobile frontends using Real User Monitoring (RUM) with DQL queries. This skill targets the new RUM experience only; do not use classic RUM data.

通过DQL查询使用真实用户监控（RUM）能力监控网页端和移动端前端。本Skill仅适配新版RUM体验，请勿用于经典RUM数据查询。

Overview

概述

This skill helps you:

Monitor Core Web Vitals and frontend performance
Track user sessions, engagement, and behavior
Analyze errors and correlate with backend traces
Optimize mobile app startup and stability
Diagnose performance issues with detailed timing analysis

Data Sources:

Metrics:
```
timeseries
```
with
```
dt.frontend.*
```
(trends, alerting)
Events:
```
fetch user.events
```
(individual page views, requests, clicks, errors)
Sessions:
```
fetch user.sessions
```
(session-level aggregates: duration, bounce, counts)

本Skill可帮助你：

监控Core Web Vitals和前端性能
追踪用户会话、参与度和行为
分析错误并与后端链路关联
优化移动应用启动速度和稳定性
通过详细的耗时分析诊断性能问题

数据源：

指标: 带
```
dt.frontend.*
```
前缀的
```
timeseries
```
（趋势分析、告警）
事件:
```
fetch user.events
```
（独立页面访问、请求、点击、错误）
会话:
```
fetch user.sessions
```
（会话级聚合：时长、跳出、计数）

Quick Reference

快速参考

Common Metrics

常用指标

```
dt.frontend.user_action.count
```
- User action volume
```
dt.frontend.user_action.duration
```
- User action duration
```
dt.frontend.request.count
```
- Request volume
```
dt.frontend.request.duration
```
- Request latency (ms)
```
dt.frontend.error.count
```
- Error counts

dt.frontend.session.active.estimated_count

- Active sessions

```
dt.frontend.user.active.estimated_count
```
- Unique users

dt.frontend.web.page.cumulative_layout_shift

- CLS metric

dt.frontend.web.navigation.dom_interactive

- DOM interactive time

```
dt.frontend.web.page.first_input_delay
```
- FID metric (legacy; prefer INP)

dt.frontend.web.page.largest_contentful_paint

- LCP metric

dt.frontend.web.page.interaction_to_next_paint

- INP metric

dt.frontend.web.navigation.load_event_end

- Load event end

dt.frontend.web.navigation.time_to_first_byte

- Time to first byte

```
dt.frontend.user_action.count
```
- 用户操作量
```
dt.frontend.user_action.duration
```
- 用户操作耗时
```
dt.frontend.request.count
```
- 请求量
```
dt.frontend.request.duration
```
- 请求延迟（毫秒）
```
dt.frontend.error.count
```
- 错误计数

dt.frontend.session.active.estimated_count

- 活跃会话数

```
dt.frontend.user.active.estimated_count
```
- 独立用户数

dt.frontend.web.page.cumulative_layout_shift

- CLS指标

dt.frontend.web.navigation.dom_interactive

- DOM可交互时间

```
dt.frontend.web.page.first_input_delay
```
- FID指标（旧版，推荐使用INP）

dt.frontend.web.page.largest_contentful_paint

- LCP指标

dt.frontend.web.page.interaction_to_next_paint

- INP指标

dt.frontend.web.navigation.load_event_end

- 加载事件结束时间

dt.frontend.web.navigation.time_to_first_byte

- 首字节时间

Common Filters

常用过滤器

```
frontend.name
```
- Filter by frontend name (e.g.
```
my-frontend
```
)
```
dt.rum.user_type
```
- Exclude synthetic monitoring
```
geo.country.iso_code
```
- Geographic filtering
```
device.type
```
- Mobile, desktop, tablet
```
browser.name
```
- Browser filtering

```
frontend.name
```
- 按前端名称过滤（例如
```
my-frontend
```
）
```
dt.rum.user_type
```
- 排除合成监控流量
```
geo.country.iso_code
```
- 地理过滤
```
device.type
```
- 移动设备、桌面设备、平板
```
browser.name
```
- 浏览器过滤

Common Timeseries Dimensions

常用时序维度

Use these for

dt.frontend.*

timeseries splits and breakdowns:

```
frontend.name
```
- Frontend name
```
geo.country.iso_code
```
```
device.type
```
```
browser.name
```
```
os.name
```
```
user_type
```
-
```
real_user
```
,
```
synthetic
```
,
```
robot
```

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc

用于

dt.frontend.*

时序数据的拆分和拆解：

```
frontend.name
```
- 前端名称
```
geo.country.iso_code
```
```
device.type
```
```
browser.name
```
```
os.name
```
```
user_type
```
-
```
real_user
```
（真实用户）、
```
synthetic
```
（合成监控）、
```
robot
```
（机器人）

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc

Event Characteristics

事件特征

```
characteristics.has_page_summary
```
- Page views (web)
```
characteristics.has_view_summary
```
- Views (mobile)
```
characteristics.has_navigation
```
- Navigation events
```
characteristics.has_user_interaction
```
- Clicks, forms, etc.
```
characteristics.has_request
```
- Network request events
```
characteristics.has_error
```
- Error events
```
characteristics.has_crash
```
- Mobile crashes
```
characteristics.has_long_task
```
- Long JavaScript tasks
```
characteristics.has_csp_violation
```
- CSP violations

Full event model: https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events

```
characteristics.has_page_summary
```
- 页面访问（网页端）
```
characteristics.has_view_summary
```
- 页面浏览（移动端）
```
characteristics.has_navigation
```
- 导航事件
```
characteristics.has_user_interaction
```
- 点击、表单提交等
```
characteristics.has_request
```
- 网络请求事件
```
characteristics.has_error
```
- 错误事件
```
characteristics.has_crash
```
- 移动应用崩溃
```
characteristics.has_long_task
```
- 长JavaScript任务
```
characteristics.has_csp_violation
```
- CSP违规

完整事件模型：https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events

Session Data (

user.sessions

)

会话数据（

user.sessions

）

user.sessions

contains session-level aggregates produced by the session aggregation service from

user.events

. Field names differ from
user.events
— sessions use underscores where events use dots.

Session identity and context:

```
dt.rum.session.id
```
— Session ID (NOT
```
dt.rum.session_id
```
)
```
dt.rum.instance.id
```
— Instance ID
```
frontend.name
```
- array of frontends involved in session
```
dt.rum.application.type
```
—
```
web
```
or
```
mobile
```
```
dt.rum.user_type
```
—
```
real_user
```
,
```
synthetic
```
, or
```
robot
```

Session aggregates (underscore naming — NOT dot):

Field	Description	⚠️ NOT this
`navigation_count`	Number of navigations	~~`navigation.count`~~
`user_interaction_count`	Clicks, form submissions	~~`user_interaction.count`~~
`user_action_count`	User actions	~~`user_action.count`~~
`request_count`	XHR/fetch requests	~~`request.count`~~
`event_count`	Total events in session	~~`event.count`~~
`page_summary_count`	Page views (web)	~~`page_summary.count`~~
`view_summary_count`	Views (mobile/SPA)	~~`view_summary.count`~~

Error fields (dot naming — same as events):

error.count

error.exception_count

error.http_4xx_count

error.http_5xx_count

error.anr_count

error.csp_violation_count

error.has_crash

Session lifecycle:

```
start_time
```
,
```
end_time
```
,
```
duration
```
(nanoseconds)

end_reason

—

timeout

synthetic_execution_finished

, etc.

```
characteristics.is_bounce
```
— Boolean bounce flag
```
characteristics.has_replay
```
— Session replay available

User identity:

```
dt.rum.user_tag
```
— User identifier (typically email, username or customerId), set via
```
dtrum.identifyUser()
```
API call in the instrumented frontend. Not always populated — only present when the frontend explicitly calls
```
identifyUser()
```
.
When
```
dt.rum.user_tag
```
is empty,
```
dt.rum.instance.id
```
is often the only user differentiator. The value is a random ID assigned by the RUM agent on the client side, so it is not personally identifiable but can be used to distinguish unique users when
```
user_tag
```
is not set. On web this is based on a persistent cookie, so it can be deleted by the user.
The user tag is a session-level field — query it from
```
user.sessions
```
, not
```
user.events
```
(where it may be empty even if the session has one).

Client/device context:

browser.name

browser.version

device.type

os.name

```
geo.country.iso_code
```
,
```
client.ip
```
,
```
client.isp
```

Synthetic-only fields:

dt.entity.synthetic_test

dt.entity.synthetic_location

dt.entity.synthetic_test_step

Time window behavior:

```
fetch user.sessions, from: X, to: Y
```
only returns sessions that started in
```
[X, Y]
```
— NOT sessions that were merely active during that window.
Sessions can last 8h+ (the aggregation service waits 30+ minutes of inactivity before closing a session).
To find all sessions active during a time window, extend the lookback by at least 8 hours: e.g., to cover events from the last 24h, query
```
fetch user.sessions, from: now() - 32h
```
.
This matters for correlation queries (e.g., matching
```
user.events
```
to
```
user.sessions
```
by session ID) — a narrow
```
user.sessions
```
window will miss long-running sessions and produce false "orphans."

Session creation delay:

The session aggregation service waits for ~30+ minutes of inactivity before closing a session and writing the
```
user.sessions
```
record.
This means recent events (last ~1 hour) will not yet have a matching
user.sessions
entry — this is normal, not a data gap.
When correlating
```
user.events
```
with
```
user.sessions
```
, exclude recent data (e.g., use
```
to: now() - 1h
```
) to avoid counting in-progress sessions as orphans.

Zombie sessions (events without a
user.sessions
record):

Not every
```
dt.rum.session.id
```
in
```
user.events
```
will have a corresponding
```
user.sessions
```
record. The session aggregation service intentionally skips zombie sessions — sessions with no real user activity (zero navigations and zero user interactions).
Zombie sessions contain only background, machine-driven activity (e.g., automatic XHR requests, heartbeats) with no page views or clicks. Serializing them would add no value to users.

When correlating

user.events

with

user.sessions

, expect a large number of unmatched session IDs. This is by design, not a data gap. Filter to sessions with activity before diagnosing orphans:

dql

fetch user.events, from: now() - 2h, to: now() - 1h
| filter isNotNull(dt.rum.session.id)
| summarize navs = countIf(characteristics.has_navigation == true),
    interactions = countIf(characteristics.has_user_interaction == true),
    by: {dt.rum.session.id}
| filter navs > 0 or interactions > 0

Example — bounce rate and session quality:

dql

fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
    total_sessions = count(),
    bounces = countIf(characteristics.is_bounce == true),
    zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
    avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)

user.sessions

包含会话聚合服务从

user.events

生成的会话级聚合数据。字段命名与
user.events
不同——会话字段使用下划线分隔，事件字段使用点分隔。

会话标识与上下文：

```
dt.rum.session.id
```
— 会话ID（不是
```
dt.rum.session_id
```
）
```
dt.rum.instance.id
```
— 实例ID
```
frontend.name
```
- 会话涉及的前端数组
```
dt.rum.application.type
```
—
```
web
```
（网页端）或
```
mobile
```
（移动端）
```
dt.rum.user_type
```
—
```
real_user
```
（真实用户）、
```
synthetic
```
（合成监控）或
```
robot
```
（机器人）

会话聚合字段（下划线命名，不是点分隔）：

字段	描述	⚠️ 不要使用该字段
`navigation_count`	导航次数	~~`navigation.count`~~
`user_interaction_count`	点击、表单提交次数	~~`user_interaction.count`~~
`user_action_count`	用户操作次数	~~`user_action.count`~~
`request_count`	XHR/fetch请求次数	~~`request.count`~~
`event_count`	会话内总事件数	~~`event.count`~~
`page_summary_count`	页面访问量（网页端）	~~`page_summary.count`~~
`view_summary_count`	页面浏览量（移动端/单页应用）	~~`view_summary.count`~~

错误字段（点分隔命名，与事件一致）：

error.count

、

error.exception_count

、

error.http_4xx_count

、

error.http_5xx_count

error.anr_count

、

error.csp_violation_count

、

error.has_crash

会话生命周期：

```
start_time
```
、
```
end_time
```
、
```
duration
```
（纳秒）
```
end_reason
```
—
```
timeout
```
（超时）、
```
synthetic_execution_finished
```
（合成监控执行完成）等
```
characteristics.is_bounce
```
— 跳出标识布尔值
```
characteristics.has_replay
```
— 是否有会话回放

用户标识：

```
dt.rum.user_tag
```
— 用户标识符（通常为邮箱、用户名或客户ID），通过埋点前端的
```
dtrum.identifyUser()
```
API调用设置。并非总是存在——仅当前端显式调用
```
identifyUser()
```
时才会填充。
当
```
dt.rum.user_tag
```
为空时，
```
dt.rum.instance.id
```
通常是唯一的用户区分标识。该值是RUM Agent在客户端分配的随机ID，因此不包含个人可识别信息，但在
```
user_tag
```
未设置时可用于区分独立用户。网页端该值基于持久化Cookie生成，因此可能被用户删除。
用户标签是会话级字段——从
```
user.sessions
```
中查询，不要从
```
user.events
```
中查询（即使会话存在用户标签，事件中的该字段也可能为空）。

客户端/设备上下文：

browser.name

、

browser.version

、

device.type

、

os.name

```
geo.country.iso_code
```
、
```
client.ip
```
、
```
client.isp
```

仅合成监控可用字段：

dt.entity.synthetic_test

、

dt.entity.synthetic_location

、

dt.entity.synthetic_test_step

时间窗口行为：

```
fetch user.sessions, from: X, to: Y
```
仅返回在
[X, Y]
区间内启动的会话——不返回仅在该窗口内活跃的会话。
会话可持续8小时以上（聚合服务会等待30分钟以上无活动后才会关闭会话）。
要查找某个时间窗口内所有活跃的会话，至少将回溯时间延长8小时：例如要覆盖过去24小时的事件，查询
```
fetch user.sessions, from: now() - 32h
```
。
这一点对关联查询非常重要（例如按会话ID匹配
```
user.events
```
和
```
user.sessions
```
）——狭窄的
```
user.sessions
```
查询窗口会遗漏长时间运行的会话，产生虚假的“孤立”记录。

会话创建延迟：

会话聚合服务会等待约30分钟以上无活动后才会关闭会话并写入
```
user.sessions
```
记录。
这意味着最近约1小时的事件还没有对应的
user.sessions
条目——这是正常现象，不是数据缺口。
关联
```
user.events
```
和
```
user.sessions
```
时，排除最近的数据（例如使用
```
to: now() - 1h
```
），避免将进行中的会话计为孤立记录。

僵尸会话（没有
user.sessions
记录的事件）：

并非
```
user.events
```
中的所有
```
dt.rum.session.id
```
都有对应的
```
user.sessions
```
记录。会话聚合服务会故意跳过僵尸会话——即没有真实用户活动的会话（零次导航和零次用户交互）。
僵尸会话仅包含后台机器驱动的活动（例如自动XHR请求、心跳），没有页面访问或点击。序列化这类会话对用户没有价值。

关联

user.events

和

user.sessions

时，会存在大量未匹配的会话ID，这是设计如此，不是数据缺口。诊断孤立记录前先过滤出有活动的会话：

dql

fetch user.events, from: now() - 2h, to: now() - 1h
| filter isNotNull(dt.rum.session.id)
| summarize navs = countIf(characteristics.has_navigation == true),
    interactions = countIf(characteristics.has_user_interaction == true),
    by: {dt.rum.session.id}
| filter navs > 0 or interactions > 0

示例——跳出率和会话质量：

dql

fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
    total_sessions = count(),
    bounces = countIf(characteristics.is_bounce == true),
    zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
    avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)

Performance Thresholds

性能阈值

LCP: Good <2.5s | Poor >4.0s
INP: Good <200ms | Poor >500ms
CLS: Good <0.1 | Poor >0.25
Cold Start: Good <3s | Poor >5s
Long Tasks: >50ms problematic, >250ms severe

LCP: 优秀 <2.5s | 较差 >4.0s
INP: 优秀 <200ms | 较差 >500ms
CLS: 优秀 <0.1 | 较差 >0.25
冷启动: 优秀 <3s | 较差 >5s
长任务: >50ms有问题，>250ms严重

Core Workflows

核心工作流

1. Web Performance Monitoring

1. 网页性能监控

Track Core Web Vitals, page performance, and request latency for SEO and UX optimization.

Primary Files:

```
references/WebVitals.md
```
- Core Web Vitals (LCP, INP, CLS)
```
references/performance-analysis.md
```
- Request and page performance

Common Queries:

All Core Web Vitals summary
Web Vitals by page/device
Request duration SLA monitoring
Page load performance trends

追踪Core Web Vitals、页面性能和请求延迟，用于SEO和UX优化。

主要文件：

```
references/WebVitals.md
```
- Core Web Vitals（LCP、INP、CLS）
```
references/performance-analysis.md
```
- 请求和页面性能

常用查询：

所有Core Web Vitals汇总
按页面/设备拆分的Web Vitals
请求耗时SLA监控
页面加载性能趋势

2. User Session & Behavior Analysis

2. 用户会话与行为分析

Understand user engagement, navigation patterns, and session characteristics. Analyze button clicks, form interactions, and user journeys.

Data source choice:

Use
```
fetch user.sessions
```
for session-level analysis (bounce rate, session duration, session counts)
Use
```
fetch user.events
```
for event-level detail (individual clicks, navigation timing, specific pages)

Primary Files:

```
references/user-sessions.md
```
- Session tracking and user analytics
```
references/performance-analysis.md
```
- Navigation and engagement patterns

Common Queries:

Active sessions by frontend
Sessions by custom property
Bounce rate analysis (use
```
user.sessions
```
with
```
characteristics.is_bounce
```
)
Session quality (zero-activity sessions via
```
navigation_count
```
,
```
user_interaction_count
```
)

Click analysis on UI elements (use

user.events

with

characteristics.has_user_interaction

)

External referrers (traffic sources)

理解用户参与度、导航模式和会话特征。分析按钮点击、表单交互和用户旅程。

数据源选择：

会话级分析使用
```
fetch user.sessions
```
（跳出率、会话时长、会话计数）
事件级细节使用
```
fetch user.events
```
（独立点击、导航耗时、特定页面）

主要文件：

```
references/user-sessions.md
```
- 会话追踪和用户分析
```
references/performance-analysis.md
```
- 导航和参与度模式

常用查询：

按前端拆分的活跃会话数
按自定义属性拆分的会话数
跳出率分析（使用
```
user.sessions
```
的
```
characteristics.is_bounce
```
字段）
会话质量（通过
```
navigation_count
```
、
```
user_interaction_count
```
统计零活动会话）

UI元素点击分析（使用带

characteristics.has_user_interaction

过滤的

user.events

）

外部引荐来源（流量来源）

3. Error Tracking & Debugging

3. 错误追踪与调试

Monitor error rates, analyze exceptions, and correlate frontend issues with backend.

Primary Files:

```
references/error-tracking.md
```
- Error analysis and debugging
```
references/performance-analysis.md
```
- Trace correlation

Common Queries:

Error rate monitoring
JavaScript exceptions by type
Failed requests with backend traces
Request timing breakdown

监控错误率、分析异常、关联前端问题与后端链路。

主要文件：

```
references/error-tracking.md
```
- 错误分析与调试
```
references/performance-analysis.md
```
- 链路关联

常用查询：

错误率监控
按类型拆分的JavaScript异常
关联后端链路的失败请求
请求耗时拆解

4. Mobile Frontend Monitoring

4. 移动端前端监控

Track mobile app performance, startup times, and crash analytics for iOS and Android. Analyze app version performance and device-specific issues.

Primary Files:

```
references/mobile-monitoring.md
```
- App starts, crashes, and mobile-specific metrics

Common Queries:

Cold start performance by app version (iOS, Android)
Warm start and hot start metrics
Crash rate by device model and OS version
ANR events (Android)
Native crash signals
App version comparison

追踪iOS和Android移动应用性能、启动时间和崩溃分析。分析应用版本性能和设备特定问题。

主要文件：

```
references/mobile-monitoring.md
```
- 应用启动、崩溃和移动端特有指标

常用查询：

按应用版本（iOS、Android）拆分的冷启动性能
温启动和热启动指标
按设备型号和OS版本拆分的崩溃率
ANR事件（Android）
原生崩溃信号
应用版本对比

5. Advanced Performance Optimization

5. 高级性能优化

Deep performance diagnostics including JavaScript profiling, main thread blocking, UI jank analysis, and geographic performance.

Primary Files:

```
references/performance-analysis.md
```
- Advanced diagnostics and long tasks

Common Queries:

Long JavaScript tasks blocking main thread
UI jank and rendering delays
Tasks >50ms impacting responsiveness
Third-party long tasks (iframes)
Single-page app performance issues
Geographic performance distribution
Performance degradation detection

深度性能诊断，包括JavaScript profiling、主线程阻塞、UI卡顿分析和地域性能。

主要文件：

```
references/performance-analysis.md
```
- 高级诊断和长任务分析

常用查询：

阻塞主线程的长JavaScript任务
UI卡顿和渲染延迟
影响响应性的>50ms任务
第三方长任务（iframe）
单页应用性能问题
地域性能分布
性能降级检测

Best Practices

最佳实践

Use metrics for trends, events for debugging
- Metrics: Timeseries dashboards, alerting, capacity planning
- Events: Root cause analysis, detailed diagnostics
Filter by frontend in multi-app environments
- Always use
```
frontend.name
```
  for clarity
Match interval to time range
- 5m intervals for hours, 1h for days, 1d for weeks
Exclude synthetic traffic when analyzing real users
- Filter
```
dt.rum.user_type
```
  to focus on genuine behavior
Combine metrics with events for complete insights
- Start with metric trends, drill into events for details
Extend
user.sessions
time window for correlation queries
- ```
user.sessions
```
  only returns sessions that started in the query window
- Sessions can last 8h+, so extend lookback by at least 8h when joining with
```
user.events
```

趋势分析用指标，调试用事件
- 指标：时序大盘、告警、容量规划
- 事件：根因分析、详细诊断
多应用环境下按前端过滤
- 始终使用
```
frontend.name
```
  保证查询清晰
时间间隔与时间范围匹配
- 小时级查询用5分钟间隔，天级用1小时间隔，周级用1天间隔
分析真实用户时排除合成流量
- 过滤
```
dt.rum.user_type
```
  聚焦真实用户行为
结合指标与事件获得完整洞察
- 从指标趋势开始，下钻到事件查看细节
关联查询时延长
user.sessions
时间窗口
- ```
user.sessions
```
  仅返回在查询窗口内启动的会话
- 会话可持续8小时以上，因此与
```
user.events
```
  关联时至少延长8小时回溯时间

Slow Page Load Playbook

页面加载缓慢排查手册

Start by segmenting the problem by page, browser, geo location, and

dt.rum.user_type

Heuristics:

High TTFB -> slow backend
High LCP with normal TTFB -> render bottleneck
High CLS -> layout shifts (late-loading content, ads, fonts)
Long tasks dominate -> JavaScript execution bottlenecks (heavy frameworks, large bundles)

首先按页面、浏览器、地理位置和

dt.rum.user_type

拆分问题。

判定规则：

TTFB高 -> 后端缓慢
TTFB正常但LCP高 -> 渲染瓶颈
CLS高 -> 布局偏移（内容晚加载、广告、字体）
长任务占比高 -> JavaScript执行瓶颈（重型框架、大包体积）

Backend latency (high TTFB)

后端延迟（高TTFB）

dql

fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)

If TTFB is high, analyze backend spans by correlating frontend events with backend traces using

dt.rum.trace_id

dql

fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)

如果TTFB很高，通过

dt.rum.trace_id

关联前端事件和后端链路，分析后端span。

Heavy JavaScript execution (long tasks)

JavaScript执行过重（长任务）

Long tasks by page:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20

Long tasks by script source:

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20

按页面统计长任务：

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20

按脚本来源统计长任务：

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20

Large JavaScript bundles

JavaScript包体积过大

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

Large resources

资源体积过大

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

Cache effectiveness

缓存效率

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
   performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
   (performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
   "cached",
   else: if(performance.transfer_size > 0, "network", else: "uncached")
  )
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   by: {url.domain, cache_status}

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
   performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
   (performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
   "cached",
   else: if(performance.transfer_size > 0, "network", else: "uncached")
  )
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   by: {url.domain, cache_status}

Compression waste

压缩浪费

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
   expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
   wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
   requests = count(),
   avg_expansion_ratio = avg(expansion_ratio),
   total_wasted_bytes = sum(wasted_bytes),
   by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
   expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
   wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
   requests = count(),
   avg_expansion_ratio = avg(expansion_ratio),
   total_wasted_bytes = sum(wasted_bytes),
   by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50

Network issues

网络问题

Compare by location and domain when TTFB is high but backend performance is good:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50

Analyze DNS time:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
   request_count = count(),
   avg_dns_ms = avg(dns_ms),
   p75_dns_ms = percentile(dns_ms, 75),
   p95_dns_ms = percentile(dns_ms, 95),
   by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50

Analyze by protocol (http/1.1, h2, h3):

dql

fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50

当TTFB很高但后端性能良好时，按地域和域名对比：

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50

分析DNS耗时：

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
   request_count = count(),
   avg_dns_ms = avg(dns_ms),
   p75_dns_ms = percentile(dns_ms, 75),
   p95_dns_ms = percentile(dns_ms, 95),
   by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50

按协议分析（http/1.1、h2、h3）：

dql

fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50

Third-party dependencies

第三方依赖

Analyze request performance by domain:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {request.url.domain}
| sort p95_duration desc
| limit 50

按域名分析请求性能：

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {request.url.domain}
| sort p95_duration desc
| limit 50

Troubleshooting

故障排查

Handling Zero Results

处理零结果

When queries return no data, follow this diagnostic workflow:

Validate Timeframe
- Check if timeframe is appropriate for the data type
- RUM data may have delay (1-2 minutes for recent events)
- Verify timeframe syntax:
```
now()-1h to now()
```
  or similar
- Try expanding timeframe:
```
now()-24h
```
  for initial exploration
Verify frontend Configuration
- Confirm frontend is instrumented and sending RUM data
- Check
```
frontend.name
```
  filter is correct
- Test without frontend filter to see if any RUM data exists
- Verify frontend name matches the environment
Check Data Availability
- Run basic query:
```
fetch user.events | limit 1
```
- If no events exist, RUM may not be configured
- Check if timeframe predates frontend deployment
- Verify user has access to the environment
Review Query Syntax
- Validate filters aren't too restrictive
- Check for typos in field names or metric names
- Test query incrementally: start simple, add filters gradually
- Verify characteristics filters match event types

When to Ask User for Clarification:

No RUM data exists in environment → "Is RUM configured for this frontend?"
Timeframe unclear → "What time period should I analyze?"
Expected data missing → "Has this frontend sent data recently?"

当查询无返回数据时，遵循以下诊断流程：

验证时间范围
- 检查时间范围是否适配数据类型
- RUM数据可能有延迟（最近的事件有1-2分钟延迟）
- 验证时间范围语法：
```
now()-1h to now()
```
  或类似格式
- 尝试扩大时间范围：初始探索使用
```
now()-24h
```
验证前端配置
- 确认前端已埋点并发送RUM数据
- 检查
```
frontend.name
```
  过滤器是否正确
- 去掉前端过滤器测试是否存在任何RUM数据
- 验证前端名称与环境匹配
检查数据可用性
- 运行基础查询：
```
fetch user.events | limit 1
```
- 如果没有事件，可能未配置RUM
- 检查时间范围是否早于前端部署时间
- 验证用户有权限访问该环境
检查查询语法
- 验证过滤器是否过于严格
- 检查字段名或指标名是否有拼写错误
- 增量测试查询：从简单查询开始，逐步添加过滤器
- 验证特征过滤器与事件类型匹配

何时向用户请求澄清：

环境中不存在RUM数据 → "该前端是否配置了RUM？"
时间范围不明确 → "我应该分析哪个时间段？"
预期数据缺失 → "该前端最近是否有发送数据？"

Handling Anomalous Results

处理异常结果

When query results seem unexpected or suspicious:

Unexpected High Values:

Metric spikes: Verify interval aggregation (avg vs. max vs. sum)
Session counts: Check for bot traffic or synthetic monitoring
Error rates: Confirm error definition matches expectations
Performance degradation: Look for deployment or infrastructure changes

Unexpected Low Values:

Missing sessions: Verify
```
dt.rum.user_type
```
filter isn't excluding real users
Low request counts: Check if frontend filter is too narrow
Few errors: Confirm error characteristics filter is correct
Missing mobile data: Verify platform-specific fields exist

Inconsistent Data:

Metrics vs. Events mismatch: Different aggregation methods are expected
Geographic anomalies: Check timezone assumptions
Device distribution skew: May reflect actual user base
Version mismatches: Verify app version filtering logic

当查询结果看起来不符合预期或可疑时：

异常高值：

指标突增：验证区间聚合方式（平均值vs最大值vs求和）
会话计数异常：检查是否有机器人流量或合成监控
错误率异常：确认错误定义与预期一致
性能降级：排查部署或基础设施变更

异常低值：

会话缺失：验证
```
dt.rum.user_type
```
过滤器是否排除了真实用户
请求计数低：检查前端过滤器是否过窄
错误很少：确认错误特征过滤器是否正确
移动端数据缺失：验证平台特有字段是否存在

数据不一致：

指标与事件不匹配：聚合方式不同属于预期情况
地理异常：检查时区假设
设备分布倾斜：可能反映真实用户群体特征
版本不匹配：验证应用版本过滤逻辑

Decision Tree: Ask vs. Investigate

决策树：询问vs调查

Query returns unexpected results
│
├─ Is this a zero-result scenario?
│  ├─ YES → Follow "Handling Zero Results" workflow
│  └─ NO → Continue
│
├─ Can I validate the result independently?
│  ├─ YES → Run validation query
│  │        ├─ Validation confirms result → Report findings
│  │        └─ Validation contradicts → Investigate further
│  └─ NO → Continue
│
├─ Is the anomaly clearly explained by data?
│  ├─ YES → Report with explanation
│  └─ NO → Continue
│
├─ Do I need domain knowledge to interpret?
│  ├─ YES → Ask user for context
│  │        Example: "The error rate is 15%. Is this expected for your frontend?"
│  └─ NO → Continue
│
└─ Is the issue ambiguous or requires clarification?
   ├─ YES → Ask specific question with data context
   │        Example: "I see two frontends named 'web-app'. Which frontend name should I use?"
   └─ NO → Investigate and report findings with caveats

查询返回异常结果
│
├─ 是否是零结果场景？
│  ├─ 是 → 遵循"处理零结果"流程
│  └─ 否 → 继续
│
├─ 我能否独立验证结果？
│  ├─ 是 → 运行验证查询
│  │        ├─ 验证确认结果 → 上报发现
│  │        └─ 验证结果矛盾 → 进一步调查
│  └─ 否 → 继续
│
├─ 异常是否能被数据清晰解释？
│  ├─ 是 → 附带解释上报
│  └─ 否 → 继续
│
├─ 我是否需要领域知识来解读？
│  ├─ 是 → 向用户请求上下文
│  │        示例："错误率为15%，这对您的前端来说是否属于预期情况？"
│  └─ 否 → 继续
│
└─ 问题是否模糊或需要澄清？
   ├─ 是 → 结合数据上下文提出具体问题
   │        示例："我发现有两个名为'web-app'的前端，我应该使用哪个前端名称？"
   └─ 否 → 调查并附带说明上报结果

Common Investigation Steps

常用调查步骤

For Performance Issues:

Compare to baseline: Query same metric for previous week
Segment by dimension: Break down by device, browser, geography
Check for outliers: Use percentiles (p50, p95, p99) vs. averages
Correlate with deployments: Filter by app version or time windows

For Data Availability Issues:

Start broad: Query all RUM data without filters
Add filters incrementally: Isolate which filter eliminates data
Check related metrics: If events missing, try timeseries
Validate entity relationships: Confirm frontend-to-service links

For Unexpected Patterns:

Expand timeframe: Look for historical context
Cross-reference data sources: Compare events and metrics
Check sampling: Verify no sampling is affecting results
Consider external factors: Holidays, outages, traffic changes

性能问题排查：

与基线对比：查询上一周的相同指标
按维度拆分：按设备、浏览器、地域拆解
检查异常值：使用分位数（p50、p95、p99）而非平均值
与部署关联：按应用版本或时间窗口过滤

数据可用性问题排查：

从宽泛查询开始：不加过滤器查询所有RUM数据
增量添加过滤器：定位哪个过滤器排除了数据
检查相关指标：如果事件缺失，尝试查询时序数据
验证实体关系：确认前端与服务的关联

异常模式排查：

扩大时间范围：查看历史上下文
交叉对比数据源：对比事件和指标
检查采样：确认无采样影响结果
考虑外部因素：节假日、故障、流量变化

Red Flags: When to Stop and Ask

红色警报：何时停止并询问用户

Always ask the user when:

❌ No RUM data exists anywhere in the environment
❌ Multiple frontends match the user's description
❌ Results contradict user's stated expectations explicitly
❌ Data suggests monitoring is misconfigured
❌ Query requires business context (e.g., "acceptable error rate")
❌ Timeframe is ambiguous and affects interpretation significantly

Example clarifying questions:

"I found two frontends named 'checkout'. Which one:
```
checkout-web
```
or
```
checkout-mobile
```
?"
"The query returns 0 results for the past hour. Should I expand the timeframe, or do you expect real-time data?"
"The average LCP is 8 seconds, which exceeds the 4-second threshold. Is this frontend known to have performance issues?"
"I see only synthetic traffic. Should I include
```
dt.rum.user_type='REAL_USER'
```
to focus on real users?"

出现以下情况时始终询问用户：

❌ 环境中完全不存在RUM数据
❌ 多个前端匹配用户的描述
❌ 结果明确与用户的预期矛盾
❌ 数据显示监控配置错误
❌ 查询需要业务上下文（例如"可接受的错误率"）
❌ 时间范围模糊且会显著影响解读

澄清问题示例：

"我找到了两个名为'checkout'的前端，应该用哪个：
```
checkout-web
```
还是
```
checkout-mobile
```
？"
"过去一小时的查询返回0结果，我应该扩大时间范围，还是您期望查询实时数据？"
"平均LCP为8秒，超过了4秒的阈值，该前端是否已知存在性能问题？"
"我仅看到合成监控流量，是否需要添加
```
dt.rum.user_type='REAL_USER'
```
过滤聚焦真实用户？"

When to Use This Skill

何时使用本Skill

Use frontend-observability skill when:

Monitoring web or mobile frontend performance
Analyzing Core Web Vitals for SEO
Tracking user sessions, engagement, or behavior
Analyzing click events and button interactions
Debugging frontend errors or slow requests
Correlating frontend issues with backend traces
Optimizing mobile app startup or crash rates (iOS, Android)
Analyzing app version performance
Diagnosing UI jank and main thread blocking
Analyzing security compliance (CSP violations)
Profiling JavaScript performance (long tasks)

Do NOT use for:

Backend service monitoring (use services skill)
Infrastructure metrics (use infrastructure skill)
Log analysis (use logs skill)
Business process monitoring (use business-events skill)

符合以下场景时使用前端可观测性Skill：

监控网页或移动端前端性能
分析Core Web Vitals用于SEO优化
追踪用户会话、参与度或行为
分析点击事件和按钮交互
调试前端错误或缓慢请求
关联前端问题与后端链路
优化移动应用启动速度或崩溃率（iOS、Android）
分析应用版本性能
诊断UI卡顿和主线程阻塞
分析安全合规性（CSP违规）
profiling JavaScript性能（长任务）

请勿用于以下场景：

后端服务监控（使用服务Skill）
基础设施指标（使用基础设施Skill）
日志分析（使用日志Skill）
业务流程监控（使用业务事件Skill）

Progressive Disclosure

渐进式披露

Always Available

始终可用

FrontendBasics.md - RUM fundamentals and quick reference

FrontendBasics.md - RUM基础和快速参考

Loaded by Workflow

按工作流加载

Web Performance: WebVitals.md, performance-analysis.md
User Behavior: user-sessions.md, performance-analysis.md
Error Analysis: error-tracking.md, performance-analysis.md
Mobile Apps: mobile-monitoring.md

网页性能：WebVitals.md、performance-analysis.md
用户行为：user-sessions.md、performance-analysis.md
错误分析：error-tracking.md、performance-analysis.md
移动应用：mobile-monitoring.md

Load on Explicit Request

显式请求时加载

Advanced diagnostics (long tasks, user actions)
Security compliance (CSP violations, visibility tracking)
Specialized mobile features (platform-specific phases)

高级诊断（长任务、用户操作）
安全合规（CSP违规、可见性追踪）
移动端特有功能（平台特有阶段）

Reference Files

参考文件

Core Reference Documents

核心参考文档

```
references/WebVitals.md
```
- Core Web Vitals monitoring
```
references/user-sessions.md
```
- Session and user analytics
```
references/error-tracking.md
```
- Error analysis and debugging
```
references/mobile-monitoring.md
```
- Mobile app performance and crashes
```
references/performance-analysis.md
```
- Advanced performance diagnostics

```
references/WebVitals.md
```
- Core Web Vitals监控
```
references/user-sessions.md
```
- 会话和用户分析
```
references/error-tracking.md
```
- 错误分析与调试
```
references/mobile-monitoring.md
```
- 移动应用性能和崩溃
```
references/performance-analysis.md
```
- 高级性能诊断

dt-obs-frontends

Original

Translation

Frontend Observability Skill

前端可观测性Skill

Overview

概述

Quick Reference

快速参考

Common Metrics

常用指标

Common Filters

常用过滤器

Common Timeseries Dimensions

常用时序维度

Event Characteristics

事件特征

Session Data (user.sessions)

会话数据（user.sessions）

Performance Thresholds

性能阈值

Core Workflows

核心工作流

1. Web Performance Monitoring

1. 网页性能监控

2. User Session & Behavior Analysis

2. 用户会话与行为分析

3. Error Tracking & Debugging

3. 错误追踪与调试

4. Mobile Frontend Monitoring

4. 移动端前端监控

5. Advanced Performance Optimization

5. 高级性能优化

Best Practices

最佳实践

Slow Page Load Playbook

页面加载缓慢排查手册

Backend latency (high TTFB)

后端延迟（高TTFB）

Heavy JavaScript execution (long tasks)

JavaScript执行过重（长任务）

Large JavaScript bundles

JavaScript包体积过大

Large resources

资源体积过大

Cache effectiveness

缓存效率

Compression waste

压缩浪费

Network issues

网络问题

Third-party dependencies

第三方依赖

Troubleshooting

故障排查

Handling Zero Results

处理零结果

Handling Anomalous Results

处理异常结果

Decision Tree: Ask vs. Investigate

决策树：询问vs调查

Common Investigation Steps

常用调查步骤

Red Flags: When to Stop and Ask

红色警报：何时停止并询问用户

When to Use This Skill

何时使用本Skill

Progressive Disclosure

渐进式披露

Always Available

始终可用

Loaded by Workflow

按工作流加载

Load on Explicit Request

显式请求时加载

Reference Files

参考文件

Core Reference Documents

核心参考文档

Session Data (
`user.sessions`
)

会话数据（
`user.sessions`
）