dt-obs-logs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Log Analysis Skill

日志分析技能

Query, filter, and analyze Dynatrace log data using DQL for troubleshooting and monitoring.
使用DQL查询、过滤和分析Dynatrace日志数据,用于故障排查和监控。

What This Skill Covers

技能涵盖范围

  • Fetching and filtering logs by severity, content, and entity
  • Searching log messages using pattern matching
  • Calculating error rates and statistics
  • Analyzing log patterns and trends
  • Grouping and aggregating log data by dimensions
  • 按严重程度、内容和实体获取并过滤日志
  • 使用模式匹配搜索日志消息
  • 计算错误率和统计数据
  • 分析日志模式和趋势
  • 按维度对日志数据进行分组和聚合

When to Use This Skill

适用场景

Use this skill when users want to:
  • Find specific log entries (e.g., "show me error logs from the last hour")
  • Filter logs by severity, process group, or content
  • Search logs for specific keywords or phrases
  • Calculate error rates or log statistics
  • Identify common error messages or patterns
  • Analyze log trends over time
  • Troubleshoot issues using log data
当用户需要完成以下操作时使用本技能:
  • 查找特定日志条目(例如:"给我展示过去一小时的错误日志")
  • 按严重程度、进程组或内容过滤日志
  • 在日志中搜索特定关键词或短语
  • 计算错误率或日志统计数据
  • 识别常见错误消息或模式
  • 分析日志随时间变化的趋势
  • 使用日志数据排查问题

Key Concepts

核心概念

Log Data Model

日志数据模型

  • timestamp: When the log entry was created
  • content: The log message text
  • status: Log level (ERROR, FATAL, WARN, INFO, etc.)
  • dt.process_group.id: Associated process group entity
  • dt.process_group.detected_name: Resolves process group IDs to human-readable names
  • timestamp: 日志条目创建时间
  • content: 日志消息文本
  • status: 日志级别(ERROR、FATAL、WARN、INFO等)
  • dt.process_group.id: 关联的进程组实体
  • dt.process_group.detected_name: 将进程组ID解析为人类可读的名称

Query Patterns

查询模式

  • fetch logs: Primary command for log data access
  • Time ranges: Use
    from:now() - <duration>
    for time windows
  • Filtering: Apply severity, content, and entity filters
  • Aggregation: Group and summarize log data
  • Pattern Detection: Use
    matchesPhrase()
    and
    contains()
    for content search
  • fetch logs: 访问日志数据的主要命令
  • 时间范围: 使用
    from:now() - <duration>
    指定时间窗口
  • 过滤: 应用严重程度、内容和实体过滤条件
  • 聚合: 对日志数据进行分组和汇总
  • 模式检测: 使用
    matchesPhrase()
    contains()
    进行内容搜索

Common Operations

常用操作

  • Severity filtering (single or multiple levels)
  • Content search (simple and full-text)
  • Entity-based filtering (process groups)
  • Time-series analysis (bucketing, sorting)
  • Error rate calculation
  • Pattern analysis (exceptions, timeouts, etc.)
  • 严重程度过滤(单个或多个级别)
  • 内容搜索(简单搜索和全文搜索)
  • 基于实体的过滤(进程组)
  • 时间序列分析(分桶、排序)
  • 错误率计算
  • 模式分析(异常、超时等)

Core Workflows

核心工作流

1. Log Searching

1. 日志搜索

Find specific log entries by time, severity, and content.
Typical steps:
  1. Define time range
  2. Filter by severity (optional)
  3. Search content for keywords
  4. Select relevant fields
  5. Sort and limit results
Example:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 100
按时间、严重程度和内容查找特定日志条目。
典型步骤:
  1. 定义时间范围
  2. 按严重程度过滤(可选)
  3. 搜索内容中的关键词
  4. 选择相关字段
  5. 对结果排序并限制数量
示例:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 100

2. Log Filtering

2. 日志过滤

Narrow down logs using multiple criteria (severity, entity, content).
Typical steps:
  1. Fetch logs with time range
  2. Apply severity filters
  3. Filter by entity (process_group)
  4. Apply content filters
  5. Format and sort output
Example:
dql
fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by: {dt.process_group.id, dt.process_group.detected_name}
| fieldsAdd process_group = dt.process_group.detected_name
| sort `count()` desc
使用多个条件(严重程度、实体、内容)缩小日志范围。
典型步骤:
  1. 获取指定时间范围内的日志
  2. 应用严重程度过滤条件
  3. 按实体(进程组)过滤
  4. 应用内容过滤条件
  5. 格式化输出并排序
示例:
dql
fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by: {dt.process_group.id, dt.process_group.detected_name}
| fieldsAdd process_group = dt.process_group.detected_name
| sort `count()` desc

3. Pattern Analysis

3. 模式分析

Identify patterns, trends, and anomalies in log data.
Typical steps:
  1. Fetch logs with time range
  2. Add pattern detection fields
  3. Aggregate by entity or time
  4. Calculate statistics and ratios
  5. Sort by frequency or rate
Example:
dql
fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
    has_exception = if(matchesPhrase(content, "exception"), true, else: false),
    has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
    count(),
    exception_count = countIf(has_exception == true),
    timeout_count = countIf(has_timeout == true),
    by: {process_group = dt.process_group.detected_name}
识别日志数据中的模式、趋势和异常。
典型步骤:
  1. 获取指定时间范围内的日志
  2. 添加模式检测字段
  3. 按实体或时间聚合
  4. 计算统计数据和比率
  5. 按频率或比率排序
示例:
dql
fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
    has_exception = if(matchesPhrase(content, "exception"), true, else: false),
    has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
    count(),
    exception_count = countIf(has_exception == true),
    timeout_count = countIf(has_timeout == true),
    by: {process_group = dt.process_group.detected_name}

Key Functions

核心函数

Filtering

过滤

  • filter status == "ERROR"
    - Filter by status level
  • in(status, "ERROR", "FATAL", "WARN")
    - Multi-status filter
  • contains(content, "keyword")
    - Simple substring search
  • matchesPhrase(content, "exact phrase")
    - Full-text phrase search
  • filter status == "ERROR"
    - 按状态级别过滤
  • in(status, "ERROR", "FATAL", "WARN")
    - 多状态过滤
  • contains(content, "keyword")
    - 简单子字符串搜索
  • matchesPhrase(content, "exact phrase")
    - 全文短语搜索

Entity Operations

实体操作

  • dt.process_group.detected_name
    - Get human-readable process group name
  • filter process_group == "service-name"
    - Filter by specific entity
  • dt.process_group.detected_name
    - 获取人类可读的进程组名称
  • filter process_group == "service-name"
    - 按特定实体过滤

Aggregation

聚合

  • count()
    - Count all log entries
  • countIf(condition)
    - Conditional count
  • by: {dimension}
    - Group by entity or time bucket
  • bin(timestamp, 5m)
    - Time bucketing for trends
  • count()
    - 统计所有日志条目数量
  • countIf(condition)
    - 条件计数
  • by: {dimension}
    - 按实体或时间桶分组
  • bin(timestamp, 5m)
    - 按时间分桶用于趋势分析

Field Operations

字段操作

  • fields timestamp, content, status
    - Select specific fields
  • fieldsAdd name = expression
    - Add computed fields
  • if(condition, true_value, else: false_value)
    - Conditional logic
  • fields timestamp, content, status
    - 选择特定字段
  • fieldsAdd name = expression
    - 添加计算字段
  • if(condition, true_value, else: false_value)
    - 条件逻辑

Common Patterns

常用模式

Content Search

内容搜索

Simple substring search:
dql
fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, status
Full-text phrase search:
dql
fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_name
简单子字符串搜索:
dql
fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, status
全文短语搜索:
dql
fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_name

Error Rate Calculation

错误率计算

Calculate error rates over time:
dql
fetch logs, from:now() - 2h
| summarize
    total_logs = count(),
    error_logs = countIf(status == "ERROR"),
    by: {time_bucket = bin(timestamp, 5m)}
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket asc
计算随时间变化的错误率:
dql
fetch logs, from:now() - 2h
| summarize
    total_logs = count(),
    error_logs = countIf(status == "ERROR"),
    by: {time_bucket = bin(timestamp, 5m)}
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket asc

Top Error Messages

高频错误消息

Find most common errors:
dql
fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by: {content}
| sort error_count desc
| limit 20
查找最常见的错误:
dql
fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by: {content}
| sort error_count desc
| limit 20

Process Group-Specific Logs

特定进程组日志

Filter logs by process group:
dql
fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp desc
按进程组过滤日志:
dql
fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp desc

Structured / JSON Log Parsing

结构化/JSON日志解析

Many applications emit JSON-formatted log lines. Use
parse
to extract fields instead of dumping raw content:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50
Aggregate by a parsed field:
dql
fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by: {message}
| sort error_count desc
| limit 20
Notes:
  • parse content, "JSON:log"
    creates a record field
    log
    — access nested values with
    log[key]
  • Filter logs with
    contains()
    before
    parse
    to reduce parsing overhead
  • Works with any JSON-structured field, not just
    content
很多应用输出JSON格式的日志行。使用
parse
提取字段,而不是输出原始内容:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50
按解析后的字段聚合:
dql
fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by: {message}
| sort error_count desc
| limit 20
注意事项:
  • parse content, "JSON:log"
    会创建一个记录字段
    log
    — 用
    log[key]
    访问嵌套值
  • parse
    之前
    contains()
    过滤日志,减少解析开销
  • 适用于所有JSON结构的字段,不仅限于
    content

Best Practices

最佳实践

  1. Always specify time ranges - Use
    from:now() - <duration>
    to limit data
  2. Apply filters early - Filter by severity and entity before aggregation
  3. Use appropriate search methods -
    contains()
    for simple,
    matchesPhrase()
    for exact
  4. Limit results - Add
    | limit 100
    to prevent overwhelming output
  5. Sort meaningfully - Sort by timestamp for recent logs, by count for top errors
  6. Name entities - Use
    dt.process_group.detected_name
    or
    getNodeName()
    for human-readable output
  7. Use time buckets for trends -
    bin(timestamp, 5m)
    for time-series analysis
  1. 始终指定时间范围 - 使用
    from:now() - <duration>
    限制数据量
  2. 尽早应用过滤条件 - 在聚合前先按严重程度和实体过滤
  3. 使用合适的搜索方法 - 简单搜索用
    contains()
    ,精确短语搜索用
    matchesPhrase()
  4. 限制结果数量 - 添加
    | limit 100
    避免输出过多内容
  5. 合理排序 - 查看最新日志按时间戳排序,查看高频错误按计数排序
  6. 实体名称转换 - 使用
    dt.process_group.detected_name
    getNodeName()
    输出人类可读的内容
  7. 趋势分析使用时间桶 - 时间序列分析用
    bin(timestamp, 5m)

Integration Points

集成点

  • Entity model: Uses
    dt.process_group.id
    for service correlation
  • Time series: Supports temporal analysis with
    bin()
    and time ranges
  • Content search: Full-text search capabilities via
    matchesPhrase()
  • Aggregation: Statistical analysis using
    summarize
    and conditional functions
  • 实体模型: 使用
    dt.process_group.id
    进行服务关联
  • 时间序列: 通过
    bin()
    和时间范围支持时间分析
  • 内容搜索: 通过
    matchesPhrase()
    提供全文搜索能力
  • 聚合: 使用
    summarize
    和条件函数进行统计分析

Limitations & Notes

限制与注意事项

  • Log availability depends on OneAgent configuration and log ingestion
  • Full-text search (
    matchesPhrase
    ) may have performance implications on large datasets
  • Entity names require proper OneAgent monitoring for resolution
  • Time ranges should be reasonable (avoid unbounded queries)
  • 日志可用性取决于OneAgent配置和日志摄入设置
  • 全文搜索(
    matchesPhrase
    )在大数据集上可能存在性能影响
  • 实体名称需要正确的OneAgent监控配置才能解析
  • 时间范围应合理(避免无边界查询)

Related Skills

相关技能

  • dt-dql-essentials - Core DQL syntax and query structure for log queries
  • dt-obs-tracing - Correlate logs with distributed traces using trace IDs
  • dt-obs-problems - Correlate logs with DAVIS-detected problems
  • dt-dql-essentials - 用于日志查询的核心DQL语法和查询结构
  • dt-obs-tracing - 使用追踪ID将日志与分布式追踪关联
  • dt-obs-problems - 将日志与DAVIS检测到的问题关联