datadog-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatadog Observability
Datadog可观测性
Overview
概述
Datadog is a SaaS observability platform providing unified monitoring across infrastructure, applications, logs, and user experience. It offers AI-powered anomaly detection, 1000+ integrations, and OpenTelemetry compatibility.
Core Capabilities:
- APM: Distributed tracing with automatic instrumentation for 8+ languages
- Infrastructure: Host, container, and cloud service monitoring
- Logs: Centralized collection with processing pipelines and 15-month retention
- Metrics: Custom metrics via DogStatsD with cardinality management
- Synthetics: Proactive API and browser testing from 29+ global locations
- RUM: Frontend performance with Core Web Vitals and session replay
Datadog是一款SaaS可观测性平台,提供跨基础设施、应用、日志和用户体验的统一监控。它具备AI驱动的异常检测功能、1000+集成以及OpenTelemetry兼容性。
核心功能:
- APM:支持8+种语言自动插桩的分布式追踪
- 基础设施监控:主机、容器和云服务监控
- 日志管理:集中式收集,附带处理流水线及15个月数据留存
- 指标管理:通过DogStatsD自定义指标,支持基数管理
- 合成监控:从全球29+个位置主动进行API和浏览器测试
- RUM:包含核心Web指标和会话重放的前端性能监控
When to Use This Skill
何时使用该技能
Activate when:
- Setting up production monitoring and observability
- Implementing distributed tracing across microservices
- Configuring log aggregation and analysis pipelines
- Creating custom metrics and dashboards
- Setting up alerting and anomaly detection
- Optimizing Datadog costs
Do not use when:
- Building with open-source stack (use Prometheus/Grafana instead)
- Cost is primary concern and budget is limited
- Need maximum customization over managed solution
启用场景:
- 搭建生产环境监控与可观测性体系
- 在微服务间实施分布式追踪
- 配置日志聚合与分析流水线
- 创建自定义指标和仪表盘
- 配置告警与异常检测
- 优化Datadog使用成本
禁用场景:
- 使用开源技术栈(建议使用Prometheus/Grafana替代)
- 成本为首要考虑因素且预算有限
- 需要对托管方案进行最大化定制
Quick Start
快速开始
1. Install Datadog Agent
1. 安装Datadog Agent
Docker (simplest):
bash
docker run -d --name dd-agent \
-e DD_API_KEY=<YOUR_API_KEY> \
-e DD_SITE="datadoghq.com" \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
gcr.io/datadoghq/agent:7Kubernetes (Helm):
bash
helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
--set datadog.apiKey=<YOUR_API_KEY> \
--set datadog.apm.enabled=true \
--set datadog.logs.enabled=trueDocker(最简方式):
bash
docker run -d --name dd-agent \
-e DD_API_KEY=<YOUR_API_KEY> \
-e DD_SITE="datadoghq.com" \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
gcr.io/datadoghq/agent:7Kubernetes(Helm):
bash
helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
--set datadog.apiKey=<YOUR_API_KEY> \
--set datadog.apm.enabled=true \
--set datadog.logs.enabled=true2. Instrument Your Application
2. 为应用插桩
Python:
python
from ddtrace import tracer, patch_allPython:
python
from ddtrace import tracer, patch_allAutomatic instrumentation for common libraries
Automatic instrumentation for common libraries
patch_all()
patch_all()
Manual span for custom operations
Manual span for custom operations
with tracer.trace("custom.operation", service="my-service") as span:
span.set_tag("user.id", user_id)
# your code here
**Node.js:**
```javascript
// Must be first import
const tracer = require('dd-trace').init({
service: 'my-service',
env: 'production',
version: '1.0.0',
});with tracer.trace("custom.operation", service="my-service") as span:
span.set_tag("user.id", user_id)
# your code here
**Node.js:**
```javascript
// Must be first import
const tracer = require('dd-trace').init({
service: 'my-service',
env: 'production',
version: '1.0.0',
});3. Verify in Datadog UI
3. 在Datadog UI中验证
- Go to Infrastructure > Host Map to verify agent
- Go to APM > Services to see traced services
- Go to Logs > Search to verify log collection
- 进入“基础设施 > 主机地图”验证Agent状态
- 进入“APM > 服务”查看已追踪的服务
- 进入“日志 > 搜索”验证日志收集情况
Core Concepts
核心概念
Tagging Strategy
标签策略
Tags enable filtering, aggregation, and cost attribution. Use consistent tags across all telemetry.
Required Tags:
| Tag | Purpose | Example |
|---|---|---|
| Environment | |
| Service name | |
| Deployment version | |
| Owning team | |
Avoid High-Cardinality Tags:
- User IDs, request IDs, timestamps
- Pod IDs in Kubernetes
- Build numbers, commit hashes
标签用于过滤、聚合和成本归因。所有遥测数据应使用统一的标签规范。
必填标签:
| 标签 | 用途 | 示例 |
|---|---|---|
| 环境标识 | |
| 服务名称 | |
| 部署版本 | |
| 负责团队 | |
避免高基数标签:
- 用户ID、请求ID、时间戳
- Kubernetes中的Pod ID
- 构建编号、提交哈希
Unified Observability
统一可观测性
Datadog correlates metrics, traces, and logs automatically:
- Traces include span tags that link to metrics
- Logs inject trace IDs for correlation
- Dashboards combine all data sources
Datadog自动关联指标、追踪和日志:
- 追踪包含关联至指标的 span 标签
- 日志注入追踪ID以实现关联
- 仪表盘整合所有数据源
Best Practices
最佳实践
Start Simple
从简入手
- Install Agent with basic configuration
- Enable automatic instrumentation
- Verify data in Datadog UI
- Add custom spans/metrics as needed
- 安装Agent并使用基础配置
- 启用自动插桩
- 在Datadog UI中验证数据
- 根据需要添加自定义span/指标
Progressive Enhancement
渐进式增强
Basic → APM tracing → Custom spans → Custom metrics → Profiling → RUM基础配置 → APM追踪 → 自定义Span → 自定义指标 → 性能分析 → RUMKey Instrumentation Points
关键插桩点
- HTTP entry/exit points
- Database queries
- External service calls
- Message queue operations
- Business-critical flows
- HTTP入口/出口点
- 数据库查询
- 外部服务调用
- 消息队列操作
- 业务关键流程
Common Mistakes
常见误区
- High-cardinality tags: Using user IDs or request IDs as tags creates millions of unique metrics
- Missing log index quotas: Leads to unexpected bills from log volume spikes
- Over-alerting: Creates alert fatigue; alert on symptoms, not causes
- Missing service tags: Prevents correlation between metrics, traces, and logs
- No sampling for high-volume traces: Ingests everything, causing cost explosion
- 高基数标签:将用户ID或请求ID作为标签会生成数百万个唯一指标
- 未设置日志索引配额:日志量突增会导致意外账单
- 告警过度:引发告警疲劳;应针对症状而非原因告警
- 缺失服务标签:无法关联指标、追踪和日志
- 高流量追踪未采样:全量摄入会导致成本激增
Navigation
导航
For detailed implementation:
- Agent Installation: Docker, Kubernetes, Linux, Windows, and cloud-specific setup
- APM Instrumentation: Python, Node.js, Go, Java instrumentation with code examples
- Log Management: Pipelines, Grok parsing, standard attributes, archives
- Custom Metrics: DogStatsD patterns, metric types, tagging best practices
- Alerting: Monitor types, anomaly detection, alert hygiene
- Cost Optimization: Metrics without Limits, sampling, index quotas
- Kubernetes: DaemonSet, Cluster Agent, autodiscovery
如需详细实施指南:
- Agent安装:Docker、Kubernetes、Linux、Windows及云环境专属配置
- APM插桩:Python、Node.js、Go、Java插桩及代码示例
- 日志管理:流水线、Grok解析、标准属性、归档
- 自定义指标:DogStatsD模式、指标类型、标签最佳实践
- 告警配置:监控类型、异常检测、告警治理
- 成本优化:无限制指标、采样、索引配额
- Kubernetes:DaemonSet、集群Agent、自动发现
Complementary Skills
互补技能
When using this skill, consider these related skills (if deployed):
- docker: Container instrumentation patterns
- kubernetes: K8s-native monitoring patterns
- python/nodejs/go: Language-specific APM setup
使用本技能时,可结合以下相关技能(若已部署):
- docker:容器插桩模式
- kubernetes:K8s原生监控模式
- python/nodejs/go:语言专属APM配置
Resources
资源
Official Documentation:
- APM: https://docs.datadoghq.com/tracing/
- Logs: https://docs.datadoghq.com/logs/
- Metrics: https://docs.datadoghq.com/metrics/
- DogStatsD: https://docs.datadoghq.com/developers/dogstatsd/
Cost Management: