instantly-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInstantly Observability
Instantly 可观测性
Overview
概述
Set up comprehensive observability for Instantly integrations.
为Instantly集成搭建全面的可观测能力。
Prerequisites
前置要求
- Prometheus or compatible metrics backend
- OpenTelemetry SDK installed
- Grafana or similar dashboarding tool
- AlertManager configured
- Prometheus或兼容的指标存储后端
- 已安装OpenTelemetry SDK
- Grafana或类似的仪表盘工具
- 已配置AlertManager
Metrics Collection
指标采集
Key Metrics
核心指标
| Metric | Type | Description |
|---|---|---|
| Counter | Total API requests |
| Histogram | Request latency |
| Counter | Error count by type |
| Gauge | Rate limit headroom |
| 指标 | 类型 | 描述 |
|---|---|---|
| Counter | 总API请求数 |
| Histogram | 请求延迟 |
| Counter | 按类型统计的错误数 |
| Gauge | 剩余限流额度 |
Prometheus Metrics
Prometheus Metrics
typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
const requestCounter = new Counter({
name: 'instantly_requests_total',
help: 'Total Instantly API requests',
labelNames: ['method', 'status'],
registers: [registry],
});
const requestDuration = new Histogram({
name: 'instantly_request_duration_seconds',
help: 'Instantly request duration',
labelNames: ['method'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry],
});
const errorCounter = new Counter({
name: 'instantly_errors_total',
help: 'Instantly errors by type',
labelNames: ['error_type'],
registers: [registry],
});typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
const requestCounter = new Counter({
name: 'instantly_requests_total',
help: 'Total Instantly API requests',
labelNames: ['method', 'status'],
registers: [registry],
});
const requestDuration = new Histogram({
name: 'instantly_request_duration_seconds',
help: 'Instantly request duration',
labelNames: ['method'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry],
});
const errorCounter = new Counter({
name: 'instantly_errors_total',
help: 'Instantly errors by type',
labelNames: ['error_type'],
registers: [registry],
});Instrumented Client
埋点客户端
typescript
async function instrumentedRequest<T>(
method: string,
operation: () => Promise<T>
): Promise<T> {
const timer = requestDuration.startTimer({ method });
try {
const result = await operation();
requestCounter.inc({ method, status: 'success' });
return result;
} catch (error: any) {
requestCounter.inc({ method, status: 'error' });
errorCounter.inc({ error_type: error.code || 'unknown' });
throw error;
} finally {
timer();
}
}typescript
async function instrumentedRequest<T>(
method: string,
operation: () => Promise<T>
): Promise<T> {
const timer = requestDuration.startTimer({ method });
try {
const result = await operation();
requestCounter.inc({ method, status: 'success' });
return result;
} catch (error: any) {
requestCounter.inc({ method, status: 'error' });
errorCounter.inc({ error_type: error.code || 'unknown' });
throw error;
} finally {
timer();
}
}Distributed Tracing
分布式链路追踪
OpenTelemetry Setup
OpenTelemetry配置
typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('instantly-client');
async function tracedInstantlyCall<T>(
operationName: string,
operation: () => Promise<T>
): Promise<T> {
return tracer.startActiveSpan(`instantly.${operationName}`, async (span) => {
try {
const result = await operation();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('instantly-client');
async function tracedInstantlyCall<T>(
operationName: string,
operation: () => Promise<T>
): Promise<T> {
return tracer.startActiveSpan(`instantly.${operationName}`, async (span) => {
try {
const result = await operation();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}Logging Strategy
日志策略
Structured Logging
结构化日志
typescript
import pino from 'pino';
const logger = pino({
name: 'instantly',
level: process.env.LOG_LEVEL || 'info',
});
function logInstantlyOperation(
operation: string,
data: Record<string, any>,
duration: number
) {
logger.info({
service: 'instantly',
operation,
duration_ms: duration,
...data,
});
}typescript
import pino from 'pino';
const logger = pino({
name: 'instantly',
level: process.env.LOG_LEVEL || 'info',
});
function logInstantlyOperation(
operation: string,
data: Record<string, any>,
duration: number
) {
logger.info({
service: 'instantly',
operation,
duration_ms: duration,
...data,
});
}Alert Configuration
告警配置
Prometheus AlertManager Rules
Prometheus AlertManager规则
yaml
undefinedyaml
undefinedinstantly_alerts.yaml
instantly_alerts.yaml
groups:
- name: instantly_alerts
rules:
-
alert: InstantlyHighErrorRate expr: | rate(instantly_errors_total[5m]) / rate(instantly_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Instantly error rate > 5%"
-
alert: InstantlyHighLatency expr: | histogram_quantile(0.95, rate(instantly_request_duration_seconds_bucket[5m]) ) > 2 for: 5m labels: severity: warning annotations: summary: "Instantly P95 latency > 2s"
-
alert: InstantlyDown expr: up{job="instantly"} == 0 for: 1m labels: severity: critical annotations: summary: "Instantly integration is down"
-
undefinedgroups:
- name: instantly_alerts
rules:
-
alert: InstantlyHighErrorRate expr: | rate(instantly_errors_total[5m]) / rate(instantly_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Instantly error rate > 5%"
-
alert: InstantlyHighLatency expr: | histogram_quantile(0.95, rate(instantly_request_duration_seconds_bucket[5m]) ) > 2 for: 5m labels: severity: warning annotations: summary: "Instantly P95 latency > 2s"
-
alert: InstantlyDown expr: up{job="instantly"} == 0 for: 1m labels: severity: critical annotations: summary: "Instantly integration is down"
-
undefinedDashboard
仪表盘
Grafana Panel Queries
Grafana面板查询
json
{
"panels": [
{
"title": "Instantly Request Rate",
"targets": [{
"expr": "rate(instantly_requests_total[5m])"
}]
},
{
"title": "Instantly Latency P50/P95/P99",
"targets": [{
"expr": "histogram_quantile(0.5, rate(instantly_request_duration_seconds_bucket[5m]))"
}]
}
]
}json
{
"panels": [
{
"title": "Instantly Request Rate",
"targets": [{
"expr": "rate(instantly_requests_total[5m])"
}]
},
{
"title": "Instantly Latency P50/P95/P99",
"targets": [{
"expr": "histogram_quantile(0.5, rate(instantly_request_duration_seconds_bucket[5m]))"
}]
}
]
}Instructions
操作指南
Step 1: Set Up Metrics Collection
步骤1:搭建指标采集
Implement Prometheus counters, histograms, and gauges for key operations.
为核心操作实现Prometheus计数器、直方图和仪表盘指标。
Step 2: Add Distributed Tracing
步骤2:添加分布式链路追踪
Integrate OpenTelemetry for end-to-end request tracing.
集成OpenTelemetry实现端到端请求追踪。
Step 3: Configure Structured Logging
步骤3:配置结构化日志
Set up JSON logging with consistent field names.
搭建字段命名统一的JSON日志。
Step 4: Create Alert Rules
步骤4:创建告警规则
Define Prometheus alerting rules for error rates and latency.
定义针对错误率和延迟的Prometheus告警规则。
Output
输出
- Metrics collection enabled
- Distributed tracing configured
- Structured logging implemented
- Alert rules deployed
- 已启用指标采集
- 已配置分布式链路追踪
- 已实现结构化日志
- 已部署告警规则
Error Handling
错误处理
| Issue | Cause | Solution |
|---|---|---|
| Missing metrics | No instrumentation | Wrap client calls |
| Trace gaps | Missing propagation | Check context headers |
| Alert storms | Wrong thresholds | Tune alert rules |
| High cardinality | Too many labels | Reduce label values |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 缺失指标 | 无埋点 | 封装客户端调用 |
| 链路断层 | 缺少上下文传播 | 检查上下文头 |
| 告警风暴 | 阈值不合理 | 调整告警规则 |
| 高基数 | 标签过多 | 减少标签值数量 |
Examples
示例
Quick Metrics Endpoint
快速搭建指标端点
typescript
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});typescript
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});Resources
参考资源
Next Steps
后续步骤
For incident response, see .
instantly-incident-runbook如需了解事件响应,请查看。
instantly-incident-runbook