fireflies-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Fireflies.ai Observability

Fireflies.ai 可观测性

Overview

概述

Set up comprehensive observability for Fireflies.ai integrations.
为Fireflies.ai集成设置全面的可观测性方案。

Prerequisites

前置条件

  • Prometheus or compatible metrics backend
  • OpenTelemetry SDK installed
  • Grafana or similar dashboarding tool
  • AlertManager configured
  • Prometheus或兼容的指标后端
  • 已安装OpenTelemetry SDK
  • Grafana或类似的仪表板工具
  • 已配置AlertManager

Metrics Collection

指标采集

Key Metrics

关键指标

MetricTypeDescription
fireflies_requests_total
CounterTotal API requests
fireflies_request_duration_seconds
HistogramRequest latency
fireflies_errors_total
CounterError count by type
fireflies_rate_limit_remaining
GaugeRate limit headroom
指标类型描述
fireflies_requests_total
CounterAPI请求总数
fireflies_request_duration_seconds
Histogram请求延迟
fireflies_errors_total
Counter按类型统计的错误数量
fireflies_rate_limit_remaining
Gauge剩余请求额度

Prometheus Metrics

Prometheus 指标

typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';

const registry = new Registry();

const requestCounter = new Counter({
  name: 'fireflies_requests_total',
  help: 'Total Fireflies.ai API requests',
  labelNames: ['method', 'status'],
  registers: [registry],
});

const requestDuration = new Histogram({
  name: 'fireflies_request_duration_seconds',
  help: 'Fireflies.ai request duration',
  labelNames: ['method'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
  registers: [registry],
});

const errorCounter = new Counter({
  name: 'fireflies_errors_total',
  help: 'Fireflies.ai errors by type',
  labelNames: ['error_type'],
  registers: [registry],
});
typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';

const registry = new Registry();

const requestCounter = new Counter({
  name: 'fireflies_requests_total',
  help: 'Total Fireflies.ai API requests',
  labelNames: ['method', 'status'],
  registers: [registry],
});

const requestDuration = new Histogram({
  name: 'fireflies_request_duration_seconds',
  help: 'Fireflies.ai request duration',
  labelNames: ['method'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
  registers: [registry],
});

const errorCounter = new Counter({
  name: 'fireflies_errors_total',
  help: 'Fireflies.ai errors by type',
  labelNames: ['error_type'],
  registers: [registry],
});

Instrumented Client

埋点客户端

typescript
async function instrumentedRequest<T>(
  method: string,
  operation: () => Promise<T>
): Promise<T> {
  const timer = requestDuration.startTimer({ method });

  try {
    const result = await operation();
    requestCounter.inc({ method, status: 'success' });
    return result;
  } catch (error: any) {
    requestCounter.inc({ method, status: 'error' });
    errorCounter.inc({ error_type: error.code || 'unknown' });
    throw error;
  } finally {
    timer();
  }
}
typescript
async function instrumentedRequest<T>(
  method: string,
  operation: () => Promise<T>
): Promise<T> {
  const timer = requestDuration.startTimer({ method });

  try {
    const result = await operation();
    requestCounter.inc({ method, status: 'success' });
    return result;
  } catch (error: any) {
    requestCounter.inc({ method, status: 'error' });
    errorCounter.inc({ error_type: error.code || 'unknown' });
    throw error;
  } finally {
    timer();
  }
}

Distributed Tracing

分布式追踪

OpenTelemetry Setup

OpenTelemetry 设置

typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('fireflies-client');

async function tracedFireflies.aiCall<T>(
  operationName: string,
  operation: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(`fireflies.${operationName}`, async (span) => {
    try {
      const result = await operation();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}
typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('fireflies-client');

async function tracedFireflies.aiCall<T>(
  operationName: string,
  operation: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(`fireflies.${operationName}`, async (span) => {
    try {
      const result = await operation();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Logging Strategy

日志策略

Structured Logging

结构化日志

typescript
import pino from 'pino';

const logger = pino({
  name: 'fireflies',
  level: process.env.LOG_LEVEL || 'info',
});

function logFireflies.aiOperation(
  operation: string,
  data: Record<string, any>,
  duration: number
) {
  logger.info({
    service: 'fireflies',
    operation,
    duration_ms: duration,
    ...data,
  });
}
typescript
import pino from 'pino';

const logger = pino({
  name: 'fireflies',
  level: process.env.LOG_LEVEL || 'info',
});

function logFireflies.aiOperation(
  operation: string,
  data: Record<string, any>,
  duration: number
) {
  logger.info({
    service: 'fireflies',
    operation,
    duration_ms: duration,
    ...data,
  });
}

Alert Configuration

告警配置

Prometheus AlertManager Rules

Prometheus AlertManager 规则

yaml
undefined
yaml
undefined

fireflies_alerts.yaml

fireflies_alerts.yaml

groups:
  • name: fireflies_alerts rules:
    • alert: Fireflies.aiHighErrorRate expr: | rate(fireflies_errors_total[5m]) / rate(fireflies_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Fireflies.ai error rate > 5%"
    • alert: Fireflies.aiHighLatency expr: | histogram_quantile(0.95, rate(fireflies_request_duration_seconds_bucket[5m]) ) > 2 for: 5m labels: severity: warning annotations: summary: "Fireflies.ai P95 latency > 2s"
    • alert: Fireflies.aiDown expr: up{job="fireflies"} == 0 for: 1m labels: severity: critical annotations: summary: "Fireflies.ai integration is down"
undefined
groups:
  • name: fireflies_alerts rules:
    • alert: Fireflies.aiHighErrorRate expr: | rate(fireflies_errors_total[5m]) / rate(fireflies_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Fireflies.ai error rate > 5%"
    • alert: Fireflies.aiHighLatency expr: | histogram_quantile(0.95, rate(fireflies_request_duration_seconds_bucket[5m]) ) > 2 for: 5m labels: severity: warning annotations: summary: "Fireflies.ai P95 latency > 2s"
    • alert: Fireflies.aiDown expr: up{job="fireflies"} == 0 for: 1m labels: severity: critical annotations: summary: "Fireflies.ai integration is down"
undefined

Dashboard

仪表板

Grafana Panel Queries

Grafana 面板查询

json
{
  "panels": [
    {
      "title": "Fireflies.ai Request Rate",
      "targets": [{
        "expr": "rate(fireflies_requests_total[5m])"
      }]
    },
    {
      "title": "Fireflies.ai Latency P50/P95/P99",
      "targets": [{
        "expr": "histogram_quantile(0.5, rate(fireflies_request_duration_seconds_bucket[5m]))"
      }]
    }
  ]
}
json
{
  "panels": [
    {
      "title": "Fireflies.ai Request Rate",
      "targets": [{
        "expr": "rate(fireflies_requests_total[5m])"
      }]
    },
    {
      "title": "Fireflies.ai Latency P50/P95/P99",
      "targets": [{
        "expr": "histogram_quantile(0.5, rate(fireflies_request_duration_seconds_bucket[5m]))"
      }]
    }
  ]
}

Instructions

操作步骤

Step 1: Set Up Metrics Collection

步骤1:设置指标采集

Implement Prometheus counters, histograms, and gauges for key operations.
为关键操作实现Prometheus计数器、直方图和仪表盘。

Step 2: Add Distributed Tracing

步骤2:添加分布式追踪

Integrate OpenTelemetry for end-to-end request tracing.
集成OpenTelemetry以实现端到端请求追踪。

Step 3: Configure Structured Logging

步骤3:配置结构化日志

Set up JSON logging with consistent field names.
设置具有一致字段名的JSON日志。

Step 4: Create Alert Rules

步骤4:创建告警规则

Define Prometheus alerting rules for error rates and latency.
为错误率和延迟定义Prometheus告警规则。

Output

输出结果

  • Metrics collection enabled
  • Distributed tracing configured
  • Structured logging implemented
  • Alert rules deployed
  • 已启用指标采集
  • 已配置分布式追踪
  • 已实现结构化日志
  • 已部署告警规则

Error Handling

错误处理

IssueCauseSolution
Missing metricsNo instrumentationWrap client calls
Trace gapsMissing propagationCheck context headers
Alert stormsWrong thresholdsTune alert rules
High cardinalityToo many labelsReduce label values
问题原因解决方案
指标缺失未埋点包装客户端调用
追踪断链上下文传递缺失检查上下文头信息
告警风暴阈值设置错误调整告警规则
高基数标签过多减少标签值数量

Examples

示例

Quick Metrics Endpoint

快速指标端点

typescript
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.send(await registry.metrics());
});
typescript
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.send(await registry.metrics());
});

Resources

资源

Next Steps

下一步

For incident response, see
fireflies-incident-runbook
.
关于事件响应,请查看
fireflies-incident-runbook