built-in-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Metrics Instrumentation

Agent指标埋点

You're using a skill that wires LaunchDarkly agent metrics around an existing provider call. Your job is to audit what's already there, pick the right tier from the ladder below, and implement it with the least ceremony that still captures the metrics the Monitoring tab needs (duration, input/output tokens, success/error, plus TTFT when streaming).
The single most important thing to get right: default to the highest tier that fits the shape of the call. Going lower ("just write the manual tracker calls") looks flexible but costs you drift, missed metrics, and legacy patterns the SDKs have moved past.
你正在使用一项技能,为现有提供商调用添加LaunchDarkly Agent指标追踪。你的任务是审核现有实现,从下方的四层阶梯中选择合适的层级,以最简洁的方式实现监控标签所需的指标捕获(时长、输入/输出令牌数、成功/错误状态,以及流式场景下的TTFT)。
最重要的原则:优先选择符合调用形态的最高层级。选择更低层级(“直接编写手动追踪调用”)看似灵活,但会导致指标漂移、遗漏以及使用SDK已淘汰的旧模式。

The four-tier ladder

四层实现阶梯

This is the order the official SDK READMEs (Python core, Node core, and every provider package) recommend. Walk from the top and stop at the first tier that fits:
TierPatternUse whenTracks automatically
1 — Managed runnerPython:
ai_client.create_model(...)
returning a
ManagedModel
, then
await model.run(...)
. <br>Node:
aiClient.createModel(...)
returning a
ManagedModel
, then
await model.run(...)
.
The call is conversational (chat history, turn-based). This is what the provider READMEs lead with.Duration, tokens, success/error — all of it, zero tracker calls.
2 — Provider package +
trackMetricsOf
tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())
. Provider packages today:
@launchdarkly/server-sdk-ai-openai
,
-langchain
,
-vercel
(Node) and
launchdarkly-server-sdk-ai-openai
,
-langchain
(Python).
The shape isn't a chat loop (one-shot completion, structured output, agent step) but the framework or provider has a package.Duration + success/error from the wrapper; tokens from the package's built-in
getAIMetricsFromResponse
extractor.
3 — Custom extractor +
trackMetricsOf
Same
trackMetricsOf
wrapper, but you write a small function that maps the provider response to
LDAIMetrics
(tokens + success).
No provider package exists (Anthropic direct, Gemini, Cohere, custom HTTP).Duration + success/error from the wrapper; tokens from your extractor.
4 — Raw manualSeparate calls to
trackDuration
,
trackTokens
,
trackSuccess
/
trackError
, plus
trackTimeToFirstToken
for streams.
Streaming with TTFT, unusual response shapes, partial tracking, anything Tier 2–3 can't cleanly wrap.Only what you explicitly call — it's on you to not miss one.
Every provider — OpenAI, LangChain, Vercel, Bedrock, Anthropic, Gemini, custom HTTP — uses the same generic shape:
tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())
in Node,
tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)
in Python. The extractor is the only thing that changes per provider: import
getAIMetricsFromResponse
from the matching
@launchdarkly/server-sdk-ai-<provider>
(or
ldai_<provider>
) package, or write a small custom function that returns
LDAIMetrics
. There are no provider-specific tracker methods.
这是官方SDK README(Python核心、Node核心及所有提供商包)推荐的顺序。从顶层开始,找到第一个符合条件的层级即可停止:
层级模式适用场景自动追踪内容
1 — 托管运行器Python:
ai_client.create_model(...)
返回
ManagedModel
,随后调用
await model.run(...)
<br>Node:
aiClient.createModel(...)
返回
ManagedModel
,随后调用
await model.run(...)
调用为对话式(聊天历史、回合制)。这是提供商README的首选方案。时长、令牌数、成功/错误状态 — 全部自动捕获,无需编写追踪调用
2 — 提供商包 +
trackMetricsOf
tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())
。当前支持的提供商包:Node端
@launchdarkly/server-sdk-ai-openai
-langchain
-vercel
;Python端
launchdarkly-server-sdk-ai-openai
-langchain
调用形态非聊天循环(单次补全、结构化输出、Agent步骤),但框架或提供商有对应的包。包装器自动捕获时长+成功/错误状态;令牌数由包内置的
getAIMetricsFromResponse
提取器获取。
3 — 自定义提取器 +
trackMetricsOf
与上述
trackMetricsOf
包装器用法相同,但需自行编写一个小型函数,将提供商响应映射为
LDAIMetrics
(令牌数 + 成功状态)。
无对应提供商包(直接调用Anthropic、Gemini、Cohere、自定义HTTP接口)。包装器自动捕获时长+成功/错误状态;令牌数由自定义提取器获取。
4 — 原生手动实现分别调用
trackDuration
trackTokens
trackSuccess
/
trackError
,流式场景下额外调用
trackTimeToFirstToken
流式场景需追踪TTFT、响应形态特殊、需部分追踪,或层级2-3无法干净包装的情况。仅捕获显式调用的指标 — 需自行确保无遗漏。
所有提供商(OpenAI、LangChain、Vercel、Bedrock、Anthropic、Gemini、自定义HTTP)均使用相同的通用形态:Node端为
tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())
,Python端为
tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)
。仅提取器因提供商而异:从匹配的
@launchdarkly/server-sdk-ai-<provider>
(或
ldai_<provider>
)包导入
getAIMetricsFromResponse
,或编写一个返回
LDAIMetrics
的小型自定义函数。不存在提供商专属的追踪方法。

Workflow

工作流程

1. Explore the existing call site

1. 探查现有调用站点

Before picking a tier, find the provider call and answer these questions:
  • Shape? Is it a chat loop (history + turn-based), a one-shot completion, an agent step, or something else? → drives Tier 1 vs 2.
  • Framework? Raw provider SDK? LangChain / LangGraph? Vercel AI SDK? CrewAI? Strands? → drives which Tier-2 provider package (if any) applies.
  • Provider? OpenAI, Anthropic, Bedrock, Gemini, Azure, custom HTTP? → cross-reference with the package availability matrix below.
  • Streaming? If yes, you'll need TTFT tracking, which means Tier 4 for the TTFT part even if the rest is Tier 2.
  • Language? Python or Node? Provider-package coverage differs between them.
  • Already using a config? If not, route to
    configs-create
    first — tracking requires a tracker, which is obtained by calling
    create_tracker()
    /
    createTracker()
    on the config object returned by
    completion_config()
    /
    completionConfig()
    /
    createModel()
    .
  • On the current SDK API? If the call site uses
    aiclient.config(...)
    /
    aiClient.config(...)
    or constructs an
    AIConfig(...)
    /
    LDAIConfig
    default, it's on the pre-0.20 surface. Migrate it as part of this work before adding tracking:
    • aiclient.config(...)
      aiclient.completion_config(...)
      for one-shot/chat or
      aiclient.agent_config(...)
      for agent mode (mirror the call signature). Node is the same with camelCase.
    • AIConfig(...)
      default →
      AICompletionConfigDefault(...)
      or
      AIAgentConfigDefault(...)
      (Node:
      LDAICompletionConfigDefault
      /
      LDAIAgentConfigDefault
      ).
      AIConfig
      is the base class the SDK returns; it isn't a valid default-value constructor — the typed
      *Default
      variants are.
    • If the result was being tuple-unpacked (
      config, tracker = aiclient.config(...)
      ), drop the unpack — the new methods return a single config object. Obtain the tracker via
      config.create_tracker()
      /
      aiConfig.createTracker()
      .
    • For deeper rewrites (call sites with hardcoded model/prompt as well), hand off to
      migrate
      instead of doing the full migration here.
选择层级前,先找到提供商调用并回答以下问题:
  • 形态? 是聊天循环(历史+回合制)、单次补全、Agent步骤,还是其他形态?→ 决定层级1或2。
  • 框架? 原生提供商SDK?LangChain / LangGraph?Vercel AI SDK?CrewAI?Strands?→ 决定适用的层级2提供商包(若存在)。
  • 提供商? OpenAI、Anthropic、Bedrock、Gemini、Azure、自定义HTTP?→ 对照下方的包支持矩阵。
  • 是否流式? 若是,需追踪TTFT,这意味着即使其他部分使用层级2,TTFT部分仍需使用层级4。
  • 语言? Python还是Node?提供商包的支持情况因语言而异。
  • 是否已配置? 若未配置,先执行
    configs-create
    — 追踪需要tracker,需通过
    completion_config()
    /
    completionConfig()
    /
    createModel()
    返回的配置对象调用
    create_tracker()
    /
    createTracker()
    获取。
  • 是否使用当前SDK API? 若调用站点使用
    aiclient.config(...)
    /
    aiClient.config(...)
    或构造
    AIConfig(...)
    /
    LDAIConfig
    默认值,则属于0.20版本前的接口。添加追踪前需先迁移:
    • aiclient.config(...)
      → 单次/聊天场景使用
      aiclient.completion_config(...)
      ,Agent模式使用
      aiclient.agent_config(...)
      (保持调用签名一致)。Node端使用驼峰命名。
    • AIConfig(...)
      默认值 → 使用
      AICompletionConfigDefault(...)
      AIAgentConfigDefault(...)
      (Node端:
      LDAICompletionConfigDefault
      /
      LDAIAgentConfigDefault
      )。
      AIConfig
      是SDK返回的基类,并非有效的默认构造函数 — 需使用带类型的
      *Default
      变体。
    • 若之前使用元组解包(
      config, tracker = aiclient.config(...)
      ),则取消解包 — 新方法仅返回单个配置对象。通过
      config.create_tracker()
      /
      aiConfig.createTracker()
      获取tracker。
    • 若需深度重写(调用站点包含硬编码模型/提示词),则移交至
      migrate
      处理,而非在此完成完整迁移。

2. Look up your Tier-2 option

2. 查找层级2选项

Use this matrix to decide whether Tier 2 (provider package) is available for your situation. If it's not, drop to Tier 3 (custom extractor). If the shape is chat-loop, go to Tier 1 first regardless of what's in this matrix.
Framework / providerPython provider packageNode provider packageReference
OpenAI (direct SDK)
launchdarkly-server-sdk-ai-openai
@launchdarkly/server-sdk-ai-openai
openai-tracking.md
LangChain / LangGraph
launchdarkly-server-sdk-ai-langchain
@launchdarkly/server-sdk-ai-langchain
langchain-tracking.md
Vercel AI SDK
@launchdarkly/server-sdk-ai-vercel
(use the Vercel provider docs)
AWS Bedrock (Converse or InvokeModel)— (use LangChain-aws or custom extractor)— (use LangChain-aws or custom extractor)bedrock-tracking.md
Anthropic direct SDKanthropic-tracking.md
Gemini / Google GenAIgemini-tracking.md
Strands Agents— (Tier 3 custom extractor)— (Tier 3 custom extractor)strands-tracking.md
Cohere, Mistral, custom HTTPTier 3 custom extractor
Any provider, streaming + TTFT— (Tier 4 only)
trackStreamMetricsOf
(no TTFT) + manual TTFT
streaming-tracking.md
使用以下矩阵判断层级2(提供商包)是否适用。若不适用,则降至层级3(自定义提取器)。若形态为聊天循环,则优先选择层级1,无需参考此矩阵。
框架/提供商Python提供商包Node提供商包参考文档
OpenAI(直接SDK)
launchdarkly-server-sdk-ai-openai
@launchdarkly/server-sdk-ai-openai
openai-tracking.md
LangChain / LangGraph
launchdarkly-server-sdk-ai-langchain
@launchdarkly/server-sdk-ai-langchain
langchain-tracking.md
Vercel AI SDK
@launchdarkly/server-sdk-ai-vercel
参考Vercel提供商文档
AWS Bedrock(Converse或InvokeModel)—(使用LangChain-aws或自定义提取器)—(使用LangChain-aws或自定义提取器)bedrock-tracking.md
Anthropic直接SDKanthropic-tracking.md
Gemini / Google GenAIgemini-tracking.md
Strands Agents—(层级3自定义提取器)—(层级3自定义提取器)strands-tracking.md
Cohere、Mistral、自定义HTTP层级3自定义提取器
任意提供商,流式+TTFT—(仅层级4)
trackStreamMetricsOf
(无TTFT)+ 手动TTFT追踪
streaming-tracking.md

3. Implement from the matching reference

3. 参考对应文档实现

Once you know the tier and the provider, open the reference file and follow the pattern. The references are written so Tier 1 is always the first example, Tier 2/3 next, and Tier 4 last. Stop at the first tier that matches the app's shape.
Guardrails that apply to every tier:
  1. Always check
    config.enabled
    before making the tracked call. A disabled config means the user has flagged the feature off — you should short-circuit to whatever fallback the app uses (cached response, error, degraded path) rather than making the provider call at all.
  2. Wrap the existing call, don't rewrite it. Tier 2 and Tier 3 are designed to slot around an unmodified provider call. If you find yourself rewriting the call to fit the tracker, you're at the wrong tier — drop down one.
  3. Errors are handled inside
    trackMetricsOf
    .
    The wrapper catches exceptions, records
    trackError()
    internally, and re-raises — do not add
    except: tracker.trackError()
    on top, it's a noop that also trips the at-most-once guard. Tier 1 handles both paths automatically. At Tier 4 (manual, streaming,
    track_duration_of
    ) the caller does own the error-tracking call.
  4. Always flush before close. Call
    ldClient.flush()
    (Python:
    ldclient.get().flush()
    ; Node:
    await ldClient.flush()
    ) before closing the client. Trailing events are at risk of being lost otherwise — in short-lived scripts and long-running services alike. In Node,
    ldClient.close()
    returns a Promise; await it.
确定层级和提供商后,打开参考文档并遵循对应模式。参考文档中始终将层级1作为第一个示例,层级2/3次之,层级4最后。找到第一个符合应用形态的层级即可停止。
适用于所有层级的注意事项:
  1. 调用追踪前始终检查
    config.enabled
    。若配置禁用,说明用户已关闭该功能 — 应直接使用应用的 fallback 逻辑(缓存响应、错误、降级路径),而非发起提供商调用。
  2. 包装现有调用,而非重写。层级2和3设计用于直接包裹未修改的提供商调用。若发现需重写调用以适配追踪器,则说明选错了层级 — 应降至下一层级。
  3. 错误由
    trackMetricsOf
    内部处理
    。包装器会捕获异常,内部调用
    trackError()
    ,然后重新抛出异常 — 请勿额外添加
    except: tracker.trackError()
    ,这是无效操作,还会触发至多一次的防护机制。层级1会自动处理两种路径。层级4(手动、流式、
    track_duration_of
    )需由调用者自行处理错误追踪。
  4. 关闭前始终调用flush。关闭客户端前调用
    ldClient.flush()
    (Python:
    ldclient.get().flush()
    ;Node:
    await ldClient.flush()
    )。否则末尾的事件可能丢失 — 无论是短脚本还是长期运行的服务均需注意。Node端
    ldClient.close()
    返回Promise,需await。

4. Verify

4. 验证

Confirm the Monitoring tab fills in:
  • Run one real request through the instrumented path.
  • Open the config in LaunchDarkly → Monitoring tab. Duration, token counts, and generation counts should appear within 1–2 minutes.
  • Force an error (bad API key, zero
    max_tokens
    , whatever) and confirm the error count increments.
  • If streaming: verify TTFT appears. If it doesn't, you probably wrapped the stream creation with
    trackMetricsOf
    but didn't add the manual
    trackTimeToFirstToken
    call — see streaming-tracking.md.
确认监控标签已填充:
  • 通过埋点路径发起一次真实请求。
  • 在LaunchDarkly中打开配置 → 监控标签。时长、令牌数、生成次数应在1-2分钟内显示。
  • 强制触发错误(无效API密钥、
    max_tokens
    设为0等),确认错误计数增加。
  • 若为流式场景:验证TTFT已显示。若未显示,可能是用
    trackMetricsOf
    包裹了流创建,但未添加手动
    trackTimeToFirstToken
    调用 — 参考streaming-tracking.md

Quick reference: tracker methods

快速参考:追踪器方法

Obtain a tracker via the factory on the config object:
tracker = config.create_tracker()
(Python) or
const tracker = aiConfig.createTracker()
(Node). Call the factory once per execution and reuse the returned
tracker
for every call — each factory invocation mints a new
runId
that tags every tracking event emitted by that tracker so events from a single execution can be correlated together (via exported events / downstream systems). The Monitoring tab aggregates events rather than grouping them by run today — the
runId
is useful when events are exported or queried outside the UI, and is the identifier the SDK's at-most-once guards are keyed on. The methods below are the raw API surface — most of the time you should not call them individually; use
trackMetricsOf
or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.
Method (Python ↔ Node)TierWhat it does
track_metrics_of(extractor, fn)
/
trackMetricsOf(extractor, fn)
2 / 3Wraps a provider call, captures duration + success/error, calls your extractor for tokens. This is the default generic tracker.
track_metrics_of_async(extractor, fn)
(Python)
2 / 3Async variant of the above.
trackStreamMetricsOf(extractor, streamFn)
(Node only)
2 / 3Streaming variant. Captures per-chunk usage when the extractor handles chunks. Does not auto-capture TTFT.
track_duration(ms)
/
trackDuration(ms)
4Record latency in milliseconds.
track_duration_of(fn)
/
trackDurationOf(fn)
4Wraps a callable and records duration automatically. Does not capture tokens or success — pair with explicit calls.
track_tokens(TokenUsage)
/
trackTokens({input, output, total})
4Record token usage.
track_time_to_first_token(ms)
/
trackTimeToFirstToken(ms)
4Record TTFT for streaming responses.
track_success()
/
trackSuccess()
4Mark the generation as successful. Required for the Monitoring tab to count it.
track_error()
/
trackError()
4Mark the generation as failed. Do not also call
trackSuccess()
in the same request.
track_feedback({kind})
/
trackFeedback({kind})
anyRecord thumbs-up / thumbs-down from a feedback UI. Independent of the success/error path.
track_tool_call(name)
/
trackToolCall(name)
anyRecord a single tool invocation by name. Available on both SDKs.
track_tool_calls([names])
/
trackToolCalls([names])
anyBatch variant — record a list of tool invocations in one call.
track_judge_result(result)
/
trackJudgeResult(result)
anyRecord a programmatic judge evaluation.
result.sampled
indicates whether evaluation ran.
通过配置对象的工厂方法获取tracker:Python为
tracker = config.create_tracker()
,Node为
const tracker = aiConfig.createTracker()
。每次执行调用一次工厂方法,复用返回的
tracker
处理所有调用 — 每次工厂调用会生成新的
runId
,标记该tracker发出的所有追踪事件,以便关联单次执行的所有事件(通过导出事件/下游系统)。当前监控标签会聚合事件而非按run分组 —
runId
在导出或UI外查询事件时有用,也是SDK至多一次防护机制的键。以下是原始API接口 — 大多数情况下不应单独调用这些方法;应使用
trackMetricsOf
或层级1的托管运行器。列出这些方法是为了让你能识别现有代码中的方法,并在确实需要层级4时选择正确的方法。
方法(Python ↔ Node)层级功能
track_metrics_of(extractor, fn)
/
trackMetricsOf(extractor, fn)
2 / 3包裹提供商调用,捕获时长+成功/错误状态,调用提取器获取令牌数。这是默认的通用追踪器
track_metrics_of_async(extractor, fn)
(Python)
2 / 3上述方法的异步变体。
trackStreamMetricsOf(extractor, streamFn)
(仅Node)
2 / 3流式变体。当提取器处理分块内容时,捕获每块的使用情况。不会自动捕获TTFT
track_duration(ms)
/
trackDuration(ms)
4记录延迟(毫秒)。
track_duration_of(fn)
/
trackDurationOf(fn)
4包裹可调用对象,自动记录时长。不捕获令牌数或成功状态 — 需搭配显式调用。
track_tokens(TokenUsage)
/
trackTokens({input, output, total})
4记录令牌使用情况。
track_time_to_first_token(ms)
/
trackTimeToFirstToken(ms)
4记录流式响应的TTFT。
track_success()
/
trackSuccess()
4标记生成成功。监控标签计数需调用此方法。
track_error()
/
trackError()
4标记生成失败。同一请求中请勿同时调用
trackSuccess()
track_feedback({kind})
/
trackFeedback({kind})
任意记录反馈UI中的点赞/点踩。独立于成功/错误路径。
track_tool_call(name)
/
trackToolCall(name)
任意记录单个工具调用(按名称)。两个SDK均支持。
track_tool_calls([names])
/
trackToolCalls([names])
任意批量变体 — 一次调用记录多个工具调用。
track_judge_result(result)
/
trackJudgeResult(result)
任意记录程序化评估结果。
result.sampled
表示是否执行了评估。

Related skills

相关技能

  • configs-create
    — prerequisite if the app doesn't have a config yet
  • custom-metrics
    — business metrics (conversion, resolution, retention) layered on top of the agent metrics this skill captures
  • online-evals
    — automatic quality scoring (LLM-as-judge) on sampled live requests; complementary to the metrics here
  • migrate
    — Stage 4 of the hardcoded-to-AgentControl migration delegates to this skill
  • configs-create
    — 若应用尚未配置,此为前置技能
  • custom-metrics
    — 在本技能捕获的Agent指标基础上添加业务指标(转化、解决率、留存率)
  • online-evals
    — 对抽样实时请求进行自动质量评分(LLM作为评估者);与本技能的指标互补
  • migrate
    — 硬编码到AgentControl迁移的第4阶段将委托给此技能