built-in-metrics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Metrics Instrumentation
Agent指标埋点
You're using a skill that wires LaunchDarkly agent metrics around an existing provider call. Your job is to audit what's already there, pick the right tier from the ladder below, and implement it with the least ceremony that still captures the metrics the Monitoring tab needs (duration, input/output tokens, success/error, plus TTFT when streaming).
The single most important thing to get right: default to the highest tier that fits the shape of the call. Going lower ("just write the manual tracker calls") looks flexible but costs you drift, missed metrics, and legacy patterns the SDKs have moved past.
你正在使用一项技能,为现有提供商调用添加LaunchDarkly Agent指标追踪。你的任务是审核现有实现,从下方的四层阶梯中选择合适的层级,以最简洁的方式实现监控标签所需的指标捕获(时长、输入/输出令牌数、成功/错误状态,以及流式场景下的TTFT)。
最重要的原则:优先选择符合调用形态的最高层级。选择更低层级(“直接编写手动追踪调用”)看似灵活,但会导致指标漂移、遗漏以及使用SDK已淘汰的旧模式。
The four-tier ladder
四层实现阶梯
This is the order the official SDK READMEs (Python core, Node core, and every provider package) recommend. Walk from the top and stop at the first tier that fits:
| Tier | Pattern | Use when | Tracks automatically |
|---|---|---|---|
| 1 — Managed runner | Python: | The call is conversational (chat history, turn-based). This is what the provider READMEs lead with. | Duration, tokens, success/error — all of it, zero tracker calls. |
2 — Provider package + | | The shape isn't a chat loop (one-shot completion, structured output, agent step) but the framework or provider has a package. | Duration + success/error from the wrapper; tokens from the package's built-in |
3 — Custom extractor + | Same | No provider package exists (Anthropic direct, Gemini, Cohere, custom HTTP). | Duration + success/error from the wrapper; tokens from your extractor. |
| 4 — Raw manual | Separate calls to | Streaming with TTFT, unusual response shapes, partial tracking, anything Tier 2–3 can't cleanly wrap. | Only what you explicitly call — it's on you to not miss one. |
Every provider — OpenAI, LangChain, Vercel, Bedrock, Anthropic, Gemini, custom HTTP — uses the same generic shape: in Node, in Python. The extractor is the only thing that changes per provider: import from the matching (or ) package, or write a small custom function that returns . There are no provider-specific tracker methods.
tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)getAIMetricsFromResponse@launchdarkly/server-sdk-ai-<provider>ldai_<provider>LDAIMetrics这是官方SDK README(Python核心、Node核心及所有提供商包)推荐的顺序。从顶层开始,找到第一个符合条件的层级即可停止:
| 层级 | 模式 | 适用场景 | 自动追踪内容 |
|---|---|---|---|
| 1 — 托管运行器 | Python: | 调用为对话式(聊天历史、回合制)。这是提供商README的首选方案。 | 时长、令牌数、成功/错误状态 — 全部自动捕获,无需编写追踪调用。 |
2 — 提供商包 + | | 调用形态非聊天循环(单次补全、结构化输出、Agent步骤),但框架或提供商有对应的包。 | 包装器自动捕获时长+成功/错误状态;令牌数由包内置的 |
3 — 自定义提取器 + | 与上述 | 无对应提供商包(直接调用Anthropic、Gemini、Cohere、自定义HTTP接口)。 | 包装器自动捕获时长+成功/错误状态;令牌数由自定义提取器获取。 |
| 4 — 原生手动实现 | 分别调用 | 流式场景需追踪TTFT、响应形态特殊、需部分追踪,或层级2-3无法干净包装的情况。 | 仅捕获显式调用的指标 — 需自行确保无遗漏。 |
所有提供商(OpenAI、LangChain、Vercel、Bedrock、Anthropic、Gemini、自定义HTTP)均使用相同的通用形态:Node端为,Python端为。仅提取器因提供商而异:从匹配的(或)包导入,或编写一个返回的小型自定义函数。不存在提供商专属的追踪方法。
tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)@launchdarkly/server-sdk-ai-<provider>ldai_<provider>getAIMetricsFromResponseLDAIMetricsWorkflow
工作流程
1. Explore the existing call site
1. 探查现有调用站点
Before picking a tier, find the provider call and answer these questions:
- Shape? Is it a chat loop (history + turn-based), a one-shot completion, an agent step, or something else? → drives Tier 1 vs 2.
- Framework? Raw provider SDK? LangChain / LangGraph? Vercel AI SDK? CrewAI? Strands? → drives which Tier-2 provider package (if any) applies.
- Provider? OpenAI, Anthropic, Bedrock, Gemini, Azure, custom HTTP? → cross-reference with the package availability matrix below.
- Streaming? If yes, you'll need TTFT tracking, which means Tier 4 for the TTFT part even if the rest is Tier 2.
- Language? Python or Node? Provider-package coverage differs between them.
- Already using a config? If not, route to first — tracking requires a tracker, which is obtained by calling
configs-create/create_tracker()on the config object returned bycreateTracker()/completion_config()/completionConfig().createModel() - On the current SDK API? If the call site uses /
aiclient.config(...)or constructs anaiClient.config(...)/AIConfig(...)default, it's on the pre-0.20 surface. Migrate it as part of this work before adding tracking:LDAIConfig- →
aiclient.config(...)for one-shot/chat oraiclient.completion_config(...)for agent mode (mirror the call signature). Node is the same with camelCase.aiclient.agent_config(...) - default →
AIConfig(...)orAICompletionConfigDefault(...)(Node:AIAgentConfigDefault(...)/LDAICompletionConfigDefault).LDAIAgentConfigDefaultis the base class the SDK returns; it isn't a valid default-value constructor — the typedAIConfigvariants are.*Default - If the result was being tuple-unpacked (), drop the unpack — the new methods return a single config object. Obtain the tracker via
config, tracker = aiclient.config(...)/config.create_tracker().aiConfig.createTracker() - For deeper rewrites (call sites with hardcoded model/prompt as well), hand off to instead of doing the full migration here.
migrate
选择层级前,先找到提供商调用并回答以下问题:
- 形态? 是聊天循环(历史+回合制)、单次补全、Agent步骤,还是其他形态?→ 决定层级1或2。
- 框架? 原生提供商SDK?LangChain / LangGraph?Vercel AI SDK?CrewAI?Strands?→ 决定适用的层级2提供商包(若存在)。
- 提供商? OpenAI、Anthropic、Bedrock、Gemini、Azure、自定义HTTP?→ 对照下方的包支持矩阵。
- 是否流式? 若是,需追踪TTFT,这意味着即使其他部分使用层级2,TTFT部分仍需使用层级4。
- 语言? Python还是Node?提供商包的支持情况因语言而异。
- 是否已配置? 若未配置,先执行— 追踪需要tracker,需通过
configs-create/completion_config()/completionConfig()返回的配置对象调用createModel()/create_tracker()获取。createTracker() - 是否使用当前SDK API? 若调用站点使用/
aiclient.config(...)或构造aiClient.config(...)/AIConfig(...)默认值,则属于0.20版本前的接口。添加追踪前需先迁移:LDAIConfig- → 单次/聊天场景使用
aiclient.config(...),Agent模式使用aiclient.completion_config(...)(保持调用签名一致)。Node端使用驼峰命名。aiclient.agent_config(...) - 默认值 → 使用
AIConfig(...)或AICompletionConfigDefault(...)(Node端:AIAgentConfigDefault(...)/LDAICompletionConfigDefault)。LDAIAgentConfigDefault是SDK返回的基类,并非有效的默认构造函数 — 需使用带类型的AIConfig变体。*Default - 若之前使用元组解包(),则取消解包 — 新方法仅返回单个配置对象。通过
config, tracker = aiclient.config(...)/config.create_tracker()获取tracker。aiConfig.createTracker() - 若需深度重写(调用站点包含硬编码模型/提示词),则移交至处理,而非在此完成完整迁移。
migrate
2. Look up your Tier-2 option
2. 查找层级2选项
Use this matrix to decide whether Tier 2 (provider package) is available for your situation. If it's not, drop to Tier 3 (custom extractor). If the shape is chat-loop, go to Tier 1 first regardless of what's in this matrix.
| Framework / provider | Python provider package | Node provider package | Reference |
|---|---|---|---|
| OpenAI (direct SDK) | | | openai-tracking.md |
| LangChain / LangGraph | | | langchain-tracking.md |
| Vercel AI SDK | — | | (use the Vercel provider docs) |
| AWS Bedrock (Converse or InvokeModel) | — (use LangChain-aws or custom extractor) | — (use LangChain-aws or custom extractor) | bedrock-tracking.md |
| Anthropic direct SDK | — | — | anthropic-tracking.md |
| Gemini / Google GenAI | — | — | gemini-tracking.md |
| Strands Agents | — (Tier 3 custom extractor) | — (Tier 3 custom extractor) | strands-tracking.md |
| Cohere, Mistral, custom HTTP | — | — | Tier 3 custom extractor |
| Any provider, streaming + TTFT | — (Tier 4 only) | | streaming-tracking.md |
使用以下矩阵判断层级2(提供商包)是否适用。若不适用,则降至层级3(自定义提取器)。若形态为聊天循环,则优先选择层级1,无需参考此矩阵。
| 框架/提供商 | Python提供商包 | Node提供商包 | 参考文档 |
|---|---|---|---|
| OpenAI(直接SDK) | | | openai-tracking.md |
| LangChain / LangGraph | | | langchain-tracking.md |
| Vercel AI SDK | — | | 参考Vercel提供商文档 |
| AWS Bedrock(Converse或InvokeModel) | —(使用LangChain-aws或自定义提取器) | —(使用LangChain-aws或自定义提取器) | bedrock-tracking.md |
| Anthropic直接SDK | — | — | anthropic-tracking.md |
| Gemini / Google GenAI | — | — | gemini-tracking.md |
| Strands Agents | —(层级3自定义提取器) | —(层级3自定义提取器) | strands-tracking.md |
| Cohere、Mistral、自定义HTTP | — | — | 层级3自定义提取器 |
| 任意提供商,流式+TTFT | —(仅层级4) | | streaming-tracking.md |
3. Implement from the matching reference
3. 参考对应文档实现
Once you know the tier and the provider, open the reference file and follow the pattern. The references are written so Tier 1 is always the first example, Tier 2/3 next, and Tier 4 last. Stop at the first tier that matches the app's shape.
Guardrails that apply to every tier:
- Always check before making the tracked call. A disabled config means the user has flagged the feature off — you should short-circuit to whatever fallback the app uses (cached response, error, degraded path) rather than making the provider call at all.
config.enabled - Wrap the existing call, don't rewrite it. Tier 2 and Tier 3 are designed to slot around an unmodified provider call. If you find yourself rewriting the call to fit the tracker, you're at the wrong tier — drop down one.
- Errors are handled inside . The wrapper catches exceptions, records
trackMetricsOfinternally, and re-raises — do not addtrackError()on top, it's a noop that also trips the at-most-once guard. Tier 1 handles both paths automatically. At Tier 4 (manual, streaming,except: tracker.trackError()) the caller does own the error-tracking call.track_duration_of - Always flush before close. Call (Python:
ldClient.flush(); Node:ldclient.get().flush()) before closing the client. Trailing events are at risk of being lost otherwise — in short-lived scripts and long-running services alike. In Node,await ldClient.flush()returns a Promise; await it.ldClient.close()
确定层级和提供商后,打开参考文档并遵循对应模式。参考文档中始终将层级1作为第一个示例,层级2/3次之,层级4最后。找到第一个符合应用形态的层级即可停止。
适用于所有层级的注意事项:
- 调用追踪前始终检查。若配置禁用,说明用户已关闭该功能 — 应直接使用应用的 fallback 逻辑(缓存响应、错误、降级路径),而非发起提供商调用。
config.enabled - 包装现有调用,而非重写。层级2和3设计用于直接包裹未修改的提供商调用。若发现需重写调用以适配追踪器,则说明选错了层级 — 应降至下一层级。
- 错误由内部处理。包装器会捕获异常,内部调用
trackMetricsOf,然后重新抛出异常 — 请勿额外添加trackError(),这是无效操作,还会触发至多一次的防护机制。层级1会自动处理两种路径。层级4(手动、流式、except: tracker.trackError())需由调用者自行处理错误追踪。track_duration_of - 关闭前始终调用flush。关闭客户端前调用(Python:
ldClient.flush();Node:ldclient.get().flush())。否则末尾的事件可能丢失 — 无论是短脚本还是长期运行的服务均需注意。Node端await ldClient.flush()返回Promise,需await。ldClient.close()
4. Verify
4. 验证
Confirm the Monitoring tab fills in:
- Run one real request through the instrumented path.
- Open the config in LaunchDarkly → Monitoring tab. Duration, token counts, and generation counts should appear within 1–2 minutes.
- Force an error (bad API key, zero , whatever) and confirm the error count increments.
max_tokens - If streaming: verify TTFT appears. If it doesn't, you probably wrapped the stream creation with but didn't add the manual
trackMetricsOfcall — see streaming-tracking.md.trackTimeToFirstToken
确认监控标签已填充:
- 通过埋点路径发起一次真实请求。
- 在LaunchDarkly中打开配置 → 监控标签。时长、令牌数、生成次数应在1-2分钟内显示。
- 强制触发错误(无效API密钥、设为0等),确认错误计数增加。
max_tokens - 若为流式场景:验证TTFT已显示。若未显示,可能是用包裹了流创建,但未添加手动
trackMetricsOf调用 — 参考streaming-tracking.md。trackTimeToFirstToken
Quick reference: tracker methods
快速参考:追踪器方法
Obtain a tracker via the factory on the config object: (Python) or (Node). Call the factory once per execution and reuse the returned for every call — each factory invocation mints a new that tags every tracking event emitted by that tracker so events from a single execution can be correlated together (via exported events / downstream systems). The Monitoring tab aggregates events rather than grouping them by run today — the is useful when events are exported or queried outside the UI, and is the identifier the SDK's at-most-once guards are keyed on. The methods below are the raw API surface — most of the time you should not call them individually; use or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.
tracker = config.create_tracker()const tracker = aiConfig.createTracker()trackerrunIdrunIdtrackMetricsOf| Method (Python ↔ Node) | Tier | What it does |
|---|---|---|
| 2 / 3 | Wraps a provider call, captures duration + success/error, calls your extractor for tokens. This is the default generic tracker. |
| 2 / 3 | Async variant of the above. |
| 2 / 3 | Streaming variant. Captures per-chunk usage when the extractor handles chunks. Does not auto-capture TTFT. |
| 4 | Record latency in milliseconds. |
| 4 | Wraps a callable and records duration automatically. Does not capture tokens or success — pair with explicit calls. |
| 4 | Record token usage. |
| 4 | Record TTFT for streaming responses. |
| 4 | Mark the generation as successful. Required for the Monitoring tab to count it. |
| 4 | Mark the generation as failed. Do not also call |
| any | Record thumbs-up / thumbs-down from a feedback UI. Independent of the success/error path. |
| any | Record a single tool invocation by name. Available on both SDKs. |
| any | Batch variant — record a list of tool invocations in one call. |
| any | Record a programmatic judge evaluation. |
通过配置对象的工厂方法获取tracker:Python为,Node为。每次执行调用一次工厂方法,复用返回的处理所有调用 — 每次工厂调用会生成新的,标记该tracker发出的所有追踪事件,以便关联单次执行的所有事件(通过导出事件/下游系统)。当前监控标签会聚合事件而非按run分组 — 在导出或UI外查询事件时有用,也是SDK至多一次防护机制的键。以下是原始API接口 — 大多数情况下不应单独调用这些方法;应使用或层级1的托管运行器。列出这些方法是为了让你能识别现有代码中的方法,并在确实需要层级4时选择正确的方法。
tracker = config.create_tracker()const tracker = aiConfig.createTracker()trackerrunIdrunIdtrackMetricsOf| 方法(Python ↔ Node) | 层级 | 功能 |
|---|---|---|
| 2 / 3 | 包裹提供商调用,捕获时长+成功/错误状态,调用提取器获取令牌数。这是默认的通用追踪器。 |
| 2 / 3 | 上述方法的异步变体。 |
| 2 / 3 | 流式变体。当提取器处理分块内容时,捕获每块的使用情况。不会自动捕获TTFT。 |
| 4 | 记录延迟(毫秒)。 |
| 4 | 包裹可调用对象,自动记录时长。不捕获令牌数或成功状态 — 需搭配显式调用。 |
| 4 | 记录令牌使用情况。 |
| 4 | 记录流式响应的TTFT。 |
| 4 | 标记生成成功。监控标签计数需调用此方法。 |
| 4 | 标记生成失败。同一请求中请勿同时调用 |
| 任意 | 记录反馈UI中的点赞/点踩。独立于成功/错误路径。 |
| 任意 | 记录单个工具调用(按名称)。两个SDK均支持。 |
| 任意 | 批量变体 — 一次调用记录多个工具调用。 |
| 任意 | 记录程序化评估结果。 |
Related skills
相关技能
- — prerequisite if the app doesn't have a config yet
configs-create - — business metrics (conversion, resolution, retention) layered on top of the agent metrics this skill captures
custom-metrics - — automatic quality scoring (LLM-as-judge) on sampled live requests; complementary to the metrics here
online-evals - — Stage 4 of the hardcoded-to-AgentControl migration delegates to this skill
migrate
- — 若应用尚未配置,此为前置技能
configs-create - — 在本技能捕获的Agent指标基础上添加业务指标(转化、解决率、留存率)
custom-metrics - — 对抽样实时请求进行自动质量评分(LLM作为评估者);与本技能的指标互补
online-evals - — 硬编码到AgentControl迁移的第4阶段将委托给此技能
migrate