aiconfig-ai-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Metrics Instrumentation

AI指标埋点

You're using a skill that wires LaunchDarkly AI metrics around an existing provider call. Your job is to audit what's already there, pick the right tier from the ladder below, and implement it with the least ceremony that still captures the metrics the Monitoring tab needs (duration, input/output tokens, success/error, plus TTFT when streaming).
The single most important thing to get right: default to the highest tier that fits the shape of the call. Going lower ("just write the manual tracker calls") looks flexible but costs you drift, missed metrics, and legacy patterns the SDKs have moved past.
你正在使用一项技能,为现有的提供商调用添加LaunchDarkly AI指标追踪。你的任务是审核现有代码,从下方的阶梯中选择合适的层级,并以最简洁的方式实现,同时确保捕获监控标签所需的指标(时长、输入/输出令牌数、成功/错误状态,以及流式传输时的TTFT)。
最重要的原则:优先选择最贴合调用形态的最高层级。选择更低层级(“直接编写手动追踪调用”)看似灵活,但会导致代码漂移、指标遗漏,以及使用SDK已淘汰的旧模式。

The four-tier ladder

四层阶梯

This is the order the official SDK READMEs (Python core, Node core, and every provider package) recommend. Walk from the top and stop at the first tier that fits:
TierPatternUse whenTracks automatically
1 — Managed runnerPython:
ai_client.create_model(...)
returning a
ManagedModel
, then
await model.invoke(...)
. <br>Node:
aiClient.initChat(...)
/
aiClient.createChat(...)
returning a
TrackedChat
, then
await chat.invoke(...)
.
The call is conversational (chat history, turn-based). This is what the provider READMEs lead with.Duration, tokens, success/error — all of it, zero tracker calls.
2 — Provider package +
trackMetricsOf
tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())
. Provider packages today:
@launchdarkly/server-sdk-ai-openai
,
-langchain
,
-vercel
(Node) and
launchdarkly-server-sdk-ai-openai
,
-langchain
(Python).
The shape isn't a chat loop (one-shot completion, structured output, agent step) but the framework or provider has a package.Duration + success/error from the wrapper; tokens from the package's built-in
getAIMetricsFromResponse
extractor.
3 — Custom extractor +
trackMetricsOf
Same
trackMetricsOf
wrapper, but you write a small function that maps the provider response to
LDAIMetrics
(tokens + success).
No provider package exists (Anthropic direct, Gemini, Cohere, custom HTTP).Duration + success/error from the wrapper; tokens from your extractor.
4 — Raw manualSeparate calls to
trackDuration
,
trackTokens
,
trackSuccess
/
trackError
, plus
trackTimeToFirstToken
for streams.
Streaming with TTFT, unusual response shapes, partial tracking, anything Tier 2–3 can't cleanly wrap.Only what you explicitly call — it's on you to not miss one.
A call to
track_openai_metrics
/
trackOpenAIMetrics
/
track_bedrock_converse_metrics
/
trackBedrockConverseMetrics
/
trackVercelAISDKGenerateTextMetrics
is Tier-2 legacy shorthand. These helpers still exist in the SDK source but none of the current provider READMEs use them — they've been superseded by
trackMetricsOf
+
Provider.getAIMetricsFromResponse
. Do not recommend them for new code; if you see them in an existing codebase, leave them alone unless the user is already on a cleanup pass.
这是官方SDK README文档(Python核心库、Node核心库及所有提供商包)推荐的顺序。从顶层开始,找到第一个符合条件的层级即可停止:
层级模式适用场景自动追踪内容
1 — 托管运行器Python:
ai_client.create_model(...)
返回
ManagedModel
,随后调用
await model.invoke(...)
<br>Node:
aiClient.initChat(...)
/
aiClient.createChat(...)
返回
TrackedChat
,随后调用
await chat.invoke(...)
调用为对话式(聊天历史、回合制交互)。这是提供商README文档中的首选方案。时长、令牌数、成功/错误状态 — 全部自动追踪,无需编写任何追踪调用
2 — 提供商包 +
trackMetricsOf
tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())
。当前可用提供商包:Node端的
@launchdarkly/server-sdk-ai-openai
-langchain
-vercel
;Python端的
launchdarkly-server-sdk-ai-openai
-langchain
调用形态非聊天循环(一次性完成、结构化输出、智能体步骤),但对应框架或提供商有现成包可用。包装器自动捕获时长+成功/错误状态;令牌数由包内置的
getAIMetricsFromResponse
提取器获取。
3 — 自定义提取器 +
trackMetricsOf
同样使用
trackMetricsOf
包装器,但需自行编写一个小型函数,将提供商响应映射为
LDAIMetrics
(令牌数+成功状态)。
无对应提供商包可用(直接调用Anthropic、Gemini、Cohere,或自定义HTTP请求)。包装器自动捕获时长+成功/错误状态;令牌数由自定义提取器获取。
4 — 原生手动实现分别调用
trackDuration
trackTokens
trackSuccess
/
trackError
,流式场景需额外调用
trackTimeToFirstToken
流式传输需追踪TTFT、响应形态特殊、仅需部分追踪,或第2-3层无法完美适配的场景。仅追踪你显式调用的内容 — 需自行确保无遗漏。
调用
track_openai_metrics
/
trackOpenAIMetrics
/
track_bedrock_converse_metrics
/
trackBedrockConverseMetrics
/
trackVercelAISDKGenerateTextMetrics
属于第2层的遗留简写方式。这些辅助方法仍存在于SDK源码中,但当前所有提供商README文档均不再使用 — 已被
trackMetricsOf
+
Provider.getAIMetricsFromResponse
替代。不建议在新代码中使用;若现有代码库中存在这些方法,除非用户正在进行清理,否则无需修改。

Workflow

工作流程

1. Explore the existing call site

1. 探查现有调用站点

Before picking a tier, find the provider call and answer these questions:
  • Shape? Is it a chat loop (history + turn-based), a one-shot completion, an agent step, or something else? → drives Tier 1 vs 2.
  • Framework? Raw provider SDK? LangChain / LangGraph? Vercel AI SDK? CrewAI? → drives which Tier-2 provider package (if any) applies.
  • Provider? OpenAI, Anthropic, Bedrock, Gemini, Azure, custom HTTP? → cross-reference with the package availability matrix below.
  • Streaming? If yes, you'll need TTFT tracking, which means Tier 4 for the TTFT part even if the rest is Tier 2.
  • Language? Python or Node? Provider-package coverage differs between them.
  • Already using an AI Config? If not, route to
    aiconfig-create
    first — tracking requires a tracker, which comes from
    completion_config()
    /
    completionConfig()
    /
    initChat()
    .
选择层级前,先找到提供商调用并回答以下问题:
  • 调用形态? 是聊天循环(历史记录+回合制)、一次性完成、智能体步骤,还是其他类型?→ 决定使用第1层还是第2层。
  • 使用框架? 原生提供商SDK?LangChain / LangGraph?Vercel AI SDK?CrewAI?→ 决定适用的第2层提供商包(若存在)。
  • 服务提供商? OpenAI、Anthropic、Bedrock、Gemini、Azure,还是自定义HTTP?→ 对照下方的包可用性矩阵。
  • 是否流式传输?若是,需追踪TTFT,这意味着即使其他部分使用第2层,TTFT部分仍需使用第4层。
  • 开发语言? Python还是Node?提供商包的支持情况因语言而异。
  • 是否已使用AI Config?若未使用,先执行
    aiconfig-create
    — 追踪功能需要追踪器,而追踪器来自
    completion_config()
    /
    completionConfig()
    /
    initChat()

2. Look up your Tier-2 option

2. 查询第2层可选方案

Use this matrix to decide whether Tier 2 (provider package) is available for your situation. If it's not, drop to Tier 3 (custom extractor). If the shape is chat-loop, go to Tier 1 first regardless of what's in this matrix.
Framework / providerPython provider packageNode provider packageReference
OpenAI (direct SDK)
launchdarkly-server-sdk-ai-openai
@launchdarkly/server-sdk-ai-openai
openai-tracking.md
LangChain / LangGraph
launchdarkly-server-sdk-ai-langchain
@launchdarkly/server-sdk-ai-langchain
(use the LangChain provider docs)
Vercel AI SDK
@launchdarkly/server-sdk-ai-vercel
(use the Vercel provider docs)
AWS Bedrock (Converse or InvokeModel)— (use LangChain-aws or custom extractor)— (use LangChain-aws or custom extractor)bedrock-tracking.md
Anthropic direct SDKanthropic-tracking.md
Gemini / Google GenAITier 3 custom extractor
Cohere, Mistral, custom HTTPTier 3 custom extractor
Any provider, streaming + TTFT— (Tier 4 only)
trackStreamMetricsOf
(no TTFT) + manual TTFT
streaming-tracking.md
使用以下矩阵判断第2层(提供商包)是否适用于你的场景。若不可用,则降级到第3层(自定义提取器)。若调用形态为聊天循环,无论矩阵内容如何,优先选择第1层。
框架/提供商Python提供商包Node提供商包参考文档
OpenAI(直接SDK)
launchdarkly-server-sdk-ai-openai
@launchdarkly/server-sdk-ai-openai
openai-tracking.md
LangChain / LangGraph
launchdarkly-server-sdk-ai-langchain
@launchdarkly/server-sdk-ai-langchain
参考LangChain提供商文档
Vercel AI SDK
@launchdarkly/server-sdk-ai-vercel
参考Vercel提供商文档
AWS Bedrock(Converse或InvokeModel)—(使用LangChain-aws或自定义提取器)—(使用LangChain-aws或自定义提取器)bedrock-tracking.md
Anthropic直接SDKanthropic-tracking.md
Gemini / Google GenAI第3层自定义提取器
Cohere、Mistral、自定义HTTP第3层自定义提取器
任意提供商,流式传输+TTFT—(仅第4层)
trackStreamMetricsOf
(无TTFT)+ 手动TTFT追踪
streaming-tracking.md

3. Implement from the matching reference

3. 对照参考文档实现

Once you know the tier and the provider, open the reference file and follow the pattern. The references are written so Tier 1 is always the first example, Tier 2/3 next, and Tier 4 last. Stop at the first tier that matches the app's shape.
Guardrails that apply to every tier:
  1. Always check
    config.enabled
    before making the tracked call. A disabled config means the user has flagged the feature off — you should short-circuit to whatever fallback the app uses (cached response, error, degraded path) rather than making the provider call at all.
  2. Wrap the existing call, don't rewrite it. Tier 2 and Tier 3 are designed to slot around an unmodified provider call. If you find yourself rewriting the call to fit the tracker, you're at the wrong tier — drop down one.
  3. Errors go through the tracker too.
    trackMetricsOf
    handles the success path; errors still need an explicit
    tracker.trackError()
    in the catch block (or a try/except around the whole thing). Tier 1 handles both paths automatically.
  4. Flush in short-lived processes. In serverless, cron jobs, CLI scripts — anything that exits quickly — call
    ldClient.flush()
    (sync or await) before the process terminates, or the tracker events never leave the machine.
确定层级和提供商后,打开对应参考文档并遵循示例模式。参考文档中第1层始终为第一个示例,第2/3层次之,第4层最后。找到第一个匹配应用形态的层级后停止。
适用于所有层级的规则:
  1. 调用追踪前始终检查
    config.enabled
    。若配置已禁用,说明用户已关闭该功能 — 应直接跳转到应用的 fallback 逻辑(缓存响应、错误提示、降级路径),而非发起提供商调用。
  2. 包装现有调用,而非重写。第2层和第3层设计用于直接包裹未修改的提供商调用。若你需要重写调用以适配追踪器,说明当前层级不合适 — 应降级一层。
  3. 错误也需通过追踪器处理
    trackMetricsOf
    处理成功路径;错误仍需在catch块中显式调用
    tracker.trackError()
    (或在整个调用外包裹try/except)。第1层会自动处理成功和错误路径。
  4. 短生命周期进程中需调用flush。在Serverless、定时任务、CLI脚本等会快速退出的进程中,需在进程终止前调用
    ldClient.flush()
    (同步或异步),否则追踪事件无法上传。

4. Verify

4. 验证

Confirm the Monitoring tab fills in:
  • Run one real request through the instrumented path.
  • Open the AI Config in LaunchDarkly → Monitoring tab. Duration, token counts, and generation counts should appear within 1–2 minutes.
  • Force an error (bad API key, zero
    max_tokens
    , whatever) and confirm the error count increments.
  • If streaming: verify TTFT appears. If it doesn't, you probably wrapped the stream creation with
    trackMetricsOf
    but didn't add the manual
    trackTimeToFirstToken
    call — see streaming-tracking.md.
确认监控标签已填充:
  • 通过埋点路径发起一次真实请求。
  • 打开LaunchDarkly中的AI Config → 监控标签。1-2分钟内应显示时长、令牌数和生成次数。
  • 触发一次错误(无效API密钥、
    max_tokens
    设为0等),确认错误计数增加。
  • 若为流式传输:验证TTFT已显示。若未显示,可能是你用
    trackMetricsOf
    包裹了流创建,但未添加手动
    trackTimeToFirstToken
    调用 — 参考streaming-tracking.md

Quick reference: tracker methods

快速参考:追踪器方法

The tracker object (
config.tracker
/
aiConfig.tracker
) provides these methods. This is the raw API surface — most of the time you should not call the individual methods, you should use
trackMetricsOf
or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.
Method (Python ↔ Node)TierWhat it does
track_metrics_of(extractor, fn)
/
trackMetricsOf(extractor, fn)
2 / 3Wraps a provider call, captures duration + success/error, calls your extractor for tokens. This is the default generic tracker.
track_metrics_of_async(extractor, fn)
(Python)
2 / 3Async variant of the above.
trackStreamMetricsOf(extractor, streamFn)
(Node only)
2 / 3Streaming variant. Captures per-chunk usage when the extractor handles chunks. Does not auto-capture TTFT.
track_duration(ms)
/
trackDuration(ms)
4Record latency in milliseconds.
track_duration_of(fn)
/
trackDurationOf(fn)
4Wraps a callable and records duration automatically. Does not capture tokens or success — pair with explicit calls.
track_tokens(TokenUsage)
/
trackTokens({input, output, total})
4Record token usage.
track_time_to_first_token(ms)
/
trackTimeToFirstToken(ms)
4Record TTFT for streaming responses.
track_success()
/
trackSuccess()
4Mark the generation as successful. Required for the Monitoring tab to count it.
track_error()
/
trackError()
4Mark the generation as failed. Do not also call
trackSuccess()
in the same request.
track_feedback({kind})
/
trackFeedback({kind})
anyRecord thumbs-up / thumbs-down from a feedback UI. Independent of the success/error path.
track_openai_metrics(fn)
/
trackOpenAIMetrics(fn)
legacyPredates provider packages. Still works; do not use in new code. Replace with
trackMetricsOf(OpenAIProvider.getAIMetricsFromResponse, fn)
.
track_bedrock_converse_metrics(res)
/
trackBedrockConverseMetrics(res)
legacySame story. Do not use in new code.
trackVercelAISDKGenerateTextMetrics(fn)
(Node)
legacySame story. Use
trackMetricsOf
with the Vercel provider package's extractor.
追踪器对象(
config.tracker
/
aiConfig.tracker
)提供以下方法。这是原始API接口 — 大多数情况下你不应直接调用单个方法,而应使用
trackMetricsOf
或第1层托管运行器。列出这些方法是为了让你能识别现有代码中的方法,并在确实需要第4层时选择正确的方法。
方法(Python ↔ Node)层级功能
track_metrics_of(extractor, fn)
/
trackMetricsOf(extractor, fn)
2 / 3包裹提供商调用,捕获时长+成功/错误状态,调用提取器获取令牌数。这是默认的通用追踪器。
track_metrics_of_async(extractor, fn)
(Python)
2 / 3上述方法的异步版本。
trackStreamMetricsOf(extractor, streamFn)
(仅Node)
2 / 3流式传输版本。当提取器支持处理分片时,捕获分片级使用情况。不会自动捕获TTFT。
track_duration(ms)
/
trackDuration(ms)
4记录延迟(毫秒)。
track_duration_of(fn)
/
trackDurationOf(fn)
4包裹可调用对象并自动记录时长。不捕获令牌数或成功状态 — 需配合显式调用。
track_tokens(TokenUsage)
/
trackTokens({input, output, total})
4记录令牌使用量。
track_time_to_first_token(ms)
/
trackTimeToFirstToken(ms)
4记录流式响应的TTFT。
track_success()
/
trackSuccess()
4将生成标记为成功。监控标签计数需调用此方法。
track_error()
/
trackError()
4将生成标记为失败。同一请求中请勿同时调用
trackSuccess()
track_feedback({kind})
/
trackFeedback({kind})
任意层级记录反馈UI中的点赞/点踩。独立于成功/错误路径。
track_openai_metrics(fn)
/
trackOpenAIMetrics(fn)
遗留方法早于提供商包出现。仍可使用;但不建议在新代码中使用。替换为
trackMetricsOf(OpenAIProvider.getAIMetricsFromResponse, fn)
track_bedrock_converse_metrics(res)
/
trackBedrockConverseMetrics(res)
遗留方法同上。不建议在新代码中使用。
trackVercelAISDKGenerateTextMetrics(fn)
(Node)
遗留方法同上。使用
trackMetricsOf
配合Vercel提供商包的提取器。

Related skills

相关技能

  • aiconfig-create
    — prerequisite if the app doesn't have an AI Config yet
  • aiconfig-custom-metrics
    — business metrics (conversion, resolution, retention) layered on top of the AI metrics this skill captures
  • aiconfig-online-evals
    — automatic quality scoring (LLM-as-judge) on sampled live requests; complementary to the metrics here
  • aiconfig-migrate
    — Stage 4 of the hardcoded-to-AI-Configs migration delegates to this skill
  • aiconfig-create
    — 若应用尚未配置AI Config,此为前置技能
  • aiconfig-custom-metrics
    — 在本技能捕获的AI指标基础上,添加业务指标(转化、解决率、留存率)
  • aiconfig-online-evals
    — 对抽样实时请求进行自动质量评分(LLM作为评审);与本技能的指标互补
  • aiconfig-migrate
    — 从硬编码到AI Config迁移的第4阶段依赖此技能