instrumenting-with-mlflow-tracing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MLflow Tracing Instrumentation Guide

MLflow Tracing 插桩指南

Language-Specific Guides

语言特定指南

Based on the user's project, load the appropriate guide:

Python projects: Read
```
references/python.md
```
TypeScript/JavaScript projects: Read
```
references/typescript.md
```

If unclear, check for

package.json

(TypeScript) or

requirements.txt

pyproject.toml

(Python) in the project.

根据用户的项目，加载对应的指南：

Python项目：阅读
```
references/python.md
```
TypeScript/JavaScript项目：阅读
```
references/typescript.md
```

如果不确定，可以检查项目中是否有

package.json

（TypeScript项目）或

requirements.txt

pyproject.toml

（Python项目）。

What to Trace

需追踪的内容

Trace these operations (high debugging/observability value):

Operation Type	Examples	Why Trace
Root operations	Main entry points, top-level pipelines, workflow steps	End-to-end latency, input/output logging
LLM calls	Chat completions, embeddings	Token usage, latency, prompt/response inspection
Retrieval	Vector DB queries, document fetches, search	Relevance debugging, retrieval quality
Tool/function calls	API calls, database queries, web search	External dependency monitoring, error tracking
Agent decisions	Routing, planning, tool selection	Understand agent reasoning and choices
External services	HTTP APIs, file I/O, message queues	Dependency failures, timeout tracking

Skip tracing these (too granular, adds noise):

Simple data transformations (dict/list manipulation)
String formatting, parsing, validation
Configuration loading, environment setup
Logging or metric emission
Pure utility functions (math, sorting, filtering)

Rule of thumb: Trace operations that are important for debugging and identifying issues in your application.

追踪以下操作（具备高调试/可观测价值）：

操作类型	示例	追踪原因
根操作	主入口点、顶级流水线、工作流步骤	端到端延迟、输入/输出日志记录
LLM调用	聊天补全、嵌入	Token使用量、延迟、提示词/响应检查
检索操作	向量数据库查询、文档获取、搜索	相关性调试、检索质量评估
工具/函数调用	API调用、数据库查询、网页搜索	外部依赖监控、错误追踪
Agent决策	路由、规划、工具选择	理解Agent的推理过程与选择
外部服务	HTTP API、文件I/O、消息队列	依赖故障、超时追踪

跳过以下操作的追踪（粒度太细，会增加噪音）：

简单的数据转换（字典/列表操作）
字符串格式化、解析、验证
配置加载、环境设置
日志记录或指标上报
纯工具函数（数学计算、排序、过滤）

经验法则：追踪那些对调试和识别应用问题至关重要的操作。

Feedback Collection

反馈收集

Log user feedback on traces for evaluation, debugging, and fine-tuning. Essential for identifying quality issues in production.

See

references/feedback-collection.md

for:

Recording user ratings and comments with
```
mlflow.log_feedback()
```
Capturing trace IDs to return to clients
LLM-as-judge automated evaluation

记录用户对追踪数据的反馈，用于评估、调试和微调。这对于识别生产环境中的质量问题至关重要。

查看

references/feedback-collection.md

了解：

使用
```
mlflow.log_feedback()
```
记录用户评分与评论
捕获追踪ID并返回给客户端
基于LLM的自动化评估（LLM-as-judge）

Reference Documentation

参考文档

Production Deployment

生产部署

See

references/production.md

for:

Environment variable configuration
Async logging for low-latency applications
Sampling configuration (MLFLOW_TRACE_SAMPLING_RATIO)
Lightweight SDK (
```
mlflow-tracing
```
)
Docker/Kubernetes deployment

查看

references/production.md

了解：

环境变量配置
低延迟应用的异步日志记录
采样配置（MLFLOW_TRACE_SAMPLING_RATIO）
轻量级SDK（
```
mlflow-tracing
```
）
Docker/Kubernetes部署

Advanced Patterns

高级模式

See

references/advanced-patterns.md

for:

Async function tracing
Multi-threading with context propagation
PII redaction with span processors

查看

references/advanced-patterns.md

了解：

异步函数追踪
带上下文传播的多线程处理
使用Span处理器进行PII脱敏

Distributed Tracing

分布式追踪

See

references/distributed-tracing.md

for:

Propagating trace context across services
Client/server header APIs

查看

references/distributed-tracing.md

了解：

跨服务传播追踪上下文
客户端/服务端头部API