instrumenting-with-mlflow-tracing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMLflow Tracing Instrumentation Guide
MLflow Tracing 插桩指南
Language-Specific Guides
语言特定指南
Based on the user's project, load the appropriate guide:
- Python projects: Read
references/python.md - TypeScript/JavaScript projects: Read
references/typescript.md
If unclear, check for (TypeScript) or / (Python) in the project.
package.jsonrequirements.txtpyproject.toml根据用户的项目,加载对应的指南:
- Python项目:阅读
references/python.md - TypeScript/JavaScript项目:阅读
references/typescript.md
如果不确定,可以检查项目中是否有(TypeScript项目)或/(Python项目)。
package.jsonrequirements.txtpyproject.tomlWhat to Trace
需追踪的内容
Trace these operations (high debugging/observability value):
| Operation Type | Examples | Why Trace |
|---|---|---|
| Root operations | Main entry points, top-level pipelines, workflow steps | End-to-end latency, input/output logging |
| LLM calls | Chat completions, embeddings | Token usage, latency, prompt/response inspection |
| Retrieval | Vector DB queries, document fetches, search | Relevance debugging, retrieval quality |
| Tool/function calls | API calls, database queries, web search | External dependency monitoring, error tracking |
| Agent decisions | Routing, planning, tool selection | Understand agent reasoning and choices |
| External services | HTTP APIs, file I/O, message queues | Dependency failures, timeout tracking |
Skip tracing these (too granular, adds noise):
- Simple data transformations (dict/list manipulation)
- String formatting, parsing, validation
- Configuration loading, environment setup
- Logging or metric emission
- Pure utility functions (math, sorting, filtering)
Rule of thumb: Trace operations that are important for debugging and identifying issues in your application.
追踪以下操作(具备高调试/可观测价值):
| 操作类型 | 示例 | 追踪原因 |
|---|---|---|
| 根操作 | 主入口点、顶级流水线、工作流步骤 | 端到端延迟、输入/输出日志记录 |
| LLM调用 | 聊天补全、嵌入 | Token使用量、延迟、提示词/响应检查 |
| 检索操作 | 向量数据库查询、文档获取、搜索 | 相关性调试、检索质量评估 |
| 工具/函数调用 | API调用、数据库查询、网页搜索 | 外部依赖监控、错误追踪 |
| Agent决策 | 路由、规划、工具选择 | 理解Agent的推理过程与选择 |
| 外部服务 | HTTP API、文件I/O、消息队列 | 依赖故障、超时追踪 |
跳过以下操作的追踪(粒度太细,会增加噪音):
- 简单的数据转换(字典/列表操作)
- 字符串格式化、解析、验证
- 配置加载、环境设置
- 日志记录或指标上报
- 纯工具函数(数学计算、排序、过滤)
经验法则:追踪那些对调试和识别应用问题至关重要的操作。
Feedback Collection
反馈收集
Log user feedback on traces for evaluation, debugging, and fine-tuning. Essential for identifying quality issues in production.
See for:
references/feedback-collection.md- Recording user ratings and comments with
mlflow.log_feedback() - Capturing trace IDs to return to clients
- LLM-as-judge automated evaluation
记录用户对追踪数据的反馈,用于评估、调试和微调。这对于识别生产环境中的质量问题至关重要。
查看了解:
references/feedback-collection.md- 使用记录用户评分与评论
mlflow.log_feedback() - 捕获追踪ID并返回给客户端
- 基于LLM的自动化评估(LLM-as-judge)
Reference Documentation
参考文档
Production Deployment
生产部署
See for:
references/production.md- Environment variable configuration
- Async logging for low-latency applications
- Sampling configuration (MLFLOW_TRACE_SAMPLING_RATIO)
- Lightweight SDK ()
mlflow-tracing - Docker/Kubernetes deployment
查看了解:
references/production.md- 环境变量配置
- 低延迟应用的异步日志记录
- 采样配置(MLFLOW_TRACE_SAMPLING_RATIO)
- 轻量级SDK()
mlflow-tracing - Docker/Kubernetes部署
Advanced Patterns
高级模式
See for:
references/advanced-patterns.md- Async function tracing
- Multi-threading with context propagation
- PII redaction with span processors
查看了解:
references/advanced-patterns.md- 异步函数追踪
- 带上下文传播的多线程处理
- 使用Span处理器进行PII脱敏
Distributed Tracing
分布式追踪
See for:
references/distributed-tracing.md- Propagating trace context across services
- Client/server header APIs
查看了解:
references/distributed-tracing.md- 跨服务传播追踪上下文
- 客户端/服务端头部API