instrumenting-with-mlflow-tracing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MLflow Tracing Instrumentation Guide

MLflow Tracing 插桩指南

Language-Specific Guides

语言特定指南

Based on the user's project, load the appropriate guide:
  • Python projects: Read
    references/python.md
  • TypeScript/JavaScript projects: Read
    references/typescript.md
If unclear, check for
package.json
(TypeScript) or
requirements.txt
/
pyproject.toml
(Python) in the project.

根据用户的项目,加载对应的指南:
  • Python项目:阅读
    references/python.md
  • TypeScript/JavaScript项目:阅读
    references/typescript.md
如果不确定,可以检查项目中是否有
package.json
(TypeScript项目)或
requirements.txt
/
pyproject.toml
(Python项目)。

What to Trace

需追踪的内容

Trace these operations (high debugging/observability value):
Operation TypeExamplesWhy Trace
Root operationsMain entry points, top-level pipelines, workflow stepsEnd-to-end latency, input/output logging
LLM callsChat completions, embeddingsToken usage, latency, prompt/response inspection
RetrievalVector DB queries, document fetches, searchRelevance debugging, retrieval quality
Tool/function callsAPI calls, database queries, web searchExternal dependency monitoring, error tracking
Agent decisionsRouting, planning, tool selectionUnderstand agent reasoning and choices
External servicesHTTP APIs, file I/O, message queuesDependency failures, timeout tracking
Skip tracing these (too granular, adds noise):
  • Simple data transformations (dict/list manipulation)
  • String formatting, parsing, validation
  • Configuration loading, environment setup
  • Logging or metric emission
  • Pure utility functions (math, sorting, filtering)
Rule of thumb: Trace operations that are important for debugging and identifying issues in your application.

追踪以下操作(具备高调试/可观测价值):
操作类型示例追踪原因
根操作主入口点、顶级流水线、工作流步骤端到端延迟、输入/输出日志记录
LLM调用聊天补全、嵌入Token使用量、延迟、提示词/响应检查
检索操作向量数据库查询、文档获取、搜索相关性调试、检索质量评估
工具/函数调用API调用、数据库查询、网页搜索外部依赖监控、错误追踪
Agent决策路由、规划、工具选择理解Agent的推理过程与选择
外部服务HTTP API、文件I/O、消息队列依赖故障、超时追踪
跳过以下操作的追踪(粒度太细,会增加噪音):
  • 简单的数据转换(字典/列表操作)
  • 字符串格式化、解析、验证
  • 配置加载、环境设置
  • 日志记录或指标上报
  • 纯工具函数(数学计算、排序、过滤)
经验法则:追踪那些对调试和识别应用问题至关重要的操作。

Feedback Collection

反馈收集

Log user feedback on traces for evaluation, debugging, and fine-tuning. Essential for identifying quality issues in production.
See
references/feedback-collection.md
for:
  • Recording user ratings and comments with
    mlflow.log_feedback()
  • Capturing trace IDs to return to clients
  • LLM-as-judge automated evaluation

记录用户对追踪数据的反馈,用于评估、调试和微调。这对于识别生产环境中的质量问题至关重要。
查看
references/feedback-collection.md
了解:
  • 使用
    mlflow.log_feedback()
    记录用户评分与评论
  • 捕获追踪ID并返回给客户端
  • 基于LLM的自动化评估(LLM-as-judge)

Reference Documentation

参考文档

Production Deployment

生产部署

See
references/production.md
for:
  • Environment variable configuration
  • Async logging for low-latency applications
  • Sampling configuration (MLFLOW_TRACE_SAMPLING_RATIO)
  • Lightweight SDK (
    mlflow-tracing
    )
  • Docker/Kubernetes deployment
查看
references/production.md
了解:
  • 环境变量配置
  • 低延迟应用的异步日志记录
  • 采样配置(MLFLOW_TRACE_SAMPLING_RATIO)
  • 轻量级SDK(
    mlflow-tracing
  • Docker/Kubernetes部署

Advanced Patterns

高级模式

See
references/advanced-patterns.md
for:
  • Async function tracing
  • Multi-threading with context propagation
  • PII redaction with span processors
查看
references/advanced-patterns.md
了解:
  • 异步函数追踪
  • 带上下文传播的多线程处理
  • 使用Span处理器进行PII脱敏

Distributed Tracing

分布式追踪

See
references/distributed-tracing.md
for:
  • Propagating trace context across services
  • Client/server header APIs
查看
references/distributed-tracing.md
了解:
  • 跨服务传播追踪上下文
  • 客户端/服务端头部API