querying-mlflow-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MLflow Metrics

MLflow 指标

Run
scripts/fetch_metrics.py
to query metrics from an MLflow tracking server.
运行
scripts/fetch_metrics.py
从MLflow跟踪服务器查询指标。

Examples

示例

Token usage summary:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG
Output:
AVG: 223.91  SUM: 7613
Hourly token trend (last 24h):
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now
Output: Time-bucketed token sums per hour
Latency percentiles by trace:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name
Error rate by status:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status
Quality scores by evaluator (assessments):
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name
Output: Average and median scores for each evaluator (e.g., correctness, relevance)
Assessment count by name:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name
JSON output: Add
-o json
to any command.
令牌使用情况汇总:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG
输出:
AVG: 223.91  SUM: 7613
每小时令牌使用趋势(过去24小时):
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now
输出: 按小时划分的令牌使用量总和
按跟踪项统计的延迟百分位数:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name
按状态统计的错误率:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status
按评估者统计的质量分数(评估项):
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name
输出: 每个评估者的平均分数和中位数分数(例如:正确性、相关性)
按名称统计的评估项数量:
bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name
JSON输出: 在任意命令后添加
-o json
即可。

Arguments

参数

ArgRequiredDescription
-s, --server
YesMLflow server URL
-x, --experiment-ids
YesExperiment IDs (comma-separated)
-m, --metric
Yes
trace_count
,
latency
,
input_tokens
,
output_tokens
,
total_tokens
-a, --aggregations
Yes
COUNT
,
SUM
,
AVG
,
MIN
,
MAX
,
P50
,
P95
,
P99
-d, --dimensions
NoGroup by:
trace_name
,
trace_status
-t, --time-interval
NoBucket size in seconds (3600=hourly, 86400=daily)
--start-time
No
-24h
,
-7d
,
now
, ISO 8601, or epoch ms
--end-time
NoSame formats as start-time
-o, --output
No
table
(default) or
json
For SPANS metrics (
span_count
,
latency
), add
-v SPANS
. For ASSESSMENTS metrics, add
-v ASSESSMENTS
.
See references/api_reference.md for filter syntax and full API details.
参数是否必填描述
-s, --server
MLflow服务器URL
-x, --experiment-ids
实验ID(逗号分隔)
-m, --metric
可选值:
trace_count
latency
input_tokens
output_tokens
total_tokens
-a, --aggregations
可选值:
COUNT
SUM
AVG
MIN
MAX
P50
P95
P99
-d, --dimensions
分组依据:
trace_name
trace_status
-t, --time-interval
时间桶大小(单位:秒,3600=每小时,86400=每天)
--start-time
可选格式:
-24h
-7d
now
、ISO 8601 或时间戳毫秒
--end-time
与开始时间格式相同
-o, --output
输出格式:
table
(默认)或
json
若要获取SPANS指标(
span_count
latency
),请添加
-v SPANS
参数。 若要获取ASSESSMENTS指标,请添加
-v ASSESSMENTS
参数。
有关过滤语法和完整API详情,请查看references/api_reference.md