querying-mlflow-metrics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MLflow Metrics

MLflow 指标

Run

scripts/fetch_metrics.py

to query metrics from an MLflow tracking server.

运行

scripts/fetch_metrics.py

从MLflow跟踪服务器查询指标。

Examples

示例

Token usage summary:

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

Output:

AVG: 223.91  SUM: 7613

Hourly token trend (last 24h):

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

Output: Time-bucketed token sums per hour

Latency percentiles by trace:

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

Error rate by status:

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status

Quality scores by evaluator (assessments):

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name

Output: Average and median scores for each evaluator (e.g., correctness, relevance)

Assessment count by name:

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name

JSON output: Add

-o json

to any command.

令牌使用情况汇总：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

输出:

AVG: 223.91  SUM: 7613

每小时令牌使用趋势（过去24小时）：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

输出: 按小时划分的令牌使用量总和

按跟踪项统计的延迟百分位数：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

按状态统计的错误率：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status

按评估者统计的质量分数（评估项）：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name

输出: 每个评估者的平均分数和中位数分数（例如：正确性、相关性）

按名称统计的评估项数量：

bash

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name

JSON输出: 在任意命令后添加

-o json

即可。

Arguments

参数

Arg	Required	Description
`-s, --server`	Yes	MLflow server URL
`-x, --experiment-ids`	Yes	Experiment IDs (comma-separated)
`-m, --metric`	Yes	`trace_count` , `latency` , `input_tokens` , `output_tokens` , `total_tokens`
`-a, --aggregations`	Yes	`COUNT` , `SUM` , `AVG` , `MIN` , `MAX` , `P50` , `P95` , `P99`
`-d, --dimensions`	No	Group by: `trace_name` , `trace_status`
`-t, --time-interval`	No	Bucket size in seconds (3600=hourly, 86400=daily)
`--start-time`	No	`-24h` , `-7d` , `now` , ISO 8601, or epoch ms
`--end-time`	No	Same formats as start-time
`-o, --output`	No	`table` (default) or `json`

For SPANS metrics (

span_count

latency

), add

-v SPANS

. For ASSESSMENTS metrics, add

-v ASSESSMENTS

See references/api_reference.md for filter syntax and full API details.

参数	是否必填	描述
`-s, --server`	是	MLflow服务器URL
`-x, --experiment-ids`	是	实验ID（逗号分隔）
`-m, --metric`	是	可选值： `trace_count` 、 `latency` 、 `input_tokens` 、 `output_tokens` 、 `total_tokens`
`-a, --aggregations`	是	可选值： `COUNT` 、 `SUM` 、 `AVG` 、 `MIN` 、 `MAX` 、 `P50` 、 `P95` 、 `P99`
`-d, --dimensions`	否	分组依据： `trace_name` 、 `trace_status`
`-t, --time-interval`	否	时间桶大小（单位：秒，3600=每小时，86400=每天）
`--start-time`	否	可选格式： `-24h` 、 `-7d` 、 `now` 、ISO 8601 或时间戳毫秒
`--end-time`	否	与开始时间格式相同
`-o, --output`	否	输出格式： `table` （默认）或 `json`

若要获取SPANS指标（

span_count

、

latency

），请添加

-v SPANS

参数。若要获取ASSESSMENTS指标，请添加

-v ASSESSMENTS

参数。

有关过滤语法和完整API详情，请查看references/api_reference.md。