nemo-evaluator-plugin

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Evaluator Plugin

Use this skill when the task is about Evaluator functionality on the plugin architecture. The plugin-backed CLI surface is

nemo evaluator

; the legacy generated

nemo evaluation

API command group is not the target surface for new guidance.

当任务涉及插件架构上的Evaluator功能时，请使用此技能。基于插件的CLI界面为

nemo evaluator

；旧版生成的

nemo evaluation

API命令组并非新指导内容的目标界面。

Current Surfaces

当前界面

a minimal
```
nemo.services
```
health surface
an SDK-backed
```
nemo.jobs
```
entry,
```
evaluator.evaluate
```
, for inline metric execution
a minimal CLI and SDK namespace
plugin-owned docs and skills directories

极简的
```
nemo.services
```
健康检查界面
基于SDK的
```
nemo.jobs
```
入口
```
evaluator.evaluate
```
，用于内联指标执行
极简的CLI和SDK命名空间
插件所属的文档和技能目录

CLI Commands

CLI命令

Prerequisite: activate the Python virtual environment before invoking the
nemo
CLI:
source .venv/bin/activate
.

Check plugin status from the CLI:

bash

nemo evaluator info

Inspect the registered job contract:

bash

nemo evaluator evaluate explain

Run an inline

exact-match

metric:

bash

nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'

Run an inline

string-check

metric:

bash

nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'

For non-trivial specs, prefer

--spec-file

over inline shell JSON:

bash

nemo evaluator evaluate run --spec-file evaluation-spec.json

Submit the same spec to a cluster:

bash

nemo evaluator evaluate submit \
  --spec-file evaluation-spec.json \
  --workspace default \
  --profile default

Use

nemo evaluator evaluate explain

as the source of truth for the current plugin job schema.

前提条件： 在调用
nemo
CLI前激活Python虚拟环境：
source .venv/bin/activate
。

通过CLI检查插件状态：

bash

nemo evaluator info

查看已注册的任务契约：

bash

nemo evaluator evaluate explain

运行内联

exact-match

指标：

bash

nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'

运行内联

string-check

指标：

bash

nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'

对于复杂的规格说明，推荐使用

--spec-file

而非内联Shell JSON：

bash

nemo evaluator evaluate run --spec-file evaluation-spec.json

将相同的规格说明提交至集群：

bash

nemo evaluator evaluate submit \
  --spec-file evaluation-spec.json \
  --workspace default \
  --profile default

请以

nemo evaluator evaluate explain

作为当前插件任务架构的权威参考。

Evaluation Specs

评估规格说明

The current job accepts inline SDK-backed evaluation specs. At a high level, specs describe:

```
metric
```
: inline Evaluator SDK metric configuration or benchmark metrics
```
dataset
```
: inline rows to evaluate
```
params
```
: optional Evaluator SDK execution parameters
```
target
```
: optional model or agent target for online evaluation

For LLM-judge setup notes, see LLM Judge Notes.

For evaluator API key auth, see Evaluator API Auth.

For local and cluster troubleshooting, see Evaluation Troubleshooting.

Call the SDK-backed status route through the platform SDK:

python

from nemo_platform import NeMoPlatform

client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()

当前任务接受内联的基于SDK的评估规格说明。从整体来看，规格说明包含以下内容：

```
metric
```
：内联Evaluator SDK指标配置或基准指标
```
dataset
```
：待评估的内联行数据
```
params
```
：可选的Evaluator SDK执行参数
```
target
```
：可选的在线评估模型或Agent目标

关于LLM-judge的设置说明，请查看LLM Judge Notes。

关于Evaluator API密钥认证，请查看Evaluator API Auth。

关于本地和集群故障排查，请查看Evaluation Troubleshooting。

通过平台SDK调用基于SDK的状态路由：

python

from nemo_platform import NeMoPlatform

client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()

Next Decisions

后续决策

Before replacing stubs, verify the target surface:

service route adaptation
job submission or compilation strategy
packaging split between service and task dependencies

在替换存根之前，请确认目标界面：

服务路由适配
任务提交或编译策略
服务与任务依赖之间的打包拆分