nemo-evaluator-plugin

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Evaluator Plugin

Evaluator Plugin

Use this skill when the task is about Evaluator functionality on the plugin architecture. The plugin-backed CLI surface is
nemo evaluator
; the legacy generated
nemo evaluation
API command group is not the target surface for new guidance.
当任务涉及插件架构上的Evaluator功能时,请使用此技能。基于插件的CLI界面为
nemo evaluator
;旧版生成的
nemo evaluation
API命令组并非新指导内容的目标界面。

Current Surfaces

当前界面

  • a minimal
    nemo.services
    health surface
  • an SDK-backed
    nemo.jobs
    entry,
    evaluator.evaluate
    , for inline metric execution
  • a minimal CLI and SDK namespace
  • plugin-owned docs and skills directories
  • 极简的
    nemo.services
    健康检查界面
  • 基于SDK的
    nemo.jobs
    入口
    evaluator.evaluate
    ,用于内联指标执行
  • 极简的CLI和SDK命名空间
  • 插件所属的文档和技能目录

CLI Commands

CLI命令

Prerequisite: activate the Python virtual environment before invoking the
nemo
CLI:
source .venv/bin/activate
.
Check plugin status from the CLI:
bash
nemo evaluator info
Inspect the registered job contract:
bash
nemo evaluator evaluate explain
Run an inline
exact-match
metric:
bash
nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'
Run an inline
string-check
metric:
bash
nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'
For non-trivial specs, prefer
--spec-file
over inline shell JSON:
bash
nemo evaluator evaluate run --spec-file evaluation-spec.json
Submit the same spec to a cluster:
bash
nemo evaluator evaluate submit \
  --spec-file evaluation-spec.json \
  --workspace default \
  --profile default
Use
nemo evaluator evaluate explain
as the source of truth for the current plugin job schema.
前提条件: 在调用
nemo
CLI前激活Python虚拟环境:
source .venv/bin/activate
通过CLI检查插件状态:
bash
nemo evaluator info
查看已注册的任务契约:
bash
nemo evaluator evaluate explain
运行内联
exact-match
指标:
bash
nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'
运行内联
string-check
指标:
bash
nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'
对于复杂的规格说明,推荐使用
--spec-file
而非内联Shell JSON:
bash
nemo evaluator evaluate run --spec-file evaluation-spec.json
将相同的规格说明提交至集群:
bash
nemo evaluator evaluate submit \
  --spec-file evaluation-spec.json \
  --workspace default \
  --profile default
请以
nemo evaluator evaluate explain
作为当前插件任务架构的权威参考。

Evaluation Specs

评估规格说明

The current job accepts inline SDK-backed evaluation specs. At a high level, specs describe:
  • metric
    : inline Evaluator SDK metric configuration or benchmark metrics
  • dataset
    : inline rows to evaluate
  • params
    : optional Evaluator SDK execution parameters
  • target
    : optional model or agent target for online evaluation
For LLM-judge setup notes, see LLM Judge Notes.
For evaluator API key auth, see Evaluator API Auth.
For local and cluster troubleshooting, see Evaluation Troubleshooting.
Call the SDK-backed status route through the platform SDK:
python
from nemo_platform import NeMoPlatform

client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()
当前任务接受内联的基于SDK的评估规格说明。从整体来看,规格说明包含以下内容:
  • metric
    :内联Evaluator SDK指标配置或基准指标
  • dataset
    :待评估的内联行数据
  • params
    :可选的Evaluator SDK执行参数
  • target
    :可选的在线评估模型或Agent目标
关于LLM-judge的设置说明,请查看LLM Judge Notes
关于Evaluator API密钥认证,请查看Evaluator API Auth
关于本地和集群故障排查,请查看Evaluation Troubleshooting
通过平台SDK调用基于SDK的状态路由:
python
from nemo_platform import NeMoPlatform

client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()

Next Decisions

后续决策

Before replacing stubs, verify the target surface:
  1. service route adaptation
  2. job submission or compilation strategy
  3. packaging split between service and task dependencies
在替换存根之前,请确认目标界面:
  1. 服务路由适配
  2. 任务提交或编译策略
  3. 服务与任务依赖之间的打包拆分