nemo-evaluator-plugin
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEvaluator Plugin
Evaluator Plugin
Use this skill when the task is about Evaluator functionality on the plugin architecture. The plugin-backed CLI surface is ; the legacy generated API command group is not the target surface for new guidance.
nemo evaluatornemo evaluation当任务涉及插件架构上的Evaluator功能时,请使用此技能。基于插件的CLI界面为;旧版生成的 API命令组并非新指导内容的目标界面。
nemo evaluatornemo evaluationCurrent Surfaces
当前界面
- a minimal health surface
nemo.services - an SDK-backed entry,
nemo.jobs, for inline metric executionevaluator.evaluate - a minimal CLI and SDK namespace
- plugin-owned docs and skills directories
- 极简的健康检查界面
nemo.services - 基于SDK的入口
nemo.jobs,用于内联指标执行evaluator.evaluate - 极简的CLI和SDK命名空间
- 插件所属的文档和技能目录
CLI Commands
CLI命令
Prerequisite: activate the Python virtual environment before invoking theCLI:nemo.source .venv/bin/activate
Check plugin status from the CLI:
bash
nemo evaluator infoInspect the registered job contract:
bash
nemo evaluator evaluate explainRun an inline metric:
exact-matchbash
nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'Run an inline metric:
string-checkbash
nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'For non-trivial specs, prefer over inline shell JSON:
--spec-filebash
nemo evaluator evaluate run --spec-file evaluation-spec.jsonSubmit the same spec to a cluster:
bash
nemo evaluator evaluate submit \
--spec-file evaluation-spec.json \
--workspace default \
--profile defaultUse as the source of truth for the current plugin job schema.
nemo evaluator evaluate explain前提条件: 在调用CLI前激活Python虚拟环境:nemo。source .venv/bin/activate
通过CLI检查插件状态:
bash
nemo evaluator info查看已注册的任务契约:
bash
nemo evaluator evaluate explain运行内联指标:
exact-matchbash
nemo evaluator evaluate run --spec '{"metric":{"type":"exact-match","reference":"{{item.expected}}","candidate":"{{item.model_output}}"},"dataset":[{"expected":"blue","model_output":"Blue"},{"expected":"Jupiter","model_output":"Saturn"}],"params":{"parallelism":2}}'运行内联指标:
string-checkbash
nemo evaluator evaluate run --spec '{"metric":{"type":"string-check","operation":"contains","left_template":"{{item.answer}}","right_template":"NeMo"},"dataset":[{"answer":"NeMo Platform supports evaluator plugins."}]}'对于复杂的规格说明,推荐使用而非内联Shell JSON:
--spec-filebash
nemo evaluator evaluate run --spec-file evaluation-spec.json将相同的规格说明提交至集群:
bash
nemo evaluator evaluate submit \
--spec-file evaluation-spec.json \
--workspace default \
--profile default请以作为当前插件任务架构的权威参考。
nemo evaluator evaluate explainEvaluation Specs
评估规格说明
The current job accepts inline SDK-backed evaluation specs. At a high level, specs describe:
- : inline Evaluator SDK metric configuration or benchmark metrics
metric - : inline rows to evaluate
dataset - : optional Evaluator SDK execution parameters
params - : optional model or agent target for online evaluation
target
For LLM-judge setup notes, see LLM Judge Notes.
For evaluator API key auth, see Evaluator API Auth.
For local and cluster troubleshooting, see Evaluation Troubleshooting.
Call the SDK-backed status route through the platform SDK:
python
from nemo_platform import NeMoPlatform
client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()当前任务接受内联的基于SDK的评估规格说明。从整体来看,规格说明包含以下内容:
- :内联Evaluator SDK指标配置或基准指标
metric - :待评估的内联行数据
dataset - :可选的Evaluator SDK执行参数
params - :可选的在线评估模型或Agent目标
target
关于LLM-judge的设置说明,请查看LLM Judge Notes。
关于Evaluator API密钥认证,请查看Evaluator API Auth。
关于本地和集群故障排查,请查看Evaluation Troubleshooting。
通过平台SDK调用基于SDK的状态路由:
python
from nemo_platform import NeMoPlatform
client = NeMoPlatform(base_url="http://localhost:8000")
status = client.evaluator.plugin_status()Next Decisions
后续决策
Before replacing stubs, verify the target surface:
- service route adaptation
- job submission or compilation strategy
- packaging split between service and task dependencies
在替换存根之前,请确认目标界面:
- 服务路由适配
- 任务提交或编译策略
- 服务与任务依赖之间的打包拆分