quick-eval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseQuick Evaluation
快速评估
Run a complete evaluation for : launch, monitor, and summarize results.
$ARGUMENTS为运行完整评估:启动、监控并汇总结果。
$ARGUMENTSWorkflow
工作流
Step 1: Select Resources
步骤1:选择资源
List and confirm resources:
bash
coval agents list
coval test-sets list
coval personas listConfirm with user:
- Agent to evaluate
- Test set to use
- Persona for simulation
列出并确认资源:
bash
coval agents list
coval test-sets list
coval personas list与用户确认以下内容:
- 待评估的Agent
- 使用的测试集
- 用于模拟的Persona
Step 2: Launch Run
步骤2:启动运行
bash
coval runs launch \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--name "Quick Eval - $(date +%Y%m%d-%H%M)"Capture the run ID from output.
bash
coval runs launch \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--name "Quick Eval - $(date +%Y%m%d-%H%M)"从输出中捕获运行ID。
Step 3: Watch Progress
步骤3:监控进度
bash
coval runs watch <run_id>Wait for completion.
bash
coval runs watch <run_id>等待运行完成。
Step 4: Gather Results
步骤4:收集结果
bash
coval runs get <run_id> --format json
coval simulations list --run-id <run_id> --format jsonbash
coval runs get <run_id> --format json
coval simulations list --run-id <run_id> --format jsonStep 5: Summarize
步骤5:汇总结果
Present a summary:
undefined展示汇总信息:
undefinedEvaluation Complete
评估完成
Run: <run_id>
Agent: <agent_name>
Test Set: <test_set_name>
Duration: X minutes
运行ID: <run_id>
Agent: <agent_name>
测试集: <test_set_name>
耗时: X分钟
Results
结果概览
- Total Simulations: N
- Completed: N
- Failed: N
- 模拟总数:N
- 已完成:N
- 失败:N
Sample Simulations
模拟示例
[List 3-5 simulation IDs for review]
[列出3-5个模拟ID供查看]
Next Steps
后续步骤
- View full results:
coval simulations list --run-id <run_id> - Download audio:
coval simulations audio <sim_id> -o recording.wav - Get transcript:
coval simulations get <sim_id>
undefined- 查看完整结果:
coval simulations list --run-id <run_id> - 下载音频:
coval simulations audio <sim_id> -o recording.wav - 获取转录文本:
coval simulations get <sim_id>
undefined