quick-eval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Quick Evaluation

快速评估

Run a complete evaluation for
$ARGUMENTS
: launch, monitor, and summarize results.
$ARGUMENTS
运行完整评估:启动、监控并汇总结果。

Workflow

工作流

Step 1: Select Resources

步骤1:选择资源

List and confirm resources:
bash
coval agents list
coval test-sets list
coval personas list
Confirm with user:
  • Agent to evaluate
  • Test set to use
  • Persona for simulation
列出并确认资源:
bash
coval agents list
coval test-sets list
coval personas list
与用户确认以下内容:
  • 待评估的Agent
  • 使用的测试集
  • 用于模拟的Persona

Step 2: Launch Run

步骤2:启动运行

bash
coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --name "Quick Eval - $(date +%Y%m%d-%H%M)"
Capture the run ID from output.
bash
coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --name "Quick Eval - $(date +%Y%m%d-%H%M)"
从输出中捕获运行ID。

Step 3: Watch Progress

步骤3:监控进度

bash
coval runs watch <run_id>
Wait for completion.
bash
coval runs watch <run_id>
等待运行完成。

Step 4: Gather Results

步骤4:收集结果

bash
coval runs get <run_id> --format json
coval simulations list --run-id <run_id> --format json
bash
coval runs get <run_id> --format json
coval simulations list --run-id <run_id> --format json

Step 5: Summarize

步骤5:汇总结果

Present a summary:
undefined
展示汇总信息:
undefined

Evaluation Complete

评估完成

Run: <run_id> Agent: <agent_name> Test Set: <test_set_name> Duration: X minutes
运行ID: <run_id> Agent: <agent_name> 测试集: <test_set_name> 耗时: X分钟

Results

结果概览

  • Total Simulations: N
  • Completed: N
  • Failed: N
  • 模拟总数:N
  • 已完成:N
  • 失败:N

Sample Simulations

模拟示例

[List 3-5 simulation IDs for review]
[列出3-5个模拟ID供查看]

Next Steps

后续步骤

  • View full results:
    coval simulations list --run-id <run_id>
  • Download audio:
    coval simulations audio <sim_id> -o recording.wav
  • Get transcript:
    coval simulations get <sim_id>
undefined
  • 查看完整结果:
    coval simulations list --run-id <run_id>
  • 下载音频:
    coval simulations audio <sim_id> -o recording.wav
  • 获取转录文本:
    coval simulations get <sim_id>
undefined