monitor-experiment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Monitor Experiment Results

监控实验结果

Monitor: $ARGUMENTS
监控:$ARGUMENTS

Workflow

工作流程

Step 1: Check What's Running

步骤1:检查正在运行的任务

bash
ssh <server> "screen -ls"
bash
ssh <server> "screen -ls"

Step 2: Collect Output from Each Screen

步骤2:收集每个Screen会话的输出

For each screen session, capture the last N lines:
bash
ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"
If hardcopy fails, check for log files or tee output.
针对每个screen会话,捕获最后N行内容:
bash
ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"
如果hardcopy命令失败,检查日志文件或tee输出。

Step 3: Check for JSON Result Files

步骤3:检查JSON结果文件

bash
ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"
If JSON results exist, fetch and parse them:
bash
ssh <server> "cat <results_dir>/<latest>.json"
bash
ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"
如果存在JSON结果,获取并解析它们:
bash
ssh <server> "cat <results_dir>/<latest>.json"

Step 4: Summarize Results

步骤4:总结结果

Present results in a comparison table:
| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |
以对比表格的形式呈现结果:
| 实验 | 指标 | 与基线的差值 | 状态 |
|-----------|--------|-------------------|--------|
| 基线  | X.XX   | —                 | 完成   |
| 方法A  | X.XX   | +Y.Y              | 完成   |

Step 5: Interpret

步骤5:结果解读

  • Compare against known baselines
  • Flag unexpected results (negative delta, NaN, divergence)
  • Suggest next steps based on findings
  • 与已知基线进行对比
  • 标记异常结果(差值为负、NaN、偏离预期)
  • 根据发现建议下一步操作

Step 6: Feishu Notification (if configured)

步骤6:飞书通知(若已配置)

After results are collected, check
~/.claude/feishu.json
:
  • Send
    experiment_done
    notification: results summary table, delta vs baseline
  • If config absent or mode
    "off"
    : skip entirely (no-op)
收集结果后,检查
~/.claude/feishu.json
配置文件:
  • 发送
    experiment_done
    通知:包含结果汇总表格、与基线的差值
  • 如果配置文件不存在或模式为
    "off"
    :直接跳过(无操作)

Key Rules

关键规则

  • Always show raw numbers before interpretation
  • Compare against the correct baseline (same config)
  • Note if experiments are still running (check progress bars, iteration counts)
  • If results look wrong, check training logs for errors before concluding
  • 解读前务必展示原始数据
  • 与正确的基线进行对比(相同配置)
  • 注意实验是否仍在运行(检查进度条、迭代次数)
  • 如果结果看起来异常,先检查训练日志中的错误再下结论