nemo-rl-auto-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Auto Research

自动研究

Run iterative NeMo-RL experiments in this repository against the user's stated objective, such as accuracy, reward, throughput, latency, stability, or another recipe-specific metric, with git as the research ledger.
Treat dependencies as ready, but choose the runtime deliberately. Use the recipe's authoritative metric as the source of truth. Keep changes small, reproducible, and simple. Preserve unrelated user work.
Safety: This skill creates git branches, writes files to disk, and executes shell commands including training jobs that may consume GPU resources. Always confirm the campaign plan with the user before creating branches or launching jobs. Do not execute destructive git operations (reset, force-push) or launch compute-intensive jobs without explicit user approval.
Use the
nemo-rl-session-memory
skill for every auto-research campaign. Start or resume a session record before branching, then checkpoint after forming the plan, before and after meaningful edits or long-running launches, when the user changes direction, and before handoff or final summary.
After context compaction, handoff, disconnect, or a long gap, reload this skill and any companion skills already in use, read the latest
nemo-rl-session-memory
handoff, and restate the overall objective, stop rules, current branch, and latest result before continuing. Treat follow-up steering as additive unless the user explicitly changes the main objective.
以git作为研究记录台账,针对用户指定的目标(如准确率、奖励值、吞吐量、延迟、稳定性或其他实验方案特定指标),在此仓库中运行迭代式NeMo-RL实验。
假设依赖项已准备就绪,但需谨慎选择运行环境。以实验方案的权威指标作为事实依据。保持修改内容小而精、可复现,且不影响用户的其他工作。
安全说明: 本技能会创建git分支、向磁盘写入文件,并执行包括可能占用GPU资源的训练任务在内的shell命令。在创建分支或启动任务前,务必与用户确认实验计划。未经用户明确批准,不得执行破坏性git操作(reset、force-push)或启动计算密集型任务。
每个自动研究项目都需使用
nemo-rl-session-memory
技能。在创建分支前启动或恢复会话记录,然后在形成计划后、进行有意义的编辑或长时间启动任务前后、用户改变方向时,以及交接或最终总结前,创建检查点。
在上下文压缩、交接、断开连接或长时间间隔后,重新加载本技能及所有已在使用的配套技能,读取最新的
nemo-rl-session-memory
交接记录,并在继续之前重新说明整体目标、停止规则、当前分支和最新结果。除非用户明确更改主要目标,否则后续指导均视为补充内容。

Workflow

工作流

  1. Inspect the current git state and identify unrelated user changes before branching.
  2. Use a shared branch prefix. Prefer a user-provided one; otherwise create a suggestive default such as
    autoresearch/2026-03-24-dapo-qwen2p5
    .
  3. Read the target recipe, its parents, and the relevant code paths in
    examples/run_grpo.py
    ,
    nemo_rl/models/
    ,
    nemo_rl/algorithms/
    ,
    nemo_rl/environments/
    , and
    docs/
    . For NeMo-gym recipes, also inspect
    examples/nemo_gym/
    entrypoints, configs, and launch scripts.
  4. Translate any user stop rule into explicit values you can monitor, such as the requested number of experiments as
    target_experiment_count
    ,
    campaign_deadline
    ,
    per_experiment_timeout
    , or
    target_metric
    .
  5. Verify required data, checkpoints, runtime inputs, and the launcher.
  6. Create an untracked TSV log and per-experiment log directory.
  7. Run a baseline first on
    <prefix>/baseline
    if none exists.
For GPU, CPU-heavy, distributed, or long-running work, choose the execution environment deliberately. Run locally when the current machine has suitable GPUs and capacity; otherwise follow the user's requested environment, use
launch-nemo-rl
for nrl-k8s/Kubernetes, use the environment's native launcher for Slurm, or clarify with the user before launching. Use CPU-only local runs only for light inspection, dry runs, and short non-GPU checks.
If the user mentions Brev, or if
/home/ubuntu/RL
exists and
/ephemeral
is available as a volume, treat the machine as a Brev instance and use
nemo-rl-brev-etiquette
before creating experiment directories, caches, logs, checkpoints, or authenticated runtime state.
  1. 在创建分支前,检查当前git状态并识别用户的无关修改。
  2. 使用共享分支前缀。优先使用用户提供的前缀;否则创建一个具有提示性的默认前缀,例如
    autoresearch/2026-03-24-dapo-qwen2p5
  3. 阅读目标实验方案、其父级方案,以及
    examples/run_grpo.py
    nemo_rl/models/
    nemo_rl/algorithms/
    nemo_rl/environments/
    docs/
    中的相关代码路径。对于NeMo-gym实验方案,还需检查
    examples/nemo_gym/
    中的入口文件、配置和启动脚本。
  4. 将用户的任何停止规则转换为可监控的明确值,例如请求的实验次数
    target_experiment_count
    campaign_deadline
    per_experiment_timeout
    target_metric
  5. 验证所需的数据、检查点、运行时输入和启动器。
  6. 创建一个未被追踪的TSV日志和每个实验的日志目录。
  7. 如果不存在基线实验,先在
    <prefix>/baseline
    分支上运行基线实验。
对于GPU、CPU密集型、分布式或长时间运行的任务,需谨慎选择执行环境。当当前机器具备合适的GPU和算力时,在本地运行;否则遵循用户要求的环境,使用
launch-nemo-rl
在nrl-k8s/Kubernetes上运行,使用Slurm环境的原生启动器,或在启动前与用户确认。仅将纯CPU本地运行用于轻量检查、试运行和非GPU相关的简短检查。
如果用户提到Brev,或者
/home/ubuntu/RL
存在且
/ephemeral
可用作卷,则将该机器视为Brev实例,并在创建实验目录、缓存、日志、检查点或认证运行时状态前,遵循
nemo-rl-brev-etiquette
规范。

Branching

分支管理

  • Put every experiment on its own branch under the shared prefix.
  • Keep every branch, even for failed or weak ideas.
  • Put at least one commit on each branch for the hypothesis.
  • Add follow-up fix commits on the same branch when a rerun is justified.
  • Never stash, reset, or overwrite unrelated user changes silently. If dirty files overlap the experiment, use a separate worktree or ask before proceeding.
See
references/git-workflow.md
for the exact pattern.
  • 每个实验都放在共享前缀下的独立分支中。
  • 保留所有分支,即使是用于失败或效果不佳的想法。
  • 每个分支至少有一个提交用于记录假设。
  • 当需要重新运行时,在同一分支上添加后续修复提交。
  • 切勿静默隐藏、重置或覆盖用户的无关修改。如果未提交的文件与实验内容重叠,请使用单独的工作树或在操作前询问用户。
有关确切模式,请参阅
references/git-workflow.md

Loop

循环流程

  1. Pick one concrete hypothesis.
  2. Create a branch such as
    autoresearch/2026-03-24-dapo-qwen2p5/prompt-compact-schema
    .
  3. Edit the smallest set of files needed.
  4. Commit the hypothesis.
  5. Before launching the run, check the monitored stop conditions. Do not stop early unless one is already clearly met.
  6. Identify the authoritative metric source from the recipe or logging code, then run with a unique log path:
bash
LOG_DIR=reports/auto_research/<campaign>/<experiment>
mkdir -p "$LOG_DIR"
uv run <entrypoint> > "$LOG_DIR/run.log" 2>&1
  1. If the user gave a per-experiment wall-clock limit, enforce it explicitly. Prefer a recipe-level timeout when one already exists; otherwise wrap the command with an external timeout. If both exist, honor the tighter limit.
  2. Extract the primary metric with a command appropriate for the actual log format. If extraction is empty, inspect the last log lines and the recipe's logging path before marking the run.
  3. Record index, branch, parent commit, commit, recipe, metric name, metric value, memory (GB), elapsed time (minutes), launcher, job id, command, log path, status, and description in the TSV, along with enough timing or count information to evaluate the stop rule.
  4. Periodically print user-facing progress updates during the campaign. Include the current branch, latest known result, attempted experiment count, remaining experiment count if applicable, remaining campaign time if applicable, and whether any stop condition has been met yet.
  5. Re-check the monitored stop conditions after the experiment completes and state the result explicitly, for example
    stop condition not yet met: 17/24 attempted, 6h12m remaining
    or
    stop condition met: 24/24 attempted
    .
  6. Mark the result as
    keep
    ,
    discard
    , or
    crash
    , then move to the next branch unless a user-specified stop condition has been clearly met.
For count-based stop rules, count attempted ideas, not only successful or fully completed runs.
For campaign time budgets, convert the user limit into an absolute deadline at the start of the campaign and keep checking remaining time.
For per-experiment budgets, enforce a timeout on every run and treat overruns as failures.
Examples:
  • do 50 experiments
    : stop only after 50 attempted experiment rows exist in the TSV
  • 10h total, 1h each
    : enforce a 1 hour limit per run and stop when the 10 hour campaign budget is reached, or when there is not enough remaining budget to start another 1 hour run
  • 50 experiments or 10h total, 1h each
    : monitor all three values, never exceed the per-run cap, and stop only when one campaign-level stop trigger is clearly reached
  1. 选择一个具体的假设。
  2. 创建一个分支,例如
    autoresearch/2026-03-24-dapo-qwen2p5/prompt-compact-schema
  3. 编辑实现目标所需的最少文件。
  4. 提交假设。
  5. 在启动运行前,检查监控的停止条件。除非已明确满足某个条件,否则不要提前停止。
  6. 从实验方案或日志代码中确定权威指标来源,然后使用唯一的日志路径运行:
bash
LOG_DIR=reports/auto_research/<campaign>/<experiment>
mkdir -p "$LOG_DIR"
uv run <entrypoint> > "$LOG_DIR/run.log" 2>&1
  1. 如果用户指定了每个实验的挂钟时间限制,请明确执行该限制。优先使用实验方案已有的超时设置;否则使用外部超时工具包裹命令。如果两者都存在,遵循更严格的限制。
  2. 使用适合实际日志格式的命令提取主要指标。如果提取结果为空,在标记运行结果前检查日志的最后几行和实验方案的日志路径。
  3. 在TSV中记录索引、分支、父提交、提交记录、实验方案、指标名称、指标值、内存(GB)、耗时(分钟)、启动器、作业ID、命令、日志路径、状态和描述,同时记录足够的时间或计数信息以评估停止规则。
  4. 在实验过程中定期向用户展示进度更新。内容包括当前分支、最新已知结果、已尝试的实验次数(若适用)、剩余实验次数(若适用)、剩余实验时间(若适用),以及是否已满足任何停止条件。
  5. 实验完成后重新检查监控的停止条件,并明确说明结果,例如
    未满足停止条件:已尝试17/24次,剩余6小时12分钟
    已满足停止条件:已尝试24/24次
  6. 将结果标记为
    保留
    丢弃
    崩溃
    ,然后转到下一个分支,除非已明确满足用户指定的停止条件。
对于基于次数的停止规则,计数已尝试的想法数量,而不仅是成功或完全完成的运行次数。
对于实验时间预算,在实验开始时将用户限制转换为绝对截止时间,并持续检查剩余时间。
对于每个实验的预算,对每次运行执行超时设置,并将超时视为失败。
示例:
  • 进行50次实验
    :仅当TSV中存在50条已尝试的实验记录时才停止
  • 总时长10小时,每次1小时
    :对每次运行执行1小时限制,当10小时的实验预算用完,或剩余预算不足以启动另一个1小时的运行时停止
  • 50次实验或总时长10小时,每次1小时
    :监控这三个值,绝不超过每次运行的上限,仅当明确触发其中一个实验级停止条件时才停止

Priorities

优先级

Prefer ideas with high expected objective gain and low complexity cost:
  • correctness and backend compatibility
  • prompt and rollout formatting
  • batch, sequence, and precision layout
  • optimizer and scheduler tuning
  • reward shaping, clipping, or scaling
  • dataset mix or validation changes
  • synchronous versus asynchronous execution based on hardware
All else equal, prefer simpler wins and avoid brittle hardware-specific hacks.
优先选择预期目标收益高且复杂度成本低的想法:
  • 正确性与后端兼容性
  • 提示词与回滚格式
  • 批次、序列与精度布局
  • 优化器与调度器调优
  • 奖励塑造、裁剪或缩放
  • 数据集混合或验证变更
  • 基于硬件的同步与异步执行选择
在其他条件相同的情况下,优先选择更简单的改进方案,避免脆弱的硬件特定技巧。

Avoid

注意事项

  • Do not conclude a training idea failed from an underpowered smoke run. If a run uses tiny batch sizes, very few optimizer steps, or otherwise non-representative settings, treat it as plumbing validation only; scale to a meaningful batch size and train long enough to test the hypothesis before marking it
    discard
    .
  • Do not repeatedly pay batch-scheduler setup costs for tight edit-run-debug loops. If Slurm batch jobs have a large startup tax and failures require quick iteration, use the documented interactive Slurm pattern or ask the user before resubmitting more batch jobs.
  • Do not let context compaction or follow-up steering questions erase the original campaign goal. Refresh
    nemo-rl-session-memory
    , reload active skills, and preserve the main objective unless the user explicitly changes it.
  • 不要从算力不足的快速测试中判定某个训练想法失败。如果运行使用极小的批次大小、极少的优化器步数或其他非代表性设置,则仅将其视为管道验证;在标记为
    丢弃
    前,需调整到有意义的批次大小并训练足够长的时间以测试假设。
  • 不要在紧凑的编辑-运行-调试循环中反复支付批处理调度器的设置成本。如果Slurm批处理作业启动成本高且失败需要快速迭代,请使用文档化的交互式Slurm模式,或在重新提交更多批处理作业前询问用户。
  • 不要让上下文压缩或后续指导问题抹去原始的实验目标。刷新
    nemo-rl-session-memory
    ,重新加载活跃技能,并保留主要目标,除非用户明确更改。

Stop

停止规则

If the user gives explicit stopping conditions, they override the generic rule. Do not stop because the search feels sufficient; stop only when the requested count, deadline, budget, or target condition has been clearly met.
During the campaign, explicitly inform the user whether the stop condition has been met. If not, report the remaining count, remaining time, or other remaining threshold in concrete terms.
If the user does not give explicit stopping conditions, run the baseline plus up to three low-risk experiments, then summarize the best result and ask before continuing.
如果用户给出明确的停止条件,它们将覆盖通用规则。不要因为感觉搜索足够而停止;仅当请求的次数、截止日期、预算或目标条件已明确满足时才停止。
在实验过程中,明确告知用户是否已满足停止条件。如果未满足,请用具体术语报告剩余次数、剩余时间或其他剩余阈值。
如果用户未给出明确的停止条件,运行基线实验加上最多三个低风险实验,然后总结最佳结果并询问用户是否继续。

References

参考资料

  • references/git-workflow.md
    for branch, dirty-worktree, parent-commit, and baseline rules.
  • references/exploration-ideas.md
    for turning symptoms into concrete hypotheses.
  • references/experiment-log-template.md
    for the TSV schema and reproducibility fields.
  • references/git-workflow.md
    :分支、未提交工作树、父提交和基线规则。
  • references/exploration-ideas.md
    :将问题症状转化为具体假设。
  • references/experiment-log-template.md
    :TSV schema与可复现性字段。