nemo-gym-reward-profiling
Original:🇺🇸 English
Translated
Use to help users get started with Nemo Gym reward profiling. Covers the basic ng_run, ng_collect_rollouts, and ng_reward_profile workflow, repeated rollouts, materialized inputs, rollout JSONL artifacts, task and rollout identity, output inspection, partial profiling, and rollout_infos. For failed jobs, prefer nemo-gym-debugging.
2installs
Sourcenvidia/skills
Added on
NPX Install
npx skill4agent add nvidia/skills nemo-gym-reward-profilingTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Nemo Gym Reward Profiling
Invocation Check
Use this skill when the user wants to run, understand, or lightly modify Nemo Gym reward profiling. Keep the answer oriented around the normal workflow:
ng_runng_collect_rolloutsng_reward_profileIf the user is primarily debugging a failed job or stack trace, use the skill first.
nemo-gym-debuggingBasic Workflow
- Identify the environment config paths and input JSONL.
- Start Gym servers with .
ng_run - Collect rollouts with ; this writes
ng_collect_rolloutsandrollouts.jsonl.*_materialized_inputs.jsonl - Run on the materialized inputs and rollout JSONL to generate
ng_reward_profile.*_reward_profiling.jsonl - Inspect line counts and profile rows.
Repeated rollouts are the main profiling lever. is valid, but per-task averages and variance are only meaningful with multiple rollouts per task.
num_repeats=1Core Concepts
- : expanded collection inputs after repeat expansion, agent defaults, and task/rollout id assignment.
*_materialized_inputs.jsonl - : one completed rollout/result per materialized input row.
rollouts.jsonl - : one summarized profile row per original task with at least one completed rollout.
*_reward_profiling.jsonl - : original task/sample id.
_ng_task_index - : repeated rollout id for that task.
_ng_rollout_index - : compact per-rollout info inside each task profile row, including reward, token usage, and numeric rollout metrics when available.
rollout_infos
Keep reward-to-length or reward-to-token analysis keyed by both and .
_ng_task_index_ng_rollout_indexReference Loading
Load references only when the user needs that detail:
- Read for a generic command template and the minimal run sequence.
references/quick-start.md - Read to explain materialized inputs, rollout JSONL, reward profile rows,
references/output-format.md, and partial profiling.rollout_infos
Practical Defaults
- Treat as the reward profiling step; rollout collection does not write reward profile files.
ng_reward_profile - Run strict profiling by default. If rollout collection stopped early, use to profile completed rollouts and drop original input rows with no completed rollout.
++allow_partial_rollouts=True - Trust the target checkout's CLI help and over memory if flags differ.
nemo_gym/reward_profile.py