nemo-gym-reward-profiling

Original：🇺🇸 English

Translated

Use to help users get started with Nemo Gym reward profiling. Covers the basic ng_run, ng_collect_rollouts, and ng_reward_profile workflow, repeated rollouts, materialized inputs, rollout JSONL artifacts, task and rollout identity, output inspection, partial profiling, and rollout_infos. For failed jobs, prefer nemo-gym-debugging.

6installs

Sourcenvidia/skills

Added on2026-05-22

NPX Install

npx skill4agent add nvidia/skills nemo-gym-reward-profiling

SKILL.md Content

View Translation Comparison →

Nemo Gym Reward Profiling

Invocation Check

Use this skill when the user wants to run, understand, or lightly modify Nemo Gym reward profiling. Keep the answer oriented around the normal workflow:

ng_run

starts model/resource servers,

ng_collect_rollouts

writes rollout artifacts, and

ng_reward_profile

generates profiling output from those artifacts.

If the user is primarily debugging a failed job or stack trace, use the

nemo-gym-debugging

skill first.

Basic Workflow

Identify the environment config paths and input JSONL.
Start Gym servers with
```
ng_run
```
.

Collect rollouts with

ng_collect_rollouts

; this writes

rollouts.jsonl

and

*_materialized_inputs.jsonl

.

Run
```
ng_reward_profile
```
on the materialized inputs and rollout JSONL to generate
```
*_reward_profiling.jsonl
```
.
Inspect line counts and profile rows.

Repeated rollouts are the main profiling lever.

num_repeats=1

is valid, but per-task averages and variance are only meaningful with multiple rollouts per task.

Core Concepts

```
*_materialized_inputs.jsonl
```
: expanded collection inputs after repeat expansion, agent defaults, and task/rollout id assignment.
```
rollouts.jsonl
```
: one completed rollout/result per materialized input row.
```
*_reward_profiling.jsonl
```
: one summarized profile row per original task with at least one completed rollout.
```
_ng_task_index
```
: original task/sample id.
```
_ng_rollout_index
```
: repeated rollout id for that task.
```
rollout_infos
```
: compact per-rollout info inside each task profile row, including reward, token usage, and numeric rollout metrics when available.

Keep reward-to-length or reward-to-token analysis keyed by both

_ng_task_index

and

_ng_rollout_index

.

Reference Loading

Load references only when the user needs that detail:

Read
```
references/quick-start.md
```
for a generic command template and the minimal run sequence.
Read
```
references/output-format.md
```
to explain materialized inputs, rollout JSONL, reward profile rows,
```
rollout_infos
```
, and partial profiling.

Practical Defaults

Treat
```
ng_reward_profile
```
as the reward profiling step; rollout collection does not write reward profile files.
Run strict profiling by default. If rollout collection stopped early, use
```
++allow_partial_rollouts=True
```
to profile completed rollouts and drop original input rows with no completed rollout.
Trust the target checkout's CLI help and
```
nemo_gym/reward_profile.py
```
over memory if flags differ.