Skip to content

[Feature] trajectory dump/replay for offline training debugging #1343

@daihaowz

Description

@daihaowz

Checklist

  • This feature will maintain backward compatibility with the current APIs in
    areal/api/. If not, please raise a refactor issue first.

Background

During RL training, the rollout phase dominates wall-clock time and resource consumption (inference servers, network communication, reward computation). When developers need to iterate on training logic (advantage computation, PPO loss, reward normalization, etc.), re-running full rollouts each time is prohibitively expensive.

Additionally, when a training issue is observed at step N, reproducing the exact conditions requires re-running the entire experiment up to that point. Without a way to capture and replay the exact rollout data, debugging becomes slow and non-deterministic.

We need a trajectory record/replay mechanism that allows:

  1. Dump mode: Serialize the complete rollout batch (token IDs, loss masks, logprobs, rewards, etc.) to disk at each training step during normal training.
  2. Replay mode: Skip rollout and inference engine initialization entirely, loading previously recorded batches from disk to drive the training loop.

This enables:

  • Deterministic reproduction: Bugs observed at a specific step can be reliably reproduced by replaying the exact same input data, eliminating non-determinism from rollout.
  • Efficient debugging: Isolate training-side issues from rollout-side issues by holding the input data constant.

Potential Solution

Add a DebugConfig to PPOConfig with two mutually exclusive flags:

  • dump_rollout_data: Record each step's full tensor batch to disk as .pt files.
  • replay_rollout_data: Load batches from disk, bypassing rollout and inference engine entirely.
  • path: Optional custom directory for dump/replay files.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions