[Feature] trajectory dump/replay for offline training debugging

## Checklist

- [x] This feature will maintain backward compatibility with the current APIs in
  `areal/api/`. If not, please raise a refactor issue first.

## Background

During RL training, the rollout phase dominates wall-clock time and resource consumption (inference servers, network communication, reward computation). When developers need to iterate on training logic (advantage computation, PPO loss, reward normalization, etc.), re-running full rollouts each time is prohibitively expensive.

Additionally, when a training issue is observed at step N, reproducing the exact conditions requires re-running the entire experiment up to that point. Without a way to capture and replay the exact rollout data, debugging becomes slow and non-deterministic.

We need a **trajectory record/replay** mechanism that allows:

1. **Dump mode**: Serialize the complete rollout batch (token IDs, loss masks, logprobs, rewards, etc.) to disk at each training step during normal training.
2. **Replay mode**: Skip rollout and inference engine initialization entirely, loading previously recorded batches from disk to drive the training loop.

This enables:
- **Deterministic reproduction**: Bugs observed at a specific step can be reliably reproduced by replaying the exact same input data, eliminating non-determinism from rollout.
- **Efficient debugging**: Isolate training-side issues from rollout-side issues by holding the input data constant.

## Potential Solution

Add a `DebugConfig` to `PPOConfig` with two mutually exclusive flags:
- `dump_rollout_data`: Record each step's full tensor batch to disk as `.pt` files.
- `replay_rollout_data`: Load batches from disk, bypassing rollout and inference engine entirely.
- `path`: Optional custom directory for dump/replay files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] trajectory dump/replay for offline training debugging #1343

Checklist

Background

Potential Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] trajectory dump/replay for offline training debugging #1343

Description

Checklist

Background

Potential Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions