Skip to content

fix: rollout version dump - filter by loss_mask and add version_rle#1350

Open
pyq623 wants to merge 1 commit into
areal-project:mainfrom
pyq623:fix/rollout-version-dump
Open

fix: rollout version dump - filter by loss_mask and add version_rle#1350
pyq623 wants to merge 1 commit into
areal-project:mainfrom
pyq623:fix/rollout-version-dump

Conversation

@pyq623
Copy link
Copy Markdown

@pyq623 pyq623 commented May 19, 2026

Summary

  • head_version/tail_version now computed per-sample, filtered by loss_mask==1 only
  • Fixes head_version always being -1 due to input token placeholders (-1) polluting min()

Context

When dumping rollout trajectories, the versions tensor includes -1 for input tokens (system prompt, tool results). Previously head_version = min(versions) always returned -1, making cross-version detection useless.

Now only model-generated tokens (where loss_mask == 1) are considered:

output_versions = [v for v, m in zip(sample_versions, mask) if m == 1]
head_version = min(output_versions)
tail_version = max(output_versions)

The new version_rle field records version transitions during generation, e.g. [[2, 63632], [3, 13147]] means 63632 tokens generated by model v2, then 13147 tokens by v3.

Test plan

  • Verified on live experiment (youyi-mini-swe-rl trial29): head_version correctly shows real version numbers
  • Cross-version detection works: head != tail when model weights updated mid-rollout
  • version_rle accurately captures version transitions

- head_version/tail_version now per-sample, filtered by loss_mask==1
- fixes head_version always being -1 due to input token placeholders
- adds version_rle field (run-length encoded per-token version list)
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _dump_trajectory function in areal/infra/workflow_executor.py to provide more granular version tracking, including the addition of a Run-Length Encoding (version_rle) for model versions within each trajectory sample. Feedback from the review suggests filtering the global_tail calculation by the loss mask to prevent placeholder values from being used in directory names. Additionally, the reviewer pointed out a correctness issue where the version_rle should reflect the total generated token count when explicit version data is missing, and recommended using tensor operations for better efficiency.

Comment on lines +877 to +881
global_tail = (
all_versions.max().item()
if all_versions is not None
else default_version
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The global_tail calculation should also be filtered by loss_mask to ensure that the version directory name is not set to -1 (the placeholder for input tokens) if the versions tensor contains such values. This ensures the directory name reflects the actual model version used for generation.

Suggested change
global_tail = (
all_versions.max().item()
if all_versions is not None
else default_version
)
if all_versions is not None:
# Filter by loss_mask to avoid -1 placeholders in the version directory name
valid_versions = all_versions[loss_mask == 1]
global_tail = (
int(valid_versions.max().item())
if valid_versions.numel() > 0
else default_version
)
else:
global_tail = default_version

Comment on lines +901 to +918
if all_versions is not None:
sample_versions = all_versions[i, :seqlen].tolist()
output_versions = [
v for v, m in zip(sample_versions, mask) if m == 1
]
else:
output_versions = [default_version]

head_version = min(output_versions) if output_versions else -1
tail_version = max(output_versions) if output_versions else -1

# RLE: [[version, count], ...]
version_rle = []
for v in output_versions:
if version_rle and version_rle[-1][0] == v:
version_rle[-1][1] += 1
else:
version_rle.append([v, 1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are two issues here:

  1. Correctness: When all_versions is missing, version_rle is set to [[default_version, 1]], which incorrectly reports only one token for the entire completion. It should reflect the total number of generated tokens (i.e., sum(mask)).
  2. Efficiency: The list comprehension for filtering versions can be replaced with more efficient tensor operations.

Additionally, the if output_versions checks are redundant because the preceding if mask[-1] != 1: continue guarantees that at least one token with loss_mask == 1 is present.

                if all_versions is not None:
                    # Filter versions by loss_mask using tensor operations for efficiency
                    output_versions = all_versions[i, :seqlen][loss_mask[i, :seqlen] == 1].tolist()
                    head_version = min(output_versions)
                    tail_version = max(output_versions)
                    
                    # RLE: [[version, count], ...]
                    version_rle = []
                    for v in output_versions:
                        if version_rle and version_rle[-1][0] == v:
                            version_rle[-1][1] += 1
                        else:
                            version_rle.append([v, 1])
                else:
                    head_version = tail_version = default_version
                    # If versions are missing, the entire completion is assumed to be default_version
                    version_rle = [[default_version, int(sum(mask))]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant