[misc] fix: harden chat template prompt inference by anzhsoft · Pull Request #6529 · verl-project/verl

anzhsoft · 2026-05-29T03:15:57Z

Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked.

Fixes #6500

Fixes #6501

What does this PR do?

This PR hardens initialize_system_prompt and extract_system_prompt_and_generation for chat templates whose rendered token structure is not compatible with the current length-difference heuristic.

The previous logic assumed:

Rendering [user, user] is always valid.
Rendering two empty user turns is append-only relative to one empty user turn.
add_generation_prompt=True appends the assistant prompt after the no-generation render.

These assumptions break for several official chat templates:

Alternating-role templates can reject [user, user], causing initialization to fail.
Templates with conversation-level final tokens can make ordinary role headers look like an implicit system prompt.
Templates such as Phi-3 can replace the final no-generation token with the assistant generation prompt, so slicing with token3[len(token1):] drops the assistant prompt.

The fix validates the rendered token structure before inferring an implicit system prompt, falls back to a valid alternating-role probe when consecutive users are invalid, and extracts the generation prompt by common prefix instead of assuming append-only behavior.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here:
Format the PR title as [{modules}] {type}: {description}

Test

PYTHONPATH=. pytest tests/utils/test_chat_template_on_cpu.py -q
9 passed

ruff check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

ruff format --check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

pre-commit run ruff --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

pre-commit run ruff-format --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

API and Usage Example

No public API change.

initialize_system_prompt(...) still returns list[int].

extract_system_prompt_and_generation(...) still returns:

system_prompt, generation_prompt

Design & Code Changes

Add a small tokenizer rendering helper to normalize apply_chat_template(..., tokenize=True) outputs.
Replace raw length-difference system prompt inference with structure validation.
Preserve the historical [user, user] probe when it is valid.
Add a fallback alternating-role probe using [user, assistant, user] for templates that reject consecutive users.
Avoid inferring system prompts when the observed render is not append-compatible.
Support common final-token suffixes while still avoiding role-header misclassification.
Extract generation prompt by common prefix so templates that replace final tokens still mask the assistant role prompt correctly.
Add CPU-only regression tests covering:
- append-only templates,
- alternating-role templates,
- templates with final conversation tokens,
- non-append-only templates that should return no system prompt,
- generation prompts that replace final tokens.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
- Targeted ruff pre-commit hooks were run on changed files and passed; full all-files pre-commit was not run locally.
Add / Update the documentation.
- Not applicable: this is an internal bug fix with no user-facing API change.
Add unit or end-to-end test(s) to the CI workflow to cover all the code.
- Added CPU-only regression tests in tests/utils/test_chat_template_on_cpu.py.
Once your PR is ready for CI, send a message in the ci-request channel.
If your PR is related to the recipe submodule, update the submodule reference.
- Not applicable.

gemini-code-assist

Code Review

This pull request introduces robust system prompt and generation prompt inference helpers in verl/utils/chat_template.py to handle various tokenizer behaviors, such as alternating roles and common final tokens. It also adds comprehensive unit tests in tests/utils/test_chat_template_on_cpu.py to validate these changes. The reviewer suggested simplifying both _common_suffix_len and _common_prefix_len to make them more Pythonic by using zip and reversed instead of manual indexing.

Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked. Fixes verl-project#6500 Fixes verl-project#6501 Assisted-by: OpenAI Codex Signed-off-by: anzhsoft <anzhsoft@gmail.com>

anzhsoft · 2026-05-29T03:29:56Z

Updated in the latest push. Both helpers now use zip-based iteration while preserving the existing behavior, and the regression tests still pass.

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Comment thread verl/utils/chat_template.py

Comment thread verl/utils/chat_template.py

anzhsoft force-pushed the fix-6500-6501-chat-template-system-prompt branch from 12c3e14 to 432ca8b Compare May 29, 2026 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] fix: harden chat template prompt inference#6529

[misc] fix: harden chat template prompt inference#6529
anzhsoft wants to merge 1 commit into
verl-project:mainfrom
anzhsoft:fix-6500-6501-chat-template-system-prompt

anzhsoft commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

anzhsoft commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anzhsoft commented May 29, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

anzhsoft commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant