[misc] fix: harden chat template prompt inference#6529
Open
anzhsoft wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces robust system prompt and generation prompt inference helpers in verl/utils/chat_template.py to handle various tokenizer behaviors, such as alternating roles and common final tokens. It also adds comprehensive unit tests in tests/utils/test_chat_template_on_cpu.py to validate these changes. The reviewer suggested simplifying both _common_suffix_len and _common_prefix_len to make them more Pythonic by using zip and reversed instead of manual indexing.
Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked. Fixes verl-project#6500 Fixes verl-project#6501 Assisted-by: OpenAI Codex Signed-off-by: anzhsoft <anzhsoft@gmail.com>
12c3e14 to
432ca8b
Compare
Contributor
Author
|
Updated in the latest push. Both helpers now use zip-based iteration while preserving the existing behavior, and the regression tests still pass. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked.
Fixes #6500
Fixes #6501
What does this PR do?
This PR hardens
initialize_system_promptandextract_system_prompt_and_generationfor chat templates whose rendered token structure is not compatible with the current length-difference heuristic.The previous logic assumed:
[user, user]is always valid.add_generation_prompt=Trueappends the assistant prompt after the no-generation render.These assumptions break for several official chat templates:
[user, user], causing initialization to fail.token3[len(token1):]drops the assistant prompt.The fix validates the rendered token structure before inferring an implicit system prompt, falls back to a valid alternating-role probe when consecutive users are invalid, and extracts the generation prompt by common prefix instead of assuming append-only behavior.
Checklist Before Starting
[{modules}] {type}: {description}Test
PYTHONPATH=. pytest tests/utils/test_chat_template_on_cpu.py -q
9 passed
ruff check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed
ruff format --check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed
pre-commit run ruff --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed
pre-commit run ruff-format --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed
API and Usage Example
No public API change.
initialize_system_prompt(...) still returns list[int].
extract_system_prompt_and_generation(...) still returns:
system_prompt, generation_prompt
Design & Code Changes
Checklist Before Submitting