Skip to content

[misc] fix: harden chat template prompt inference#6529

Open
anzhsoft wants to merge 1 commit into
verl-project:mainfrom
anzhsoft:fix-6500-6501-chat-template-system-prompt
Open

[misc] fix: harden chat template prompt inference#6529
anzhsoft wants to merge 1 commit into
verl-project:mainfrom
anzhsoft:fix-6500-6501-chat-template-system-prompt

Conversation

@anzhsoft
Copy link
Copy Markdown
Contributor

Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked.

Fixes #6500

Fixes #6501

What does this PR do?

This PR hardens initialize_system_prompt and extract_system_prompt_and_generation for chat templates whose rendered token structure is not compatible with the current length-difference heuristic.

The previous logic assumed:

  1. Rendering [user, user] is always valid.
  2. Rendering two empty user turns is append-only relative to one empty user turn.
  3. add_generation_prompt=True appends the assistant prompt after the no-generation render.

These assumptions break for several official chat templates:

  • Alternating-role templates can reject [user, user], causing initialization to fail.
  • Templates with conversation-level final tokens can make ordinary role headers look like an implicit system prompt.
  • Templates such as Phi-3 can replace the final no-generation token with the assistant generation prompt, so slicing with token3[len(token1):] drops the assistant prompt.

The fix validates the rendered token structure before inferring an implicit system prompt, falls back to a valid alternating-role probe when consecutive users are invalid, and extracts the generation prompt by common prefix instead of assuming append-only behavior.

Checklist Before Starting

Test

PYTHONPATH=. pytest tests/utils/test_chat_template_on_cpu.py -q
9 passed

ruff check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

ruff format --check verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

pre-commit run ruff --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

pre-commit run ruff-format --files verl/utils/chat_template.py tests/utils/test_chat_template_on_cpu.py
passed

API and Usage Example

No public API change.

initialize_system_prompt(...) still returns list[int].

extract_system_prompt_and_generation(...) still returns:

system_prompt, generation_prompt

Design & Code Changes

  • Add a small tokenizer rendering helper to normalize apply_chat_template(..., tokenize=True) outputs.
  • Replace raw length-difference system prompt inference with structure validation.
  • Preserve the historical [user, user] probe when it is valid.
  • Add a fallback alternating-role probe using [user, assistant, user] for templates that reject consecutive users.
  • Avoid inferring system prompts when the observed render is not append-compatible.
  • Support common final-token suffixes while still avoiding role-header misclassification.
  • Extract generation prompt by common prefix so templates that replace final tokens still mask the assistant role prompt correctly.
  • Add CPU-only regression tests covering:
    • append-only templates,
    • alternating-role templates,
    • templates with final conversation tokens,
    • non-append-only templates that should return no system prompt,
    • generation prompts that replace final tokens.

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
    • Targeted ruff pre-commit hooks were run on changed files and passed; full all-files pre-commit was not run locally.
  • Add / Update the documentation.
    • Not applicable: this is an internal bug fix with no user-facing API change.
  • Add unit or end-to-end test(s) to the CI workflow to cover all the code.
    • Added CPU-only regression tests in tests/utils/test_chat_template_on_cpu.py.
  • Once your PR is ready for CI, send a message in the ci-request channel.
  • If your PR is related to the recipe submodule, update the submodule reference.
    • Not applicable.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust system prompt and generation prompt inference helpers in verl/utils/chat_template.py to handle various tokenizer behaviors, such as alternating roles and common final tokens. It also adds comprehensive unit tests in tests/utils/test_chat_template_on_cpu.py to validate these changes. The reviewer suggested simplifying both _common_suffix_len and _common_prefix_len to make them more Pythonic by using zip and reversed instead of manual indexing.

Comment thread verl/utils/chat_template.py
Comment thread verl/utils/chat_template.py
Validate rendered token structure before inferring implicit system prompts, fall back to alternating-role probes when consecutive users are invalid, and extract generation prompts by common prefix so final-token replacement templates keep the assistant prompt masked.

Fixes verl-project#6500

Fixes verl-project#6501

Assisted-by: OpenAI Codex

Signed-off-by: anzhsoft <anzhsoft@gmail.com>
@anzhsoft anzhsoft force-pushed the fix-6500-6501-chat-template-system-prompt branch from 12c3e14 to 432ca8b Compare May 29, 2026 03:28
@anzhsoft
Copy link
Copy Markdown
Contributor Author

Updated in the latest push. Both helpers now use zip-based iteration while preserving the existing behavior, and the regression tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant