[agent_loop, tool] fix: support hermes-format tool calls on gpt-oss tokenizer models by dafu-wu · Pull Request #6481 · verl-project/verl

dafu-wu · 2026-05-26T07:39:45Z

What does this PR do?

Fixes a crash in the multi-turn tool agent loop when using a model with a gpt-oss tokenizer that emits hermes-style <tool_call> tool calls.

Models based on a gpt-oss tokenizer but SFT-trained to emit hermes-style <tool_call> blocks crash during multi-turn rollout with:

jinja2.exceptions.TemplateError: Message has tool role, but there was no
previous assistant message with a tool call!

Root cause: format: hermes correctly parses <tool_call> and executes tools, but then calls apply_chat_template with a standard role: tool message. The gpt-oss jinja template rejects this format since it expects tool_calls as a structured attribute on the assistant message, not a separate role: tool entry.

Checklist Before Starting

Search for similar PRs:
- https://github.com/verl-project/verl/pulls?q=gpt-oss+tool+apply_chat_template
- https://github.com/verl-project/verl/pulls?q=tool_agent_loop+gpt-oss

Test

Reproduced the TemplateError crash by running multi-turn rollout with a gpt-oss-tokenizer model configured with format: hermes. After this fix, tool execution proceeds normally and tool results are encoded as gpt-oss channel tokens.

API and Usage Example

No API change. Behavior is automatically correct when a gpt-oss tokenizer is detected. Existing configs using format: hermes with gpt-oss tokenizer models will work without any change.

Design & Code Changes

verl/experimental/agent_loop/tool_agent_loop.py

The gpt-oss and gemma4 parsers already have dedicated manual tool response formatters that bypass apply_chat_template. The fix detects whether the loaded tokenizer is gpt-oss-style (presence of <|channel|> special token) and routes tool response encoding through build_gpt_oss_tool_response_text regardless of which parser is configured:

# __init__: detect gpt-oss tokenizer once
_channel_token_id = self.tokenizer.convert_tokens_to_ids("<|channel|>")
self._is_gpt_oss_tokenizer = (
    _channel_token_id is not None and _channel_token_id != self.tokenizer.unk_token_id
)

# _handle_processing_tools_state: route accordingly
if self.tool_parser_name == "gpt-oss" or self._is_gpt_oss_tokenizer:
    tool_response_text = build_gpt_oss_tool_response_text(add_messages, tool_call_names)
    ...

This decouples the tool call extraction format (tool_parser_name, e.g. hermes or gpt-oss) from the tool response encoding format (gpt-oss channel tokens vs. chat template), allowing models that mix gpt-oss framing with hermes-format tool outputs to work correctly.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: the crash requires a real gpt-oss model checkpoint with hermes-format SFT, which is not feasible in CI.
Once your PR is ready for CI, send a message in the ci-request channel.

…s tokenizer is detected When a model is based on a gpt-oss tokenizer but trained to emit hermes-style <tool_call> output, two incompatible things happen simultaneously: 1. format: hermes correctly parses <tool_call> blocks in the model output. 2. apply_chat_template raises TemplateError because the gpt-oss jinja template does not accept standard role:tool messages: "Message has tool role, but there was no previous assistant message with a tool call!" The gpt-oss path already handles this by calling build_gpt_oss_tool_response_text which manually encodes tool results as gpt-oss channel tokens, bypassing apply_chat_template entirely. Fix: detect the gpt-oss tokenizer at init time by checking for the <|channel|> special token. When detected, always use the manual gpt-oss tool response formatter regardless of the configured tool parser name, so models that output hermes-format tool calls but use a gpt-oss tokenizer do not crash.

gemini-code-assist

Code Review

This pull request introduces a check to detect the gpt-oss tokenizer by looking for the <|channel|> special token, ensuring that tool responses are manually formatted when this tokenizer is detected. The reviewer identified a critical issue where multimodal tool responses (which contain structured list content) are stringified as raw Python lists, corrupting the prompt format. A code suggestion was provided to extract and concatenate only the text parts from the content list before formatting.

dafu-wu requested review from ArronHZG and wuxibin89 as code owners May 26, 2026 07:39

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Comment thread verl/experimental/agent_loop/tool_agent_loop.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agent_loop, tool] fix: support hermes-format tool calls on gpt-oss tokenizer models#6481

[agent_loop, tool] fix: support hermes-format tool calls on gpt-oss tokenizer models#6481
dafu-wu wants to merge 1 commit into
verl-project:mainfrom
dafu-wu:fix/tool-agent-loop-gpt-oss-hermes-compat-v2

dafu-wu commented May 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dafu-wu commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dafu-wu commented May 26, 2026 •

edited

Loading