Skip to content

init commit for external agent framework+gateway#25

Open
zackcxb wants to merge 22 commits into
verl-project:mainfrom
zackcxb:gateway_framework_pr
Open

init commit for external agent framework+gateway#25
zackcxb wants to merge 22 commits into
verl-project:mainfrom
zackcxb:gateway_framework_pr

Conversation

@zackcxb
Copy link
Copy Markdown

@zackcxb zackcxb commented May 18, 2026

What does this PR do?

This PR adds a trainer-side agent framework and gateway runtime for multi-turn agent-style rollout in uni-agent, as a downstream integration of verl RFC #5790 and the upstream agent framework PR verl#6299.

Specifically, it:

  • adds uni_agent.trainer.frameworkAgentFramework abstract base, OpenAICompatibleAgentFramework concrete implementation, and AgentFrameworkRolloutAdapter (satisfies the trainer's agent_loop_manager_class extension point; recipes wire it in via YAML with no per-recipe glue),
  • adds uni_agent.trainer.gateway_GatewayActor / GatewayManager / GatewayServingRuntime for OpenAI-compatible session serving, sticky session routing, tool-parser wiring, and multimodal media accumulation; backend routing delegates to LLMServerClient,
  • adds a deepeyes gateway recipe under examples/deepeyes/,
  • adds CPU tests covering framework contract, gateway actor / manager behavior, session runtime lifecycle, and multimodal postprocess.

Wave 2 additions (length budget enforcement + OpenAI parity):

  • Rollout prompt_length / response_length budget injected into _GatewayActor; continuation turns clamped to remaining budget; budget-exhausted turns materialise a synthetic finish_reason=length response without hitting the backend.
  • All error paths return a unified OpenAI-spec error body ({"error": {"message": …, "type": …, "code": …}}); encode/decode failures caught and surfaced as 400.
  • Per-request chat_template_kwargs forwarded to apply_chat_template; reasoning_content preserved through _normalize_message and prefix comparison.
  • Unsupported OpenAI capabilities (n>1, response_format, tool_choice=required/function) rejected with 400; tool_choice="none" supported (skips tool injection and parser).
  • verl submodule bumped to upstream 3c5f6e04 (verl PR #6129: move LLMServerManager out of AgentLoopManager) so reviewers can git submodule update --init without access to a private fork.

Checklist Before Starting

Test

PYTHONPATH=$(pwd) pytest tests/uni_agent/trainer/ -q

Result: 64 passed, 6 warnings (framework, gateway, runtime, multimodal postprocess).

Real-rollout evidence from the deepeyes gateway recipe: a 50-step GRPO run on multi-turn multimodal data (Qwen3.5-4B, 7× RTX 3090 train + 1× local judge) produced a real learning curve — critic/rewards/mean moved from ~0.21 at step 1 to ~1.86 by step 50.

API and Usage Example

Public APIs added:

  • uni_agent.trainer.frameworkAgentFramework, OpenAICompatibleAgentFramework, AgentFrameworkRolloutAdapter, build_agent_framework
  • uni_agent.trainer.gatewayGatewayServingRuntime, GatewayManager, GatewayActor

Minimum viable wiring via YAML config:

actor_rollout_ref:
  rollout:
    agent:
      agent_loop_manager_class: uni_agent.trainer.framework.entry.AgentFrameworkRolloutAdapter
    custom:
      agent_framework:
        framework_class_fqn: uni_agent.trainer.framework.framework.OpenAICompatibleAgentFramework
        gateway_count: 1

The adapter calls build_agent_framework() which wires GatewayServingRuntime and the framework subclass from config. The agent runner only needs the gateway base URL:

async def agent_runner(*, raw_prompt, session_runtime, sample_index, **_):
    await run_external_agent(
        base_url=f"http://127.0.0.1:{session_runtime.port}",
        raw_prompt=raw_prompt,
    )

generate_sequences() writes finalized trajectories directly to TransferQueue with key "{uid}_{session_id}_{index}", matching AgentLoopWorkerTQ._agent_loop_postprocess()'s field / tag layout.

Design & Code Changes

High-level changes:

  • AgentFramework base class + OpenAICompatibleAgentFramework own session orchestration (create_sessionagent_runnerfinalize_session), trajectory assembly, multimodal post-processing, reward scoring, and TransferQueue writes. Per-session failures are isolated via asyncio.gather(..., return_exceptions=True) so one bad session does not cancel the rest of the batch.
  • _GatewayActor provides OpenAI Chat Completions over sticky sessions with prefix-consistency checks, tool-parser decoding, multimodal media accumulation, and rollout budget enforcement. GatewayManager routes new sessions by least-active count. GatewayServingRuntime owns gateway actor lifecycle and delegates backend routing to LLMServerClient.
  • Multimodal trajectory post-process builds trainer-consumable multi_modal_inputs and (4, seq_len) position ids inside the framework, so VLM sessions do not need per-recipe glue.
  • AgentFrameworkRolloutAdapter satisfies the trainer's agent_loop_manager_class contract; every recipe wires the same class in YAML — no per-recipe adapter code.

WIP / Follow-up

  • GatewayActor default placement strategy (at least one per node) once multi-node validation is in
  • Fully Async support

Checklist Before Submitting

  • Read the Contribute Guide (if present).
  • Add unit tests to cover all new code — 64 CPU tests included, following the *_on_cpu.py naming convention.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files
  • Add / update documentation — deferred to a follow-up; inline docstrings ship with this PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive agent framework and gateway system designed to facilitate agentic workflows within a training environment. Key components include a factory for constructing frameworks, an OpenAI-compatible framework implementation that manages sequence generation and trajectory logging, and a gateway system that provides an OpenAI-compatible API for agent interactions. The gateway handles session lifecycle, trajectory buffering, and multimodal data processing. Feedback identifies a critical issue with the incremental token encoding logic in the gateway, which may produce malformed sequences due to assumptions about tokenizer stability and turn separators. Further recommendations include parallelizing reward calculations to improve performance and replacing blocking ray.get calls with asynchronous operations to avoid event loop starvation.

Comment thread uni_agent/trainer/gateway/gateway.py Outdated
Comment on lines +412 to +453
def _encode_incremental(
self,
messages: list[dict[str, Any]],
image_data: list[Any] | None = None,
video_data: list[Any] | None = None,
) -> list[int]:
"""Encode incremental messages (tool results, user follow-ups) for a continuation turn.

Uses the remove_system_prompt pattern from ToolAgentLoop: encode the new messages
alone (which prepends a system prompt), then strip the known system_prompt prefix.
No tools parameter — tool schema is already in the initial prompt_ids.
"""
if self._processor is not None:
raw_prompt = _apply_chat_template(
self._processor,
messages,
add_generation_prompt=True,
tokenize=False,
**self._apply_chat_template_kwargs,
)
videos = video_data
video_metadata = None
if videos is not None:
videos, video_metadata = zip(*videos, strict=False)
videos, video_metadata = list(videos), list(video_metadata)
model_inputs = self._processor(
text=[raw_prompt],
images=image_data,
videos=videos,
video_metadata=video_metadata,
return_tensors="pt",
do_sample_frames=False,
)
ids = normalize_token_ids(model_inputs["input_ids"])
else:
ids = normalize_token_ids(
_apply_chat_template(
self._tokenizer, messages, add_generation_prompt=True,
**self._apply_chat_template_kwargs,
)
)
return ids[len(self._system_prompt):]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The incremental encoding logic is fragile and likely to produce malformed token sequences. Slicing tokens based on the length of a pre-encoded system prompt assumes that the tokenizer is prefix-stable and that the chat template doesn't insert turn separators or special tokens between the system prompt and the first message. Furthermore, concatenating these incremental IDs to the previous turn's response IDs (at line 542) will miss the necessary turn separators (e.g., <|im_end|> and <|im_start|>user) required by most chat templates. It is safer to re-encode the full message history and identify the delta, or simply rely on the backend's prefix caching by sending the full prompt.

Comment thread uni_agent/trainer/framework/framework.py Outdated
gateway_actor_kwargs["backend"] = self

self.owned_gateway_actors = [GatewayActor.remote(**gateway_actor_kwargs) for _ in range(gateway_count)]
ray.get([gateway.start.remote() for gateway in self.owned_gateway_actors])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using ray.get inside an async context (called via build_agent_framework) will block the event loop, preventing other concurrent tasks from making progress. Since a helper _await_ray_ref is already defined in this file, you should consider moving the gateway startup logic to an async initialization method that can be awaited, rather than performing blocking calls in the constructor.

@wangtiance
Copy link
Copy Markdown

为什么放在trainer目录下?我觉得这是黑盒调用训推通用的流程。我偏向往上提一级,直接放uni_agent/framework和uni_agent/gateway.

Overwrite stale init-commit code with latest verl gateway-framework-pr-source
(4605deb4). Key changes since init:
- Inline reward scoring (remove callable abstraction)
- Gateway round-robin placement, finish_reason map, tool parse tolerance
- Zero-fill rollout_log_probs/rm_scores for trainer compatibility
- Session concurrency cap (max_concurrent_sessions)
- Delete helpers.py (inlined into framework.py)
- Entry.py now includes AgentFrameworkRolloutAdapter

Import paths rewritten: verl.agent.{framework,gateway} -> uni_agent.trainer.{framework,gateway}
External verl imports (verl.utils.*, verl.workers.*) kept as-is (accessed via submodule).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from bee0d08 to ef46265 Compare May 21, 2026 07:56
zackcxb and others added 2 commits May 24, 2026 14:29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
return DataProto(batch=batch, non_tensor_batch=non_tensor_batch)


class OpenAICompatibleAgentFramework(AgentFramework):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move OpenAICompatibleAgentFramework into a separate file, keep abstract interface only.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the abstract interface to base.py

zackcxb and others added 2 commits May 27, 2026 11:27
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread uni_agent/trainer/framework/entry.py Outdated
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from 2da5be1 to 825b7f3 Compare May 27, 2026 11:55
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread uni_agent/trainer/gateway/runtime.py Outdated
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from 825b7f3 to 9c7c97a Compare May 27, 2026 13:16
@zackcxb zackcxb marked this pull request as ready for review May 28, 2026 03:14
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from 2dc9a0f to aa08c58 Compare May 28, 2026 03:33
zackcxb added 8 commits May 28, 2026 04:07
…fixes from PR verl-project#5

Three independent fixes carried over from PR verl-project#5 (zqz/main d700136) so this PR is self-contained:

A. _normalize_message now parses tool_calls[i].function.arguments from JSON string to dict. The OpenAI spec defines arguments as a JSON string, but Qwen-style chat templates iterate via |items and require a dict; without this conversion multi-turn tool calling breaks for those templates.

B. compute_position_ids returns the text-only path when multi_modal_inputs is empty, avoiding a multimodal codepath crash on pure-text samples in mixed-modality runs.

C. OpenAICompatibleAgentFramework injects session_runtime as a kwarg into agent_runner so blackbox runners (mini-swe-agent) can drive session lifecycle directly. deepeyes_agent_runner already accepts **kwargs and is unaffected.
Plumb actor_rollout_ref.rollout.prompt_length and response_length from
hydra config through entry.py into _GatewayActor as runtime constants.
No enforcement yet; wave2 follow-up commit enforces continuation budget
and clamps request max_tokens against the response budget.
…est max_tokens

Mirrors ToolAgentLoop:358-359 budget semantics. When continuation incremental tokens would push the trajectory past response_length, the gateway materializes the current state as a length-stop trajectory (extra_fields.finish_reason='length') and returns an OpenAI-compatible finish_reason='length' response without invoking the backend.

Also clamps request-level max_tokens to (response_length - current_response_mask_len) so request overrides cannot exceed training response budget. The framework TQ writer copies finish_reason from extra_fields into the TQ tag for downstream filtering. Length-stop trajectories keep status='success' to align with VERL native ToolAgentLoop semantics; mask/drop policy is deferred to framework layer.
All gateway error responses now use the OpenAI/vLLM-compatible top-level body shape {error: {message, type, code, param}}. A FastAPI exception handler installed in _register_routes rewrites HTTPException raises into this shape; business code uses idiomatic raise HTTPException(...) and relies on the handler for body shape.

Backend ValueError raises 400 invalid_request_error; other backend exceptions raise 500 internal_server_error. ContextWindow keyword detection (litellm whitelist alignment) is deferred to verl source PR V4.
…ntent

Per-request chat_template_kwargs in payload are now merged with the actor-init self._apply_chat_template_kwargs and passed to verl.utils.chat_template.apply_chat_template. Per-request values override actor-init defaults so reasoning models can switch enable_thinking / reasoning_effort etc. on a per-call basis.

_normalize_message now allow-lists reasoning_content so multi-turn reasoning history (Qwen3.5 chat_template.jinja:90-99 reads message.reasoning_content) survives gateway-level message normalization. _canonicalize_message_for_prefix_comparison automatically picks up the new field.

Note: this is input-side reasoning support only. Output-side reasoning parsing (gateway _decode_response splitting reasoning_content from content) and verl finish_reason raw-preservation (V2) are deferred to follow-up work; reasoning models still need agent-side parsing of content -> reasoning_content for now.
…hoice=none

Per Q5c (AReaL-equivalent boundary): stream=true falls back to non-streaming with a warning so existing OpenAI SDK clients with unconditional stream=False/True flags do not break. n!=1 / response_format / tool_choice="required" / tool_choice with a specific function return 400 invalid_request_error because the gateway cannot honor those semantics without verl source PR adding grammar / multi-sample passthrough.

tool_choice="none" is now supported: tools are not injected into the chat template, and the existing tool-parser branch in _decode_response naturally skips parsing when tools is None.

Defaults / unset tool_choice continue to behave as before.
zackcxb added 4 commits May 28, 2026 04:07
Trim 6 low-signal tests, merge 2 parametrize groups:
- Delete: test_gateway_actor_accepts_and_stores_rollout_budget (e2e covers)
- Delete: test_normalize_request_context_preserves_multimodal_blocks (identity pass-through)
- Delete: test_decode_response_preserves_unknown_stop_reasons (hypothetical backend value)
- Delete: test_gateway_serving_runtime_complete_session_forwards_reward_info (duplicate)
- Delete: test_canonicalize_for_prefix_comparison_includes_reasoning_content (private impl detail)
- Delete: test_run_session_passes_session_runtime_to_agent_runner (private impl detail)
- Merge sampling-params allowlist tests into parametrized test_gateway_actor_allowlist_filters_sampling_params
- Merge trajectory-split trigger tests into parametrized test_gateway_actor_context_change_splits_trajectory

64 tests pass.
Auto-fixed by ruff: import sort (I001), quoted annotations (UP037),
line length (E501), and ruff-format reformatting across changed files.
mypy and compileall pass.
… config

Remove hardware-specific and experiment-specific fields that don't belong
in a community recipe: model path, batch sizes, fsdp offload flags, sglang
tuning knobs, checkpoint config, and trainer experiment names.

Keeps only recipe-relevant fields, matching the style of
verl/recipe/deepeyes/configs/deepeyes_multiturn_grpo.yaml.
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from a5e6007 to a7e392b Compare May 28, 2026 04:08
@zackcxb zackcxb force-pushed the gateway_framework_pr branch from a7e392b to 9677db1 Compare May 28, 2026 08:09
@yyDing1
Copy link
Copy Markdown
Collaborator

yyDing1 commented May 28, 2026

The current entry point binds a single runner via agent_runner_fqn + agent_runner_kwargs. This works for a single-task recipe like DeepEyes, but it doesn't scale to multi-task rollout.

We may introduce an AgentRunner abstract base with a minimal run() contract:

# uni_agent/trainer/framework/runner.py
class AgentRunner(ABC):
    name: str = ""
    @abstractmethod
    async def run(
        self,
        *,
        raw_prompt: list[dict],
        session: SessionHandle,
        session_runtime: SessionRuntime,
        sample_index: int,
        tools_kwargs: dict[str, Any] | None = None,
    ) -> None:
        ...

Each sample carries the runner name; config mounts a name → runner map.

The config could be in the following format:

# agent_runner.yaml
- name: deepeyes
  _target_: examples.agent_train.deepeyes_gateway.runner.DeepEyesAgentRunner
  max_turns: 5
  tools:
    - name: image_zoom_in_tool
      config_path: examples/agent_train/deepeyes_gateway/configs/image_zoom_in_tool_config.yaml

- name: swe
  _target_: examples.agent_train.swe_gateway.runner.SweAgentRunner
  max_turns: 50
  env:
    deployment:
      type: vefaas
      command: ...
  tools:
    - name: str_replace_editor
    - name: execute_bash

Then the framework resolves the runner per-session by sample["agent_runner_name"], like:

# framework.py:_run_session
runner = self._runners_by_name[sample_fields["agent_runner_name"]]
await runner.run(
    raw_prompt=raw_prompt,
    session=session,
    session_runtime=self.session_runtime,
    sample_index=sample_index,
    tools_kwargs=sample_fields.get("tools_kwargs"),
)

This could be similar to verl's existing agent_loop_config pattern, and we can adopt the same shape here.


@self._app.post("/sessions/{session_id}/v1/chat/completions")
async def _chat_completions(session_id: str, request: Request):
payload = await request.json()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish_reason = _FINISH_REASON_MAP.get(stop_reason, stop_reason) if stop_reason else "stop"
return {"role": "assistant", "content": response_text}, finish_reason

async def _handle_chat_completions(self, session_id: str, payload: dict[str, Any]) -> JSONResponse:
Copy link
Copy Markdown
Collaborator

@wuxibin89 wuxibin89 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate gateway functionality: http request -> SessionState.encode -> LLMServerClient.generate -> SessionState.decode -> http response. Gateway should only handle http related request/response.

@gxlvera
Copy link
Copy Markdown

gxlvera commented May 28, 2026

Hi, I would like to propose using Prefix Trie for multi-trajectory storage for Agentgateway. My RFC is here:#51
This approach could address the following limitations of current implementation:

  • Single active branch only: A session keeps one message_history and one active trajectory. When switching sub-agents, picking a resample path, or returning to an older branch, new requests cannot reattach to historical branches. A trie keeps every branch; incoming messages longest-prefix-match against any path and continue from there.
  • Repeated encoding of shared prefixes: Message/token prefixes shared across trajectories are re-materialized and re-tokenized on every branch switch. A trie stores checkpoints on shared nodes; later calls clone from the matched node and tokenize incrementally.
  • No concurrent inference: One shared state requires a generation lock and serial LLM calls. With a trie, each call owns a cloned branch state; tokenize and commit can interleave—supporting sub-agents, best-of-n, etc.

For detailed explanation, please also refer to this comment: verl-project/verl#6299 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants