Skip to content

[Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks #267

@acsoto

Description

@acsoto

Background

Many emerging agent workloads are no longer single-turn prompt/response tasks. They are multi-step, stateful, sandboxed executions that require:

  • isolated execution environments,
  • persistent per-run state and workspace,
  • concurrent fan-out of multiple rollouts,
  • and automatic evaluation through tests, scripts, or environment checks.

This is especially relevant for:

  • verifiable agentic tasks such as SWE-bench-style software engineering tasks,
  • best-of-N agent evaluation,
  • trajectory collection,
  • and full RL / post-training pipelines where rollout execution is a core stage.

AgentCube already appears well aligned with this direction through its sandbox-oriented, stateful execution model. This suggests an important positioning:

AgentCube can serve as a stateful, isolated, concurrent rollout execution layer for agentic RL and other verifiable agentic tasks.

Proposal

The proposal is to make this role explicit.

AgentCube should be positioned as the execution substrate responsible for:

  • launching and managing one sandbox per rollout,
  • preserving per-rollout workspace and session state,
  • supporting concurrent rollout fan-out,
  • collecting rollout traces, artifacts, and execution outcomes,
  • and enabling automated evaluators to score results.

In this model, AgentCube is not the RL trainer itself, but it plays a critical role in the RL loop by serving as the rollout layer between:

  • task input,
  • agent execution,
  • environment interaction,
  • and evaluator / reward generation.

This also naturally supports non-training scenarios such as best-of-N evaluation and verifiable agent benchmarking.

Example Workflow

A minimal example is a SWE-bench-style software engineering task:

  1. A task includes a repository snapshot, issue description, evaluation script, and execution limits.
  2. The same task is expanded into N rollouts.
  3. AgentCube launches one isolated sandbox per rollout.
  4. The agent performs multi-step execution inside each sandbox.
  5. Intermediate files, logs, and workspace state remain local to that rollout.
  6. An evaluator script runs automatically and returns pass/fail, score, and execution metrics.
  7. The system compares rollout outcomes such as success rate, best-of-N result, runtime, and artifact quality.

This already demonstrates clear value even before full policy optimization is added.

Why This Matters

This positioning makes AgentCube useful for two closely related layers of the stack:

1. Verifiable agent execution

  • sandboxed multi-step tasks
  • concurrent rollout evaluation
  • best-of-N execution
  • trajectory and artifact collection

2. RL / post-training pipelines

  • rollout generation for GRPO / PPO-style methods
  • preference data collection
  • verifiable SFT data construction
  • reward generation through sandbox execution and evaluation

In other words, AgentCube does not need to own the trainer to be a meaningful part of an agentic RL system. It can provide the rollout execution layer that those systems depend on.

A training-oriented direction such as using sandbox execution as reward for GRPO-style workflows is a natural downstream integration, but the immediate value is already clear at the rollout layer itself.

Suggested Capabilities to Discuss

To support this positioning, the following capabilities seem especially valuable:

  • rollout-level identity and metadata
  • per-rollout sandbox and workspace isolation
  • trace collection for multi-step execution
  • artifact and evaluator result collection
  • concurrency and lifecycle visibility
  • optional resume / checkpoint support
  • a simple standardized result schema for verifiable tasks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions