[Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks

### Background

Many emerging agent workloads are no longer single-turn prompt/response tasks. They are **multi-step, stateful, sandboxed executions** that require:

- isolated execution environments,
- persistent per-run state and workspace,
- concurrent fan-out of multiple rollouts,
- and automatic evaluation through tests, scripts, or environment checks.

This is especially relevant for:

- verifiable agentic tasks such as SWE-bench-style software engineering tasks,
- best-of-N agent evaluation,
- trajectory collection,
- and full RL / post-training pipelines where rollout execution is a core stage.

AgentCube already appears well aligned with this direction through its sandbox-oriented, stateful execution model. This suggests an important positioning:

> **AgentCube can serve as a stateful, isolated, concurrent rollout execution layer for agentic RL and other verifiable agentic tasks.**

### Proposal

The proposal is to make this role explicit.

AgentCube should be positioned as the **execution substrate** responsible for:

- launching and managing one sandbox per rollout,
- preserving per-rollout workspace and session state,
- supporting concurrent rollout fan-out,
- collecting rollout traces, artifacts, and execution outcomes,
- and enabling automated evaluators to score results.

In this model, AgentCube is **not the RL trainer itself**, but it plays a critical role in the RL loop by serving as the rollout layer between:

- task input,
- agent execution,
- environment interaction,
- and evaluator / reward generation.

This also naturally supports non-training scenarios such as best-of-N evaluation and verifiable agent benchmarking.

### Example Workflow

A minimal example is a SWE-bench-style software engineering task:

1. A task includes a repository snapshot, issue description, evaluation script, and execution limits.
2. The same task is expanded into **N rollouts**.
3. AgentCube launches **one isolated sandbox per rollout**.
4. The agent performs multi-step execution inside each sandbox.
5. Intermediate files, logs, and workspace state remain local to that rollout.
6. An evaluator script runs automatically and returns pass/fail, score, and execution metrics.
7. The system compares rollout outcomes such as success rate, best-of-N result, runtime, and artifact quality.

This already demonstrates clear value even before full policy optimization is added.

### Why This Matters

This positioning makes AgentCube useful for two closely related layers of the stack:

**1. Verifiable agent execution**

- sandboxed multi-step tasks
- concurrent rollout evaluation
- best-of-N execution
- trajectory and artifact collection

**2. RL / post-training pipelines**

- rollout generation for GRPO / PPO-style methods
- preference data collection
- verifiable SFT data construction
- reward generation through sandbox execution and evaluation

In other words, AgentCube does not need to own the trainer to be a meaningful part of an agentic RL system. It can provide the rollout execution layer that those systems depend on.

A training-oriented direction such as using sandbox execution as reward for GRPO-style workflows is a natural downstream integration, but the immediate value is already clear at the rollout layer itself. 

### Suggested Capabilities to Discuss

To support this positioning, the following capabilities seem especially valuable:

- rollout-level identity and metadata
- per-rollout sandbox and workspace isolation
- trace collection for multi-step execution
- artifact and evaluator result collection
- concurrency and lifecycle visibility
- optional resume / checkpoint support
- a simple standardized result schema for verifiable tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks #267

Background

Proposal

Example Workflow

Why This Matters

Suggested Capabilities to Discuss

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Position AgentCube as a Stateful, Isolated, Concurrent Rollout Execution Layer for Agentic RL and Verifiable Agentic Tasks #267

Description

Background

Proposal

Example Workflow

Why This Matters

Suggested Capabilities to Discuss

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions