framework__eval — sandboxed scratchpad against the live container

## Goal

Build `framework__eval` — a sandboxed scratchpad MCP tool (and `bin/altair eval` CLI equivalent) that executes a short PHP snippet **inside the project's live container**, returns the structured result, and tears down without persisting state. The agent's "let me check" primitive.

## Why

When an agent forms a hypothesis — "does `UserRepository::findByEmail` return `null` or throw when no match?", "what does `formatNegotiator->getContentTypeByFormat('json')` actually return?" — the cheapest way to validate it is to run a few lines of PHP. Today that requires writing a temporary script, running it via `php -r`, knowing the autoloader path, and bootstrapping the container. Agents don't do that well.

A scratchpad tool collapses all of this:

```php
// agent calls framework__eval with this snippet
$users = container(UserRepository::class);
return $users->findByEmail('does-not-exist@example.com');
```

Returns `null`. Hypothesis validated in 200ms. No file written, no test created.

## How it works

The tool spawns a fresh PHP process (sandboxed; not the agent's session) with the project's autoloader and the container booted. The snippet runs inside a wrapper that:

1. Provides a `container()` helper (resolves from the framework Container)
2. Captures `return` value, stdout, and any throwables
3. Times out at a configurable wall clock (default 5s)
4. Returns structured output as JSON

```json
{
  "result": {
    "type": "null",
    "value": null
  },
  "stdout": "",
  "stderr": "",
  "duration_ms": 187,
  "memory_peak_bytes": 4194304,
  "exception": null
}
```

Or on exception:

```json
{
  "result": null,
  "exception": {
    "class": "App\\User\\UserNotFoundException",
    "message": "No user with email 'x'",
    "file": "src/App/User/UserRepository.php",
    "line": 42,
    "stack_trace": ["..."]
  },
  "duration_ms": 134
}
```

The agent reads `result.value` or `exception.class` and learns. Without breaking the workspace.

## Guard rails

This is the most dangerous tool in the MCP palette — eval is eval. The guardrails:

1. **Always runs in a fresh subprocess.** No state contamination between calls. No persistent container.
2. **Read-only DB connection by default.** A separate `--writes` flag enables writes; the MCP server respects the same `--allow-writes` boot flag from #69.
3. **Filesystem sandbox.** The subprocess inherits a `chdir` to a temp directory; `file_put_contents` to absolute paths outside the project is blocked at the subprocess level via `open_basedir`.
4. **No network by default.** A separate `--network` flag enables outbound HTTP.
5. **Time limit.** Default 5s wall clock; max 60s. Hard kill via `SIGKILL`.
6. **Memory limit.** Default 128MB; max 512MB.
7. **No `exec`/`shell_exec`/`passthru` available.** Disabled via `disable_functions`.
8. **No `eval` available inside the snippet.** Disabled via `disable_functions`.

The `--unsafe` flag lifts every guard rail simultaneously, for cases where the agent (with explicit user confirmation) needs to do something the sandbox forbids. Logs to `.altair/events.jsonl` (#76) every time it's used.

## API surface

### CLI

```bash
bin/altair eval 'return container(UserRepository::class)->count();'
bin/altair eval --file=snippet.php
bin/altair eval --timeout=10s 'return ...;'
bin/altair eval --writes 'container(EntityManager::class)->flush();'
bin/altair eval --network 'return file_get_contents("https://...");'
bin/altair eval --json                # JSON output (default for MCP)
```

Default output is pretty: a heredoc-ish summary with the result type, duration, peak memory, and (if exception) the trace. JSON output is the MCP variant.

### MCP tool

```json
{
  "name": "framework__eval",
  "description": "Execute a PHP snippet against the live container, return structured result",
  "inputSchema": {
    "snippet": "string",
    "timeout_ms": "integer (default 5000, max 60000)",
    "allow_writes": "boolean (default false)",
    "allow_network": "boolean (default false)"
  }
}
```

## Capturing the return value

PHP doesn't easily let you capture the `return` of a script. The wrapper inlines the snippet inside a function:

```php
// generated wrapper
require __DIR__ . '/../vendor/autoload.php';
$result = (function () use (&$ctx) {
    // user snippet inserted here
    return container(UserRepository::class)->count();
})();
file_put_contents('php://stderr', json_encode([
    'result' => Altair\Eval\Encoder::encode($result),
    'memory_peak_bytes' => memory_get_peak_usage(true),
]));
```

Stderr is reserved for the structured result so stdout remains usable for `echo`/`print` capture.

`Altair\Eval\Encoder` produces the typed JSON form:

```json
{ "type": "object", "class": "App\\User\\User", "id": 42 }
{ "type": "array", "value": [...] }
{ "type": "string", "value": "..." }
{ "type": "null", "value": null }
{ "type": "iterable", "preview": [...], "exhausted": false, "size_hint": 100 }
```

Objects render their `__debugInfo()` if available; otherwise public-property snapshot, capped at depth 3. Iterables show the first 50 items (more on request).

## Shape

```
src/Altair/Eval/
├── Cli/
│   └── EvalCommand.php
├── Mcp/
│   └── EvalTool.php
├── Runner/
│   ├── SubprocessRunner.php       # spawns the wrapper, captures result
│   ├── WrapperBuilder.php         # generates the PHP wrapper file
│   └── SecurityProfile.php        # encodes the guardrail flags into php.ini / open_basedir
├── Encoder/
│   ├── ValueEncoder.php           # converts return value to structured JSON
│   └── ExceptionEncoder.php
└── composer.json
```

## Acceptance criteria

- [ ] `bin/altair eval 'return 1 + 1;'` returns `2` with a clean structured output
- [ ] Container helper works: `bin/altair eval 'return container(SomeBoundInterface::class);'` resolves correctly
- [ ] Exception is captured cleanly: `bin/altair eval 'throw new \RuntimeException("nope");'` returns the exception JSON, exit code 1
- [ ] Timeout kills runaway snippets after the limit; clean error message, no zombies
- [ ] Memory limit enforced; clean error rather than OOM kill
- [ ] Network disabled by default; `--network` toggles
- [ ] DB writes disabled by default; `--writes` toggles
- [ ] `disable_functions` prevents `exec`, `shell_exec`, `passthru`, `eval`, `assert`, `system`
- [ ] `open_basedir` confines filesystem writes to the project tree
- [ ] `framework__eval` MCP tool produces the structured response shape
- [ ] `--unsafe` mode emits a mutation event (#76) every time it's used
- [ ] Tests:
  - Golden cases for ValueEncoder (scalars, arrays, objects, iterables, recursion)
  - Exception encoding (with and without previous chain)
  - Subprocess timeout test (snippet that loops, assert it's killed)
  - Security profile test (verifies `disable_functions` actually disables what we expect)
  - End-to-end MCP test

## Out of scope

- Stateful REPL sessions (each eval is independent — by design)
- Async / fiber-based execution (synchronous is fine for the use case)
- Snippet history persistence (the event log #76 covers what was run)
- IDE integration (the MCP tool is the integration)

## Dependencies

- **#17 (`univeros/cli`)** — required
- **#69 (`univeros/mcp`)** — required
- **#76 (mutation event log)** — soft (we emit events but the log can be missing)

No new external deps — uses `proc_open` from stdlib.

## Why this matters

Agents form hypotheses constantly. Without eval, every hypothesis becomes a write-a-test-and-run-it loop. With it, the loop collapses to milliseconds. **The single biggest productivity multiplier for the agent's middle-loop reasoning.**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

framework__eval — sandboxed scratchpad against the live container #79

Goal

Why

How it works

Guard rails

API surface

CLI

MCP tool

Capturing the return value

Shape

Acceptance criteria

Out of scope

Dependencies

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

framework__eval — sandboxed scratchpad against the live container #79

Description

Goal

Why

How it works

Guard rails

API surface

CLI

MCP tool

Capturing the return value

Shape

Acceptance criteria

Out of scope

Dependencies

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions