You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build framework__eval — a sandboxed scratchpad MCP tool (and bin/altair eval CLI equivalent) that executes a short PHP snippet inside the project's live container, returns the structured result, and tears down without persisting state. The agent's "let me check" primitive.
Why
When an agent forms a hypothesis — "does UserRepository::findByEmail return null or throw when no match?", "what does formatNegotiator->getContentTypeByFormat('json') actually return?" — the cheapest way to validate it is to run a few lines of PHP. Today that requires writing a temporary script, running it via php -r, knowing the autoloader path, and bootstrapping the container. Agents don't do that well.
A scratchpad tool collapses all of this:
// agent calls framework__eval with this snippet$users = container(UserRepository::class);
return$users->findByEmail('does-not-exist@example.com');
Returns null. Hypothesis validated in 200ms. No file written, no test created.
How it works
The tool spawns a fresh PHP process (sandboxed; not the agent's session) with the project's autoloader and the container booted. The snippet runs inside a wrapper that:
Provides a container() helper (resolves from the framework Container)
Captures return value, stdout, and any throwables
Times out at a configurable wall clock (default 5s)
Filesystem sandbox. The subprocess inherits a chdir to a temp directory; file_put_contents to absolute paths outside the project is blocked at the subprocess level via open_basedir.
No network by default. A separate --network flag enables outbound HTTP.
Time limit. Default 5s wall clock; max 60s. Hard kill via SIGKILL.
Memory limit. Default 128MB; max 512MB.
No exec/shell_exec/passthru available. Disabled via disable_functions.
No eval available inside the snippet. Disabled via disable_functions.
The --unsafe flag lifts every guard rail simultaneously, for cases where the agent (with explicit user confirmation) needs to do something the sandbox forbids. Logs to .altair/events.jsonl (#76) every time it's used.
Default output is pretty: a heredoc-ish summary with the result type, duration, peak memory, and (if exception) the trace. JSON output is the MCP variant.
MCP tool
{
"name": "framework__eval",
"description": "Execute a PHP snippet against the live container, return structured result",
"inputSchema": {
"snippet": "string",
"timeout_ms": "integer (default 5000, max 60000)",
"allow_writes": "boolean (default false)",
"allow_network": "boolean (default false)"
}
}
Capturing the return value
PHP doesn't easily let you capture the return of a script. The wrapper inlines the snippet inside a function:
Objects render their __debugInfo() if available; otherwise public-property snapshot, capped at depth 3. Iterables show the first 50 items (more on request).
Shape
src/Altair/Eval/
├── Cli/
│ └── EvalCommand.php
├── Mcp/
│ └── EvalTool.php
├── Runner/
│ ├── SubprocessRunner.php # spawns the wrapper, captures result
│ ├── WrapperBuilder.php # generates the PHP wrapper file
│ └── SecurityProfile.php # encodes the guardrail flags into php.ini / open_basedir
├── Encoder/
│ ├── ValueEncoder.php # converts return value to structured JSON
│ └── ExceptionEncoder.php
└── composer.json
Acceptance criteria
bin/altair eval 'return 1 + 1;' returns 2 with a clean structured output
No new external deps — uses proc_open from stdlib.
Why this matters
Agents form hypotheses constantly. Without eval, every hypothesis becomes a write-a-test-and-run-it loop. With it, the loop collapses to milliseconds. The single biggest productivity multiplier for the agent's middle-loop reasoning.
Goal
Build
framework__eval— a sandboxed scratchpad MCP tool (andbin/altair evalCLI equivalent) that executes a short PHP snippet inside the project's live container, returns the structured result, and tears down without persisting state. The agent's "let me check" primitive.Why
When an agent forms a hypothesis — "does
UserRepository::findByEmailreturnnullor throw when no match?", "what doesformatNegotiator->getContentTypeByFormat('json')actually return?" — the cheapest way to validate it is to run a few lines of PHP. Today that requires writing a temporary script, running it viaphp -r, knowing the autoloader path, and bootstrapping the container. Agents don't do that well.A scratchpad tool collapses all of this:
Returns
null. Hypothesis validated in 200ms. No file written, no test created.How it works
The tool spawns a fresh PHP process (sandboxed; not the agent's session) with the project's autoloader and the container booted. The snippet runs inside a wrapper that:
container()helper (resolves from the framework Container)returnvalue, stdout, and any throwables{ "result": { "type": "null", "value": null }, "stdout": "", "stderr": "", "duration_ms": 187, "memory_peak_bytes": 4194304, "exception": null }Or on exception:
{ "result": null, "exception": { "class": "App\\User\\UserNotFoundException", "message": "No user with email 'x'", "file": "src/App/User/UserRepository.php", "line": 42, "stack_trace": ["..."] }, "duration_ms": 134 }The agent reads
result.valueorexception.classand learns. Without breaking the workspace.Guard rails
This is the most dangerous tool in the MCP palette — eval is eval. The guardrails:
--writesflag enables writes; the MCP server respects the same--allow-writesboot flag from univeros/mcp — Model Context Protocol server for agent-native workflows #69.chdirto a temp directory;file_put_contentsto absolute paths outside the project is blocked at the subprocess level viaopen_basedir.--networkflag enables outbound HTTP.SIGKILL.exec/shell_exec/passthruavailable. Disabled viadisable_functions.evalavailable inside the snippet. Disabled viadisable_functions.The
--unsafeflag lifts every guard rail simultaneously, for cases where the agent (with explicit user confirmation) needs to do something the sandbox forbids. Logs to.altair/events.jsonl(#76) every time it's used.API surface
CLI
Default output is pretty: a heredoc-ish summary with the result type, duration, peak memory, and (if exception) the trace. JSON output is the MCP variant.
MCP tool
{ "name": "framework__eval", "description": "Execute a PHP snippet against the live container, return structured result", "inputSchema": { "snippet": "string", "timeout_ms": "integer (default 5000, max 60000)", "allow_writes": "boolean (default false)", "allow_network": "boolean (default false)" } }Capturing the return value
PHP doesn't easily let you capture the
returnof a script. The wrapper inlines the snippet inside a function:Stderr is reserved for the structured result so stdout remains usable for
echo/printcapture.Altair\Eval\Encoderproduces the typed JSON form:{ "type": "object", "class": "App\\User\\User", "id": 42 } { "type": "array", "value": [...] } { "type": "string", "value": "..." } { "type": "null", "value": null } { "type": "iterable", "preview": [...], "exhausted": false, "size_hint": 100 }Objects render their
__debugInfo()if available; otherwise public-property snapshot, capped at depth 3. Iterables show the first 50 items (more on request).Shape
Acceptance criteria
bin/altair eval 'return 1 + 1;'returns2with a clean structured outputbin/altair eval 'return container(SomeBoundInterface::class);'resolves correctlybin/altair eval 'throw new \RuntimeException("nope");'returns the exception JSON, exit code 1--networktoggles--writestogglesdisable_functionspreventsexec,shell_exec,passthru,eval,assert,systemopen_basedirconfines filesystem writes to the project treeframework__evalMCP tool produces the structured response shape--unsafemode emits a mutation event (Examples library — .altair/examples/ + MCP tools for idiomatic patterns #76) every time it's useddisable_functionsactually disables what we expect)Out of scope
Dependencies
univeros/cli) — requireduniveros/mcp) — requiredNo new external deps — uses
proc_openfrom stdlib.Why this matters
Agents form hypotheses constantly. Without eval, every hypothesis becomes a write-a-test-and-run-it loop. With it, the loop collapses to milliseconds. The single biggest productivity multiplier for the agent's middle-loop reasoning.