feat: add sandbox execution mode for secure command handling#293
feat: add sandbox execution mode for secure command handling#293igun997 wants to merge 14 commits into
Conversation
…calls ToolMiddleware only implemented Run() but all LLM providers (OpenAI, Anthropic, etc.) call Execute() on tools, meaning guardrails applied via ToolMiddleware were completely bypassed. This adds the missing Execute() method with the same guardrail pipeline processing pattern.
Add integration tests that require a running Docker daemon, guarded behind the //go:build integration build tag. Tests cover container creation and command execution, allowlist enforcement, and container isolation with network mode "none".
Add the public SDK API for sandbox support: - WithSandbox(executor) option on Agent to set a sandbox executor - sandbox field on Agent struct for containerized MCP command execution - Executor field on LazyMCPConfig to allow per-server sandbox config - Sandbox field (*sandbox.Config) on MCPServerConfig for YAML config - Executor field on mcp.LazyMCPServerConfig, wired through to StdioServerConfig - Agent-level sandbox propagates to MCP configs that lack their own executor
Add implementation plan document and a working example that demonstrates a Gemini agent executing commands inside a Docker sandbox container with command allowlisting.
Apply consistent formatting across the codebase including: - Aligned struct tags and field comments - Added missing newlines at end of files - Replaced fmt.Sprintf+WriteString with fmt.Fprintf - Fixed comment alignment - Removed trailing whitespace
The streaming GenerateWithToolsStream path was missing Items handling for array-type parameters, causing Gemini API to reject function declarations with "items: missing field" errors. The non-streaming GenerateWithTools already had this handling.
meidad
left a comment
There was a problem hiding this comment.
Thanks for the substantial work on this — the security defaults in buildContainerArgs (--read-only, --security-opt no-new-privileges, --cap-drop ALL, --pids-limit 64, --network none, mem/cpu limits) are exactly right, the fail-closed allowlist is correct, and gating the integration tests behind //go:build integration is the right call.
That said, this needs some rework before it can land. Requesting changes:
Architectural
-
Layering:
pkg/mcpandpkg/agentshouldn't importpkg/sandboxdirectly.- pkg/mcp/mcp.go line ~19 and pkg/agent/agent.go line ~24 both add
"github.com/Ingenimax/agent-sdk-go/pkg/sandbox"to their imports. TheCommandExecutorinterface is two methods — define it inpkg/interfaces(or inpkg/mcpitself) and havepkg/sandboximplement it. Then a user who never wires sandboxing doesn't pull a Docker-aware package into their build graph.
- pkg/mcp/mcp.go line ~19 and pkg/agent/agent.go line ~24 both add
-
Please split out the cosmetic / formatting changes into a separate PR.
- The description says "Also includes code formatting and style improvements across the codebase." gemini, deepseek, orchestration, prompts, websearch, executionplan, guardrails are all touched for what looks like newline /
ifformatting. Reviewing 47 files where ~37 are unrelated noise makes it very hard to vet the actual feature, and it greatly inflates conflict surface during rebase.
- The description says "Also includes code formatting and style improvements across the codebase." gemini, deepseek, orchestration, prompts, websearch, executionplan, guardrails are all touched for what looks like newline /
Security model gaps (need to be addressed in code or in docs that ship with the feature)
-
Pool reuse means shared state across requests.
docker exec -iagainst the same container persists filesystem state and process artifacts between calls. WithPoolSize > 1and concurrent traffic, two requests can land in the same container and observe each other's state. This should be documented explicitly in the security model section, and ideally a "fresh container per execution" mode should exist as an option (at the cost of warm-pool latency) for stricter isolation. Without that, calling this a "sandbox" oversells it.
-
docker exec -ilifecycle / orphan processes.- If the host-side
docker execis killed before the in-container process exits, the in-container process survives. Long-running MCP server processes started this way will leak.Pool.MarkUnhealthyexists but nothing calls it on exec failure — at minimum, wrap the executor'sCommandso that a failure to start (or an exec exit with a defined signal) marks the container unhealthy.
- If the host-side
-
Allowlist uses
filepath.Base(command)(pkg/sandbox/allowlist.go:32).- "allow
bash" matches/usr/bin/bash,./bash, and any binary anywhere literally namedbash. It's an OK basic check but should be documented (or made path-aware as a stricter mode).
- "allow
-
No image validation / no recommendation to pin by digest.
Config.Imagedefaults toubuntu:22.04but accepts anything; with no digest pinning, the container contents drift over time. Log the resolved image and recommend digest-pinned images in docs (ubuntu:22.04@sha256:…).
Code-level
-
Container name collision risk (pkg/sandbox/docker.go:104).
fmt.Sprintf(\"agent-sandbox-%d-%d\", time.Now().UnixNano(), i)— two pools created in the same nanosecond collide. Low probability but a UUID orcrypto/randis trivial and removes the concern entirely.
-
Pool.Acquirerotates to a non-Ready container and returns an error rather than skipping it (pkg/sandbox/pool.go:45).- Either skip unhealthy slots inside
Acquire(try up tolen(p.containers)times) or return a deterministic "no healthy container" error after one full rotation. Today, repeated calls just hit the next unhealthy slot in turn instead of failing fast or routing around them.
- Either skip unhealthy slots inside
Stale
- 2.5 months old, last updated 2026-02-26. CI failed on the now-removed
build (1.23.8)job. Please rebase against current main; non-trivial conflicts likely given how much landed since.
Path to merge
If items 1, 2, 7, 8, and 9 are addressed and 3-6 are clearly documented (or mode-switched) in code comments / a short pkg/sandbox/README.md, I'd be happy to take another look and merge.
|
The sandbox feature itself is well-built — Docker executor, warm container pool, fail-closed allowlist, integration tests guarded by build tag. This is exactly the kind of security primitive the SDK needs. Two things blocking merge: Required
Questions
Once the split + rebase lands I can do a focused review of just the sandbox code and move quickly. |
Description
Add a new sandbox execution mode that provides secure container-based command execution for MCP servers. This feature includes:
Also includes code formatting and style improvements across the codebase.
Type of change
How Has This Been Tested?
make buildmake testChecklist: