Skip to content

feat: add sandbox execution mode for secure command handling#293

Open
igun997 wants to merge 14 commits into
Ingenimax:mainfrom
igun997:feat/sandbox-and-improvements
Open

feat: add sandbox execution mode for secure command handling#293
igun997 wants to merge 14 commits into
Ingenimax:mainfrom
igun997:feat/sandbox-and-improvements

Conversation

@igun997

@igun997 igun997 commented Feb 24, 2026

Copy link
Copy Markdown

Description

Add a new sandbox execution mode that provides secure container-based command execution for MCP servers. This feature includes:

  • CommandExecutor interface: Define a unified interface for command execution with sandbox support
  • DockerExecutor: Implement Docker-based container execution with lifecycle management
  • Warm container pool: Add round-robin container pooling for improved performance
  • Command allowlist: Implement fail-closed semantics for command validation
  • MCP integration: Wire sandbox executor into StdioServerConfig for stdio MCP servers
  • Agent integration: Add WithSandbox option to agent configuration
  • Tests: Add comprehensive unit and integration tests for sandbox functionality

Also includes code formatting and style improvements across the codebase.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Code builds successfully with make build
  • All tests pass with make test
  • Docker integration tests added

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

…calls

ToolMiddleware only implemented Run() but all LLM providers (OpenAI,
Anthropic, etc.) call Execute() on tools, meaning guardrails applied
via ToolMiddleware were completely bypassed. This adds the missing
Execute() method with the same guardrail pipeline processing pattern.
Add integration tests that require a running Docker daemon, guarded
behind the //go:build integration build tag. Tests cover container
creation and command execution, allowlist enforcement, and container
isolation with network mode "none".
Add the public SDK API for sandbox support:
- WithSandbox(executor) option on Agent to set a sandbox executor
- sandbox field on Agent struct for containerized MCP command execution
- Executor field on LazyMCPConfig to allow per-server sandbox config
- Sandbox field (*sandbox.Config) on MCPServerConfig for YAML config
- Executor field on mcp.LazyMCPServerConfig, wired through to StdioServerConfig
- Agent-level sandbox propagates to MCP configs that lack their own executor
Add implementation plan document and a working example that demonstrates
a Gemini agent executing commands inside a Docker sandbox container
with command allowlisting.
Apply consistent formatting across the codebase including:
- Aligned struct tags and field comments
- Added missing newlines at end of files
- Replaced fmt.Sprintf+WriteString with fmt.Fprintf
- Fixed comment alignment
- Removed trailing whitespace
The streaming GenerateWithToolsStream path was missing Items handling
for array-type parameters, causing Gemini API to reject function
declarations with "items: missing field" errors. The non-streaming
GenerateWithTools already had this handling.

@meidad meidad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the substantial work on this — the security defaults in buildContainerArgs (--read-only, --security-opt no-new-privileges, --cap-drop ALL, --pids-limit 64, --network none, mem/cpu limits) are exactly right, the fail-closed allowlist is correct, and gating the integration tests behind //go:build integration is the right call.

That said, this needs some rework before it can land. Requesting changes:

Architectural

  1. Layering: pkg/mcp and pkg/agent shouldn't import pkg/sandbox directly.

    • pkg/mcp/mcp.go line ~19 and pkg/agent/agent.go line ~24 both add "github.com/Ingenimax/agent-sdk-go/pkg/sandbox" to their imports. The CommandExecutor interface is two methods — define it in pkg/interfaces (or in pkg/mcp itself) and have pkg/sandbox implement it. Then a user who never wires sandboxing doesn't pull a Docker-aware package into their build graph.
  2. Please split out the cosmetic / formatting changes into a separate PR.

    • The description says "Also includes code formatting and style improvements across the codebase." gemini, deepseek, orchestration, prompts, websearch, executionplan, guardrails are all touched for what looks like newline / if formatting. Reviewing 47 files where ~37 are unrelated noise makes it very hard to vet the actual feature, and it greatly inflates conflict surface during rebase.

Security model gaps (need to be addressed in code or in docs that ship with the feature)

  1. Pool reuse means shared state across requests.

    • docker exec -i against the same container persists filesystem state and process artifacts between calls. With PoolSize > 1 and concurrent traffic, two requests can land in the same container and observe each other's state. This should be documented explicitly in the security model section, and ideally a "fresh container per execution" mode should exist as an option (at the cost of warm-pool latency) for stricter isolation. Without that, calling this a "sandbox" oversells it.
  2. docker exec -i lifecycle / orphan processes.

    • If the host-side docker exec is killed before the in-container process exits, the in-container process survives. Long-running MCP server processes started this way will leak. Pool.MarkUnhealthy exists but nothing calls it on exec failure — at minimum, wrap the executor's Command so that a failure to start (or an exec exit with a defined signal) marks the container unhealthy.
  3. Allowlist uses filepath.Base(command) (pkg/sandbox/allowlist.go:32).

    • "allow bash" matches /usr/bin/bash, ./bash, and any binary anywhere literally named bash. It's an OK basic check but should be documented (or made path-aware as a stricter mode).
  4. No image validation / no recommendation to pin by digest.

    • Config.Image defaults to ubuntu:22.04 but accepts anything; with no digest pinning, the container contents drift over time. Log the resolved image and recommend digest-pinned images in docs (ubuntu:22.04@sha256:…).

Code-level

  1. Container name collision risk (pkg/sandbox/docker.go:104).

    • fmt.Sprintf(\"agent-sandbox-%d-%d\", time.Now().UnixNano(), i) — two pools created in the same nanosecond collide. Low probability but a UUID or crypto/rand is trivial and removes the concern entirely.
  2. Pool.Acquire rotates to a non-Ready container and returns an error rather than skipping it (pkg/sandbox/pool.go:45).

    • Either skip unhealthy slots inside Acquire (try up to len(p.containers) times) or return a deterministic "no healthy container" error after one full rotation. Today, repeated calls just hit the next unhealthy slot in turn instead of failing fast or routing around them.

Stale

  1. 2.5 months old, last updated 2026-02-26. CI failed on the now-removed build (1.23.8) job. Please rebase against current main; non-trivial conflicts likely given how much landed since.

Path to merge

If items 1, 2, 7, 8, and 9 are addressed and 3-6 are clearly documented (or mode-switched) in code comments / a short pkg/sandbox/README.md, I'd be happy to take another look and merge.

@meidad

meidad commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

The sandbox feature itself is well-built — Docker executor, warm container pool, fail-closed allowlist, integration tests guarded by build tag. This is exactly the kind of security primitive the SDK needs.

Two things blocking merge:

Required

  1. Split the formatting cleanup out. This PR mixes the sandbox feature with a large reformatting pass across ~15 unrelated files (pkg/llm/deepseek, pkg/agentconfig, pkg/prompts, etc.). That makes the diff hard to review and the git history harder to bisect. Please open a separate PR for the formatting changes so this one is reviewable on its own merits.

  2. Rebase against current main. This is 4 months old — pkg/mcp/mcp.go and pkg/mcp/lazy.go have changed significantly and there will be conflicts.

Questions

  1. The DockerExecutor pulls/starts containers inline during Execute(). Is there a startup timeout? If the container image isn't cached locally, the first tool call could block for a long time.

  2. The warm pool uses round-robin selection — is there any concern about concurrent agents sharing pool slots and seeing each other's filesystem state between calls?

Once the split + rebase lands I can do a focused review of just the sandbox code and move quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants