feat: add sandbox execution mode for secure command handling by igun997 · Pull Request #293 · Ingenimax/agent-sdk-go

igun997 · 2026-02-24T19:35:04Z

Description

Add a new sandbox execution mode that provides secure container-based command execution for MCP servers. This feature includes:

CommandExecutor interface: Define a unified interface for command execution with sandbox support
DockerExecutor: Implement Docker-based container execution with lifecycle management
Warm container pool: Add round-robin container pooling for improved performance
Command allowlist: Implement fail-closed semantics for command validation
MCP integration: Wire sandbox executor into StdioServerConfig for stdio MCP servers
Agent integration: Add WithSandbox option to agent configuration
Tests: Add comprehensive unit and integration tests for sandbox functionality

Also includes code formatting and style improvements across the codebase.

Type of change

New feature (non-breaking change which adds functionality)
Refactoring (no functional changes)

How Has This Been Tested?

Code builds successfully with make build
All tests pass with make test
Docker integration tests added

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
New and existing unit tests pass locally with my changes

…calls ToolMiddleware only implemented Run() but all LLM providers (OpenAI, Anthropic, etc.) call Execute() on tools, meaning guardrails applied via ToolMiddleware were completely bypassed. This adds the missing Execute() method with the same guardrail pipeline processing pattern.

Add integration tests that require a running Docker daemon, guarded behind the //go:build integration build tag. Tests cover container creation and command execution, allowlist enforcement, and container isolation with network mode "none".

Add the public SDK API for sandbox support: - WithSandbox(executor) option on Agent to set a sandbox executor - sandbox field on Agent struct for containerized MCP command execution - Executor field on LazyMCPConfig to allow per-server sandbox config - Sandbox field (*sandbox.Config) on MCPServerConfig for YAML config - Executor field on mcp.LazyMCPServerConfig, wired through to StdioServerConfig - Agent-level sandbox propagates to MCP configs that lack their own executor

Add implementation plan document and a working example that demonstrates a Gemini agent executing commands inside a Docker sandbox container with command allowlisting.

Apply consistent formatting across the codebase including: - Aligned struct tags and field comments - Added missing newlines at end of files - Replaced fmt.Sprintf+WriteString with fmt.Fprintf - Fixed comment alignment - Removed trailing whitespace

The streaming GenerateWithToolsStream path was missing Items handling for array-type parameters, causing Gemini API to reject function declarations with "items: missing field" errors. The non-streaming GenerateWithTools already had this handling.

meidad

Thanks for the substantial work on this — the security defaults in buildContainerArgs (--read-only, --security-opt no-new-privileges, --cap-drop ALL, --pids-limit 64, --network none, mem/cpu limits) are exactly right, the fail-closed allowlist is correct, and gating the integration tests behind //go:build integration is the right call.

That said, this needs some rework before it can land. Requesting changes:

Architectural

Layering: pkg/mcp and pkg/agent shouldn't import pkg/sandbox directly.
- pkg/mcp/mcp.go line ~19 and pkg/agent/agent.go line ~24 both add "github.com/Ingenimax/agent-sdk-go/pkg/sandbox" to their imports. The CommandExecutor interface is two methods — define it in pkg/interfaces (or in pkg/mcp itself) and have pkg/sandbox implement it. Then a user who never wires sandboxing doesn't pull a Docker-aware package into their build graph.
Please split out the cosmetic / formatting changes into a separate PR.
- The description says "Also includes code formatting and style improvements across the codebase." gemini, deepseek, orchestration, prompts, websearch, executionplan, guardrails are all touched for what looks like newline / if formatting. Reviewing 47 files where ~37 are unrelated noise makes it very hard to vet the actual feature, and it greatly inflates conflict surface during rebase.

Security model gaps (need to be addressed in code or in docs that ship with the feature)

Pool reuse means shared state across requests.
- docker exec -i against the same container persists filesystem state and process artifacts between calls. With PoolSize > 1 and concurrent traffic, two requests can land in the same container and observe each other's state. This should be documented explicitly in the security model section, and ideally a "fresh container per execution" mode should exist as an option (at the cost of warm-pool latency) for stricter isolation. Without that, calling this a "sandbox" oversells it.
docker exec -i lifecycle / orphan processes.
- If the host-side docker exec is killed before the in-container process exits, the in-container process survives. Long-running MCP server processes started this way will leak. Pool.MarkUnhealthy exists but nothing calls it on exec failure — at minimum, wrap the executor's Command so that a failure to start (or an exec exit with a defined signal) marks the container unhealthy.
Allowlist uses filepath.Base(command) (pkg/sandbox/allowlist.go:32).
- "allow bash" matches /usr/bin/bash, ./bash, and any binary anywhere literally named bash. It's an OK basic check but should be documented (or made path-aware as a stricter mode).
No image validation / no recommendation to pin by digest.
- Config.Image defaults to ubuntu:22.04 but accepts anything; with no digest pinning, the container contents drift over time. Log the resolved image and recommend digest-pinned images in docs (ubuntu:22.04@sha256:…).

Code-level

Container name collision risk (pkg/sandbox/docker.go:104).
- fmt.Sprintf(\"agent-sandbox-%d-%d\", time.Now().UnixNano(), i) — two pools created in the same nanosecond collide. Low probability but a UUID or crypto/rand is trivial and removes the concern entirely.
Pool.Acquire rotates to a non-Ready container and returns an error rather than skipping it (pkg/sandbox/pool.go:45).
- Either skip unhealthy slots inside Acquire (try up to len(p.containers) times) or return a deterministic "no healthy container" error after one full rotation. Today, repeated calls just hit the next unhealthy slot in turn instead of failing fast or routing around them.

Stale

2.5 months old, last updated 2026-02-26. CI failed on the now-removed build (1.23.8) job. Please rebase against current main; non-trivial conflicts likely given how much landed since.

Path to merge

If items 1, 2, 7, 8, and 9 are addressed and 3-6 are clearly documented (or mode-switched) in code comments / a short pkg/sandbox/README.md, I'd be happy to take another look and merge.

meidad · 2026-06-12T01:41:07Z

The sandbox feature itself is well-built — Docker executor, warm container pool, fail-closed allowlist, integration tests guarded by build tag. This is exactly the kind of security primitive the SDK needs.

Two things blocking merge:

Required

Split the formatting cleanup out. This PR mixes the sandbox feature with a large reformatting pass across ~15 unrelated files (pkg/llm/deepseek, pkg/agentconfig, pkg/prompts, etc.). That makes the diff hard to review and the git history harder to bisect. Please open a separate PR for the formatting changes so this one is reviewable on its own merits.
Rebase against current main. This is 4 months old — pkg/mcp/mcp.go and pkg/mcp/lazy.go have changed significantly and there will be conflicts.

Questions

The DockerExecutor pulls/starts containers inline during Execute(). Is there a startup timeout? If the container image isn't cached locally, the first tool call could block for a long time.
The warm pool uses round-robin selection — is there any concern about concurrent agents sharing pool slots and seeing each other's filesystem state between calls?

Once the split + rebase lands I can do a focused review of just the sandbox code and move quickly.

igun997 added 14 commits February 19, 2026 23:11

docs: add sandbox container execution design plan

988cd48

fix: address code review feedback for ToolMiddleware

5bf0e3d

feat(sandbox): add config structs and error types

f5f6530

feat(sandbox): add CommandExecutor interface and LocalExecutor

e24604a

feat(sandbox): add command allowlist with fail-closed semantics

52915c0

feat(sandbox): add warm container pool with round-robin selection

159f0a9

feat(sandbox): add DockerExecutor with container lifecycle management

87016e7

test(sandbox): add Docker integration tests

9a16edd

Add integration tests that require a running Docker daemon, guarded behind the //go:build integration build tag. Tests cover container creation and command execution, allowlist enforcement, and container isolation with network mode "none".

feat(mcp): integrate sandbox CommandExecutor into StdioServerConfig

204111a

docs: add sandbox implementation plan and demo example

719f7d1

Add implementation plan document and a working example that demonstrates a Gemini agent executing commands inside a Docker sandbox container with command allowlisting.

meidad requested changes May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sandbox execution mode for secure command handling#293

feat: add sandbox execution mode for secure command handling#293
igun997 wants to merge 14 commits into
Ingenimax:mainfrom
igun997:feat/sandbox-and-improvements

igun997 commented Feb 24, 2026

Uh oh!

meidad left a comment

Uh oh!

meidad commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

igun997 commented Feb 24, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

meidad left a comment

Choose a reason for hiding this comment

Architectural

Security model gaps (need to be addressed in code or in docs that ship with the feature)

Code-level

Stale

Path to merge

Uh oh!

meidad commented Jun 12, 2026

Required

Questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants