Skip to content

Code Sandbox

scarecr0w12 edited this page Jun 25, 2026 · 8 revisions

Code Sandbox

CortexPrism executes code in isolated environments to protect the host system from potentially harmful or buggy code generated by LLMs.

Sandbox workspace

Docker Runtime (Recommended)

docker run --rm \
  --network=none \
  --memory=256m \
  --cpus=0.5 \
  --pids-limit=64 \
  --security-opt=no-new-privileges \
  <image> <interpreter> /tmp/code.<ext>

Security Properties

  • No network access--network=none
  • Resource limits — 256MB memory, 0.5 CPU, 64 PIDs max
  • No privilege escalation--security-opt=no-new-privileges
  • Ephemeral — Container destroyed immediately after execution (--rm)
  • No host mounts — No filesystem access to the host machine

Limits

  • Timeout: 30 seconds
  • Max output: 64KB (configurable via maxOutputBytes)

Subprocess Fallback

When Docker is not available (docker info fails), CortexPrism falls back to direct subprocess execution. This provides less isolation but retains policy gating through the security validator.

gVisor Support

When gVisor is installed, runInDocker passes --runtime=runsc for kernel-level syscall filtering. getAvailableRuntime() auto-detects gVisor availability (cached result) and prefers it over plain Docker.

Supported Languages

Language Docker Image
Python python:3.12-alpine
JavaScript node:22-alpine
TypeScript denoland/deno:alpine
Bash alpine:3.20
Ruby ruby:3.3-alpine
Go golang:1.22-alpine
Rust rust:1.78-alpine

Auto-Fix Loop

When code execution fails, CortexPrism can automatically fix and retry:

runInSandbox(code)
  → exit != 0?
     → LLM: "Fix this error: <stderr>\n\nCode:\n<code>"
     → extract code from LLM response
     → runInSandbox(fixedCode)
     → repeat up to maxRounds (default 4)

Enable with --fix flag on cortex run or configure per session.

CLI

cortex sandbox run script.py                    # Docker sandbox
cortex sandbox run script.py --no-sandbox       # Subprocess mode
cortex sandbox run script.py --fix              # Auto-fix on failure
cortex sandbox run script.py --fix --max-fix 6  # Up to 6 fix attempts
cortex sandbox run script.py --sandbox-debug       # Enable debug logging

Agent Tool

The code_exec tool lets agents execute code in the sandbox. The tool description explicitly warns that:

  • The sandbox has NO access to host files or workspace
  • No package managers are available in the sandbox
  • Use file tools for all file operations

Sandbox Configuration

Configurable via ~/.cortex/config.json:

{
  "sandbox": {
    "runtime": "docker",
    "languages": ["python", "javascript", "typescript", "bash", "ruby", "go", "rust"],
    "timeout": 30000,
    "memoryLimit": "512m",
    "outputLimit": 102400
  }
}

runtime: docker | gvisor (kernel-level syscall filtering via runsc) | subprocess.

REST API

Method Path Description
POST /api/code/exec Execute code in sandbox
GET /api/sandbox/config Sandbox configuration (runtime, Docker/gVisor availability, timeout/memory limits, supported languages)
PUT /api/sandbox/config Update sandbox config
GET /api/sandbox/backends Available backends (docker, gvisor, e2b, daytona) with API-key-based availability
GET /api/sandbox/debug Debug status
PUT /api/sandbox/debug Toggle sandbox debug logging

Environment Snapshot & Replication (#79)

Full environment capture and replay system:

Method Path Description
POST /api/sandbox/snapshots Capture environment snapshot (env vars, dependencies, git state, sandbox config) to JSON + DB
GET /api/sandbox/snapshots List snapshots with optional session filter and sensitive-key masking
GET /api/sandbox/snapshots/:id Single snapshot detail with masked env values
POST /api/sandbox/snapshots/:id/replicate Replicate snapshot to target workspace (writes commented .cortex-env-replication.sh)
GET /api/sandbox/snapshots/compare?id1=&id2= Diff two snapshots (env vars + dependencies)
DELETE /api/sandbox/snapshots/:id Delete snapshot (file + DB row)

Security: Env key validation (/^[A-Za-z_][A-Za-z0-9_]*$/), value length limit (1024 chars), sensitive env value masking for keys matching API_KEY|TOKEN|SECRET|PASSWORD|AUTH|CREDENTIAL|PRIVATE_KEY|ACCESS_KEY patterns. Shell-injection-safe env var replication with fully escaped $, backtick, !, \.

Workspace Context Snapshot (#240)

Point-in-time workspace state capture:

Method Path Description
POST /api/workspace/snapshots Capture file tree with SHA-256 hashes, git state, memory context, tool state
GET /api/workspace/snapshots List snapshots with session filter
GET /api/workspace/snapshots/:id Single snapshot with full file tree
POST /api/workspace/snapshots/:id/restore Write restore manifest (.cortex-ws-restore.json)
GET /api/workspace/snapshots/diff?id1=&id2= Diff file trees (added/removed/modified)
DELETE /api/workspace/snapshots/:id Delete snapshot

Files >10 MB skipped with skipped:too-large:<size> placeholder hash. Excludes .git, node_modules, __pycache__, .DS_Store from scans.

Dev Environment as Code (#232)

Serialize environment config into versioned manifests:

Method Path Description
POST /api/sandbox/dev-env/generate Auto-detect language, dependencies, setup commands; generate DevEnvManifest
GET /api/sandbox/dev-env/manifest?workspacePath= Load existing cortex-devenv.json
PUT /api/sandbox/dev-env/manifest Save/update manifest with validation
GET /api/sandbox/dev-env/list List all stored manifests

Auto-detection of JavaScript (npm/yarn/pnpm/bun), Python (pip), Rust (cargo), Go, Ruby (bundler). Unique default names via SHA-256 hash of workspace path to prevent collisions.

Bug Reproduction Studio (#230)

Reproduce issues as sandbox test runs:

Method Path Description
POST /api/sandbox/bug-repro Create bug repro run from issue title, description, language, and code
GET /api/sandbox/bug-repro List runs with optional status/session filters
GET /api/sandbox/bug-repro/:id Single run detail with result
POST /api/sandbox/bug-repro/:id/run Execute repro in sandbox (docker/subprocess)
DELETE /api/sandbox/bug-repro/:id Delete run

Status lifecycle: queuedrunningpassed | failed | error. Error handling wraps runInSandbox with try/catch for runtime failures.

Module Map

src/sandbox/
├── executor.ts            # Core execution engine (Docker/subprocess/gVisor/e2b/daytona)
├── agent-sandbox.ts       # Docker CLI args builder for agent workspace sandboxes
├── autofix.ts             # LLM auto-debug loop (up to 4 fix rounds)
├── replication.ts         # Environment snapshots: capture, replicate, compare
├── workspace-snapshot.ts  # Workspace snapshots: file tree, git state, restore manifest
├── dev-env-code.ts        # Dev env manifests: generate, validate, save, load
├── bug-repro.ts           # Bug reproduction: create, execute, list results
├── git-capture.ts         # Shared git state capture (branch, HEAD, porcelain status)
├── dependency-detect.ts   # Shared dependency detection (JS/Python/Rust/Go/Ruby)
├── environment.ts         # Sandbox environment provisioning with language auto-detection
├── logger.ts              # Namespaced debug logging (toggleable via env var/CLI/API/WebUI)
├── snapshot-types.ts      # TypeScript interfaces for all snapshot/manifest/run types
└── mod.ts                 # Barrel export

See Also

  • Security — Sandbox isolation in the security model
  • Built-in Tools — The code_exec tool and sandbox tools

Clone this wiki locally