Code Sandbox

CortexPrism executes code in isolated environments to protect the host system from potentially harmful or buggy code generated by LLMs.

Sandbox workspace

Docker Runtime (Recommended)

docker run --rm \
  --network=none \
  --memory=256m \
  --cpus=0.5 \
  --pids-limit=64 \
  --security-opt=no-new-privileges \
  <image> <interpreter> /tmp/code.<ext>

Security Properties

No network access — --network=none
Resource limits — 256MB memory, 0.5 CPU, 64 PIDs max
No privilege escalation — --security-opt=no-new-privileges
Ephemeral — Container destroyed immediately after execution (--rm)
No host mounts — No filesystem access to the host machine

Limits

Timeout: 30 seconds
Max output: 64KB (configurable via maxOutputBytes)

Subprocess Fallback

When Docker is not available (docker info fails), CortexPrism falls back to direct subprocess execution. This provides less isolation but retains policy gating through the security validator.

gVisor Support

When gVisor is installed, runInDocker passes --runtime=runsc for kernel-level syscall filtering. getAvailableRuntime() auto-detects gVisor availability (cached result) and prefers it over plain Docker.

Supported Languages

Language	Docker Image
Python	`python:3.12-alpine`
JavaScript	`node:22-alpine`
TypeScript	`denoland/deno:alpine`
Bash	`alpine:3.20`
Ruby	`ruby:3.3-alpine`
Go	`golang:1.22-alpine`
Rust	`rust:1.78-alpine`

Auto-Fix Loop

When code execution fails, CortexPrism can automatically fix and retry:

runInSandbox(code)
  → exit != 0?
     → LLM: "Fix this error: <stderr>\n\nCode:\n<code>"
     → extract code from LLM response
     → runInSandbox(fixedCode)
     → repeat up to maxRounds (default 4)

Enable with --fix flag on cortex run or configure per session.

CLI

cortex sandbox run script.py                    # Docker sandbox
cortex sandbox run script.py --no-sandbox       # Subprocess mode
cortex sandbox run script.py --fix              # Auto-fix on failure
cortex sandbox run script.py --fix --max-fix 6  # Up to 6 fix attempts
cortex sandbox run script.py --sandbox-debug       # Enable debug logging

Agent Tool

The code_exec tool lets agents execute code in the sandbox. The tool description explicitly warns that:

The sandbox has NO access to host files or workspace
No package managers are available in the sandbox
Use file tools for all file operations

Sandbox Configuration

Configurable via ~/.cortex/config.json:

{
  "sandbox": {
    "runtime": "docker",
    "languages": ["python", "javascript", "typescript", "bash", "ruby", "go", "rust"],
    "timeout": 30000,
    "memoryLimit": "512m",
    "outputLimit": 102400
  }
}

runtime: docker | gvisor (kernel-level syscall filtering via runsc) | subprocess.

REST API

Method	Path	Description
`POST`	`/api/code/exec`	Execute code in sandbox
`GET`	`/api/sandbox/config`	Sandbox configuration (runtime, Docker/gVisor availability, timeout/memory limits, supported languages)
`PUT`	`/api/sandbox/config`	Update sandbox config
`GET`	`/api/sandbox/backends`	Available backends (docker, gvisor, e2b, daytona) with API-key-based availability
`GET`	`/api/sandbox/debug`	Debug status
`PUT`	`/api/sandbox/debug`	Toggle sandbox debug logging

Environment Snapshot & Replication (#79)

Full environment capture and replay system:

Method	Path	Description
`POST`	`/api/sandbox/snapshots`	Capture environment snapshot (env vars, dependencies, git state, sandbox config) to JSON + DB
`GET`	`/api/sandbox/snapshots`	List snapshots with optional session filter and sensitive-key masking
`GET`	`/api/sandbox/snapshots/:id`	Single snapshot detail with masked env values
`POST`	`/api/sandbox/snapshots/:id/replicate`	Replicate snapshot to target workspace (writes commented `.cortex-env-replication.sh`)
`GET`	`/api/sandbox/snapshots/compare?id1=&id2=`	Diff two snapshots (env vars + dependencies)
`DELETE`	`/api/sandbox/snapshots/:id`	Delete snapshot (file + DB row)

Workspace Context Snapshot (#240)

Point-in-time workspace state capture:

Method	Path	Description
`POST`	`/api/workspace/snapshots`	Capture file tree with SHA-256 hashes, git state, memory context, tool state
`GET`	`/api/workspace/snapshots`	List snapshots with session filter
`GET`	`/api/workspace/snapshots/:id`	Single snapshot with full file tree
`POST`	`/api/workspace/snapshots/:id/restore`	Write restore manifest (`.cortex-ws-restore.json`)
`GET`	`/api/workspace/snapshots/diff?id1=&id2=`	Diff file trees (added/removed/modified)
`DELETE`	`/api/workspace/snapshots/:id`	Delete snapshot

Files >10 MB skipped with skipped:too-large:<size> placeholder hash. Excludes .git, node_modules, __pycache__, .DS_Store from scans.

Dev Environment as Code (#232)

Serialize environment config into versioned manifests:

Method	Path	Description
`POST`	`/api/sandbox/dev-env/generate`	Auto-detect language, dependencies, setup commands; generate `DevEnvManifest`
`GET`	`/api/sandbox/dev-env/manifest?workspacePath=`	Load existing `cortex-devenv.json`
`PUT`	`/api/sandbox/dev-env/manifest`	Save/update manifest with validation
`GET`	`/api/sandbox/dev-env/list`	List all stored manifests

Auto-detection of JavaScript (npm/yarn/pnpm/bun), Python (pip), Rust (cargo), Go, Ruby (bundler). Unique default names via SHA-256 hash of workspace path to prevent collisions.

Bug Reproduction Studio (#230)

Reproduce issues as sandbox test runs:

Method	Path	Description
`POST`	`/api/sandbox/bug-repro`	Create bug repro run from issue title, description, language, and code
`GET`	`/api/sandbox/bug-repro`	List runs with optional status/session filters
`GET`	`/api/sandbox/bug-repro/:id`	Single run detail with result
`POST`	`/api/sandbox/bug-repro/:id/run`	Execute repro in sandbox (docker/subprocess)
`DELETE`	`/api/sandbox/bug-repro/:id`	Delete run

Status lifecycle: queued → running → passed | failed | error. Error handling wraps runInSandbox with try/catch for runtime failures.

Module Map

src/sandbox/
├── executor.ts            # Core execution engine (Docker/subprocess/gVisor/e2b/daytona)
├── agent-sandbox.ts       # Docker CLI args builder for agent workspace sandboxes
├── autofix.ts             # LLM auto-debug loop (up to 4 fix rounds)
├── replication.ts         # Environment snapshots: capture, replicate, compare
├── workspace-snapshot.ts  # Workspace snapshots: file tree, git state, restore manifest
├── dev-env-code.ts        # Dev env manifests: generate, validate, save, load
├── bug-repro.ts           # Bug reproduction: create, execute, list results
├── git-capture.ts         # Shared git state capture (branch, HEAD, porcelain status)
├── dependency-detect.ts   # Shared dependency detection (JS/Python/Rust/Go/Ruby)
├── environment.ts         # Sandbox environment provisioning with language auto-detection
├── logger.ts              # Namespaced debug logging (toggleable via env var/CLI/API/WebUI)
├── snapshot-types.ts      # TypeScript interfaces for all snapshot/manifest/run types
└── mod.ts                 # Barrel export

Uh oh!

Uh oh!

Code Sandbox

Code Sandbox

Docker Runtime (Recommended)

Security Properties

Limits

Subprocess Fallback

gVisor Support

Supported Languages

Auto-Fix Loop

CLI

Agent Tool

Sandbox Configuration

REST API

Environment Snapshot & Replication (#79)

Workspace Context Snapshot (#240)

Dev Environment as Code (#232)

Bug Reproduction Studio (#230)

Module Map

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!