Run agent code in an isolated Linux virtual machine — safely, locally, and with full dev environment capabilities.
The Sandbox is a shared Linux container powered by Apple's Containerization framework. It gives every Osaurus agent access to a real Linux environment with shell, package managers, compilers, and file system access — all running natively on Apple Silicon with zero risk to your Mac.
Agents can run arbitrary code, install packages, and modify files without any risk to the host macOS system. The VM is a disposable, resettable environment. If something goes wrong, reset the container and start fresh — your Mac is never affected.
Agents gain a full Linux environment with shell access, Python (pip), Node.js (npm), system packages (apk), compilers, and standard POSIX tools. This far exceeds what macOS-sandboxed tools can offer, enabling agents to build, test, and run real software.
Each agent gets its own Linux user and home directory. One agent's files, processes, and installed packages cannot interfere with another's. Run multiple specialized agents simultaneously — a Python data analyst, a Node.js web developer, and a system administration agent — without cross-contamination.
Sandbox plugins are simple JSON recipes. No compiled dylibs, no Xcode, no code signing required. Anyone can write, share, and import plugins that install dependencies, seed files, and define custom tools — dramatically lowering the barrier to extending agent capabilities.
Everything runs on-device using Apple's Virtualization framework. No Docker, no cloud VMs, no network dependency. The container boots in seconds and runs with native performance on Apple Silicon.
Despite running in isolation, agents inside the VM retain full access to Osaurus services — inference, memory, secrets, agent dispatch, and events — via a vsock bridge. The sandbox is isolated but not disconnected.
- macOS 26+ (Tahoe) — required for Apple's Containerization framework
- Apple Silicon (M1 or newer)
Open the Management window (⌘ Shift M) → Sandbox.
Click Provision to download the Linux kernel and initial filesystem, then boot the container. This is a one-time setup that takes about a minute.
Once the container is running, sandbox tools are automatically registered for the active agent. The agent can now execute commands, read/write files, install packages, and more — all inside the VM.
Switch to the Plugins tab to browse, import, or create sandbox plugins that extend your agents with custom tools.
┌──────────────────────────────────────────────────────────────┐
│ macOS Host │
│ │
│ ┌──────────────┐ ┌──────────────────────────────┐ │
│ │ Osaurus │ │ Linux VM (Alpine) │ │
│ │ │ │ │ │
│ │ SandboxMgr ─┼─────┤→ /workspace (VirtioFS) │ │
│ │ │ │→ /output (VirtioFS) │ │
│ │ HostAPI ←──┼─vsock─→ /run/osaurus-bridge.sock │ │
│ │ Bridge │ │ │ │
│ │ │ │ agent-alice (Linux user) │ │
│ │ ToolReg ←──┼─────┤ agent-bob (Linux user) │ │
│ │ │ │ ... │ │
│ └──────────────┘ └──────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Key components:
| Component | Description |
|---|---|
| Linux VM | Alpine Linux with Kata Containers 3.17.0 ARM64 kernel, 8 GiB root filesystem |
| VirtioFS Mounts | /workspace maps to ~/.osaurus/container/workspace/, /output maps to ~/.osaurus/container/output/ |
| NAT Networking | Container gets 10.0.2.15/24 via VZNATNetworkDeviceAttachment |
| Vsock Bridge | Unix socket relayed via vsock connects the container to the Host API Bridge server |
| Per-Agent Users | Each agent gets a Linux user agent-{name} with home at /workspace/agents/{name}/ |
| Host API Bridge | HTTP server on the host, accessible from the container via osaurus-host CLI shim |
Configure the container via the Management window → Sandbox → Container tab → Resources section.
| Setting | Range | Default | Description |
|---|---|---|---|
| CPUs | 1–8 | 2 | Virtual CPU cores allocated to the VM |
| Memory | 1–8 GB | 2 GB | RAM allocated to the VM |
| Network | outbound / none | outbound | NAT networking for outbound internet access |
| Auto-Start | on / off | on | Automatically start the container when Osaurus launches |
Changes require a container restart to take effect.
Config file: ~/.osaurus/config/sandbox.json
{
"autoStart": true,
"cpus": 2,
"memoryGB": 2,
"network": "outbound"
}When the container is running, sandbox tools are automatically registered for the active agent. Read-only tools are always available. Write and execution tools require autonomous_exec to be enabled on the agent.
| Tool | Description |
|---|---|
sandbox_read_file |
Read a file's contents from the sandbox (supports line ranges, tail, char cap) |
sandbox_list_directory |
List files and directories (supports recursive listing via tree) |
sandbox_search_files |
Search file contents with ripgrep (regex, glob filters, context lines, case-insensitive) |
sandbox_find_files |
Find files by name glob pattern (e.g. *.py, test_*) |
| Tool | Description |
|---|---|
sandbox_write_file |
Write content to a file (creates parent directories) |
sandbox_edit_file |
Edit a file by exact string replacement — old_string must match exactly once |
sandbox_move |
Move or rename files and directories |
sandbox_delete |
Delete files or directories |
sandbox_exec |
Run a shell command (configurable timeout, max 300s) |
sandbox_exec_background |
Start a background process with log file output |
sandbox_install |
Install system packages via apk (runs as root) |
sandbox_pip_install |
Install Python packages via pip install --user |
sandbox_npm_install |
Install Node.js packages via npm install |
sandbox_run_script |
Run a script file (auto-detects Python, Node, Bash, etc.) |
sandbox_secret_check |
Check whether a secret exists for this agent (never reveals the value) |
sandbox_secret_set |
Store a secret securely — pass value directly or omit to prompt the user |
sandbox_plugin_register |
Register an agent-created plugin (requires pluginCreate permission) |
share_artifact is a global built-in (registered in ToolRegistry) and is the only way for sandbox-generated content to reach the chat thread. It's not in this sandbox-specific list because it's available everywhere, not just in sandbox mode.
All file paths are validated on the host side before container execution by SandboxPathSanitizer, which now returns structured rejection reasons (empty, traversal, null byte, dangerous character, outside allowed roots). Tools surface the reason to the model in an invalid_args envelope so the next call self-corrects instead of retrying with the same bad path.
Every sandbox tool returns a ToolEnvelope JSON string. Success payloads in result:
- Read/inspect:
{path, content, size}(+ optionalstart_line/line_count/tail_lines/max_chars) - Exec family:
{stdout, stderr, exit_code, cwd}—sandbox_run_scriptaddscombined: stdout+stderrandlanguage. - Install family:
{installed, exit_code, output}on success;execution_errorenvelope on non-zero exit. - Mutations:
{path, ...}/{source, destination}/{deleted, recursive}.
Failures use kind: invalid_args with field pointing at the offending argument (path, cwd, content, etc.) so the model can self-correct on the next turn.
Sandbox plugins are JSON recipes that extend agent capabilities inside the container. They can install system dependencies, seed files, define custom tools, and configure secrets — all without compiling code.
{
"name": "Python Data Tools",
"description": "Data analysis toolkit with pandas and matplotlib",
"version": "1.0.0",
"author": "your-name",
"dependencies": ["python3", "py3-pip"],
"setup": "pip install --user pandas matplotlib seaborn",
"files": {
"helpers.py": "import pandas as pd\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n"
},
"tools": [
{
"id": "analyze_csv",
"description": "Load a CSV file and return summary statistics",
"parameters": {
"file": {
"type": "string",
"description": "Path to the CSV file"
}
},
"run": "cd $HOME/plugins/python-data-tools && python3 -c \"import pandas as pd; df = pd.read_csv('$PARAM_FILE'); print(df.describe().to_string())\""
}
],
"secrets": ["OPENAI_API_KEY"],
"permissions": {
"network": "outbound",
"inference": true
}
}| Property | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Display name |
description |
string | Yes | Brief description |
version |
string | No | Semantic version |
author |
string | No | Author name |
source |
string | No | Source URL (e.g., GitHub repo) |
dependencies |
string[] | No | System packages installed via apk add (runs as root) |
setup |
string | No | Setup command run as the agent's Linux user |
files |
object | No | Files seeded into the plugin folder (key = relative path, value = contents) |
tools |
SandboxToolSpec[] | No | Custom tool definitions |
secrets |
string[] | No | Secret names the plugin requires (user prompted on install) |
permissions |
object | No | Network policy and inference access |
Plugins are installed per agent. Each agent can have a different set of plugins installed, and each installation is isolated in its own directory within the agent's workspace.
Install flow:
- Validate plugin file paths
- Start the container (if not running)
- Create the agent's Linux user
- Install system dependencies via
apk - Create plugin directory and seed files via VirtioFS
- Configure secrets from Keychain
- Run the setup command
- Register plugin tools
Managing plugins:
- Open Management window → Sandbox → Plugins tab
- Import plugins from JSON files, URLs, or GitHub repos
- Create new plugins with the built-in editor
- Install plugins to specific agents
- Export and duplicate plugins for sharing
Each tool in a plugin's tools array becomes an AI-callable tool. The tool name is {pluginId}_{toolId}.
Parameters are passed as environment variables with the prefix PARAM_:
| Parameter Name | Environment Variable |
|---|---|
file |
$PARAM_FILE |
query |
$PARAM_QUERY |
output_format |
$PARAM_OUTPUT_FORMAT |
The run field is a shell command executed as the agent's Linux user with the working directory set to the plugin folder.
Agents can check for and store secrets (API keys, tokens) using sandbox_secret_check and sandbox_secret_set. Secrets are stored in the macOS Keychain, scoped per agent.
| Path | When | How |
|---|---|---|
| Direct | Agent already has the value (e.g., received via Host API or Telegram bot) | Pass value parameter to sandbox_secret_set |
| Prompt | Agent needs the user to provide the value (Chat) | Omit value — a secure overlay appears with SecureField input |
The prompt path keeps secret values out of the conversation history and LLM context entirely. The execution loop pauses via withCheckedContinuation until the user submits or cancels.
- Agent calls
sandbox_secret_setwithoutvalue - Tool returns a
secret_promptmarker (JSON with key, description, instructions) - The chat execution loop intercepts the marker and shows
SecretPromptOverlay - User enters the secret value in a
SecureFieldand submits (or cancels via button/ESC) - The value is stored in Keychain and the tool result is rewritten to
{"stored": true, "key": "..."}(or cancelled) - Execution resumes with the sanitized result — the LLM never sees the secret
SecretPromptStatetracks aresolvedflag, makingsubmit()andcancel()idempotentonDisappearon the overlay callscancel()as a safety net if the view is dismissed unexpectedly- All session reset paths (
cancelExecution,finishExecution, etc.) dismiss pending prompts before clearing state
Agents can author, package, and register new sandbox plugins at runtime. The model-facing skill is named Sandbox Plugin Creator and is injected into the system prompt automatically when an autonomous agent has no other plugin/MCP tools available. Both the in-process sandbox_plugin_register tool and the host-API POST /api/plugin/create endpoint funnel through one shared registration pipeline (SandboxPluginRegistration.register) so they cannot drift.
autonomousExec.enabledmust betrueon the agentautonomousExec.pluginCreatemust betrue(the default inAutonomousExecConfig)- The Sandbox Plugin Creator skill must be enabled (it is, by default — disable it in the skill catalog to suppress the auto-injected backstop)
- Agent writes script files to
~/plugins/{plugin-id}/scripts/(or any subdirectory) - Agent writes a
plugin.jsonmanifest defining the plugin name, description, tools, and dependencies - Agent calls
sandbox_plugin_registerwith theplugin_id(or the host-CLI callsPOST /api/plugin/create) - The shared registration pipeline validates the plugin, applies restricted defaults, persists to
SandboxPluginLibrary, runs the install, and hot-registers the tools viaCapabilityLoadBuffer - A non-blocking toast notifies the user with a Remove action for later review
When sandbox_plugin_register loads a plugin directory, it recursively collects every UTF-8 readable file (excluding plugin.json itself) and merges them into the plugin's files map. Files explicitly defined in plugin.json take precedence over auto-discovered ones. Binary files are rejected up-front — plugin.files is text-only and silently dropped binaries would break library-driven reinstalls. Either remove them, regenerate them at install time in setup, or fetch them from a setup-allowlisted host.
Every agent-authored plugin is rewritten to enforce safe defaults before persistence:
permissions.networkis sanitised. Wildcard values (outbound) collapse tonone. Comma-separated domain lists are accepted as-is when every entry parses as a valid domain; invalid lists collapse tonone. Plan accordingly — declare exact API hostnames you need.permissions.inferenceis forced tofalse. Agent-authored plugins cannot call inference APIs.metadata.created_byis stamped toagent;metadata.created_viarecordsagent_toolorhost_bridge.
The shared pipeline rejects a registration up-front (no library state is written) when:
- File paths fail
SandboxPathSanitizer.validatePluginFiles - The
setupcommand references a host outsideSandboxNetworkPolicy.setupAllowlist - Any tool's
runcommand references a host outside the same allowlist - A declared
secretsentry has no value inAgentSecretsKeychainfor the requesting agent - The agent exceeds
SandboxRateLimiterquota forservice: "http" - The sandbox container is not running (
unavailable→ HTTP 503)
Registered plugins are saved to the SandboxPluginLibrary (~/.osaurus/sandbox-plugins/) and survive app restarts. Per-agent install state lives under ~/.osaurus/agents/{agent-id}/sandbox-plugins/installed.json. Manage, export, or remove plugins from the Sandbox → Plugins tab.
The Host API Bridge connects the container to Osaurus services on the host. Inside the container, the osaurus-host CLI communicates with the bridge server over a vsock-relayed Unix socket.
| Command | Description |
|---|---|
osaurus-host secrets get <name> |
Read a secret from the macOS Keychain |
osaurus-host config get <key> |
Read a plugin config value |
osaurus-host config set <key> <value> |
Write a plugin config value |
osaurus-host inference chat -m <message> |
Run a chat completion through Osaurus |
osaurus-host agent dispatch <id> <task> |
Dispatch a task to an agent |
osaurus-host agent memory query <text> |
Search agent memory |
osaurus-host agent memory store <text> |
Store a memory entry |
osaurus-host events emit <type> [payload] |
Emit a cross-plugin event |
osaurus-host plugin create |
Create a plugin from stdin JSON |
osaurus-host log <message> |
Append to the sandbox log buffer |
All requests include the calling Linux username for identity verification.
All file paths from tool arguments are validated by SandboxPathSanitizer before any container execution. Directory traversal attempts (..) are rejected, and paths are resolved relative to the agent's home directory.
Each agent runs as a separate Linux user (agent-{name}). Standard Unix file permissions prevent agents from accessing each other's files and processes.
Container networking can be set to outbound (NAT with internet access) or none (completely isolated). Plugins can declare their own network requirements in the permissions field.
SandboxExecLimiter— Limits the number of commands an agent can run per conversation turnSandboxRateLimiter— General rate limiting for sandbox operations and Host API bridge calls
The Sandbox UI includes built-in diagnostic checks accessible from the Container tab. Click Run Diagnostics to verify the container is functioning correctly.
| Check | What It Verifies |
|---|---|
| Exec | Can execute commands in the container |
| NAT | Outbound network connectivity |
| Agent User | Agent's Linux user exists and can run commands |
| APK | Package manager is functional |
| Vsock Bridge | Host API bridge is reachable from the container |
- Start — Boots the container (provisions first if needed)
- Stop — Gracefully shuts down the container
Removes the container and re-provisions from scratch. All agent workspaces and installed plugins are preserved (they live in the VirtioFS-mounted /workspace).
Completely removes the container and all associated assets (kernel, init filesystem). Agent workspaces are preserved.
Access these operations from the Container tab → Danger Zone section.
| Path | Description |
|---|---|
~/.osaurus/container/ |
Container root directory |
~/.osaurus/container/kernel/vmlinux |
Linux kernel |
~/.osaurus/container/initfs.ext4 |
Initial filesystem |
~/.osaurus/container/workspace/ |
Mounted as /workspace in the VM |
~/.osaurus/container/workspace/agents/{name}/ |
Per-agent home directory |
~/.osaurus/container/output/ |
Mounted as /output in the VM |
~/.osaurus/sandbox-plugins/ |
Plugin library (JSON recipes) |
~/.osaurus/agents/{agentId}/sandbox-plugins/installed.json |
Per-agent installed plugin records |
~/.osaurus/config/sandbox.json |
Sandbox configuration |
~/.osaurus/config/sandbox-agent-map.json |
Linux username to agent UUID mapping |