Note: This project is built with Claude Code(using Opus 4.6 and superpowers) by more than 99% and human-reviewed by author.
demo.mov
Or you can see examples online: example1, example2, example3, example4, example5
An AI-powered data analysis agent with a built-in SQL playground. Upload data files (CSV, JSON, Parquet, Excel) and ask questions in plain English — the agent delegates to a specialized subagent for SQL queries and renders charts inline — or switch to the SQL editor for direct queries. Powered by DuckDB on a lightweight FastAPI backend with a React frontend. The app opens in Agent Mode by default so you can start analyzing data immediately.
Each browser tab gets its own isolated DuckDB session — uploaded data and query state are fully isolated between users and tabs. Sessions are persisted to disk so your tables survive idle timeouts; the session file is only deleted when you close the tab explicitly.
- Per-user DuckDB sessions — Each browser tab gets its own isolated DuckDB instance, identified by a
X-Session-IDheader generated client-side; data and state are never shared between users or tabs; sessions are persisted to disk (/tmp/duckdb-data-agent-{session_id}.duckdb) so tables survive idle timeouts and reconnections; the session file is deleted only on explicit tab close - DuckDB SQL engine — Fast, in-process analytical database on the backend
- Multi-format file upload — Drag-and-drop or click to import CSV, JSON, Parquet, and Excel (.xlsx) files (default limit: 500 MB, configurable via
MAX_TOTAL_SIZE_BYTESenv var) with automatic schema detection; Excel workbooks with multiple sheets create one table per sheet; duplicate filename detection prevents accidental overwrites; the upload UI appears when no tables are loaded, and files can also be added via the sidebar upload button - Sample dataset — One-click load of the Titanic dataset to get started quickly
- Table sidebar — Collapsible panel to browse tables, inspect columns, and view types
- Dark / light mode — Toggle between dark and light themes with the sun/moon button in the header; respects your OS preference on first visit and remembers your choice across sessions
- Internationalization (i18n) — Switch between English and Traditional Chinese with the EN/中 toggle in the header; auto-detects your OS language on first visit and remembers your choice across sessions
- Interactive clarification — When your request is ambiguous, the agent asks a clarifying question with selectable options displayed inline in the chat; pick an option or type a free-text response to continue
- Export conversation — Click the Export button to download the full conversation as a single self-contained HTML file; the export preserves the current theme, collapsible thinking blocks, interactive Plotly charts (via CDN), and styled query result tables; interactive-only elements (edit/delete buttons, retry buttons) are stripped for a clean read-only view
- Conversation history — The bottom half of the sidebar lists past conversations for the current browser tab, ordered by most recent; click to reload a previous conversation, rename it inline, or delete it; conversations are scoped to the browser tab's session and automatically cleaned up when the tab is closed; backed by a SQLite database for durability within a session; a new conversation is created automatically when you send the first message
- Natural language queries — Ask questions about your data in plain English; the orchestrator delegates to specialized subagents that write and execute SQL for you
- Subagent architecture — An orchestrator agent delegates to a sql-analyst subagent for data queries, with a configurable model (via
SQL_SUBAGENT_MODELenv var, defaulting tohaiku); the orchestrator itself handles chart rendering viarender_chartfor coherent interleaved text-and-chart answers - Streaming responses — Real-time token streaming powered by Claude via the Anthropic Agent SDK; subagent internal reasoning is filtered from the main stream
- Live cross-conversation streaming — Start a query in one conversation, switch to another, and both streams run concurrently; a pulsing dot in the sidebar indicates which conversations are actively streaming; switch back to a streaming conversation for instant re-attachment with no lost tokens
- Visible reasoning — Collapsible thinking block shows the agent's intermediate steps and SQL queries
- Inline results — Query results rendered inline within the conversation
- Chart generation — Ask for a chart or visualization and the orchestrator generates it inline; supports bar, scatter, line, pie, histogram, box, and heatmap chart types with optional multi-series grouping; animated charts with frames, sliders, and play/pause controls are also supported
- Dual chart library support — Toggle between Plotly and Vega-Lite chart renderers via a toggle button in the agent panel header; the selected library is persisted to localStorage and sent to the backend so the agent emits native chart specs for the chosen library; defaults to Plotly for backward compatibility
- Response time — Each answer block shows the agent response time (e.g., "Answered in 3.2s") after streaming completes, giving visibility into query latency
- Edit & delete messages — Hover over any user message to edit or delete it; editing re-sends the modified query with prior conversation as context, deleting rewinds the conversation to that point
- Bifrost LLM gateway — A Bifrost gateway service manages API keys centrally and routes LLM requests to multiple providers; sidecar containers never have access to real API keys (see Security)
- Privacy-conscious — Requires an Anthropic API key stored in a server-side
.envfile; your data and credentials are never sent anywhere besides the Anthropic API - Container isolation — Run each agent session inside a gVisor-sandboxed Docker container for code execution sandboxing and multi-tenant isolation; read-only rootfs, all capabilities dropped, no host filesystem or Docker socket access (see Container Isolation)
- Langfuse observability (optional) — Built-in Langfuse tracing for monitoring agent interactions
- SQL query editor — Write and execute queries with Ctrl/Cmd+Enter
- Interactive results — Sortable columns, per-column filters, and global search across results
- EXPLAIN support — Markdown-rendered output for
EXPLAINandEXPLAIN ANALYZEqueries
- Persistent agent memory — The agent remembers facts, preferences, and patterns across conversations; memories are stored as markdown files on disk (
data/memories/{user_id}/MEMORY.md) and injected into the system prompt at the start of each conversation, with stored user preferences taking priority over default model behaviors - MCP-based memory tools — The agent saves, recalls, and forgets memories via MCP tools (
save_memory,recall_memories,forget_memory); duplicate detection prevents storing the same memory twice - Memory management UI — A Memories tab in the sidebar lets you view and delete individual memories; a file icon button opens a detail modal showing the full
MEMORY.mdfile with Preview (rendered markdown) and Source (raw text) tabs, matching the skill detail modal
- Browse skills — A "Skills" tab in the sidebar lists available skills with name and description; click a skill to open a detail modal with Preview and Source tabs; click "Use" to append a
/skill-nameslash command to the current chat input text - Invoke skills via slash command — Type
/anywhere in the chat input to open an autocomplete dropdown of available skills; the menu filters as you type and works at any cursor position, not just the beginning of the input - Multiple skills per message — Combine multiple skills in a single message (e.g.,
use this skill /analyze-data use this skill /my-other-skill What is the average revenue?); all referenced skills are invoked before the agent processes your question - Enable/disable skills — Toggle the eye icon on any skill to enable or disable it; disabled skills are hidden from the slash command menu and cannot be invoked
- Delete skills — Remove custom skills via the delete button in the sidebar; built-in skills (like
analyze-data) cannot be deleted - Create skills from the UI — Click the "+" button in the Skills tab to create a new skill with a name, description, and step-by-step instructions; skills are stored as
SKILL.mdfiles on disk - Agent-created skills — The agent can create new skills during a conversation via the
create_skillMCP tool; new skills appear in the sidebar automatically - Dynamic skill discovery — Skills are stored at
skills/<name>/SKILL.mdand re-scanned per request; add or remove skills without restarting
- Data analysis plugin — The knowledge-work-plugins/data plugin is bundled at
plugins/data/, providing specialized commands and skills for data work - Plugin commands — Invoke via slash commands:
/data:analyze,/data:explore-data,/data:write-query,/data:create-viz,/data:build-dashboard,/data:validate - Plugin skills — The plugin adds skills for SQL queries, data exploration, data visualization, statistical analysis, data validation, interactive dashboard building, and data context extraction
- Plugin isolation — Plugins are bind-mounted read-only into sidecar containers at
/app/plugins/; the plugin's external MCP server connections (Snowflake, Databricks, etc.) are neutralized since the agent uses its own DuckDB MCP server - Dashboard limitation — The
/data:build-dashboardcommand can generate HTML dashboards inside the sidecar container, but the output is not yet visible in the UI; the sidecar writes to ephemeral tmpfs storage that is inaccessible from the browser, and the frontend has no HTML preview or iframe rendering support; seedocs/plans/2026-03-03-inline-dashboard-rendering.mdfor the implementation plan to add inline dashboard rendering via arender_dashboardMCP tool
make installCopy the example environment file and add your credentials:
cp backend/.env.example backend/.envEdit backend/.env and set your Anthropic API key:
ANTHROPIC_API_KEY=sk-ant-...
ORCHESTRATOR_MODEL=claude-sonnet-4-6 # optional, defaults to sonnet
SQL_SUBAGENT_MODEL=haiku # optional, model for SQL analyst subagent (default: haiku)
DEFAULT_TOOL_MODEL= # optional, fallback model for SDK built-in tools (see below)
MAX_TOTAL_SIZE_BYTES=524288000 # optional, max upload size in bytes (default: 500 MB)
ANTHROPIC_API_KEYandORCHESTRATOR_MODELare only needed for the AI agent. The SQL playground works without them, but both require the backend running.
You can route each agent (orchestrator, SQL subagent) to a different LLM provider by using the @suffix syntax:
ORCHESTRATOR_MODEL=openai/gpt-4o@sonnet
SQL_SUBAGENT_MODEL=openai/gpt-4o-mini@haiku
The format is real_model@sdk_alias. The SDK alias (after @) is the model name the Claude Agent SDK sees; the real model (before @) is what the backend proxy rewrites it to before forwarding to Bifrost. This lets you use any provider Bifrost supports (OpenAI, Bedrock, etc.) while keeping the SDK configuration unchanged.
How it works:
- The SDK sends requests using the alias (e.g.
sonnet,haiku). - A lightweight reverse proxy on the backend (
/anthropic/*) intercepts these requests. - The proxy rewrites the
modelfield to the real provider model (e.g.openai/gpt-4o) and forwards the request to Bifrost. - Bifrost routes the request to the correct provider.
Without @suffix, model values are used as-is (direct Anthropic routing).
The Claude Agent SDK's built-in tools (e.g. WebFetch) internally use their own Anthropic models (e.g. claude-haiku-4-5-20251001). When routing all traffic through a non-Anthropic provider, these models are not in the rewrite map and will fail because they lack a provider prefix.
Set DEFAULT_TOOL_MODEL to catch any model not matched by the per-agent rewrites:
DEFAULT_TOOL_MODEL=openai/gpt-4o-mini
Any request whose model does not match a configured rewrite is rewritten to this value before forwarding to Bifrost. When empty (default), unmatched models pass through unchanged.
Warning: The
@suffixmust NOT match theORCHESTRATOR_MODEL's Anthropic model name. For example,ORCHESTRATOR_MODEL=haikuwithSQL_SUBAGENT_MODEL=openai/gpt-4o-mini@haikuwill conflict because the proxy cannot distinguish orchestrator traffic from subagent traffic when both resolve to the same model name.
Copy the example config and adjust provider keys as needed:
cp bifrost/config.example.json bifrost/config.jsonThe default config routes requests through Anthropic using ANTHROPIC_API_KEY from your backend/.env. To add other providers (OpenAI, Bedrock, etc.), edit bifrost/config.json — see bifrost/config.example.json for multi-provider examples. You can also configure providers via Bifrost's Web UI at http://localhost:8081 (or $BIFROST_PORT). Note that you still need to set api keys and model names in backend/.env.
To enable agent tracing with Langfuse, add these to backend/.env:
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.com # optional, defaults to cloud
When configured, every agent conversation is traced (LLM turns, tool calls, SQL execution) and a Langfuse Traces button appears in the agent panel header linking to your dashboard. When not configured, tracing is disabled with zero overhead.
First-time setup — install dependencies, build the sidecar image, and create the Docker network:
make installStart both the frontend and backend:
make devOpen http://localhost:5173 to use the app. The Vite dev server proxies /api requests to the backend automatically.
Docker Compose builds both images (app + sidecar) and runs the app with container isolation enabled. The sidecar image is built but not started as a service — the app spawns sidecar containers on-demand via the Docker SDK.
Prerequisites: Docker and Docker Compose (included with Docker Desktop).
Build all images:
make compose-buildStart the app:
make compose-upOpen http://localhost:10000 to use the app.
Stop the app:
make compose-downNotes:
- Linux users: Set
DOCKER_GIDto your host's docker group GID so the app container can access the Docker socket:On macOS with Docker Desktop, the default (DOCKER_GID=$(getent group docker | cut -d: -f3) make compose-up0) works out of the box. - Linux users (skills write permission):
make compose-buildautomatically passes your host UID (APP_UID) so the container user can write to the bind-mountedskills/directory. If you build manually without the Makefile, pass it explicitly:On macOS with Docker Desktop, this is not needed as Docker handles file permissions transparently.APP_UID=$(id -u) docker compose --profile sidecar build - Custom port: Set
APP_PORTto expose the app on a different host port (e.g.,APP_PORT=8080 make compose-up).
┌─────────────────────────────────────────────────────────────────────────────────┐
│ BROWSER (per tab) │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ React 18 + TypeScript (Vite) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ App.tsx │ │ Sidebar │ │ AgentPanel │ │ │
│ │ │ (Layout + │ │ - Tables │ │ - MessageBubble │ │ │
│ │ │ Contexts) │ │ - Convos │ │ - ChartWidget (Plotly) │ │ │
│ │ │ │ │ - Memories │ │ - VegaLiteChartWidget │ │ │
│ │ │ │ │ - Skills │ │ - ResultsTable │ │ │
│ │ │ │ │ - FileUpload│ │ - QueryEditor │ │ │
│ │ └──────────────┘ └──────────────┘ │ - UserQuestion │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Context Providers │ │ │
│ │ │ AgentContext · SessionContext · ConversationContext │ │ │
│ │ │ ThemeContext · LanguageContext · ChartLibraryContext │ │ │
│ │ └─────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ agentService.ts │ ── SSE streaming (/api/chat) ──────────┐ │ │
│ │ │ (fetch + EventSource) │ │ │
│ │ └──────────────────────┘ │ │ │
│ └──────────────────────────────────────────────────────────────────┼──────┘ │
│ │ │
│ X-Session-ID header (UUID per tab) │ │
└─────────────────────────────────────────────────────────────────────┼───────────┘
│
REST + SSE (port 8000 dev / 10000 prod) │
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ BACKEND (FastAPI + Uvicorn) │
│ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Routes (app/routes/) │ │
│ │ │ │
│ │ POST /api/chat ──────────► SSE stream (thinking, answer, tool_result) │ │
│ │ POST /api/chat/edit ─────► SSE stream (re-run from edited message) │ │
│ │ POST /api/chat/respond ──► Answer pending user_question │ │
│ │ POST /api/query ─────────► Direct SQL execution (playground) │ │
│ │ GET /api/tables ────────► List loaded tables + schemas │ │
│ │ POST /api/upload ────────► Load CSV/JSON/Parquet/Excel into DuckDB │ │
│ │ CRUD /api/conversations ─► Conversation history (SQLite) │ │
│ │ GET /api/memories ──────► Agent long-term memories │ │
│ │ GET /api/skills ────────► Skill discovery and management │ │
│ │ GET /api/config ────────► Frontend feature flags │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌─────────────────────────────┐ │
│ │ session_manager │ │ database.py │ │ memory_store.py │ │
│ │ │ │ │ │ │ │
│ │ Per-tab DuckDB │ │ DuckDB wrapper │ │ SQLite (data/memory.db) │ │
│ │ sessions with │ │ - load_csv/json/│ │ - Conversations │ │
│ │ TTL cleanup │ │ parquet/excel │ │ - Messages │ │
│ │ /tmp/duckdb-*.db│ │ - execute_query │ │ - WAL mode │ │
│ └──────────────────┘ │ - thread-safe │ └─────────────────────────────┘ │
│ └──────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ agent.py (Orchestration) │ │
│ │ │ │
│ │ build_system_prompt() ── dynamic prompt with table schemas │ │
│ │ stream_chat() ── spawns sidecar, streams SSE events │ │
│ │ build_subagent_defs() ── sql-analyst subagent configuration │ │
│ └────────────────────────────────┬─────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────┼─────────────────────────────────────────┐ │
│ │ MCP SSE Server (mcp_sse.py — Starlette) │ │
│ │ │ │ │
│ │ Tools exposed to Agent SDK: │ ◄── MCP protocol ───┐ │ │
│ │ ┌────────────┐ ┌────────────┐ │ │ │ │
│ │ │execute_sql │ │render_chart│ │ │ │ │
│ │ └────────────┘ └────────────┘ │ │ │ │
│ │ ┌─────────────────┐ ┌─────────┴──────┐ │ │ │
│ │ │ask_user_question│ │ create_skill │ │ │ │
│ │ └─────────────────┘ └────────────────┘ │ │ │
│ │ ┌─────────────┐ ┌────────────────┐ ┌──────────────┐ │ │ │
│ │ │save_memory │ │recall_memories │ │forget_memory │ │ │ │
│ │ └─────────────┘ └────────────────┘ └──────────────┘ │ │ │
│ └─────────────────────────────────────────────────────────┼────────────────┘ │
│ │ │
│ ┌──────────────────────────┐ ┌───────────────────────────┼──────────────┐ │
│ │ container_manager.py │ │ proxy.py │ │ │
│ │ │ │ │ │ │
│ │ Docker container spawn │ │ /anthropic/* → Bifrost │ │ │
│ │ gVisor sandbox (runsc) │ │ Model @suffix rewriting │ │ │
│ │ Resource limits │ │ │ │ │
│ │ agent-sandbox network │ └─────────────┬─────────────┘ │ │
│ └──────────┬───────────────┘ │ │ │
│ │ │ │ │
│ ┌──────────┼─────────────────┐ ┌───────────┼──────────────────────────┐ │ │
│ │ skills.py│ agent_memory.py│ │ tracing.py│ pending_questions.py │ │ │
│ │ SKILL.md │ MEMORY.md │ │ Langfuse │ Question state mgmt │ │ │
│ │ parsing │ persistence │ │ traces │ │ │ │
│ └──────────┘─────────────────┘ └───────────┘──────────────────────────┘ │ │
└──────────────┬────────────────────────────────┬───────────────────────────┘ │
│ │ │
▼ ▼ │
┌───────────────────────────┐ ┌──────────────────────────────┐ │
│ SIDECAR CONTAINER │ │ BIFROST LLM GATEWAY │ │
│ (Express + TypeScript) │ │ (maximhq/bifrost) │ │
│ │ │ │ │
│ Claude Agent SDK 0.2.62 │ │ Port 8081 │ │
│ Port 3000 (internal) │ │ Multi-provider routing: │ │
│ │ │ ┌──────────┐ │ │
│ POST /query │ │ │Anthropic │ Claude models │ │
│ - Spawns CLI subprocess │ │ └──────────┘ │ │
│ - MCP client → backend ──┼────┼──│OpenAI │ GPT models │ │
│ - Streams SSE events │ │ └──────────┘ │ │
│ - Plugin loading (data) │ │ ┌──────────┐ │ │
│ - Skill allowlist check │ │ │Bedrock │ AWS models │ │
│ - Idle timeout (10 min) │ │ │ │ │ │
│ │ │ └──────────┘ │ │
│ /health (liveness) │ │ config.json routing rules │ │
│ │ │ │ │
│ Security: │ └──────────────────────────────┘ │
│ - Read-only rootfs │ │
│ - 512MB memory limit │ │
│ - All caps dropped │ │
│ - agent-sandbox network │ │
└───────────────────────────┘ │
│
┌─────────────────────────────────────────────────────────────────────────────────┘
│
│ DATA FLOW: User Query → Response
│ ═══════════════════════════════
│
│ 1. User types message in AgentPanel
│ 2. Frontend POST /api/chat (SSE) with X-Session-ID
│ 3. Backend creates/retrieves DuckDB session
│ 4. Backend spawns sidecar Docker container
│ 5. Backend POST sidecar:3000/query with system prompt + table schemas
│ 6. Sidecar spawns Claude Agent SDK subprocess
│ 7. SDK calls MCP tools on backend /mcp/sse:
│ ├── execute_sql → DuckDB query → tabular results
│ ├── render_chart → Plotly/Vega-Lite spec → UI rendering
│ ├── ask_user_question → interactive clarification
│ ├── save_memory → persistent learning
│ └── create_skill → reusable workflow
│ 8. SDK generates response, streams to sidecar
│ 9. Sidecar streams SSE events to backend
│ 10. Backend persists messages to SQLite, forwards SSE to frontend
│ 11. Frontend renders: thinking → answer → charts → tables
│
│ SSE Event Types:
│ thinking | answer | thinking_done | tool_call | tool_result
│ subagent_start | subagent_end | user_question | done | error
│
└──────────────────────────────────────────────────────────────────────────────────────
┌──────────────────────────────────────────────────────────────────────────────┐
│ AGENT MODEL HIERARCHY │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATOR (claude-sonnet-4-6) │ │
│ │ │ │
│ │ Has tools: execute_sql, render_chart, ask_user_question, │ │
│ │ create_skill, save_memory, recall_memories, │ │
│ │ forget_memory │ │
│ │ │ │
│ │ Handles: chart rendering, memory mgmt, skill creation, │ │
│ │ user interaction, response generation │ │
│ │ │ │
│ │ ┌────────────────────────────────────┐ │ │
│ │ │ SQL ANALYST SUBAGENT (haiku) │ │ │
│ │ │ │ │ │
│ │ │ Tools: execute_sql only │ │ │
│ │ │ Purpose: multi-step SQL analysis │ │ │
│ │ │ Lightweight, fast, cost-efficient │ │ │
│ │ └────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ PERSISTENCE LAYER │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌─────────────────────────┐ │
│ │ DuckDB │ │ SQLite │ │ Filesystem │ │
│ │ │ │ │ │ │ │
│ │ Per-session OLAP │ │ data/memory.db │ │ data/memories/ │ │
│ │ /tmp/duckdb-*.db │ │ - conversations │ │ {user}/MEMORY.md │ │
│ │ │ │ - messages │ │ │ │
│ │ User data: │ │ - WAL mode │ │ skills/ │ │
│ │ CSV, JSON, │ │ │ │ *.SKILL.md │ │
│ │ Parquet, Excel │ │ │ │ (built-in + user) │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ plugins/data/ │ │
│ │ │ │ │ │ commands + skills │ │
│ └───────────────────┘ └───────────────────┘ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘When the agent runs, the backend spawns a sidecar container via the Docker SDK. A naive approach would pass ANTHROPIC_API_KEY directly into that container's environment — but any tool or shell command the agent executes could then read and exfiltrate the key.
Instead, a Bifrost LLM gateway service manages API keys centrally and routes requests to LLM providers:
Sidecar Container
→ ANTHROPIC_BASE_URL=http://bifrost:8080/anthropic
→ ANTHROPIC_API_KEY=placeholder
↓
Bifrost Gateway (/anthropic)
→ ignores client-sent key
→ injects the real provider API key
→ forwards request to provider (Anthropic, OpenAI, Bedrock, etc.)
How it works:
- Bifrost runs as a Docker Compose service with real API keys stored in
bifrost/config.json(referencing environment variables). - Sidecar containers receive a placeholder
ANTHROPIC_API_KEYand pointANTHROPIC_BASE_URLto Bifrost's native/anthropicendpoint. - Bifrost injects the real provider API key when forwarding requests upstream. The Claude Agent SDK works unchanged since Bifrost speaks native Anthropic Messages API.
- Additional providers (OpenAI, Bedrock, Vertex, etc.) can be added to
bifrost/config.jsonor via Bifrost's Web UI. Use provider prefixes in model names (e.g.,openai/gpt-4o-mini) to route subagent requests to different providers. Seebifrost/config.example.jsonfor configuration examples.
The sidecar container only ever holds a placeholder string. Even if a tool call reads the environment, it cannot obtain any real API key.
The backend runs each Claude Code session inside a gVisor-sandboxed Docker container ("sidecar"). This provides code execution sandboxing, multi-tenant isolation, and a hardened boundary between the agent and the host system.
Architecture:
Browser
│
▼
FastAPI Backend (host)
├── Chat route ──► ContainerManager ──► Docker SDK
│ │
│ ▼
│ ┌──────────────────────┐
│ │ gVisor Sandbox │
│ │ │
│ │ Sidecar Container │
│ │ (Node.js + Claude) │
│ │ │
│ │ POST /query → SSE │
│ └──────┬───────────────┘
│ │
│ ├──► Bifrost Gateway (:8080/anthropic) (LLM routing)
├── /mcp/sse ◄───────┘ (DuckDB MCP bridge)
│
└── DuckDB (per-user, disk-persisted)
The data flow for a chat message is:
- Frontend sends a chat message to the FastAPI backend.
- Backend spins up a gVisor container (or reuses an existing one for the session) via
ContainerManager, configured to route LLM calls through the Bifrost gateway. - Backend sends the query to the sidecar's
POST /queryendpoint. The sidecar calls the Claude Agent SDK'squery()function withincludePartialMessages: truefor token-level streaming, configured with the host's MCP SSE endpoint. - The agent talks to the Bifrost gateway (
/anthropic) for LLM API access (using a placeholder key; Bifrost injects the real key). - The agent's
execute_sqltool calls reach the host DuckDB via the MCP SSE bridge (/mcp/sse?session_id=...), which routes each connection to the correct per-user DuckDB instance through the existingSessionManager. - The sidecar streams SSE events back to the backend, which forwards them to the frontend.
- On session end, the container is stopped and removed.
Sidecar container: The sidecar/ directory contains a TypeScript HTTP server (src/server.ts) that uses the Claude Agent SDK (@anthropic-ai/claude-agent-sdk) with includePartialMessages: true for true token-level streaming. The Docker image (sidecar/Dockerfile) bundles Node.js 20, Python 3.12, and the Agent SDK. Containers run with a read-only root filesystem, all Linux capabilities dropped, no Docker socket access, and a non-root user. The host skills/ directory is bind-mounted read-only at /app/.claude/skills/ (the project-level path) so the SDK's built-in Skill tool can discover and invoke them. The host plugins/ directory is bind-mounted read-only at /app/plugins/ so the SDK can load installed plugins (e.g., the data plugin for data analysis commands and skills).
MCP SSE bridge: The backend exposes tools at /mcp/sse using the MCP protocol's SSE transport (backend/app/mcp_sse.py): execute_sql for DuckDB queries, render_chart for chart generation, ask_user_question for interactive clarification, create_skill for agent-driven skill creation, and save_memory / recall_memories / forget_memory for persistent agent memory. Each SSE connection requires a session_id query parameter to route tool calls to the correct per-user DuckDB instance. This is how the containerized agent reaches DuckDB on the host without any direct database access inside the container.
Prerequisites:
- Docker
- gVisor (runsc) (Optional) runtime installed and registered with Docker
Note: gVisor requires Linux (kernel 4.14.77+, x86_64 or ARM64). It is not available on macOS or Windows. On non-Linux hosts (e.g., macOS with Docker Desktop), set
CONTAINER_RUNTIME=runcto use Docker's default runtime instead. You still get container isolation (filesystem, process, network, capability drop, read-only rootfs) — only gVisor's syscall interception layer is absent. For production multi-tenant deployments, use a Linux host with gVisor for full sandboxing.
Setup:
-
Build the sidecar image and create the Docker network:
docker compose build make sidecar-network
-
Install gVisor by following the official guide.
-
Start the app:
make compose-up
Environment variables:
| Variable | Default | Description |
|---|---|---|
CONTAINER_IMAGE |
duckdb-agent-sidecar:latest |
Sidecar Docker image |
CONTAINER_RUNTIME |
runc |
Docker runtime (runc for non-gVisor, runsc for gVisor) |
CONTAINER_MEMORY_LIMIT |
512m |
Memory limit per container |
CONTAINER_CPU_LIMIT |
0.5 |
CPU limit per container |
CONTAINER_MAX_LIFETIME_SECONDS |
3600 |
Max container lifetime |
CONTAINER_IDLE_TIMEOUT_SECONDS |
900 |
Idle timeout before container is stopped (15 min) |
CONTAINER_NETWORK |
agent-sandbox |
Docker network name |
SKILLS_DIR |
skills |
Path to skills directory (inside backend container) |
SKILLS_HOST_PATH |
./skills |
Host path for skills volume mount (bind-mounted read-only at /app/.claude/skills/ in sidecar containers) |
PLUGINS_HOST_PATH |
./plugins |
Host path for plugins volume mount (bind-mounted read-only at /app/plugins/ in sidecar containers) |
APP_UID |
1000 |
UID for the container user (set to $(id -u) on Linux so skills directory writes work; not needed on macOS) |
MEMORY_DB_PATH |
data/memory.db |
SQLite database for conversation history |
MEMORIES_DIR |
data/memories |
Directory for agent memory files |
Security properties:
- The container has no host filesystem access except read-only bind mounts of the
skills/andplugins/directories - gVisor intercepts all syscalls -- even arbitrary bash/python execution is sandboxed
- No real API keys inside the container (placeholder string only; real keys managed by Bifrost)
- Per-session isolation -- containers cannot see each other
- Resource limits (CPU, memory, lifetime) prevent denial-of-service against the host
- Internal networks (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16) are blocked, preventing cloud metadata and internal service access
For full design details, see docs/plans/2026-02-22-containerized-runtime-design.md.
Browser-based end-to-end tests are driven by YAML scenario files using Playwright. Define test scenarios declaratively — the runner handles browser automation, structural DOM assertions, and LLM-as-judge semantic verification.
Install:
make install-e2eEnvironment setup: Scenarios using verify_llm (LLM-as-judge) require an Anthropic API key. Create e2e/.env with:
ANTHROPIC_API_KEY=your-key-here
Run tests (requires make dev running in another terminal):
make e2e-test # headless (default)
make e2e-test-headed # with visible browser
make e2e-test-ui # interactive Playwright UIView HTML report:
make e2e-reportConfigure target environment via BASE_URL (defaults to http://localhost:5173):
BASE_URL=http://localhost:10000 make e2e-testWriting scenarios: Add YAML files to e2e/scenarios/. Each file contains a list of scenarios with sequential steps:
scenarios:
- name: "Upload CSV and query"
steps:
- action: upload_file
file: ./test-data/sales.csv
- action: send_message
input: "Show total sales by region"
- action: wait_for_response
- action: verify
expected:
contains: ["north", "south"]
has_chart: true
- action: verify_llm
criteria: "Response shows sales by region with numeric values"
pass_threshold: 0.7Available actions: upload_file, send_message, wait_for_response, click, navigate, verify (structural DOM checks), verify_llm (LLM-as-judge with Anthropic API — requires ANTHROPIC_API_KEY).
Structural verifiers: contains, not_contains, has_chart, has_table, table_row_count_min, element_exists, element_not_exists, css_property.
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components (editor, results, sidebar, chat, Plotly/Vega-Lite charts, skills, memories, conversations, user-question)
│ │ ├── contexts/ # React context providers (theme, language, agent, config, session, conversation, chart library)
│ │ ├── hooks/ # Custom hooks (useTheme, useTranslation, useAgent, useConfig, useSessionId, useChartLibrary)
│ │ ├── agent/ # Agent service (SSE event handling, session ID injection)
│ │ ├── services/ # API clients (skillsService.ts, memoriesService.ts)
│ │ ├── i18n/ # Translation files (en.json, zh-TW.json)
│ │ ├── utils/ # Utility functions (UUID generation, conversation export, message building)
│ │ └── types.ts # Shared TypeScript interfaces
│ ├── index.html # HTML entry point
│ ├── package.json # npm config
│ └── vite.config.ts # Vite bundler config
├── backend/ # FastAPI backend
│ ├── Dockerfile # Production image: Python 3.12 + React frontend bundle
│ ├── app/
│ │ ├── main.py # App setup, CORS, and background session/container cleanup loop
│ │ ├── config.py # Environment variables (API key, model, upload limits, container settings)
│ │ ├── database.py # DuckDB connection and query execution
│ │ ├── session_manager.py # Per-user DuckDB session lifecycle (create, cleanup, disk persistence)
│ │ ├── agent.py # Agent loop, subagent definitions, & SSE streaming via container sidecar
│ │ ├── memory_store.py # SQLite-backed conversation and message persistence (WAL mode, thread-safe)
│ │ ├── agent_memory.py # File-based agent memory (read/save/forget markdown memories, thread-safe)
│ │ ├── skills.py # Skill CRUD operations (read/write SKILL.md files)
│ │ ├── mcp_sse.py # MCP SSE endpoint: exposes DuckDB, chart, create_skill, and memory tools over HTTP
│ │ ├── container_manager.py # Docker container lifecycle management for sidecar containers
│ │ ├── proxy.py # Reverse proxy for per-subagent model routing (@suffix rewriting)
│ │ ├── pending_questions.py # Interactive clarification (agent asks user for disambiguation)
│ │ ├── dependencies.py # FastAPI dependency injection utilities
│ │ ├── tracing.py # Langfuse client wrapper & initialization
│ │ ├── data/ # Sample datasets (titanic.csv)
│ │ └── routes/ # API endpoints
│ │ ├── chat.py # Chat endpoint with SSE streaming
│ │ ├── query.py # SQL query execution
│ │ ├── tables.py # Table inspection (schema, columns, sample data)
│ │ ├── session.py # Session creation and deletion
│ │ ├── skills.py # Skills CRUD REST API (/api/skills)
│ │ ├── conversations.py # Conversation history CRUD REST API (/api/conversations)
│ │ ├── memories.py # Agent memory REST API (/api/memories)
│ │ ├── config.py # Runtime configuration
│ │ └── langfuse_status.py # Langfuse tracing status and link
│ └── tests/ # Unit tests (pytest)
│ ├── test_skills.py
│ ├── test_skills_routes.py
│ ├── test_container_manager.py
│ ├── test_mcp_sse.py
│ ├── test_memory_store.py
│ ├── test_proxy.py
│ ├── test_session_manager.py
│ └── ... # 15 test modules total
├── skills/ # Skill definitions (SKILL.md files, volume-mounted into sidecar containers)
│ ├── analyze-data/ # Built-in data analysis workflow skill
│ └── <name>/ # Custom skills (each with a SKILL.md file)
├── sidecar/ # Containerized agent sidecar
│ ├── src/
│ │ ├── server.ts # TypeScript HTTP server using Claude Agent SDK with token-level streaming
│ │ └── types.ts # Request/response type definitions
│ ├── Dockerfile # Sidecar image: Node.js 20 + Python 3.12
│ ├── package.json # npm config
│ └── setup-network.sh # Docker network setup script
├── e2e/ # Playwright E2E tests
│ ├── scenarios/ # YAML test scenario files
│ ├── test-data/ # Test fixture files (CSV, etc.)
│ ├── tests/ # Dynamic test generator (scenario-runner.spec.ts)
│ ├── lib/ # YAML loader, actions, verifiers, LLM judge
│ └── playwright.config.ts
├── bifrost/ # Bifrost LLM gateway configuration
│ └── config.example.json # Example provider keys and routing config (copy to config.json)
├── examples/ # Exported conversation examples (self-contained HTML)
├── docs/plans/ # Design and implementation plan documents
├── utils/ # Standalone utility pages (run_plotly.html, run_vega_lite.html for testing chart specs)
├── .github/workflows/ # GitHub Actions CI/CD (code review, CI)
├── docker-compose.yml # Compose orchestration (bifrost + app + sidecar build)
└── Makefile # Dev commands (install, dev, compose-build/up/down, e2e-test, clean)
Frontend
- React 18 + TypeScript
- Vite
- Plotly via react-plotly.js and Vega-Lite via vega-embed (dual chart rendering, user-selectable)
Backend
- FastAPI + Uvicorn
- DuckDB (Python)
- Anthropic Agent SDK
- MCP SSE transport (DuckDB tool bridge for containers)
- Subagent architecture via Claude Agent SDK
AgentDefinitionAPI (sql-analyst) - Docker SDK for Python + gVisor (container isolation)
- Langfuse (optional, for observability)
Sidecar
- Node.js 20 + TypeScript
- Express HTTP server
- Claude Agent SDK (
@anthropic-ai/claude-agent-sdk) with token-level streaming
LLM Gateway
- Bifrost — centralized API key management and multi-provider LLM routing (Anthropic, OpenAI, Bedrock, etc.)