Skip to content

System Architecture

Daniel Ellison edited this page Mar 26, 2026 · 16 revisions

System Architecture

This page explains how Kai's components fit together, how a message flows through the system, and how key features are implemented.

Overview

Kai is a bridge between Telegram and Claude Code. At its core, it's a Telegram bot that maintains a pool of persistent Claude Code CLI subprocesses - one per user - and pipes messages between Telegram and the appropriate subprocess.

Users (Telegram app)                GitHub / External services
    |                                       |
    v                                       v
Telegram Bot API                    webhook.py (HTTP server)
    |                                       |
    v                                       v
bot.py (handlers, commands, routing) -------+
    |                                       |
    v                                       v
pool.py (per-user subprocess pool)  review.py / triage.py (one-shot agents)
    |                                       |
    v                                       v
claude.py (one instance per user)   Claude Code CLI (--print mode)
    |
    v
Claude Code CLI (interactive)

Alongside the Telegram message flow, a webhook HTTP server runs for receiving GitHub events, generic webhooks, and providing the internal scheduling API. GitHub events can trigger autonomous agent pipelines (PR review, issue triage) that run independently of the chat subprocess.

Kai supports two transport modes for receiving Telegram updates: long polling (default) and webhooks. In polling mode, python-telegram-bot pulls updates from the Telegram API. In webhook mode, Telegram pushes updates to the webhook server at /webhook/telegram. The mode is selected automatically based on whether TELEGRAM_WEBHOOK_URL is set in .env.

Source files

File Responsibility
main.py Async startup/shutdown, orchestrates all components
bot.py Telegram handlers, slash commands, inline keyboards, message routing
claude.py Manages the persistent Claude Code subprocess, handles streaming
config.py Loads environment variables, defines Config, WorkspaceConfig, and UserConfig dataclasses, parses workspaces.yaml and users.yaml
sessions.py SQLite database: sessions, jobs, settings, workspace history
pool.py Per-user subprocess pool: lazy creation, idle eviction, workspace restoration, prompt routing
cron.py APScheduler integration, registers and executes scheduled jobs
webhook.py aiohttp HTTP server: GitHub webhooks, generic webhooks, scheduling API, file exchange
history.py Conversation history: JSONL logging by date + recent history retrieval
locks.py Per-chat async locks (prevents concurrent responses) and stop events
totp.py TOTP verification, rate limiting, and CLI setup/reset
transcribe.py Voice message transcription (ffmpeg + whisper-cpp)
tts.py Text-to-speech synthesis (Piper TTS + ffmpeg)
review.py PR review agent: diff analysis, prior-comment awareness, one-shot Claude subprocess
triage.py Issue triage agent: labeling, duplicate detection, project assignment via one-shot subprocess
services.py External service proxy: declarative HTTP layer for third-party API calls
install.py Protected installation: interactive config, apply, status, service lifecycle

Message lifecycle

Here's what happens when you send a message in Telegram:

1. Receive and authorize

bot.py:handle_message receives the update from python-telegram-bot. The @_require_auth decorator checks if the sender's Telegram user ID is authorized - either via users.yaml (if present) or ALLOWED_USER_IDS. Unauthorized messages are silently dropped.

If TOTP authentication is configured, a second gate checks whether the user's TOTP session is still active. If it has expired, Kai prompts for a 6-digit authenticator code before Claude is started. The auth timestamp lives in memory and persists across workspace switches and session resets, but not across bot restarts.

2. Acquire lock

Each chat has an async lock (locks.py:get_lock). This prevents concurrent Claude requests from the same chat — if you send a second message while Claude is still responding, it queues behind the first.

3. Inject context (first message only)

On the first message of a new session, claude.py:send prepends context to the prompt:

  • Identity — When in a foreign workspace, Kai's CLAUDE.md from the home workspace is injected so it remembers who it is.
  • Home memory — The home workspace's MEMORY.md is always injected.
  • Workspace memory — If the current workspace has its own MEMORY.md, it's also injected.
  • Recent history — The last 20 messages from conversation logs, scanning back as far as needed.
  • API info -- The scheduling API endpoint and secret, so Claude can create jobs via curl. Injected in every workspace because the API runs on localhost, not inside the workspace - scheduling, messaging, and file operations need to work regardless of which repo Claude is pointed at.
  • File API info -- The send-file endpoint and incoming file save location (DATA_DIR/files/<chat_id>/), so Claude knows how to send files back and where to find received files.
  • Service info -- Available external services (names, descriptions, usage notes) and the service proxy endpoint, so Claude can call third-party APIs via curl.

4. Send to Claude Code

The prompt is written to the Claude Code subprocess's stdin as a JSON message. Claude Code runs in --output-format stream-json mode, so it emits JSON events as it processes.

5. Stream the response

claude.py:send is an async generator that yields events:

  • Text events — partial response text as Claude generates it
  • Done event — final response with session ID, cost, and full text

6. Live-update in Telegram

bot.py:_handle_response consumes the stream:

  1. Shows a typing indicator (refreshed every 4 seconds)
  2. On the first text event, sends an initial reply message
  3. Every 2 seconds (EDIT_INTERVAL), edits the message with the latest text
  4. On the done event, performs a final edit with the complete response
  5. If the response exceeds 4096 characters (Telegram's limit), it's split into chunks

7. Persist session

After a successful response, the session ID and cost are saved to SQLite (sessions.py:save_session). This allows cost tracking via /stats and session continuity.

Agent subprocess pattern

The message lifecycle above describes the persistent subprocess - a long-running Claude Code process that maintains session state across messages. Kai also has a second execution model for autonomous agents: the one-shot subprocess.

How it works

Agent modules (review.py, triage.py) spawn a separate Claude Code process using claude --print mode. This is fundamentally different from the chat subprocess:

Persistent (claude.py) One-shot (agents)
Lifetime Runs for the entire session Spawned per event, exits when done
Mode --output-format stream-json --print (stdin to stdout, no streaming)
Tools Full tool access (shell, files, browser) No tools (--no-tool-use)
Session Maintains conversation context Stateless, no session ID
Triggered by User messages via Telegram GitHub webhook events
Budget Per-session cost tracking Per-invocation max budget

Prompt construction

Agent prompts follow a consistent pattern:

  1. System prompt with the agent's role and output format
  2. Structured data (diff, issue body, metadata) wrapped in XML tags
  3. Untrusted content (PR titles, issue text, commit messages) double-wrapped in XML delimiters as a prompt injection defense
  4. Output instructions requesting a specific format (Markdown for reviews, JSON for triage)

The XML wrapping is important: since webhook payloads contain user-authored content (PR descriptions, issue bodies), the agent prompts use <untrusted-content> delimiters with explicit instructions not to follow any directives found inside them.

Lifecycle

  1. GitHub webhook arrives at webhook.py
  2. Cooldown check prevents duplicate processing
  3. Agent module gathers context (gh CLI for diffs, related issues, prior comments)
  4. Prompt is built and passed to claude --print via stdin
  5. Claude's response is parsed and acted on (GitHub comment, label application, Telegram summary)
  6. The subprocess exits; no state is retained

This fire-and-forget model means agents don't interfere with the chat subprocess. A PR review can run while the user is mid-conversation - they use completely separate Claude processes.

Workspace switching

Workspaces are a key feature — they let you point Kai at any directory on your machine.

How it works

  1. User taps a workspace button or types /workspace <name>
  2. bot.py resolves the name relative to WORKSPACE_BASE
  3. bot.py looks up per-workspace configuration from workspaces.yaml (if present)
  4. _do_switch_workspace is called:
    • claude.py:change_workspace resets model/budget/timeout to global defaults, then applies the workspace's overrides (if any)
    • The working directory is updated and the Claude process restarts
    • The session is cleared from SQLite
    • The workspace setting is updated (or deleted if switching to home)
    • The workspace is added to history
  5. The next message starts a fresh session with context injection (including any workspace system prompt)

Identity preservation

When Kai is in a foreign workspace (not home), the identity injection ensures it still knows who it is. The home workspace's CLAUDE.md is prepended to the first message. This means:

  • A project can have its own .claude/CLAUDE.md with project-specific instructions
  • Kai's identity (CLAUDE.md from home) is layered on top
  • Both are visible to Claude in the same session

Path resolution

_resolve_workspace_path resolves workspace names in two stages: first against WORKSPACE_BASE (with a traversal guard to prevent ../../ escapes), then against ALLOWED_WORKSPACES entries matched by directory name. Absolute paths from Telegram are always rejected. If neither WORKSPACE_BASE nor ALLOWED_WORKSPACES is configured, only /workspace home works.

Memory system

Kai has three layers of memory:

Layer 1: Auto-memory (semantic)

Location: ~/.claude/projects/<workspace-key>/memory/MEMORY.md

Managed entirely by Claude Code. This is where Claude Code stores what it learns about a project — file structure, architecture, patterns, completed work. One file per workspace. Kai's code doesn't read or write this; Claude Code handles it internally and loads it into its system prompt.

Layer 2: Home memory (personal)

Location: DATA_DIR/memory/MEMORY.md

Kai's personal memory - user preferences, facts, ongoing context. The absolute file path is injected into the session context so Claude can read and update it directly. Injected by claude.py into the first message of every session, regardless of which workspace is active. Kai proactively updates this file when it notices information worth persisting, without needing to be asked.

When working in a foreign workspace, that workspace's .claude/MEMORY.md is also injected if it exists, providing project-specific context alongside personal memory.

Layer 3: Conversation history (episodic)

Location: DATA_DIR/history/<chat_id>/

All messages logged as JSONL files, one per day per user (e.g., 2026-02-11.jsonl). Each user's history lives in a subdirectory named by their Telegram chat ID, providing natural multi-user isolation. At session start, the last 20 messages are injected for ambient recall, scanning back through as many days as needed. The full history remains searchable on demand - Kai can grep these files when asked about past conversations.

Legacy flat-file history (from before per-user directories) is still readable for backward compatibility, but new entries always go into the per-user subdirectory.

Written by history.py:log_message, read by history.py:get_recent_history.

Memory flow

Session start in a foreign workspace:

1. Claude Code loads auto-memory for workspace    (system prompt)
2. Kai injects home CLAUDE.md                     (first message)
3. Kai injects home MEMORY.md                     (first message)
4. Kai injects workspace MEMORY.md                (first message)
5. Kai injects recent conversation history        (first message)
6. Kai injects scheduling API info                (first message)
7. Kai injects file exchange API info             (first message)
8. Kai injects service proxy info                 (first message)
9. User's actual message follows                  (first message)

Webhook server

The webhook server (webhook.py) runs alongside the Telegram bot as an aiohttp application.

Endpoints

Method Path Auth Description
GET /health None Returns {"status": "ok"}
POST /webhook/telegram Secret token Telegram update delivery (webhook mode only)
POST /webhook/github HMAC-SHA256 GitHub event notifications
POST /webhook Shared secret Generic webhook forwarding
POST /api/schedule Shared secret Create scheduled jobs
GET /api/jobs Shared secret List active jobs
GET /api/jobs/{id} Shared secret Get a single job
PATCH /api/jobs/{id} Shared secret Update a job's mutable fields
DELETE /api/jobs/{id} Shared secret Delete a scheduled job
POST /api/services/{name} Shared secret Proxy request to an external service
POST /api/send-file Shared secret Send a workspace file to the Telegram chat
POST /api/send-message Shared secret Send a text message to the Telegram chat

GitHub event processing

GitHub events arrive at /webhook/github and are dispatched based on event type. Signature validation uses HMAC-SHA256 with the WEBHOOK_SECRET, matching GitHub's X-Hub-Signature-256 header. Invalid signatures are rejected with a 401.

Most events are handled by formatter functions (_fmt_push, _fmt_pull_request, etc.) that extract relevant information and send a Telegram notification. Two event types trigger autonomous agent pipelines instead:

  • pull_request (opened/synchronize) - routed to the PR review agent. A background task calls review.review_pull_request(), which runs a one-shot Claude subprocess to analyze the diff and post a review comment.
  • issues (opened) - routed to the issue triage agent. A background task calls triage.triage_issue(), which runs a one-shot Claude subprocess to label, categorize, and assess the issue.

Both agents use in-memory cooldown rate limiting (300s for PR review, 60s for triage) to avoid duplicate work when GitHub sends rapid successive events for the same PR or issue. Cooldown dicts are pruned of expired entries on write to prevent unbounded memory growth. Unsupported event types are silently ignored.

GitHub notifications (push, PR, issue, comment, review events) can optionally be routed to a separate Telegram group via GITHUB_NOTIFY_CHAT_ID, keeping the primary DM clean. Agent output (review comments, triage summaries) also routes to the group when configured. See GitHub Notification Routing.

Scheduling integration

When a job is created via POST /api/schedule, it's both stored in SQLite and immediately registered with APScheduler. This means jobs start firing right away — no restart needed.

APScheduler uses python-telegram-bot's built-in job queue, which means scheduled jobs run in the same event loop as the Telegram bot and can send messages directly.

Service proxy

The service proxy (services.py) provides a declarative HTTP layer for calling external APIs. Services are defined in services.yaml -- each entry specifies a base URL, authentication method, default headers, and optional notes. Claude calls services through the proxy endpoint (POST /api/services/{name}), which injects the appropriate API key and forwards the request. This keeps API keys out of conversation context and Claude's tool calls.

File exchange

Bidirectional file transfer between Telegram and the workspace filesystem.

Inbound (Telegram to filesystem): When a user sends a photo, document, or any file type, bot.py downloads it and saves it to DATA_DIR/files/<chat_id>/ with a timestamped filename (e.g., 20260223_234500_123456_report.pdf). Per-user subdirectories keep each user's uploads isolated. The absolute path is appended to the message so Claude can access the file via shell tools - pdftotext, unzip, file, or anything else available on the machine. Images and text files are also sent to Claude in their native format (base64 or code block) for direct processing.

Outbound (filesystem to Telegram): Claude sends files back via POST /api/send-file with a JSON body containing path (required) and caption (optional). Images (PNG, JPEG, GIF, WebP) are sent as Telegram photos (rendered inline); everything else is sent as document attachments. Path confinement via Path.relative_to() prevents traversal outside the workspace directory. Paths pointing to history or memory directories are explicitly rejected, preventing the inner Claude process from exfiltrating sensitive data via the file API.

Automatic cleanup: If FILE_RETENTION_DAYS is set (default: 0/disabled), a background task runs every 24 hours and deletes uploaded files older than the configured retention period. Age is determined from the timestamp prefix in the filename (set at upload time), not filesystem mtime, so the behavior is deterministic regardless of file copies or backup restores. Files without a recognizable timestamp prefix are never touched.

Voice pipeline

Kai supports both voice input and voice output, both running locally with no cloud services.

Speech-to-text (input)

When a user sends a Telegram voice note, bot.py downloads the audio and passes it to transcribe.py:

  1. ffmpeg converts the Ogg Opus audio to 16kHz mono WAV
  2. whisper-cli transcribes the audio using a local Whisper model
  3. The transcription is echoed back to the chat ("Heard: ...") for verification
  4. The text is forwarded to Claude as a regular message

Both subprocesses have a 30-second timeout. Enabled via VOICE_ENABLED=true.

Text-to-speech (output)

When voice mode is active, bot.py passes the final response text to tts.py:

  1. Piper TTS synthesizes the text to WAV via subprocess (python -m piper)
  2. ffmpeg converts WAV to Ogg Opus (Telegram's required voice note format)
  3. The voice note is sent to the chat

Three voice modes are available: off (default), only (voice note only, text fallback on failure), and on (text streams normally, voice note follows). Voice mode and voice selection are persisted per-chat via the settings table. Eight curated English voices are included (4 British, 4 American).

Synthesis timeout is 120 seconds (long responses take proportionally longer). Enabled via TTS_ENABLED=true.

Database schema

SQLite (kai.db) with four tables. All tables are namespaced by chat_id for multi-user isolation.

Table Purpose
sessions Active Claude sessions: chat ID (primary key), session ID, model, cost, timestamps
jobs Scheduled jobs: prompt, schedule, type, active flag, chat_id (owner)
settings Key-value config namespaced as key:chat_id (e.g., workspace:123456)
workspace_history Recently used workspace paths with composite key (chat_id, path)

The database is managed by sessions.py with simple async functions backed by aiosqlite. WAL mode is enabled for safe concurrent reads during multi-user operation.

Startup sequence

main.py:main loads config then runs _init_and_run:

  1. Load configuration - in a protected installation, config.py reads /etc/kai/env via sudo -n cat (using the NOPASSWD sudoers rules); in development mode, it reads .env from the project root via python-dotenv. Per-workspace configuration is loaded from workspaces.yaml and per-user configuration from users.yaml (both check /etc/kai/ first for protected installations)
  2. Initialize SQLite database
  3. Create the Telegram bot application with all handlers
  4. Initialize the subprocess pool (pool.py) - no subprocesses are spawned yet; they are created lazily on each user's first message. The idle eviction background task starts immediately.
  5. Initialize app and start receiving Telegram updates (webhook mode if TELEGRAM_WEBHOOK_URL is set, otherwise long polling)
  6. Register Telegram slash command menu
  7. Reload scheduled jobs from the database into APScheduler
  8. Start the webhook HTTP server (includes the health monitor background task, which tracks consecutive failures and notifies admins after 3 consecutive health check failures)
  9. Start the file cleanup background task if FILE_RETENTION_DAYS > 0 (runs every 24 hours after a 2-minute startup delay)
  10. Send crash recovery notification if a previous response was interrupted
  11. On shutdown: send a save prompt to each active subprocess so Claude can persist context to MEMORY.md, stop webhook server, stop Telegram transport (polling or webhook), shut down subprocess pool, close database

Deployment modes

Kai can run in two modes:

Development mode (make run) - everything lives under the project directory. Config comes from .env, the database is kai.db in the project root, and the process runs as your user. Simple to set up, good for hacking.

Protected installation (python -m kai install) - source, data, and secrets are separated across three locations:

Directory Owner Purpose
/opt/kai/ root Read-only source code and venv
/var/lib/kai/ service user Writable database, logs, files
/etc/kai/ root (mode 0600) Secrets (env vars, API keys, TOTP)

The KAI_DATA_DIR and KAI_INSTALL_DIR environment variables redirect path resolution at runtime. Code in config.py and sessions.py uses these to find the database and log directory regardless of which mode is active.

The installer (install.py) handles both macOS (LaunchDaemon) and Linux (systemd). See Protected Installation for the full walkthrough.

Concurrency model

Everything runs in a single asyncio event loop:

  • Telegram updates (webhook or long polling, via python-telegram-bot)
  • Claude subprocess I/O (asyncio subprocess pipes)
  • Webhook HTTP server (aiohttp)
  • Scheduled jobs (APScheduler via python-telegram-bot's job queue)
  • Typing indicators (asyncio tasks)
  • Agent subprocesses (fire-and-forget background tasks)

Per-chat locks ensure only one Claude request runs at a time per chat. The stop event (/stop command) allows interrupting a running response by setting a flag that the streaming loop checks.

Agent subprocesses (PR review, issue triage) run as background asyncio.Tasks, so they don't block the chat flow. Multiple agents can run concurrently with each other and with the persistent chat subprocess.

User separation

Process-level isolation can be configured at two levels: a global CLAUDE_USER for single-user deployments, or per-user os_user in users.yaml for multi-user setups. Per-user os_user takes precedence when set.

Trust boundaries

root
  owns /opt/kai/src/ (source), /etc/kai/ (secrets)

service user (e.g., kai)
  runs the bot process
  can read /etc/kai/env via sudo cat (NOPASSWD rule)
  owns /var/lib/kai/ (database, logs, files)

claude user(s) (e.g., alice, bob)    [optional]
  each runs their own Claude Code subprocess
  cannot read /etc/kai/ (no sudoers rules)
  cannot write to /var/lib/kai/ (wrong owner)
  can access the workspace directory and their own home
  cannot access other claude users' files (Unix permissions)

Implementation

When os_user is set in users.yaml (or CLAUDE_USER as a global fallback), pool.py passes the OS user to claude.py, which starts the subprocess with sudo -u <os_user> claude ... and start_new_session=True. The new session prevents signals sent to the bot's process group from reaching Claude, and sudo -u drops privileges to the target user before exec.

The sudoers rules (generated by install.py) grant the service user permission to run the claude binary as each configured OS user without a password. No other elevated operations are permitted.

This is optional and most useful for defense in depth. In a single-user local deployment, the default (bot and Claude sharing the same user) is fine. For the full multi-user setup, see Multi-User Setup. For protected installation details, see Protected Installation.

Security model

Summary of security patterns used across the codebase:

  • Webhook secret isolation - the inner Claude process receives the webhook secret via KAI_WEBHOOK_SECRET env var (set by _ensure_started in claude.py), not from the bot's .env or /etc/kai/env. In a protected install with CLAUDE_USER, the inner process can't read the secrets file at all.
  • Path confinement - the send-file endpoint (/api/send-file) uses Path.relative_to() to prevent directory traversal. Files must resolve within the current workspace directory.
  • Temporary file handling - transcription and TTS use tempfile.TemporaryDirectory (auto-cleaned on exit). The sudoers temp file uses mkstemp (random name, restrictive permissions) instead of predictable /tmp/ paths.
  • Subprocess isolation - when CLAUDE_USER is set, the Claude process runs in a new session (start_new_session=True) so signals from the bot's process group don't reach it.
  • Input validation - all API endpoints validate required fields and types before processing. Schedule data is validated against known types before database persistence. Job IDs are validated as integers. Service names must match configured services. The CLAUDE_MODEL env var is validated against the set of known models at startup.
  • SSRF protection - the service proxy requires explicit allow_path_suffix: true in the service definition before accepting a path_suffix parameter. Without this flag, services cannot be used as open HTTP proxies. Path suffixes are validated to reject query strings, fragments, and path traversal.
  • Response size limits - service proxy responses are capped at 10MB to prevent out-of-memory conditions from oversized external API responses.
  • Error detail redaction - agent error notifications sent to Telegram include only the exception type, not the full message, preventing leakage of filesystem paths or internal URLs.
  • Prompt injection defense - agent prompts wrap untrusted webhook content (PR descriptions, issue bodies, commit messages) in XML delimiters with explicit instructions to treat the content as data, not directives. This prevents crafted issue titles or PR bodies from hijacking agent behavior.

Clone this wiki locally