-
Notifications
You must be signed in to change notification settings - Fork 8
System Architecture
This page explains how Kai's components fit together, how a message flows through the system, and how key features are implemented.
Kai is a bridge between Telegram and Claude Code. At its core, it's a Telegram bot that maintains a pool of persistent Claude Code CLI subprocesses - one per user - and pipes messages between Telegram and the appropriate subprocess.
Users (Telegram app) GitHub / External services
| |
v v
Telegram Bot API webhook.py (HTTP server)
| |
v v
bot.py (handlers, commands, routing) -------+
| |
v v
pool.py (per-user subprocess pool) review.py / triage.py (one-shot agents)
| |
v v
claude.py (one instance per user) Claude Code CLI (--print mode)
|
v
Claude Code CLI (interactive)
Alongside the Telegram message flow, a webhook HTTP server runs for receiving GitHub events, generic webhooks, and providing the internal scheduling API. GitHub events can trigger autonomous agent pipelines (PR review, issue triage) that run independently of the chat subprocess.
Kai supports two transport modes for receiving Telegram updates: long polling (default) and webhooks. In polling mode, python-telegram-bot pulls updates from the Telegram API. In webhook mode, Telegram pushes updates to the webhook server at /webhook/telegram. The mode is selected automatically based on whether TELEGRAM_WEBHOOK_URL is set in .env.
| File | Responsibility |
|---|---|
main.py |
Async startup/shutdown, orchestrates all components |
bot.py |
Telegram handlers, slash commands, inline keyboards, message routing |
claude.py |
Manages the persistent Claude Code subprocess, handles streaming |
config.py |
Loads environment variables, defines Config, WorkspaceConfig, and UserConfig dataclasses, parses workspaces.yaml and users.yaml
|
sessions.py |
SQLite database: sessions, jobs, settings, workspace history |
pool.py |
Per-user subprocess pool: lazy creation, idle eviction, workspace restoration, prompt routing |
cron.py |
APScheduler integration, registers and executes scheduled jobs |
webhook.py |
aiohttp HTTP server: GitHub webhooks, generic webhooks, scheduling API, file exchange |
history.py |
Conversation history: JSONL logging by date + recent history retrieval |
locks.py |
Per-chat async locks (prevents concurrent responses) and stop events |
totp.py |
TOTP verification, rate limiting, and CLI setup/reset |
transcribe.py |
Voice message transcription (ffmpeg + whisper-cpp) |
tts.py |
Text-to-speech synthesis (Piper TTS + ffmpeg) |
review.py |
PR review agent: diff analysis, prior-comment awareness, one-shot Claude subprocess |
triage.py |
Issue triage agent: labeling, duplicate detection, project assignment via one-shot subprocess |
services.py |
External service proxy: declarative HTTP layer for third-party API calls |
install.py |
Protected installation: interactive config, apply, status, service lifecycle |
Here's what happens when you send a message in Telegram:
bot.py:handle_message receives the update from python-telegram-bot. The @_require_auth decorator checks if the sender's Telegram user ID is authorized - either via users.yaml (if present) or ALLOWED_USER_IDS. Unauthorized messages are silently dropped.
If TOTP authentication is configured, a second gate checks whether the user's TOTP session is still active. If it has expired, Kai prompts for a 6-digit authenticator code before Claude is started. The auth timestamp lives in memory and persists across workspace switches and session resets, but not across bot restarts.
Each chat has an async lock (locks.py:get_lock). This prevents concurrent Claude requests from the same chat — if you send a second message while Claude is still responding, it queues behind the first.
On the first message of a new session, claude.py:send prepends context to the prompt:
-
Identity — When in a foreign workspace, Kai's
CLAUDE.mdfrom the home workspace is injected so it remembers who it is. -
Home memory — The home workspace's
MEMORY.mdis always injected. -
Workspace memory — If the current workspace has its own
MEMORY.md, it's also injected. - Recent history — The last 20 messages from conversation logs, scanning back as far as needed.
-
API info -- The scheduling API endpoint and secret, so Claude can create jobs via
curl. Injected in every workspace because the API runs on localhost, not inside the workspace - scheduling, messaging, and file operations need to work regardless of which repo Claude is pointed at. -
File API info -- The send-file endpoint and incoming file save location (
DATA_DIR/files/<chat_id>/), so Claude knows how to send files back and where to find received files. -
Service info -- Available external services (names, descriptions, usage notes) and the service proxy endpoint, so Claude can call third-party APIs via
curl.
The prompt is written to the Claude Code subprocess's stdin as a JSON message. Claude Code runs in --output-format stream-json mode, so it emits JSON events as it processes.
claude.py:send is an async generator that yields events:
- Text events — partial response text as Claude generates it
- Done event — final response with session ID, cost, and full text
bot.py:_handle_response consumes the stream:
- Shows a typing indicator (refreshed every 4 seconds)
- On the first text event, sends an initial reply message
- Every 2 seconds (
EDIT_INTERVAL), edits the message with the latest text - On the done event, performs a final edit with the complete response
- If the response exceeds 4096 characters (Telegram's limit), it's split into chunks
After a successful response, the session ID and cost are saved to SQLite (sessions.py:save_session). This allows cost tracking via /stats and session continuity.
The message lifecycle above describes the persistent subprocess - a long-running Claude Code process that maintains session state across messages. Kai also has a second execution model for autonomous agents: the one-shot subprocess.
Agent modules (review.py, triage.py) spawn a separate Claude Code process using claude --print mode. This is fundamentally different from the chat subprocess:
| Persistent (claude.py) | One-shot (agents) | |
|---|---|---|
| Lifetime | Runs for the entire session | Spawned per event, exits when done |
| Mode | --output-format stream-json |
--print (stdin to stdout, no streaming) |
| Tools | Full tool access (shell, files, browser) | No tools (--no-tool-use) |
| Session | Maintains conversation context | Stateless, no session ID |
| Triggered by | User messages via Telegram | GitHub webhook events |
| Budget | Per-session cost tracking | Per-invocation max budget |
Agent prompts follow a consistent pattern:
- System prompt with the agent's role and output format
- Structured data (diff, issue body, metadata) wrapped in XML tags
- Untrusted content (PR titles, issue text, commit messages) double-wrapped in XML delimiters as a prompt injection defense
- Output instructions requesting a specific format (Markdown for reviews, JSON for triage)
The XML wrapping is important: since webhook payloads contain user-authored content (PR descriptions, issue bodies), the agent prompts use <untrusted-content> delimiters with explicit instructions not to follow any directives found inside them.
- GitHub webhook arrives at
webhook.py - Cooldown check prevents duplicate processing
- Agent module gathers context (
ghCLI for diffs, related issues, prior comments) - Prompt is built and passed to
claude --printvia stdin - Claude's response is parsed and acted on (GitHub comment, label application, Telegram summary)
- The subprocess exits; no state is retained
This fire-and-forget model means agents don't interfere with the chat subprocess. A PR review can run while the user is mid-conversation - they use completely separate Claude processes.
Workspaces are a key feature — they let you point Kai at any directory on your machine.
- User taps a workspace button or types
/workspace <name> -
bot.pyresolves the name relative toWORKSPACE_BASE -
bot.pylooks up per-workspace configuration fromworkspaces.yaml(if present) -
_do_switch_workspaceis called:-
claude.py:change_workspaceresets model/budget/timeout to global defaults, then applies the workspace's overrides (if any) - The working directory is updated and the Claude process restarts
- The session is cleared from SQLite
- The
workspacesetting is updated (or deleted if switching to home) - The workspace is added to history
-
- The next message starts a fresh session with context injection (including any workspace system prompt)
When Kai is in a foreign workspace (not home), the identity injection ensures it still knows who it is. The home workspace's CLAUDE.md is prepended to the first message. This means:
- A project can have its own
.claude/CLAUDE.mdwith project-specific instructions - Kai's identity (
CLAUDE.mdfrom home) is layered on top - Both are visible to Claude in the same session
_resolve_workspace_path resolves workspace names in two stages: first against WORKSPACE_BASE (with a traversal guard to prevent ../../ escapes), then against ALLOWED_WORKSPACES entries matched by directory name. Absolute paths from Telegram are always rejected. If neither WORKSPACE_BASE nor ALLOWED_WORKSPACES is configured, only /workspace home works.
Kai has three layers of memory:
Location: ~/.claude/projects/<workspace-key>/memory/MEMORY.md
Managed entirely by Claude Code. This is where Claude Code stores what it learns about a project — file structure, architecture, patterns, completed work. One file per workspace. Kai's code doesn't read or write this; Claude Code handles it internally and loads it into its system prompt.
Location: DATA_DIR/memory/MEMORY.md
Kai's personal memory - user preferences, facts, ongoing context. The absolute file path is injected into the session context so Claude can read and update it directly. Injected by claude.py into the first message of every session, regardless of which workspace is active. Kai proactively updates this file when it notices information worth persisting, without needing to be asked.
When working in a foreign workspace, that workspace's .claude/MEMORY.md is also injected if it exists, providing project-specific context alongside personal memory.
Location: DATA_DIR/history/<chat_id>/
All messages logged as JSONL files, one per day per user (e.g., 2026-02-11.jsonl). Each user's history lives in a subdirectory named by their Telegram chat ID, providing natural multi-user isolation. At session start, the last 20 messages are injected for ambient recall, scanning back through as many days as needed. The full history remains searchable on demand - Kai can grep these files when asked about past conversations.
Legacy flat-file history (from before per-user directories) is still readable for backward compatibility, but new entries always go into the per-user subdirectory.
Written by history.py:log_message, read by history.py:get_recent_history.
Session start in a foreign workspace:
1. Claude Code loads auto-memory for workspace (system prompt)
2. Kai injects home CLAUDE.md (first message)
3. Kai injects home MEMORY.md (first message)
4. Kai injects workspace MEMORY.md (first message)
5. Kai injects recent conversation history (first message)
6. Kai injects scheduling API info (first message)
7. Kai injects file exchange API info (first message)
8. Kai injects service proxy info (first message)
9. User's actual message follows (first message)
The webhook server (webhook.py) runs alongside the Telegram bot as an aiohttp application.
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /health |
None | Returns {"status": "ok"}
|
| POST | /webhook/telegram |
Secret token | Telegram update delivery (webhook mode only) |
| POST | /webhook/github |
HMAC-SHA256 | GitHub event notifications |
| POST | /webhook |
Shared secret | Generic webhook forwarding |
| POST | /api/schedule |
Shared secret | Create scheduled jobs |
| GET | /api/jobs |
Shared secret | List active jobs |
| GET | /api/jobs/{id} |
Shared secret | Get a single job |
| PATCH | /api/jobs/{id} |
Shared secret | Update a job's mutable fields |
| DELETE | /api/jobs/{id} |
Shared secret | Delete a scheduled job |
| POST | /api/services/{name} |
Shared secret | Proxy request to an external service |
| POST | /api/send-file |
Shared secret | Send a workspace file to the Telegram chat |
| POST | /api/send-message |
Shared secret | Send a text message to the Telegram chat |
GitHub events arrive at /webhook/github and are dispatched based on event type. Signature validation uses HMAC-SHA256 with the WEBHOOK_SECRET, matching GitHub's X-Hub-Signature-256 header. Invalid signatures are rejected with a 401.
Most events are handled by formatter functions (_fmt_push, _fmt_pull_request, etc.) that extract relevant information and send a Telegram notification. Two event types trigger autonomous agent pipelines instead:
-
pull_request(opened/synchronize) - routed to the PR review agent. A background task callsreview.review_pull_request(), which runs a one-shot Claude subprocess to analyze the diff and post a review comment. -
issues(opened) - routed to the issue triage agent. A background task callstriage.triage_issue(), which runs a one-shot Claude subprocess to label, categorize, and assess the issue.
Both agents use in-memory cooldown rate limiting (300s for PR review, 60s for triage) to avoid duplicate work when GitHub sends rapid successive events for the same PR or issue. Cooldown dicts are pruned of expired entries on write to prevent unbounded memory growth. Unsupported event types are silently ignored.
GitHub notifications (push, PR, issue, comment, review events) can optionally be routed to a separate Telegram group via GITHUB_NOTIFY_CHAT_ID, keeping the primary DM clean. Agent output (review comments, triage summaries) also routes to the group when configured. See GitHub Notification Routing.
When a job is created via POST /api/schedule, it's both stored in SQLite and immediately registered with APScheduler. This means jobs start firing right away — no restart needed.
APScheduler uses python-telegram-bot's built-in job queue, which means scheduled jobs run in the same event loop as the Telegram bot and can send messages directly.
The service proxy (services.py) provides a declarative HTTP layer for calling external APIs. Services are defined in services.yaml -- each entry specifies a base URL, authentication method, default headers, and optional notes. Claude calls services through the proxy endpoint (POST /api/services/{name}), which injects the appropriate API key and forwards the request. This keeps API keys out of conversation context and Claude's tool calls.
Bidirectional file transfer between Telegram and the workspace filesystem.
Inbound (Telegram to filesystem): When a user sends a photo, document, or any file type, bot.py downloads it and saves it to DATA_DIR/files/<chat_id>/ with a timestamped filename (e.g., 20260223_234500_123456_report.pdf). Per-user subdirectories keep each user's uploads isolated. The absolute path is appended to the message so Claude can access the file via shell tools - pdftotext, unzip, file, or anything else available on the machine. Images and text files are also sent to Claude in their native format (base64 or code block) for direct processing.
Outbound (filesystem to Telegram): Claude sends files back via POST /api/send-file with a JSON body containing path (required) and caption (optional). Images (PNG, JPEG, GIF, WebP) are sent as Telegram photos (rendered inline); everything else is sent as document attachments. Path confinement via Path.relative_to() prevents traversal outside the workspace directory. Paths pointing to history or memory directories are explicitly rejected, preventing the inner Claude process from exfiltrating sensitive data via the file API.
Automatic cleanup: If FILE_RETENTION_DAYS is set (default: 0/disabled), a background task runs every 24 hours and deletes uploaded files older than the configured retention period. Age is determined from the timestamp prefix in the filename (set at upload time), not filesystem mtime, so the behavior is deterministic regardless of file copies or backup restores. Files without a recognizable timestamp prefix are never touched.
Kai supports both voice input and voice output, both running locally with no cloud services.
When a user sends a Telegram voice note, bot.py downloads the audio and passes it to transcribe.py:
- ffmpeg converts the Ogg Opus audio to 16kHz mono WAV
- whisper-cli transcribes the audio using a local Whisper model
- The transcription is echoed back to the chat ("Heard: ...") for verification
- The text is forwarded to Claude as a regular message
Both subprocesses have a 30-second timeout. Enabled via VOICE_ENABLED=true.
When voice mode is active, bot.py passes the final response text to tts.py:
-
Piper TTS synthesizes the text to WAV via subprocess (
python -m piper) - ffmpeg converts WAV to Ogg Opus (Telegram's required voice note format)
- The voice note is sent to the chat
Three voice modes are available: off (default), only (voice note only, text fallback on failure), and on (text streams normally, voice note follows). Voice mode and voice selection are persisted per-chat via the settings table. Eight curated English voices are included (4 British, 4 American).
Synthesis timeout is 120 seconds (long responses take proportionally longer). Enabled via TTS_ENABLED=true.
SQLite (kai.db) with four tables. All tables are namespaced by chat_id for multi-user isolation.
| Table | Purpose |
|---|---|
sessions |
Active Claude sessions: chat ID (primary key), session ID, model, cost, timestamps |
jobs |
Scheduled jobs: prompt, schedule, type, active flag, chat_id (owner) |
settings |
Key-value config namespaced as key:chat_id (e.g., workspace:123456) |
workspace_history |
Recently used workspace paths with composite key (chat_id, path)
|
The database is managed by sessions.py with simple async functions backed by aiosqlite. WAL mode is enabled for safe concurrent reads during multi-user operation.
main.py:main loads config then runs _init_and_run:
- Load configuration - in a protected installation,
config.pyreads/etc/kai/envviasudo -n cat(using the NOPASSWD sudoers rules); in development mode, it reads.envfrom the project root viapython-dotenv. Per-workspace configuration is loaded fromworkspaces.yamland per-user configuration fromusers.yaml(both check/etc/kai/first for protected installations) - Initialize SQLite database
- Create the Telegram bot application with all handlers
- Initialize the subprocess pool (
pool.py) - no subprocesses are spawned yet; they are created lazily on each user's first message. The idle eviction background task starts immediately. - Initialize app and start receiving Telegram updates (webhook mode if
TELEGRAM_WEBHOOK_URLis set, otherwise long polling) - Register Telegram slash command menu
- Reload scheduled jobs from the database into APScheduler
- Start the webhook HTTP server (includes the health monitor background task, which tracks consecutive failures and notifies admins after 3 consecutive health check failures)
- Start the file cleanup background task if
FILE_RETENTION_DAYS > 0(runs every 24 hours after a 2-minute startup delay) - Send crash recovery notification if a previous response was interrupted
- On shutdown: send a save prompt to each active subprocess so Claude can persist context to MEMORY.md, stop webhook server, stop Telegram transport (polling or webhook), shut down subprocess pool, close database
Kai can run in two modes:
Development mode (make run) - everything lives under the project directory. Config comes from .env, the database is kai.db in the project root, and the process runs as your user. Simple to set up, good for hacking.
Protected installation (python -m kai install) - source, data, and secrets are separated across three locations:
| Directory | Owner | Purpose |
|---|---|---|
/opt/kai/ |
root | Read-only source code and venv |
/var/lib/kai/ |
service user | Writable database, logs, files |
/etc/kai/ |
root (mode 0600) | Secrets (env vars, API keys, TOTP) |
The KAI_DATA_DIR and KAI_INSTALL_DIR environment variables redirect path resolution at runtime. Code in config.py and sessions.py uses these to find the database and log directory regardless of which mode is active.
The installer (install.py) handles both macOS (LaunchDaemon) and Linux (systemd). See Protected Installation for the full walkthrough.
Everything runs in a single asyncio event loop:
- Telegram updates (webhook or long polling, via python-telegram-bot)
- Claude subprocess I/O (asyncio subprocess pipes)
- Webhook HTTP server (aiohttp)
- Scheduled jobs (APScheduler via python-telegram-bot's job queue)
- Typing indicators (asyncio tasks)
- Agent subprocesses (fire-and-forget background tasks)
Per-chat locks ensure only one Claude request runs at a time per chat. The stop event (/stop command) allows interrupting a running response by setting a flag that the streaming loop checks.
Agent subprocesses (PR review, issue triage) run as background asyncio.Tasks, so they don't block the chat flow. Multiple agents can run concurrently with each other and with the persistent chat subprocess.
Process-level isolation can be configured at two levels: a global CLAUDE_USER for single-user deployments, or per-user os_user in users.yaml for multi-user setups. Per-user os_user takes precedence when set.
root
owns /opt/kai/src/ (source), /etc/kai/ (secrets)
service user (e.g., kai)
runs the bot process
can read /etc/kai/env via sudo cat (NOPASSWD rule)
owns /var/lib/kai/ (database, logs, files)
claude user(s) (e.g., alice, bob) [optional]
each runs their own Claude Code subprocess
cannot read /etc/kai/ (no sudoers rules)
cannot write to /var/lib/kai/ (wrong owner)
can access the workspace directory and their own home
cannot access other claude users' files (Unix permissions)
When os_user is set in users.yaml (or CLAUDE_USER as a global fallback), pool.py passes the OS user to claude.py, which starts the subprocess with sudo -u <os_user> claude ... and start_new_session=True. The new session prevents signals sent to the bot's process group from reaching Claude, and sudo -u drops privileges to the target user before exec.
The sudoers rules (generated by install.py) grant the service user permission to run the claude binary as each configured OS user without a password. No other elevated operations are permitted.
This is optional and most useful for defense in depth. In a single-user local deployment, the default (bot and Claude sharing the same user) is fine. For the full multi-user setup, see Multi-User Setup. For protected installation details, see Protected Installation.
Summary of security patterns used across the codebase:
-
Webhook secret isolation - the inner Claude process receives the webhook secret via
KAI_WEBHOOK_SECRETenv var (set by_ensure_startedinclaude.py), not from the bot's.envor/etc/kai/env. In a protected install with CLAUDE_USER, the inner process can't read the secrets file at all. -
Path confinement - the send-file endpoint (
/api/send-file) usesPath.relative_to()to prevent directory traversal. Files must resolve within the current workspace directory. -
Temporary file handling - transcription and TTS use
tempfile.TemporaryDirectory(auto-cleaned on exit). The sudoers temp file usesmkstemp(random name, restrictive permissions) instead of predictable/tmp/paths. -
Subprocess isolation - when
CLAUDE_USERis set, the Claude process runs in a new session (start_new_session=True) so signals from the bot's process group don't reach it. -
Input validation - all API endpoints validate required fields and types before processing. Schedule data is validated against known types before database persistence. Job IDs are validated as integers. Service names must match configured services. The
CLAUDE_MODELenv var is validated against the set of known models at startup. -
SSRF protection - the service proxy requires explicit
allow_path_suffix: truein the service definition before accepting apath_suffixparameter. Without this flag, services cannot be used as open HTTP proxies. Path suffixes are validated to reject query strings, fragments, and path traversal. - Response size limits - service proxy responses are capped at 10MB to prevent out-of-memory conditions from oversized external API responses.
- Error detail redaction - agent error notifications sent to Telegram include only the exception type, not the full message, preventing leakage of filesystem paths or internal URLs.
- Prompt injection defense - agent prompts wrap untrusted webhook content (PR descriptions, issue bodies, commit messages) in XML delimiters with explicit instructions to treat the content as data, not directives. This prevents crafted issue titles or PR bodies from hijacking agent behavior.