From d2b70d35f4619c9c305fc95bf87c0fa0e649f83a Mon Sep 17 00:00:00 2001
From: xprilion <xprilion@gmail.com>
Date: Sun, 26 Apr 2026 15:01:13 +0530
Subject: [PATCH] Update docs and changelog

---
 README.md                      | 126 ++++---------
 backend/pyproject.toml         |   2 +-
 site/docs/.vitepress/config.ts |   3 +-
 site/docs/agent-harness.md     | 308 +++++++++++---------------------
 site/docs/api.md               |  29 ++-
 site/docs/architecture.md      | 315 +++++++++------------------------
 site/docs/changelog.md         |  39 ++++
 site/docs/configuration.md     | 102 +++--------
 site/docs/index.md             |  49 ++---
 site/docs/modes.md             |  99 +++++------
 site/docs/setup.md             | 116 ++++--------
 site/docs/tools.md             |  95 ++++++----
 12 files changed, 466 insertions(+), 817 deletions(-)
 create mode 100644 site/docs/changelog.md

diff --git a/README.md b/README.md
index 8d225a2..39f29bd 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,6 @@
 # OpenMLR
 
-Built for ML researchers who are tired of context-switching.
-
-Search papers, take notes, write drafts, run experiments — all in one conversation.
-Your context stays with you from the first question to the final export.
+A self-hosted ML research agent that plans, researches, writes papers, and executes code — all in one conversation.
 
 [![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/xprilion/OpenMLR)
 [![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR)
@@ -14,27 +11,19 @@ Your context stays with you from the first question to the final export.
 
 ---
 
-## What it does
-
-- **Plan** — Asks clarifying questions before diving in. Breaks down tasks, tracks progress.
-- **Research** — OpenAlex, ArXiv, Papers With Code, citation graphs. Reads full papers, not just abstracts.
-- **Write** — Section-by-section drafting with auto-citations. Export to Markdown or LaTeX.
-- **Execute** — Docker-isolated code execution. SSH remotes. Modal cloud. Runs experiments, not just snippets.
-
 ## Features
 
-- **Structured planning** — asks 2-4 clarifying options before starting, builds task lists, generates completion reports
-- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code; reads full papers section-by-section; crawls citation graphs
-- **Context tracking** — token usage gauge, auto-compaction approaching limits, preserves key decisions
-- **Multi-provider LLMs** — OpenAI, Anthropic, OpenRouter, OpenCode Go, plus local models (Ollama, LM Studio, vLLM)
-- **Background jobs** — tasks persist and continue even if you close the browser (requires Redis)
-- **Mode enforcement** — Plan, Research, Write modes restrict which tools are available
-- **MCP support** — connect any Model Context Protocol server as additional tools
+- **Plan + Execute modes** — Plan mode gathers context and creates plans; Execute mode does the work. Toggle with `Cmd+B` / `Cmd+E`.
+- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code. Reads full papers, crawls citation graphs.
+- **Paper writing** — Section-by-section drafting with auto-save to database. Preview and export (Markdown/LaTeX) in the Paper tab.
+- **Sub-agent streaming** — Research tool spawns independent sub-agents with their own context, with nested tool call visibility.
+- **Background jobs** — Celery + Redis processing. Close the browser, come back later.
+- **Per-conversation parallelism** — Multiple conversations process simultaneously with isolated state.
+- **Multi-provider LLMs** — OpenAI, Anthropic, OpenRouter, plus local models (Ollama, LM Studio, vLLM).
+- **Onboarding flow** — Guided setup when no LLM provider is configured.
 
 ## Quick Start
 
-### Docker Compose (recommended)
-
 ```bash
 git clone https://github.com/xprilion/OpenMLR.git
 cd OpenMLR
@@ -44,116 +33,63 @@ docker compose up -d
 
 Open `http://localhost:3000`. Create an account on first visit.
 
-### Render
-
-Click the button to deploy to Render (includes Postgres + Redis):
-
-[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/xprilion/OpenMLR)
-
-After deploy, add your LLM API key(s) in the Environment settings.
-
-### Heroku
-
-[![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR)
-
-### Coolify
-
-In Coolify, create a new Docker Compose service pointing to this repo. It will use `docker-compose.yml` automatically. Add your LLM API keys as environment variables in the Coolify UI.
-
-### Local Development
+## Local Development
 
 ```bash
 make install           # Install deps (backend + frontend)
 cp .env.example .env   # Add DATABASE_URL + at least one LLM key
 make db-fresh          # Create tables
-make dev               # Start dev servers
+make dev               # Start dev servers (backend :3000, frontend :5173)
 ```
 
-Open `http://localhost:5173`.
-
 ## Configuration
 
-At minimum, you need:
+At minimum, set in `.env`:
 
 ```bash
-# Database
 DATABASE_URL="postgresql://user:pass@localhost:5432/openmlr"
 
 # At least one LLM provider
 OPENAI_API_KEY=sk-...
-# or
-ANTHROPIC_API_KEY=sk-ant-...
-# or
-OPENROUTER_API_KEY=sk-or-...
-# or
-OPENCODE_GO_API_KEY=sk-...   # $5-10/mo for open models
+# or ANTHROPIC_API_KEY=sk-ant-...
+# or OPENROUTER_API_KEY=sk-or-...
+```
+
+For background jobs, add:
+
+```bash
+REDIS_URL=redis://localhost:6379/0
+USE_BACKGROUND_JOBS=true
+USE_REDIS_PUBSUB=true
 ```
 
-See `.env.example` for all options including:
-- Local models (Ollama, LM Studio, vLLM)
-- Background jobs (Redis + Celery)
-- Web search (Brave API)
-- GitHub integration
+See `.env.example` for all options.
 
-## Using Local Models
+## Testing
 
 ```bash
-# Ollama
-OLLAMA_MODEL=llama3.1
-# Use as: ollama/llama3.1
-
-# LM Studio
-LMSTUDIO_API_BASE=http://localhost:1234/v1
-# Use as: lmstudio/default
-
-# Any OpenAI-compatible API
-LOCAL_API_BASE=http://localhost:8000/v1
-LOCAL_MODEL=my-model
-# Use as: local/my-model
+make test              # Run all tests (149 backend + 29 frontend + docs build)
+make test-backend      # Backend tests only
+make test-frontend     # Frontend tests only
+make test-docs         # Docs build check
 ```
 
 ## Architecture
 
 ```
-frontend/   React 19 + Vite + react-router-dom
+frontend/   React 19 + TypeScript + Vite
 backend/    Python 3.12 + FastAPI + SQLAlchemy + Celery
 site/       VitePress documentation
 ```
 
-Key components:
-- **Agent Harness** — 300-iteration loop with doom detection, auto-compaction, mode enforcement
-- **Tool Router** — Mode-based tool filtering, MCP integration
-- **Session Manager** — Per-conversation state isolation
-- **LLM Provider** — Multi-provider routing with retry logic
-
-See [Architecture](https://openmlr.dev/architecture) and [Agent Harness](https://openmlr.dev/agent-harness) for details.
-
-## Makefile
-
-| Target | Description |
-|--------|-------------|
-| `make install` | Install all dependencies |
-| `make dev` | Run backend + frontend dev servers |
-| `make up` | Start Docker Compose (app on :3000) |
-| `make down` | Stop Docker Compose |
-| `make restart` | Rebuild + restart web/worker |
-| `make logs` | Tail all logs |
-| `make docs-docker` | Run docs site (:4000) |
-| `make docs-dev` | Run docs locally (:4000) |
-| `make db-fresh` | Drop + recreate tables |
-| `make check` | Type-check backend + frontend |
-| `make test` | Run pytest |
-
-Run `make help` for all targets.
+See [Architecture](https://openmlr.dev/architecture) for details.
 
 ## Contributing
 
-Contributions welcome! Please:
-
 1. Fork the repo
 2. Create a feature branch
 3. Make your changes
-4. Run `make check` and `make test`
+4. Run `make test`
 5. Submit a PR
 
 ## License
diff --git a/backend/pyproject.toml b/backend/pyproject.toml
index 8dd49a3..d7c2876 100644
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "openmlr"
-version = "2.0.0"
+version = "0.2.0"
 description = "OpenMLR — an ML research intern that reads papers, trains models, and ships code"
 requires-python = ">=3.12"
 license = { text = "MIT" }
diff --git a/site/docs/.vitepress/config.ts b/site/docs/.vitepress/config.ts
index 5ab888a..320d4ff 100644
--- a/site/docs/.vitepress/config.ts
+++ b/site/docs/.vitepress/config.ts
@@ -21,7 +21,7 @@ export default defineConfig({
       {
         text: "Usage",
         items: [
-          { text: "Modes (Plan / Research / Write)", link: "/modes" },
+          { text: "Modes (Plan / Execute)", link: "/modes" },
           { text: "Agent Tools", link: "/tools" },
         ],
       },
@@ -31,6 +31,7 @@ export default defineConfig({
           { text: "Architecture", link: "/architecture" },
           { text: "Agent Harness", link: "/agent-harness" },
           { text: "REST API", link: "/api" },
+          { text: "Changelog", link: "/changelog" },
         ],
       },
     ],
diff --git a/site/docs/agent-harness.md b/site/docs/agent-harness.md
index 3ffa10b..adadee6 100644
--- a/site/docs/agent-harness.md
+++ b/site/docs/agent-harness.md
@@ -1,182 +1,131 @@
 # Agent Harness
 
-The agent harness is the core execution engine that processes user messages, manages tool calls, and maintains conversation context across long research sessions.
+The agent harness is the core execution engine that processes user messages, manages tool calls, and maintains conversation context.
 
 ## Overview
 
-OpenMLR's agent harness is designed for extended, multi-turn research workflows. Unlike simple chatbot loops, it handles:
+The harness is designed for extended, multi-turn research workflows:
 
 - **Long-running sessions** — Up to 300 tool calls per user message
-- **Context management** — Automatic compaction when approaching model limits  
-- **Mode enforcement** — Restricts tools based on Plan/Research/Write mode
+- **Mode enforcement** — Restricts tools based on Plan/Execute mode
+- **Context management** — Automatic compaction when approaching model limits
 - **Doom loop detection** — Breaks out of repetitive tool call patterns
-- **Streaming output** — Real-time text and tool output via SSE
-
-## Architecture
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                     Session Manager                          │
-│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
-│  │  Session 1  │  │  Session 2  │  │  Session N  │          │
-│  │  (conv_id)  │  │  (conv_id)  │  │  (conv_id)  │          │
-│  └──────┬──────┘  └─────────────┘  └─────────────┘          │
-└─────────┼───────────────────────────────────────────────────┘
-          │
-          ▼
-┌─────────────────────────────────────────────────────────────┐
-│                      Agent Loop                              │
-│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐ │
-│  │ Context  │──▶│   LLM    │──▶│  Parse   │──▶│ Execute  │ │
-│  │ Manager  │   │  Stream  │   │  Tools   │   │  Tools   │ │
-│  └──────────┘   └──────────┘   └──────────┘   └────┬─────┘ │
-│       ▲                                            │        │
-│       │         ┌──────────────────────────────────┘        │
-│       │         ▼                                           │
-│  ┌────┴─────────────────┐   ┌────────────────────────────┐ │
-│  │   Doom Detection     │   │     Tool Router            │ │
-│  │   (break loops)      │   │   (mode filtering)         │ │
-│  └──────────────────────┘   └────────────────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
-```
-
-## Key Components
+- **DB-persisted writing** — Paper drafts survive across workers and restarts
+- **Redis interrupt relay** — Actually kills running tasks, not just a flag check
+- **Sub-agent streaming** — Research tool spawns nested agents with visible tool calls
+
+## Agent Loop
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Agent Loop                             │
+│                                                          │
+│  ┌──────────┐   ┌──────────┐   ┌──────────┐             │
+│  │ Context  │──▶│   LLM    │──▶│  Parse   │             │
+│  │ Manager  │   │  Stream  │   │ Response │             │
+│  └──────────┘   └──────────┘   └────┬─────┘             │
+│       ▲                             │                    │
+│       │                    ┌────────▼────────┐           │
+│       │                    │   Tool Router   │           │
+│       │                    │ (mode filtering)│           │
+│       │                    └────────┬────────┘           │
+│       │                             │                    │
+│       │                    ┌────────▼────────┐           │
+│       │                    │  Execute Tools  │           │
+│       │                    └────────┬────────┘           │
+│       │                             │                    │
+│  ┌────┴────────────┐      ┌────────▼────────┐           │
+│  │ Doom Detection  │◀─────│  Add Results    │           │
+│  │ (break loops)   │      │  to Context     │           │
+│  └─────────────────┘      └─────────────────┘           │
+└─────────────────────────────────────────────────────────┘
+```
+
+The loop runs for each user message:
+
+1. Check if context needs compaction
+2. Call LLM with streaming (system prompt + history + tools)
+3. Parse response for tool calls
+4. Filter tools through mode restrictions
+5. Execute allowed tools, return errors for blocked ones
+6. Add results to context
+7. Check for doom loops
+8. Repeat until LLM produces no tool calls or max iterations reached
+
+## Mode Enforcement
+
+Tools are restricted based on the current mode at three layers:
+
+1. **System prompt** — Instructs the agent about mode constraints
+2. **Tool filtering** — Only mode-allowed tools are sent to the LLM
+3. **Runtime blocking** — Blocked calls return an error instead of executing
+
+See [Modes](/modes) for the full breakdown.
+
+## Context Management
+
+**Token tracking** uses a character-based estimate (~4 chars per token).
+
+**Compaction** triggers at 90% of the model's context window:
+- Summarizes old messages while preserving recent ones
+- Keeps the last N messages untouched (default: 5)
+- Preserves completion reports, key decisions, and PLAN.md
+- Broadcasts `context_usage` events for the UI gauge
 
-### 1. Agent Loop (`agent/loop.py`)
+## Doom Loop Detection
 
-The main execution engine. Processes one user message at a time, iterating through LLM calls and tool executions.
+Detects when the agent gets stuck in repetitive patterns:
 
-```python
-# Simplified flow
-for iteration in range(max_iterations):  # Default: 300
-    if needs_compaction():
-        compact_context()
-    
-    response = await llm.generate_stream(messages, tools)
-    
-    if response.has_tool_calls:
-        for tool_call in response.tool_calls:
-            if doom_loop_detected():
-                inject_correction_prompt()
-                continue
-            
-            result = await tool_router.call(tool_call)
-            messages.append(tool_result)
-    else:
-        # No more tool calls, turn complete
-        break
+**Identical consecutive calls** — Same tool + same arguments 3+ times:
 ```
-
-**Key behaviors:**
-- Exits when LLM produces no tool calls (natural completion)
-- Exits on user interrupt (`/stop` command)
-- Auto-compacts context at 90% of model's token limit
-- Injects mode hints for Plan/Research/Write modes
-
-### 2. Context Manager (`agent/context.py`)
-
-Tracks message history and token usage. Handles compaction to stay within model limits.
-
-**Token tracking:**
-```python
-def estimate_tokens(text: str) -> int:
-    return max(1, len(text) // 4)  # ~4 chars per token
+bash(ls) → bash(ls) → bash(ls)  → DETECTED
 ```
 
-**Compaction:**
-- Triggered at 90% of model's max tokens (configurable)
-- Summarizes old messages while preserving recent ones
-- Keeps completion reports and key decisions intact
-- Preserves the last N messages untouched (default: 5)
-
-**Usage tracking:**
-```python
-{
-    "used": 45000,      # Current token count
-    "max": 200000,      # Model's context window
-    "ratio": 0.225      # Percentage used
-}
+**Repeating sequences** — A-B-A-B patterns:
 ```
-
-### 3. Tool Router (`tools/registry.py`)
-
-Central registry for all tools. Handles mode-based filtering and dispatching.
-
-**Mode restrictions:**
-
-| Mode | Allowed Tools |
-|------|---------------|
-| **plan** | `ask_user`, `plan_tool`, `read_file`, `list_dir`, `glob_files`, `grep_search` |
-| **research** | All plan tools + `web_search`, `papers`, `research`, `github_*` |
-| **write** | All plan tools + `writing`, `web_search` (for citations), `papers` |
-| **general** | All tools (no restrictions) |
-
-**When a blocked tool is called:**
-```
-Tool 'bash' is not available in PLAN mode. 
-Plan mode is for planning and asking questions only.
-Suggest switching to research or write mode using ask_user with suggest_mode.
+read(a) → edit(a) → read(a) → edit(a)  → DETECTED
 ```
 
-### 4. Doom Loop Detection (`agent/doom_loop.py`)
+When detected, a correction prompt is injected telling the agent to try a different approach.
 
-Detects when the agent gets stuck in repetitive patterns.
+## DB-Persisted Writing Projects
 
-**Pattern 1: Identical consecutive calls**
-```
-bash(ls) → bash(ls) → bash(ls)  # 3+ identical = doom loop
-```
+Paper writing uses the `writing_projects` table:
+- Outline, sections, and bibliography are stored as structured data
+- Every write/update auto-saves to the database immediately
+- Writing state survives Celery worker restarts, server redeployments, and browser refreshes
+- The Paper tab in the UI reads directly from the database
+- Client-side export to Markdown or LaTeX
 
-**Pattern 2: Repeating sequences**
-```
-read_file(a) → edit_file(a) → read_file(a) → edit_file(a)  # A-B-A-B pattern
-```
+## Redis Interrupt Relay
 
-**Correction prompt injected:**
-```
-[DOOM LOOP DETECTED] You have called `bash` with identical arguments 3 times 
-in a row. This is not making progress. Try a completely different approach:
-- Use a different tool
-- Change the arguments significantly
-- Re-read the error message carefully
-- Ask the user for help if you're stuck
-```
+When a user clicks Stop:
 
-### 5. Session Manager (`services/session_manager.py`)
+1. Frontend sends `POST /api/interrupt`
+2. Web process publishes interrupt signal to Redis channel
+3. Celery worker receives the signal
+4. Worker kills the running agent task immediately
+5. `interrupted` event is broadcast via SSE
 
-Manages multiple concurrent conversations. Each conversation gets its own isolated session.
+This is a real kill, not a cooperative flag check. The agent stops within seconds regardless of what tool is executing.
 
-**Session lifecycle:**
-1. Created on first message to a conversation
-2. Persists across browser refreshes (messages in DB)
-3. Destroyed when conversation deleted or server restart
+## Sub-Agent Streaming
 
-**Per-session state:**
-- `Session` — Message history, config, event callbacks
-- `ToolRouter` — Registered tools, MCP connections
-- `SandboxManager` — Docker containers, SSH connections
+The `research` tool spawns an independent sub-agent:
 
-## Event Flow
+- Sub-agent has its own context window and tool set
+- Parent agent sees nested tool calls streamed in real-time
+- Frontend displays nested tool calls inline within the research tool output
+- Useful for deep dives that would consume too much of the main context
 
-All events are broadcast via Server-Sent Events (SSE):
+## Per-Conversation Processing
 
-```
-User Message → processing → assistant_chunk (streaming) → tool_call → 
-               tool_output → assistant_chunk → ... → turn_complete
-```
+Each conversation gets isolated state:
 
-| Event | Data | When |
-|-------|------|------|
-| `processing` | `{status: "thinking..."}` | Agent starts |
-| `assistant_chunk` | `{chunk: "text"}` | Streaming tokens |
-| `assistant_stream_end` | `{}` | Stream complete |
-| `tool_call` | `{name, arguments}` | Tool invoked |
-| `tool_output` | `{name, output}` | Tool returned |
-| `questions` | `{questions: [...]}` | `ask_user` called |
-| `plan_update` | `{tasks: [...]}` | Task list changed |
-| `context_usage` | `{used, max, ratio}` | Token gauge |
-| `turn_complete` | `{}` | Processing done |
-| `error` | `{error: "..."}` | Error occurred |
+- Own agent session, tool router, and sandbox manager
+- Processing state tracked independently (`idle` / `processing` / `interrupted`)
+- Multiple conversations can process in parallel
+- Interrupting one does not affect others
 
 ## Configuration
 
@@ -185,7 +134,7 @@ Key settings in `AgentConfig`:
 ```python
 @dataclass
 class AgentConfig:
-    model_name: str = ""                    # LLM to use
+    model_name: str = ""                    # LLM to use (empty = auto-detect)
     max_iterations: int = 300               # Tool calls per turn
     stream: bool = True                     # Stream responses
     compact_threshold_ratio: float = 0.90   # Compact at 90%
@@ -193,60 +142,3 @@ class AgentConfig:
     default_max_tokens: int = 200000        # Fallback context size
     yolo_mode: bool = False                 # Skip confirmations
 ```
-
-## Extending the Harness
-
-### Adding a new tool
-
-```python
-# In tools/my_tool.py
-from ..agent.types import ToolSpec
-
-MY_TOOL_SPEC = ToolSpec(
-    name="my_tool",
-    description="Does something useful",
-    parameters={
-        "type": "object",
-        "properties": {
-            "arg1": {"type": "string", "description": "First argument"},
-        },
-        "required": ["arg1"],
-    },
-)
-
-async def my_tool(arg1: str) -> str:
-    # Implementation
-    return f"Result: {arg1}"
-
-# In tools/registry.py, add to create_tool_router()
-router.register(ToolSpec(...))
-```
-
-### Adding mode restrictions
-
-```python
-# In tools/registry.py
-MODE_TOOL_RESTRICTIONS = {
-    "my_mode": {
-        "allowed": {"tool1", "tool2"},
-        "blocked_message": "Tool '{tool}' not allowed in my_mode.",
-    },
-}
-```
-
-### Custom compaction logic
-
-Override `ContextManager.compact()` to customize how old messages are summarized.
-
-## Debugging
-
-Enable debug logging:
-
-```bash
-LOG_LEVEL=DEBUG uvicorn openmlr.app:app
-```
-
-Key log messages:
-- `[LLM] Model: ...` — Which model is being used
-- `[DOOM LOOP DETECTED]` — Loop detected and corrected
-- `Context nearing limit, compacting...` — Auto-compaction triggered
diff --git a/site/docs/api.md b/site/docs/api.md
index b4dc2d5..7d89255 100644
--- a/site/docs/api.md
+++ b/site/docs/api.md
@@ -9,7 +9,7 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens.
 | POST | `/api/auth/register` | `{username, password, display_name?}` | Create account, returns token |
 | POST | `/api/auth/login` | `{username, password}` | Login, returns token |
 | GET | `/api/auth/me` | — | Current user info |
-| GET | `/api/auth/check` | — | Check if any users exist |
+| GET | `/api/auth/check` | — | Check if any users exist (onboarding) |
 
 ## Conversations
 
@@ -19,26 +19,24 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens.
 | POST | `/api/conversations` | `{title?, model?, mode?}` | Create conversation |
 | GET | `/api/conversations/:uuid` | — | Get conversation + messages |
 | DELETE | `/api/conversations/:uuid` | — | Delete conversation |
-| POST | `/api/conversations/:uuid/switch` | — | Switch active conversation |
 
 ## Messaging
 
 | Method | Path | Body | Description |
 |--------|------|------|-------------|
-| POST | `/api/message` | `{message, mode?}` | Send message (mode: plan/research/write) |
+| POST | `/api/message` | `{message, mode?}` | Send message (mode: plan/execute) |
 | POST | `/api/answers` | `{answers: {qid: label}}` | Answer structured questions |
-| POST | `/api/interrupt` | — | Cancel current agent turn |
+| POST | `/api/interrupt` | — | Cancel current agent turn (Redis relay) |
 | POST | `/api/approval` | `{approvals: {id: bool}}` | Approve/reject tool calls |
 | POST | `/api/undo` | — | Undo last turn |
 | POST | `/api/compact` | — | Compact conversation context |
-| POST | `/api/model` | `{model}` | Switch LLM model |
+| POST | `/api/model` | `{model}` | Switch LLM model (sticky, persisted) |
 
 ## SSE
 
 | Method | Path | Description |
 |--------|------|-------------|
-| GET | `/api/events?token=JWT` | Server-Sent Events stream |
-| GET | `/api/events/test` | Test endpoint (3 events) |
+| GET | `/api/events?token=JWT` | Server-Sent Events stream (supports reconnection catch-up) |
 
 ## Settings
 
@@ -51,10 +49,23 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens.
 | GET | `/api/providers` | List provider status |
 | GET | `/api/models` | List available models |
 | GET | `/api/status` | Current model + config |
-| GET | `/api/reports/:id` | Get completion report content |
+
+## Frontend Routes
+
+The frontend is a single-page app served from `/`:
+
+| Route | Description |
+|-------|-------------|
+| `/login` | Authentication |
+| `/` | Chat UI (protected, redirects to `/login` if unauthenticated) |
+| `/:uuid` | Specific conversation |
+| `/settings/providers` | API key management |
+| `/settings/agent` | Model & behavior settings |
+| `/settings/sandbox` | Execution environment settings |
+| `/settings/writing` | Paper writing preferences |
 
 ## Health
 
 | Method | Path | Description |
 |--------|------|-------------|
-| GET | `/health` | `{"status": "ok", "version": "2.0.0"}` |
+| GET | `/health` | `{"status": "ok", "version": "0.2.0"}` |
diff --git a/site/docs/architecture.md b/site/docs/architecture.md
index 1a14207..b051274 100644
--- a/site/docs/architecture.md
+++ b/site/docs/architecture.md
@@ -2,239 +2,101 @@
 
 ## Overview
 
-OpenMLR is a full-stack application with a React frontend, Python backend, and PostgreSQL database. It's designed to run as a self-hosted service with optional background job processing.
+OpenMLR is a full-stack application with three packages:
 
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         Frontend                                 │
-│  React 19 + Vite + react-router-dom                             │
-│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
-│  │ Landing │ │  Login  │ │  Chat   │ │Settings │ │ Reports │   │
-│  │  Page   │ │  Page   │ │   UI    │ │  Panel  │ │ Drawer  │   │
-│  └─────────┘ └─────────┘ └────┬────┘ └─────────┘ └─────────┘   │
-└────────────────────────────────┼────────────────────────────────┘
-                                 │ SSE + REST
-                                 ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                         Backend                                  │
-│  Python 3.12 + FastAPI + SQLAlchemy + Celery                    │
-│  ┌─────────────────────────────────────────────────────────┐    │
-│  │                    Agent Harness                         │    │
-│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │    │
-│  │  │  Loop    │ │ Context  │ │   Tool   │ │   LLM    │   │    │
-│  │  │ (300 it) │ │ Manager  │ │  Router  │ │ Provider │   │    │
-│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘   │    │
-│  └─────────────────────────────────────────────────────────┘    │
-│  ┌─────────────────────────────────────────────────────────┐    │
-│  │                       Tools                              │    │
-│  │  papers, research, writing, search, github, sandbox...  │    │
-│  └─────────────────────────────────────────────────────────┘    │
-└────────────────────────────────────┬────────────────────────────┘
-                                     │
-              ┌──────────────────────┼──────────────────────┐
-              ▼                      ▼                      ▼
-        ┌──────────┐           ┌──────────┐           ┌──────────┐
-        │ Postgres │           │  Redis   │           │  Celery  │
-        │    DB    │           │  (jobs)  │           │  Worker  │
-        └──────────┘           └──────────┘           └──────────┘
-```
-
-## Directory Structure
-
-```
-OpenMLR/
-├── frontend/                    # React 19 + Vite
-│   ├── src/
-│   │   ├── components/          # UI components
-│   │   │   ├── LandingPage.tsx  # Public landing page
-│   │   │   ├── LoginPage.tsx    # Auth forms
-│   │   │   ├── AuthGuard.tsx    # Route protection
-│   │   │   ├── Sidebar.tsx      # Conversation list
-│   │   │   ├── MessageList.tsx  # Chat messages
-│   │   │   ├── InputArea.tsx    # Message input + mode selector
-│   │   │   ├── ModelModal.tsx   # Model picker
-│   │   │   ├── ApprovalModal.tsx# Sandbox confirmations
-│   │   │   ├── SettingsPanel.tsx# User settings
-│   │   │   ├── QuestionDrawer.tsx# Agent questions UI
-│   │   │   ├── RightPanel.tsx   # Tasks + resources
-│   │   │   └── ReportDrawer.tsx # Completion reports
-│   │   ├── hooks/
-│   │   │   ├── useSSE.ts        # Server-Sent Events
-│   │   │   └── useJobStatus.ts  # Background job polling
-│   │   ├── api.ts               # REST client
-│   │   └── types.ts             # TypeScript types
-│   └── index.html
-│
-├── backend/
-│   ├── openmlr/
-│   │   ├── app.py               # FastAPI entry point
-│   │   ├── config.py            # Layered config (YAML → env → auto)
-│   │   ├── dependencies.py      # DI (auth, db)
-│   │   │
-│   │   ├── agent/               # Core agent harness
-│   │   │   ├── loop.py          # Agentic loop (300 iterations)
-│   │   │   ├── context.py       # Token tracking, compaction
-│   │   │   ├── session.py       # Per-conversation state
-│   │   │   ├── llm.py           # Multi-provider LLM calls
-│   │   │   ├── prompts.py       # System prompt builder
-│   │   │   ├── doom_loop.py     # Repetition detection
-│   │   │   └── types.py         # Data classes
-│   │   │
-│   │   ├── tools/               # Agent tools
-│   │   │   ├── registry.py      # Tool router + mode restrictions
-│   │   │   ├── local.py         # bash, read, write, edit
-│   │   │   ├── papers.py        # OpenAlex, ArXiv, CrossRef
-│   │   │   ├── research.py      # Research sub-agent
-│   │   │   ├── writing.py       # Paper drafting
-│   │   │   ├── ask_user.py      # Structured questions
-│   │   │   ├── plan.py          # Task tracking
-│   │   │   ├── search.py        # Brave web search
-│   │   │   ├── github.py        # GitHub search
-│   │   │   ├── sandbox_tools.py # Sandbox wrappers
-│   │   │   └── mcp.py           # MCP integration
-│   │   │
-│   │   ├── sandbox/             # Code execution
-│   │   │   ├── interface.py     # Abstract interface
-│   │   │   ├── local.py         # Docker-based
-│   │   │   ├── ssh.py           # Remote SSH
-│   │   │   └── modal_sandbox.py # Modal cloud
-│   │   │
-│   │   ├── auth/                # JWT authentication
-│   │   │   ├── router.py        # /api/auth/* routes
-│   │   │   └── security.py      # bcrypt + JOSE
-│   │   │
-│   │   ├── db/                  # Database layer
-│   │   │   ├── engine.py        # AsyncSession setup
-│   │   │   ├── models.py        # SQLAlchemy models
-│   │   │   └── operations.py    # CRUD operations
-│   │   │
-│   │   ├── routes/              # API routes
-│   │   │   ├── agent.py         # /api/message, /api/conversations
-│   │   │   ├── settings.py      # /api/settings, /api/models
-│   │   │   └── health.py        # /health, /api/health
-│   │   │
-│   │   ├── services/            # Business logic
-│   │   │   ├── event_bus.py     # SSE broadcasting
-│   │   │   ├── session_manager.py # Session lifecycle
-│   │   │   └── job_manager.py   # Celery job tracking
-│   │   │
-│   │   └── tasks/               # Background jobs
-│   │       └── agent_tasks.py   # Celery task definitions
-│   │
-│   └── configs/
-│       └── prompts/
-│           └── system_prompt.yaml # Jinja2 system prompt
-│
-├── site/                        # VitePress documentation
-│   └── docs/
-│
-├── docker-compose.yml           # Production deployment
-├── docker-compose.coolify.yml   # Coolify-specific
-├── Dockerfile                   # Multi-stage build
-└── railway.json                 # Railway deployment
-```
+| Package | Stack | Purpose |
+|---------|-------|---------|
+| `backend/` | Python 3.12, FastAPI, SQLAlchemy async, Celery | API, agent harness, tools, background jobs |
+| `frontend/` | React 19, TypeScript, Vite | Chat UI, settings pages, paper preview |
+| `site/` | VitePress | Documentation |
 
-## Data Flow
+## Request Flow
 
-### User Message Processing
-
-```
-1. User types message → InputArea.tsx
-2. POST /api/message → agent.py:send_message()
-3. Load/create session → SessionManager
-4. Add user message to context
-5. Start agent loop:
-   a. Build messages array (system + history + user)
-   b. Call LLM with streaming
-   c. Parse response for tool calls
-   d. Execute tools via ToolRouter
-   e. Add results to context
-   f. Repeat until no tool calls or max iterations
-6. Broadcast events via SSE → useSSE.ts
-7. Frontend updates in real-time
 ```
-
-### SSE Event Stream
-
-```typescript
-// Frontend subscribes
-const { messages, isConnected } = useSSE('/api/events');
-
-// Backend broadcasts
-await event_bus.broadcast({
-  event_type: "assistant_chunk",
-  data: { chunk: "Hello" },
-  conversation_uuid: "..."
-});
+┌──────────────────────────────────────────────────────────┐
+│                      Frontend                             │
+│  React 19 + Vite + react-router-dom                      │
+│  /login  /  /:uuid  /settings/*                          │
+└────────────────────┬─────────────────────────────────────┘
+                     │ SSE + REST
+                     ▼
+┌──────────────────────────────────────────────────────────┐
+│                      Backend                              │
+│  FastAPI + SQLAlchemy async                               │
+│  ┌──────────────────────────────────────────────────┐    │
+│  │              Agent Harness                        │    │
+│  │  Loop → LLM → Parse → Tool Router → Execute      │    │
+│  │         ↑                           │             │    │
+│  │         └───── results ─────────────┘             │    │
+│  └──────────────────────────────────────────────────┘    │
+│  ┌──────────────────────────────────────────────────┐    │
+│  │  Tools: papers, research, writing, bash, sandbox  │    │
+│  └──────────────────────────────────────────────────┘    │
+└──────────┬──────────────┬──────────────┬─────────────────┘
+           │              │              │
+     ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
+     │ PostgreSQL │ │   Redis   │ │  Celery   │
+     │     DB     │ │  pub/sub  │ │  Worker   │
+     └───────────┘ └───────────┘ └───────────┘
 ```
 
-| Event | Payload | When |
-|-------|---------|------|
-| `processing` | `{status}` | Agent starts thinking |
-| `assistant_chunk` | `{chunk}` | Streaming text token |
-| `assistant_stream_end` | `{}` | Stream finished |
-| `assistant_message` | `{content}` | Non-streaming fallback |
-| `tool_call` | `{name, arguments}` | Tool invoked |
-| `tool_output` | `{name, output}` | Tool returned |
-| `questions` | `{questions}` | Agent asks user |
-| `plan_update` | `{tasks}` | Task list changed |
-| `resources_update` | `{resources}` | Resources changed |
-| `context_usage` | `{used, max, ratio}` | Token gauge |
-| `search_budget` | `{used, max}` | Paper search budget |
-| `turn_complete` | `{}` | Processing done |
-| `error` | `{error}` | Error occurred |
-| `interrupted` | `{}` | User cancelled |
-
 ## Database Schema
 
 ```sql
--- Users and authentication
+-- Auth
 users (id, username, password_hash, display_name, created_at)
-user_settings (id, user_id, category, key, value, created_at, updated_at)
+user_settings (id, user_id, category, key, value, updated_at)
 
 -- Conversations
-conversations (id, uuid, user_id, title, model, mode, user_message_count, created_at, updated_at)
+conversations (id, uuid, user_id, title, model, mode, created_at, updated_at)
 messages (id, conversation_id, role, content, metadata, created_at)
 
--- Research
-research_corpus (id, user_id, paper_id, title, authors, abstract, year, source, url, added_at)
-
--- Writing
-writing_projects (id, user_id, title, outline, sections, bibliography, created_at, updated_at)
+-- Research & Writing
+research_corpus (id, user_id, paper_id, title, authors, abstract, year, source, url)
+writing_projects (id, user_id, title, outline, sections, bibliography, updated_at)
 
 -- Execution
-sandbox_configs (id, user_id, name, type, config, created_at, updated_at)
+sandbox_configs (id, user_id, name, type, config)
 
--- Task tracking (persisted)
-conversation_tasks (id, conversation_id, content, status, priority, created_at, updated_at)
-conversation_resources (id, conversation_id, title, type, url, content, created_at)
+-- Task tracking
+conversation_tasks (id, conversation_id, content, status, priority)
+conversation_resources (id, conversation_id, title, type, content)
 
 -- Background jobs
-agent_jobs (id, job_id, conversation_id, user_id, status, message, mode, model, error, created_at, started_at, completed_at)
+agent_jobs (id, job_id, conversation_id, user_id, status, message, mode, model, error)
 ```
 
-## LLM Provider Routing
+## SSE Event Flow
 
-```
-Model name format: provider/model-name
+The frontend connects to `/api/events?token=JWT` and receives real-time updates:
 
-openai/gpt-4o           → OpenAI API
-anthropic/claude-sonnet-4 → Anthropic API  
-openrouter/...          → OpenRouter API
-opencode-go/qwen3.6-plus → OpenCode Go API
-ollama/llama3.1         → Local Ollama
-lmstudio/default        → Local LM Studio
-local/my-model          → Custom OpenAI-compatible
+```
+User sends message
+       │
+       ▼
+  processing          → "thinking..."
+       │
+       ▼
+  assistant_chunk     → streaming text tokens (repeats)
+       │
+       ▼
+  tool_call           → {name, arguments}
+       │
+       ▼
+  tool_output         → {name, output}
+       │
+       ▼
+  (repeat LLM → tools until done)
+       │
+       ▼
+  turn_complete       → processing finished
 ```
 
-The `LLMProvider` class handles:
-- API key selection based on prefix
-- Base URL routing
-- Anthropic vs OpenAI message format conversion
-- Streaming and non-streaming calls
-- Retry with exponential backoff
+Key events: `processing`, `assistant_chunk`, `assistant_stream_end`, `tool_call`, `tool_output`, `plan_update`, `resources_update`, `context_usage`, `turn_complete`, `error`, `interrupted`.
 
-## Background Jobs
+SSE supports reconnection catch-up — if the client disconnects and reconnects, it receives missed events.
+
+## Background Jobs with Redis Interrupt
 
 When `USE_BACKGROUND_JOBS=true`:
 
@@ -242,44 +104,39 @@ When `USE_BACKGROUND_JOBS=true`:
 User sends message
        │
        ▼
-Web creates AgentJob in DB
+Web process creates AgentJob in DB
        │
        ▼
 Celery task queued to Redis
        │
        ▼
-Worker picks up job
-       │
-       ▼
-Agent loop runs in worker
+Worker picks up job, runs agent loop
        │
        ▼
 Events published to Redis pub/sub
        │
        ▼
-Web relays to SSE clients
+Web process relays events to SSE clients
 ```
 
-Benefits:
-- Browser can close, job continues
-- Horizontal scaling of workers
-- Job status persistence
+**Interrupt flow**: When a user clicks Stop, the web process publishes an interrupt signal to Redis. The worker receives it and actually kills the running task — not just a flag check on the next iteration.
+
+## Per-Conversation Processing
 
-## Security Model
+Each conversation has its own isolated processing state:
 
-- **Authentication**: JWT tokens (bcrypt hashed passwords)
-- **Authorization**: Per-user data isolation via `user_id` foreign keys
-- **API Keys**: Stored in `user_settings` or env vars
-- **Sandboxing**: Docker isolation for code execution
-- **Confirmations**: Required for sandbox creation, destructive ops (unless `yolo_mode`)
+- Multiple conversations can process simultaneously
+- Each gets its own agent session, tool router, and sandbox manager
+- Processing state (`idle`, `processing`, `interrupted`) is tracked per conversation
+- One conversation being interrupted does not affect others
 
 ## Deployment Options
 
-| Platform | Config File | Notes |
-|----------|-------------|-------|
-| Docker Compose | `docker-compose.yml` | Default, all-in-one |
-| Railway | `railway.json` | Multi-service template |
-| Coolify | `docker-compose.coolify.yml` | One-click deploy |
-| Manual | - | Run backend + frontend separately |
+| Platform | Config | Notes |
+|----------|--------|-------|
+| Docker Compose | `docker-compose.yml` | Default, includes all services |
+| Render | Deploy button | Includes Postgres + Redis |
+| Heroku | Deploy button | Dyno-based |
+| Coolify | `docker-compose.coolify.yml` | Self-hosted PaaS |
 
 See [Setup](/setup) for detailed instructions.
diff --git a/site/docs/changelog.md b/site/docs/changelog.md
new file mode 100644
index 0000000..b506cad
--- /dev/null
+++ b/site/docs/changelog.md
@@ -0,0 +1,39 @@
+# Changelog
+
+## v0.2.0
+
+Major rewrite of the mode system, paper writing, processing architecture, and UI routing.
+
+### Mode System
+- Simplified from Plan/Research/Write to **Plan + Execute** (two modes)
+- Plan mode: ask questions, gather context, create plans. No execution.
+- Execute mode: all tools available. Follow the plan.
+- Toggle with P/E button, `Cmd+B` (Plan), or `Cmd+E` (Execute)
+- Amber border for Plan messages, blue border for Execute messages
+- Three-layer mode enforcement: system prompt, tool filtering, runtime blocking
+
+### Paper Writing
+- Writing tool with **auto-save to database** — survives across workers and restarts
+- Paper preview in the **Paper tab** in the UI
+- Client-side export to Markdown and LaTeX
+- Outline, sections, and bibliography managed as structured data
+
+### Processing Architecture
+- **Per-conversation processing state** — multiple conversations run in parallel
+- **Background jobs via Celery + Redis** — close the browser, come back later
+- **Redis-based interrupt relay** — actually kills running worker tasks, not just a flag check
+- **Sub-agent streaming** — research tool shows nested tool calls in real-time
+
+### Settings & UI
+- **Settings as routed pages** (`/settings/providers`, `/settings/agent`, `/settings/sandbox`, `/settings/writing`) — no longer a modal
+- **Sticky model selection** — persisted per-user in the database
+- **Onboarding flow** — guided setup when no LLM provider is configured
+- **Route restructure** — app served from `/` instead of `/app`
+- **SSE reconnection catch-up** — missed events replayed on reconnect
+- **PLAN.md auto-generated** and pinned in resources panel
+
+### Testing & CI
+- **149 backend tests + 29 frontend tests** — comprehensive coverage
+- **GitHub CI** — tests run on push and pull request
+- `make test` runs all tests (backend + frontend + docs build)
+- `make test-backend`, `make test-frontend`, `make test-docs` for targeted runs
diff --git a/site/docs/configuration.md b/site/docs/configuration.md
index a8ea486..bc000a4 100644
--- a/site/docs/configuration.md
+++ b/site/docs/configuration.md
@@ -12,121 +12,72 @@ ANTHROPIC_API_KEY=sk-ant-...
 OPENROUTER_API_KEY=sk-or-...
 
 # ── Local models (OpenAI-compatible APIs) ──
-# Ollama
 OLLAMA_API_BASE=http://localhost:11434/v1
 OLLAMA_MODEL=llama3.1
 
-# LM Studio
 LMSTUDIO_API_BASE=http://localhost:1234/v1
 LMSTUDIO_MODEL=default
 
-# Any OpenAI-compatible API (vLLM, TGI, etc.)
 LOCAL_API_BASE=http://localhost:8000/v1
 LOCAL_MODEL=local/my-model
 LOCAL_API_KEY=not-needed
 
-# ── Background jobs (optional) ──
+# ── Background jobs (optional, recommended) ──
 REDIS_URL=redis://localhost:6379/0
 USE_BACKGROUND_JOBS=true
 USE_REDIS_PUBSUB=true
 
 # ── Search & research (optional) ──
-BRAVE_API_KEY=...                    # web search
-OPENALEX_EMAIL=you@example.com       # polite pool (no key needed)
+BRAVE_API_KEY=...
+OPENALEX_EMAIL=you@example.com
 
 # ── GitHub (optional) ──
-GITHUB_TOKEN=ghp_...                 # improves code search rate limits
+GITHUB_TOKEN=ghp_...
 
 # ── Auth ──
 JWT_SECRET_KEY=change-me-in-production
 
 # ── Docker execution ──
-OPEN_MLR_DOCKER_IMAGE=python:3.12-slim  # default container image
+OPEN_MLR_DOCKER_IMAGE=python:3.12-slim
 
 # ── Modal sandbox (optional) ──
 MODAL_TOKEN_ID=...
 MODAL_TOKEN_SECRET=...
 ```
 
-## Per-User Settings
+## Settings Pages
 
-After login, click the gear icon in the sidebar to open Settings:
+Settings are routed pages (not a modal), accessible from the sidebar:
 
-| Tab | What you can configure |
-|-----|----------------------|
-| **Providers** | API keys for all services (stored encrypted in DB, override .env) |
-| **Agent** | Default model, research model, YOLO mode |
-| **Sandbox** | Default execution environment (local/SSH/Modal), Modal credentials |
-| **Writing** | Citation style (APA/IEEE/ACM/Chicago), export format |
+| Route | What you configure |
+|-------|-------------------|
+| `/settings/providers` | API keys for LLM providers and services. Stored encrypted in DB, override `.env` values. |
+| `/settings/agent` | Default model, research model, YOLO mode, max iterations. |
+| `/settings/sandbox` | Default execution environment (Docker/SSH/Modal), Modal credentials. |
+| `/settings/writing` | Citation style (APA/IEEE/ACM/Chicago), export format preferences. |
+
+## Sticky Model Selection
+
+The selected model is persisted per-user in the database. When you switch models via the header dropdown, the choice sticks across sessions, devices, and browser refreshes. No need to re-select every time.
 
 ## Model Selection
 
 Models are auto-detected based on which API keys/URLs are configured:
 
-### Cloud Providers
-
-| Key set | Default model |
+| Key set | Example model |
 |---------|--------------|
 | `ANTHROPIC_API_KEY` | `anthropic/claude-sonnet-4` |
 | `OPENAI_API_KEY` | `openai/gpt-4o` |
 | `OPENROUTER_API_KEY` | `openrouter/anthropic/claude-sonnet-4` |
-
-### Local Models
-
-| Config | Model prefix |
-|--------|-------------|
 | `OLLAMA_MODEL` | `ollama/llama3.1` |
 | `LMSTUDIO_API_BASE` | `lmstudio/default` |
 | `LOCAL_API_BASE` | `local/my-model` |
 
-Override via Settings > Agent > Default Model, or by clicking the model
-button in the header.
-
-## Local Model Setup
-
-### Ollama
-
-```bash
-# Install and start Ollama
-ollama serve
-
-# Pull a model
-ollama pull llama3.1
-
-# Configure
-OLLAMA_MODEL=llama3.1
-# OLLAMA_API_BASE defaults to http://localhost:11434/v1
-```
-
-Use as `ollama/llama3.1` in the model selector.
-
-### LM Studio
-
-1. Download and install [LM Studio](https://lmstudio.ai)
-2. Load a model and start the server
-3. Configure:
-```bash
-LMSTUDIO_API_BASE=http://localhost:1234/v1
-LMSTUDIO_MODEL=default
-```
-
-Use as `lmstudio/default` in the model selector.
-
-### vLLM / text-generation-inference / Other
-
-Any server that exposes an OpenAI-compatible `/v1/chat/completions` endpoint:
-
-```bash
-LOCAL_API_BASE=http://localhost:8000/v1
-LOCAL_MODEL=local/my-model-name
-LOCAL_API_KEY=not-needed   # or your auth token
-```
-
-Use as `local/my-model-name` in the model selector.
+Override via `/settings/agent` or the model dropdown in the header.
 
 ## Background Jobs
 
-Enable persistent task tracking that survives browser refreshes:
+Enable with Redis for persistent processing:
 
 ```bash
 REDIS_URL=redis://localhost:6379/0
@@ -135,14 +86,15 @@ USE_REDIS_PUBSUB=true
 ```
 
 When enabled:
-- Tasks and resources persist to the database
 - Agent processing continues even if you close the browser
-- Reconnecting shows full progress history
-- Multiple browser tabs receive live updates via Redis pub/sub
+- Multiple conversations process in parallel via Celery workers
+- Redis pub/sub relays events from workers to SSE clients
+- Redis-based interrupt relay actually kills running worker tasks
+- Reconnecting via SSE catches up on missed events
 
 Requires a running Redis server. Use `make infra` to start one with Docker.
 
-## Agent Config File
+## Agent Config
 
 `backend/configs/agent_config.yaml` controls defaults:
 
@@ -150,6 +102,8 @@ Requires a running Redis server. Use `make infra` to start one with Docker.
 model_name: ""              # empty = auto-detect
 max_iterations: 300
 stream: true
-paper_search_budget: 25     # API calls per session
+paper_search_budget: 25
 require_plan_approval: true
 ```
+
+These can be overridden per-user via `/settings/agent`.
diff --git a/site/docs/index.md b/site/docs/index.md
index daa06f7..ada0324 100644
--- a/site/docs/index.md
+++ b/site/docs/index.md
@@ -2,8 +2,8 @@
 layout: home
 hero:
   name: OpenMLR
-  text: ML Research Intern
-  tagline: Plans tasks, reads papers, writes drafts, and runs experiments — end to end.
+  text: ML Research Agent
+  tagline: Plans tasks, researches papers, writes drafts, and executes code — end to end, in one conversation.
   actions:
     - theme: brand
       text: Get Started
@@ -13,13 +13,9 @@ hero:
       link: https://github.com/xprilion/OpenMLR
 features:
   - title: Plan
-    details: Structured questions with options, task breakdown, scope clarification before any work begins.
-  - title: Research
-    details: Search OpenAlex, ArXiv, CrossRef. Read papers section-by-section. Crawl citation graphs. Find code on GitHub.
-  - title: Write
-    details: Section-by-section paper drafting with bibliography management and Markdown/LaTeX export.
+    details: Ask clarifying questions, gather context, break tasks into structured plans. No execution until you're ready.
   - title: Execute
-    details: Docker-isolated code execution locally, on SSH remotes, or Modal cloud sandboxes.
+    details: Research papers, write drafts, run code. All tools available. Follows the plan you built in Plan mode.
 ---
 
 ## Quick start
@@ -39,33 +35,22 @@ Open `http://localhost:3000`. Create an account. Start researching.
 
 [![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR)
 
-See [Setup & Installation](/setup) for Coolify, local development, and more options.
-
-## Why OpenMLR?
-
-Ever started researching a topic, opened 47 browser tabs, took notes in three different apps, lost track of that one paper you saw yesterday, and then had to context-switch to run some experiments?
-
-OpenMLR keeps everything in one place. Your research context stays with you from the first "what should I look into?" to the final PDF export.
-
-**No more:**
-- Lost tabs and forgotten citations
-- Copy-pasting between arxiv, notes, and code
-- "Where did I see that figure?"
-- Starting over because you closed your browser
+See [Setup & Installation](/setup) for local development and more options.
 
 ## How it works
 
-```
-Plan → Research → Write → Execute
-```
+OpenMLR uses two modes to keep the agent focused:
+
+- **Plan mode (P)** — The agent asks questions, gathers context, and creates structured plans. No code execution, no file writes. Toggle with `Cmd+B`. Messages have an amber border.
+- **Execute mode (E)** — The agent does the work: researches papers, writes drafts, runs experiments. All tools available. Toggle with `Cmd+E`. Messages have a blue border.
 
-Each mode restricts which tools are available, keeping the agent focused:
+Switch modes with the P/E button in the input area or keyboard shortcuts. The agent follows the plan built during Plan mode.
 
-| Mode | What it does | Tools available |
-|------|--------------|-----------------|
-| **Plan** | Asks clarifying questions, breaks down tasks | Questions, task tracking |
-| **Research** | Searches papers, crawls citations | OpenAlex, ArXiv, web search, GitHub |
-| **Write** | Drafts sections, manages bibliography | Writing tools, citation lookup |
-| **Execute** | Runs code when needed | Docker, SSH, Modal (available in all modes) |
+## Key features
 
-The agent can suggest switching modes, but you approve the switch. No more half-baked drafts with missing citations.
+- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code. Full paper reading, citation graphs.
+- **Paper writing** — Section-by-section drafting with auto-save. Preview + export (Markdown/LaTeX) in the Paper tab.
+- **Sub-agent streaming** — Research tool spawns independent agents with nested tool call visibility.
+- **Background jobs** — Celery + Redis. Close the browser, come back later.
+- **Per-conversation parallelism** — Multiple conversations process simultaneously.
+- **Onboarding flow** — Guided setup when no LLM provider is configured.
diff --git a/site/docs/modes.md b/site/docs/modes.md
index 64f4dc6..09d6eec 100644
--- a/site/docs/modes.md
+++ b/site/docs/modes.md
@@ -1,73 +1,70 @@
 # Modes
 
-OpenMLR uses three per-message modes. Switch modes using the selector above
-the input area. Code execution is available in all modes.
+OpenMLR uses two modes — **Plan** and **Execute** — to keep the agent focused on the right kind of work.
 
-## Plan mode
+## Plan Mode (P)
 
-**Purpose**: Clarify scope before doing work.
+**Purpose**: Gather context, ask questions, create structured plans before doing any work.
 
-The agent will:
-- Ask structured questions using a bottom drawer UI (2-4 options + free text)
-- Break tasks into a plan visible in the right panel
-- Not execute code or modify files
-- Suggest switching to Research or Write mode when ready
+**What the agent can do:**
+- Ask clarifying questions via `ask_user` (structured options UI)
+- Create and update task plans via `plan_tool`
+- Read files and search the codebase (read-only filesystem tools)
+- Search the web and papers for context
+- Generate `PLAN.md` and pin it in resources
 
-## Research mode
+**What the agent cannot do:**
+- Write or edit files
+- Execute code (bash, sandbox)
+- Write paper sections
+- Use the `research` sub-agent
 
-**Purpose**: Find and synthesize information.
+**Visual indicator**: Messages have an **amber border**.
 
-The agent will:
-- Search papers via OpenAlex, ArXiv, CrossRef
-- Read full paper sections from ar5iv HTML
-- Crawl citation graphs and find related work
-- Search GitHub for code examples
-- Track all papers/resources in the right panel
-- Respect the per-session search budget (default 25 API calls)
+**When to use**: Start here. Let the agent understand the problem, ask questions, and build a plan before switching to Execute.
 
-### Search budget
+## Execute Mode (E)
 
-Each session has a limited number of paper API calls to prevent endless
-searching. The budget is shown in the right panel. When exhausted, the agent
-must ask the user before continuing.
+**Purpose**: Do the work. Follow the plan built in Plan mode.
 
-## Write mode
+**What the agent can do:**
+- All tools except `ask_user`
+- Research papers, crawl citations, spawn sub-agents
+- Write and edit files
+- Draft paper sections with auto-save
+- Run code in bash or sandboxes (Docker/SSH/Modal)
 
-**Purpose**: Draft academic content.
+**What the agent cannot do:**
+- Use `ask_user` (no structured questions — it should be working, not asking)
 
-The agent will:
-- Write paper sections using the `writing` tool
-- Manage citations and bibliography
-- Reference completion reports from the research phase
-- Export to Markdown or LaTeX
+**Visual indicator**: Messages have a **blue border**.
 
-## Task management
+**When to use**: Once you have a plan and the agent knows what to do.
 
-The right panel shows:
-- **Tasks**: Plan items with status (pending → in progress → completed)
-- **Resources**: Papers, code repos, datasets, and completion reports
+## Switching Modes
 
-When a task is marked completed, a **completion report** is auto-generated
-with a summary and hints for upcoming tasks. Click report titles in the
-resources list to view them in a slide-out drawer.
+| Method | Action |
+|--------|--------|
+| **P/E button** | Click the mode toggle above the input area |
+| **Cmd+B** | Switch to Plan mode |
+| **Cmd+E** | Switch to Execute mode |
 
-### Completion reports
+The mode applies per-message. You can switch freely between messages.
 
-Reports follow a markdown spec:
+## Mode Enforcement
 
-```markdown
-# Task Completion Report: [task title]
-**Completed**: [timestamp]
-## Summary
-[what was accomplished]
-## Next Steps
-[recommendations for upcoming tasks]
-```
+Mode restrictions are enforced at three layers:
+
+1. **System prompt** — The agent is instructed about what it can and cannot do in the current mode
+2. **Tool filtering** — The tool router only presents mode-allowed tools to the LLM
+3. **Runtime blocking** — If a tool call somehow bypasses filtering, the router returns an error message instead of executing
 
-The agent re-reads these reports to maintain context across compactions.
+When a blocked tool is called, the agent receives:
+```
+Tool 'bash' is not available in PLAN mode.
+Plan mode is for planning and asking questions only.
+```
 
-## Context tracking
+## PLAN.md
 
-The right panel shows a token usage gauge. When approaching the model's
-context window limit, the system auto-compacts the conversation by summarizing
-older messages. The gauge color changes: green → yellow → red.
+When the agent creates a plan in Plan mode, it auto-generates a `PLAN.md` file that is pinned in the resources panel. This plan persists across context compactions and serves as the agent's reference during Execute mode.
diff --git a/site/docs/setup.md b/site/docs/setup.md
index 727dba5..8797c31 100644
--- a/site/docs/setup.md
+++ b/site/docs/setup.md
@@ -9,17 +9,15 @@
 | Node.js | 20+ | [nodejs.org](https://nodejs.org) |
 | pnpm | 9+ | `npm i -g pnpm` |
 | PostgreSQL | 14+ | [postgresql.org](https://www.postgresql.org) |
-| Docker | 20+ | [docker.com](https://www.docker.com) (recommended for code execution) |
+| Docker | 20+ | [docker.com](https://www.docker.com) (recommended) |
 
 ## Quick Start with Docker Compose
 
-The easiest way to run everything:
-
 ```bash
 git clone https://github.com/xprilion/OpenMLR.git
 cd OpenMLR
 cp .env.example .env   # add your API keys
-make up                # starts db, redis, web, worker
+docker compose up -d
 ```
 
 Open `http://localhost:3000`. Create an account on first visit.
@@ -36,10 +34,9 @@ make install
 
 This runs `uv sync` for the Python backend and `pnpm install` for the frontend.
 
-> **Do not** create a virtual environment at the project root (`uv venv` or `python -m venv`).
-> The backend is a standalone uv project — `uv sync` and `uv run` automatically manage
-> `backend/.venv`. Activating a root-level venv will conflict with the backend's environment
-> and cause import errors at runtime.
+> **Do not** create a virtual environment at the project root.
+> The backend is a standalone uv project — `uv sync` and `uv run` manage
+> `backend/.venv` automatically. A root-level venv will cause import errors.
 
 ### Configure
 
@@ -54,8 +51,6 @@ DATABASE_URL="postgresql://user:pass@localhost:5432/openmlr"
 OPENROUTER_API_KEY=sk-or-...    # or OPENAI_API_KEY or ANTHROPIC_API_KEY
 ```
 
-For local models, see [Local Models](#local-models) below.
-
 See [Configuration](/configuration) for all options.
 
 ### Create database
@@ -75,7 +70,7 @@ Opens backend on `:3000` and Vite dev server on `:5173`. Use `:5173` for develop
 **With background jobs** (requires Redis):
 ```bash
 make infra      # start postgres + redis in Docker
-make dev-full   # backend + frontend + celery worker
+make dev        # backend + frontend dev servers
 ```
 
 **Production**:
@@ -98,79 +93,14 @@ make down       # stop all services
 ### Development with Live Reload
 
 ```bash
-make dev-docker-build   # first time
-make dev-docker         # subsequent runs
+make dev-docker     # docker compose with live reload (includes docs)
 ```
 
 Code changes are automatically detected and services restart.
 
-### Useful Commands
-
-| Command | Description |
-|---------|-------------|
-| `make up` | Start all services |
-| `make down` | Stop all services |
-| `make restart` | Quick rebuild web + worker |
-| `make rebuild` | Full rebuild from scratch |
-| `make logs` | Tail all logs |
-| `make logs-web` | Tail web service only |
-| `make logs-worker` | Tail worker only |
-| `make shell-db` | psql into database |
-| `make shell-web` | bash into web container |
-| `make infra` | Start only db + redis |
-
-## Local Models
-
-OpenMLR supports any OpenAI-compatible API for local inference.
-
-### Ollama
-
-```bash
-# Start Ollama
-ollama serve
-
-# Pull a model
-ollama pull llama3.1
-
-# Configure in .env
-OLLAMA_MODEL=llama3.1
-```
-
-Use as `ollama/llama3.1` in the model selector.
-
-### LM Studio
-
-1. Start the LM Studio server from the UI
-2. Configure in `.env`:
-```bash
-LMSTUDIO_API_BASE=http://localhost:1234/v1
-LMSTUDIO_MODEL=default
-```
-
-Use as `lmstudio/default` in the model selector.
-
-### vLLM / text-generation-inference / Other
-
-For any OpenAI-compatible server:
-
-```bash
-LOCAL_API_BASE=http://localhost:8000/v1
-LOCAL_MODEL=local/my-model
-LOCAL_API_KEY=not-needed   # if no auth required
-```
-
-Use as `local/my-model` in the model selector.
-
-## First Launch
-
-1. Open `http://localhost:5173` (dev) or `http://localhost:3000` (prod/Docker)
-2. Create an account (first user is auto-created)
-3. The model is auto-detected from your configured API keys
-4. Start a conversation in **Plan** mode
-
 ## Background Jobs
 
-To enable persistent task tracking that survives browser refreshes:
+Enable persistent processing that survives browser refreshes:
 
 ```bash
 # In .env
@@ -180,9 +110,20 @@ USE_REDIS_PUBSUB=true
 ```
 
 When enabled:
-- Tasks and resources persist to the database
 - Agent continues processing even if you close the browser
-- You can return later and see all progress
+- Per-conversation processing state — multiple conversations run in parallel
+- Redis-based interrupt relay actually kills running worker tasks
+- Reconnecting via SSE catches up on missed events
+
+Requires a running Redis server. Use `make infra` to start one with Docker.
+
+## First Launch
+
+1. Open `http://localhost:5173` (dev) or `http://localhost:3000` (prod/Docker)
+2. Create an account (first user is auto-created)
+3. If no LLM provider is configured, the **onboarding flow** guides you through adding API keys at `/settings/providers`
+4. Start a conversation — you'll be in **Plan mode** by default
+5. Switch to **Execute mode** (P/E button or `Cmd+E`) when ready to work
 
 ## All Makefile Targets
 
@@ -193,22 +134,25 @@ Run `make help` for the full list:
 | **Setup** | |
 | `make install` | Install all dependencies |
 | **Development** | |
-| `make dev` | Run backend + frontend |
-| `make dev-full` | Run with background jobs |
+| `make dev` | Run backend + frontend dev servers |
 | `make worker` | Start Celery worker only |
 | **Docker Compose** | |
 | `make up` | Start all services |
 | `make down` | Stop all services |
-| `make restart` | Quick rebuild + restart |
-| `make rebuild` | Full rebuild |
+| `make restart` | Quick rebuild web + worker |
+| `make rebuild` | Full rebuild from scratch |
 | `make logs` | Tail all logs |
 | `make infra` | Start only db + redis |
-| `make dev-docker` | Live reload with Docker |
+| `make dev-docker` | Live reload with Docker (includes docs) |
 | **Database** | |
 | `make db-fresh` | Drop + recreate tables |
 | `make db-upgrade` | Run migrations |
+| **Testing** | |
+| `make test` | Run all tests (backend + frontend + docs build) |
+| `make test-backend` | Backend tests only (149 tests) |
+| `make test-frontend` | Frontend tests only (29 tests) |
+| `make test-docs` | Docs build check |
 | **Other** | |
 | `make check` | Type-check backend + frontend |
-| `make test` | Run backend tests |
 | `make docs-dev` | Preview docs locally |
 | `make clean` | Remove build artifacts |
diff --git a/site/docs/tools.md b/site/docs/tools.md
index de2511d..3727ea9 100644
--- a/site/docs/tools.md
+++ b/site/docs/tools.md
@@ -1,29 +1,25 @@
 # Agent Tools
 
-The agent has access to 18 built-in tools. Tools are invoked automatically
-based on the task at hand.
+The agent has access to built-in tools organized by category. Tool availability depends on the current [mode](/modes).
 
-## Filesystem
+## Planning Tools
 
-| Tool | Description |
-|------|-------------|
-| `bash` | Execute shell commands in a Docker container (falls back to host if Docker unavailable) |
-| `read` | Read files with line numbers |
-| `write` | Create/overwrite files |
-| `edit` | Find-and-replace in files |
+| Tool | Description | Plan | Execute |
+|------|-------------|:----:|:-------:|
+| `ask_user` | Ask structured questions (2-4 options + free text per question) | yes | no |
+| `plan_tool` | Create/update task plans, track resources, generate completion reports | yes | yes |
 
-## Research
+## Research Tools
 
-| Tool | Description |
-|------|-------------|
-| `papers` | Search OpenAlex, read ArXiv papers, get citations, find code/datasets |
-| `web_search` | Brave web search |
-| `research` | Spawn an independent research sub-agent with its own context |
-| `github_read_file` | Read files from GitHub repos |
-| `github_list_repos` | List repos for a user/org |
-| `github_find_examples` | Search GitHub for code examples |
+| Tool | Description | Plan | Execute |
+|------|-------------|:----:|:-------:|
+| `papers` | Search OpenAlex, read ArXiv papers, get citations, find code/datasets | yes | yes |
+| `web_search` | Brave web search | yes | yes |
+| `research` | Spawn an independent research sub-agent with its own context | no | yes |
+| `github_search` | Search GitHub for repos and code | yes | yes |
+| `github_read` | Read files from GitHub repos | yes | yes |
 
-### Papers operations
+### Papers Operations
 
 | Operation | Source | Description |
 |-----------|--------|-------------|
@@ -36,19 +32,56 @@ based on the task at hand.
 | `find_code` | Papers With Code | GitHub repos linked to papers |
 | `find_datasets` | Papers With Code | Datasets linked to papers |
 
-## Planning & interaction
+### Research Sub-Agent
 
-| Tool | Description |
-|------|-------------|
-| `plan_tool` | Create/update task plans, track resources, generate completion reports |
-| `ask_user` | Ask structured questions (2-4 options + free text per question) |
-| `writing` | Manage paper writing projects (outline, sections, bibliography, export) |
+The `research` tool spawns an independent sub-agent with its own context window. The parent agent sees nested tool calls streamed in real-time. Useful for deep dives that would consume too much of the main conversation's context.
+
+## Writing Tool
+
+| Tool | Description | Plan | Execute |
+|------|-------------|:----:|:-------:|
+| `writing` | Paper authoring — manage outline, write sections, update bibliography | no | yes |
+
+The writing tool manages a **writing project** stored in the database:
+- **Outline**: Define paper structure (sections, subsections)
+- **Sections**: Write/update individual sections with auto-save
+- **Bibliography**: Manage citations and references
+- **Auto-save**: All changes persist to the database immediately, surviving across workers and restarts
+
+Paper preview and client-side export (Markdown/LaTeX) are available in the **Paper tab** in the UI.
+
+## Filesystem Tools
 
-## Execution environments
+| Tool | Description | Plan | Execute |
+|------|-------------|:----:|:-------:|
+| `read` | Read files with line numbers | yes | yes |
+| `write` | Create/overwrite files | no | yes |
+| `edit` | Find-and-replace in files | no | yes |
+| `list_dir` | List directory contents | yes | yes |
+| `glob_files` | Find files by glob pattern | yes | yes |
+| `grep_search` | Search file contents | yes | yes |
 
-| Tool | Description |
+In Plan mode, only read-only filesystem tools are available.
+
+## Execution Tools
+
+| Tool | Description | Plan | Execute |
+|------|-------------|:----:|:-------:|
+| `bash` | Execute shell commands (Docker-isolated when available) | no | yes |
+| `sandbox` | Run code in Docker containers, SSH remotes, or Modal cloud | no | yes |
+
+### Sandbox Types
+
+| Type | Description |
 |------|-------------|
-| `sandbox_probe` | Check environment (Python version, GPU, disk, packages) |
-| `sandbox_create` | Create a new sandbox (local/SSH/Modal) |
-| `sandbox_exec` | Run commands in active sandbox |
-| `sandbox_read` / `sandbox_write` | File I/O in sandbox |
+| **Local (Docker)** | Docker container on the host machine |
+| **SSH** | Remote machine via SSH |
+| **Modal** | Cloud sandbox via Modal |
+
+## Mode Restrictions
+
+Tools are filtered based on the current mode before being sent to the LLM. See [Modes](/modes) for details on the enforcement layers.
+
+In summary:
+- **Plan mode**: `ask_user`, `plan_tool`, read-only filesystem, web search, papers, GitHub
+- **Execute mode**: Everything except `ask_user`