From d2b70d35f4619c9c305fc95bf87c0fa0e649f83a Mon Sep 17 00:00:00 2001 From: xprilion Date: Sun, 26 Apr 2026 15:01:13 +0530 Subject: [PATCH] Update docs and changelog --- README.md | 126 ++++--------- backend/pyproject.toml | 2 +- site/docs/.vitepress/config.ts | 3 +- site/docs/agent-harness.md | 308 +++++++++++--------------------- site/docs/api.md | 29 ++- site/docs/architecture.md | 315 +++++++++------------------------ site/docs/changelog.md | 39 ++++ site/docs/configuration.md | 102 +++-------- site/docs/index.md | 49 ++--- site/docs/modes.md | 99 +++++------ site/docs/setup.md | 116 ++++-------- site/docs/tools.md | 95 ++++++---- 12 files changed, 466 insertions(+), 817 deletions(-) create mode 100644 site/docs/changelog.md diff --git a/README.md b/README.md index 8d225a2..39f29bd 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,6 @@ # OpenMLR -Built for ML researchers who are tired of context-switching. - -Search papers, take notes, write drafts, run experiments — all in one conversation. -Your context stays with you from the first question to the final export. +A self-hosted ML research agent that plans, researches, writes papers, and executes code — all in one conversation. [![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/xprilion/OpenMLR) [![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR) @@ -14,27 +11,19 @@ Your context stays with you from the first question to the final export. --- -## What it does - -- **Plan** — Asks clarifying questions before diving in. Breaks down tasks, tracks progress. -- **Research** — OpenAlex, ArXiv, Papers With Code, citation graphs. Reads full papers, not just abstracts. -- **Write** — Section-by-section drafting with auto-citations. Export to Markdown or LaTeX. -- **Execute** — Docker-isolated code execution. SSH remotes. Modal cloud. Runs experiments, not just snippets. - ## Features -- **Structured planning** — asks 2-4 clarifying options before starting, builds task lists, generates completion reports -- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code; reads full papers section-by-section; crawls citation graphs -- **Context tracking** — token usage gauge, auto-compaction approaching limits, preserves key decisions -- **Multi-provider LLMs** — OpenAI, Anthropic, OpenRouter, OpenCode Go, plus local models (Ollama, LM Studio, vLLM) -- **Background jobs** — tasks persist and continue even if you close the browser (requires Redis) -- **Mode enforcement** — Plan, Research, Write modes restrict which tools are available -- **MCP support** — connect any Model Context Protocol server as additional tools +- **Plan + Execute modes** — Plan mode gathers context and creates plans; Execute mode does the work. Toggle with `Cmd+B` / `Cmd+E`. +- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code. Reads full papers, crawls citation graphs. +- **Paper writing** — Section-by-section drafting with auto-save to database. Preview and export (Markdown/LaTeX) in the Paper tab. +- **Sub-agent streaming** — Research tool spawns independent sub-agents with their own context, with nested tool call visibility. +- **Background jobs** — Celery + Redis processing. Close the browser, come back later. +- **Per-conversation parallelism** — Multiple conversations process simultaneously with isolated state. +- **Multi-provider LLMs** — OpenAI, Anthropic, OpenRouter, plus local models (Ollama, LM Studio, vLLM). +- **Onboarding flow** — Guided setup when no LLM provider is configured. ## Quick Start -### Docker Compose (recommended) - ```bash git clone https://github.com/xprilion/OpenMLR.git cd OpenMLR @@ -44,116 +33,63 @@ docker compose up -d Open `http://localhost:3000`. Create an account on first visit. -### Render - -Click the button to deploy to Render (includes Postgres + Redis): - -[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/xprilion/OpenMLR) - -After deploy, add your LLM API key(s) in the Environment settings. - -### Heroku - -[![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR) - -### Coolify - -In Coolify, create a new Docker Compose service pointing to this repo. It will use `docker-compose.yml` automatically. Add your LLM API keys as environment variables in the Coolify UI. - -### Local Development +## Local Development ```bash make install # Install deps (backend + frontend) cp .env.example .env # Add DATABASE_URL + at least one LLM key make db-fresh # Create tables -make dev # Start dev servers +make dev # Start dev servers (backend :3000, frontend :5173) ``` -Open `http://localhost:5173`. - ## Configuration -At minimum, you need: +At minimum, set in `.env`: ```bash -# Database DATABASE_URL="postgresql://user:pass@localhost:5432/openmlr" # At least one LLM provider OPENAI_API_KEY=sk-... -# or -ANTHROPIC_API_KEY=sk-ant-... -# or -OPENROUTER_API_KEY=sk-or-... -# or -OPENCODE_GO_API_KEY=sk-... # $5-10/mo for open models +# or ANTHROPIC_API_KEY=sk-ant-... +# or OPENROUTER_API_KEY=sk-or-... +``` + +For background jobs, add: + +```bash +REDIS_URL=redis://localhost:6379/0 +USE_BACKGROUND_JOBS=true +USE_REDIS_PUBSUB=true ``` -See `.env.example` for all options including: -- Local models (Ollama, LM Studio, vLLM) -- Background jobs (Redis + Celery) -- Web search (Brave API) -- GitHub integration +See `.env.example` for all options. -## Using Local Models +## Testing ```bash -# Ollama -OLLAMA_MODEL=llama3.1 -# Use as: ollama/llama3.1 - -# LM Studio -LMSTUDIO_API_BASE=http://localhost:1234/v1 -# Use as: lmstudio/default - -# Any OpenAI-compatible API -LOCAL_API_BASE=http://localhost:8000/v1 -LOCAL_MODEL=my-model -# Use as: local/my-model +make test # Run all tests (149 backend + 29 frontend + docs build) +make test-backend # Backend tests only +make test-frontend # Frontend tests only +make test-docs # Docs build check ``` ## Architecture ``` -frontend/ React 19 + Vite + react-router-dom +frontend/ React 19 + TypeScript + Vite backend/ Python 3.12 + FastAPI + SQLAlchemy + Celery site/ VitePress documentation ``` -Key components: -- **Agent Harness** — 300-iteration loop with doom detection, auto-compaction, mode enforcement -- **Tool Router** — Mode-based tool filtering, MCP integration -- **Session Manager** — Per-conversation state isolation -- **LLM Provider** — Multi-provider routing with retry logic - -See [Architecture](https://openmlr.dev/architecture) and [Agent Harness](https://openmlr.dev/agent-harness) for details. - -## Makefile - -| Target | Description | -|--------|-------------| -| `make install` | Install all dependencies | -| `make dev` | Run backend + frontend dev servers | -| `make up` | Start Docker Compose (app on :3000) | -| `make down` | Stop Docker Compose | -| `make restart` | Rebuild + restart web/worker | -| `make logs` | Tail all logs | -| `make docs-docker` | Run docs site (:4000) | -| `make docs-dev` | Run docs locally (:4000) | -| `make db-fresh` | Drop + recreate tables | -| `make check` | Type-check backend + frontend | -| `make test` | Run pytest | - -Run `make help` for all targets. +See [Architecture](https://openmlr.dev/architecture) for details. ## Contributing -Contributions welcome! Please: - 1. Fork the repo 2. Create a feature branch 3. Make your changes -4. Run `make check` and `make test` +4. Run `make test` 5. Submit a PR ## License diff --git a/backend/pyproject.toml b/backend/pyproject.toml index 8dd49a3..d7c2876 100644 --- a/backend/pyproject.toml +++ b/backend/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "openmlr" -version = "2.0.0" +version = "0.2.0" description = "OpenMLR — an ML research intern that reads papers, trains models, and ships code" requires-python = ">=3.12" license = { text = "MIT" } diff --git a/site/docs/.vitepress/config.ts b/site/docs/.vitepress/config.ts index 5ab888a..320d4ff 100644 --- a/site/docs/.vitepress/config.ts +++ b/site/docs/.vitepress/config.ts @@ -21,7 +21,7 @@ export default defineConfig({ { text: "Usage", items: [ - { text: "Modes (Plan / Research / Write)", link: "/modes" }, + { text: "Modes (Plan / Execute)", link: "/modes" }, { text: "Agent Tools", link: "/tools" }, ], }, @@ -31,6 +31,7 @@ export default defineConfig({ { text: "Architecture", link: "/architecture" }, { text: "Agent Harness", link: "/agent-harness" }, { text: "REST API", link: "/api" }, + { text: "Changelog", link: "/changelog" }, ], }, ], diff --git a/site/docs/agent-harness.md b/site/docs/agent-harness.md index 3ffa10b..adadee6 100644 --- a/site/docs/agent-harness.md +++ b/site/docs/agent-harness.md @@ -1,182 +1,131 @@ # Agent Harness -The agent harness is the core execution engine that processes user messages, manages tool calls, and maintains conversation context across long research sessions. +The agent harness is the core execution engine that processes user messages, manages tool calls, and maintains conversation context. ## Overview -OpenMLR's agent harness is designed for extended, multi-turn research workflows. Unlike simple chatbot loops, it handles: +The harness is designed for extended, multi-turn research workflows: - **Long-running sessions** — Up to 300 tool calls per user message -- **Context management** — Automatic compaction when approaching model limits -- **Mode enforcement** — Restricts tools based on Plan/Research/Write mode +- **Mode enforcement** — Restricts tools based on Plan/Execute mode +- **Context management** — Automatic compaction when approaching model limits - **Doom loop detection** — Breaks out of repetitive tool call patterns -- **Streaming output** — Real-time text and tool output via SSE - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Session Manager │ -│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ -│ │ Session 1 │ │ Session 2 │ │ Session N │ │ -│ │ (conv_id) │ │ (conv_id) │ │ (conv_id) │ │ -│ └──────┬──────┘ └─────────────┘ └─────────────┘ │ -└─────────┼───────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Agent Loop │ -│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ -│ │ Context │──▶│ LLM │──▶│ Parse │──▶│ Execute │ │ -│ │ Manager │ │ Stream │ │ Tools │ │ Tools │ │ -│ └──────────┘ └──────────┘ └──────────┘ └────┬─────┘ │ -│ ▲ │ │ -│ │ ┌──────────────────────────────────┘ │ -│ │ ▼ │ -│ ┌────┴─────────────────┐ ┌────────────────────────────┐ │ -│ │ Doom Detection │ │ Tool Router │ │ -│ │ (break loops) │ │ (mode filtering) │ │ -│ └──────────────────────┘ └────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -## Key Components +- **DB-persisted writing** — Paper drafts survive across workers and restarts +- **Redis interrupt relay** — Actually kills running tasks, not just a flag check +- **Sub-agent streaming** — Research tool spawns nested agents with visible tool calls + +## Agent Loop + +``` +┌─────────────────────────────────────────────────────────┐ +│ Agent Loop │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Context │──▶│ LLM │──▶│ Parse │ │ +│ │ Manager │ │ Stream │ │ Response │ │ +│ └──────────┘ └──────────┘ └────┬─────┘ │ +│ ▲ │ │ +│ │ ┌────────▼────────┐ │ +│ │ │ Tool Router │ │ +│ │ │ (mode filtering)│ │ +│ │ └────────┬────────┘ │ +│ │ │ │ +│ │ ┌────────▼────────┐ │ +│ │ │ Execute Tools │ │ +│ │ └────────┬────────┘ │ +│ │ │ │ +│ ┌────┴────────────┐ ┌────────▼────────┐ │ +│ │ Doom Detection │◀─────│ Add Results │ │ +│ │ (break loops) │ │ to Context │ │ +│ └─────────────────┘ └─────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +The loop runs for each user message: + +1. Check if context needs compaction +2. Call LLM with streaming (system prompt + history + tools) +3. Parse response for tool calls +4. Filter tools through mode restrictions +5. Execute allowed tools, return errors for blocked ones +6. Add results to context +7. Check for doom loops +8. Repeat until LLM produces no tool calls or max iterations reached + +## Mode Enforcement + +Tools are restricted based on the current mode at three layers: + +1. **System prompt** — Instructs the agent about mode constraints +2. **Tool filtering** — Only mode-allowed tools are sent to the LLM +3. **Runtime blocking** — Blocked calls return an error instead of executing + +See [Modes](/modes) for the full breakdown. + +## Context Management + +**Token tracking** uses a character-based estimate (~4 chars per token). + +**Compaction** triggers at 90% of the model's context window: +- Summarizes old messages while preserving recent ones +- Keeps the last N messages untouched (default: 5) +- Preserves completion reports, key decisions, and PLAN.md +- Broadcasts `context_usage` events for the UI gauge -### 1. Agent Loop (`agent/loop.py`) +## Doom Loop Detection -The main execution engine. Processes one user message at a time, iterating through LLM calls and tool executions. +Detects when the agent gets stuck in repetitive patterns: -```python -# Simplified flow -for iteration in range(max_iterations): # Default: 300 - if needs_compaction(): - compact_context() - - response = await llm.generate_stream(messages, tools) - - if response.has_tool_calls: - for tool_call in response.tool_calls: - if doom_loop_detected(): - inject_correction_prompt() - continue - - result = await tool_router.call(tool_call) - messages.append(tool_result) - else: - # No more tool calls, turn complete - break +**Identical consecutive calls** — Same tool + same arguments 3+ times: ``` - -**Key behaviors:** -- Exits when LLM produces no tool calls (natural completion) -- Exits on user interrupt (`/stop` command) -- Auto-compacts context at 90% of model's token limit -- Injects mode hints for Plan/Research/Write modes - -### 2. Context Manager (`agent/context.py`) - -Tracks message history and token usage. Handles compaction to stay within model limits. - -**Token tracking:** -```python -def estimate_tokens(text: str) -> int: - return max(1, len(text) // 4) # ~4 chars per token +bash(ls) → bash(ls) → bash(ls) → DETECTED ``` -**Compaction:** -- Triggered at 90% of model's max tokens (configurable) -- Summarizes old messages while preserving recent ones -- Keeps completion reports and key decisions intact -- Preserves the last N messages untouched (default: 5) - -**Usage tracking:** -```python -{ - "used": 45000, # Current token count - "max": 200000, # Model's context window - "ratio": 0.225 # Percentage used -} +**Repeating sequences** — A-B-A-B patterns: ``` - -### 3. Tool Router (`tools/registry.py`) - -Central registry for all tools. Handles mode-based filtering and dispatching. - -**Mode restrictions:** - -| Mode | Allowed Tools | -|------|---------------| -| **plan** | `ask_user`, `plan_tool`, `read_file`, `list_dir`, `glob_files`, `grep_search` | -| **research** | All plan tools + `web_search`, `papers`, `research`, `github_*` | -| **write** | All plan tools + `writing`, `web_search` (for citations), `papers` | -| **general** | All tools (no restrictions) | - -**When a blocked tool is called:** -``` -Tool 'bash' is not available in PLAN mode. -Plan mode is for planning and asking questions only. -Suggest switching to research or write mode using ask_user with suggest_mode. +read(a) → edit(a) → read(a) → edit(a) → DETECTED ``` -### 4. Doom Loop Detection (`agent/doom_loop.py`) +When detected, a correction prompt is injected telling the agent to try a different approach. -Detects when the agent gets stuck in repetitive patterns. +## DB-Persisted Writing Projects -**Pattern 1: Identical consecutive calls** -``` -bash(ls) → bash(ls) → bash(ls) # 3+ identical = doom loop -``` +Paper writing uses the `writing_projects` table: +- Outline, sections, and bibliography are stored as structured data +- Every write/update auto-saves to the database immediately +- Writing state survives Celery worker restarts, server redeployments, and browser refreshes +- The Paper tab in the UI reads directly from the database +- Client-side export to Markdown or LaTeX -**Pattern 2: Repeating sequences** -``` -read_file(a) → edit_file(a) → read_file(a) → edit_file(a) # A-B-A-B pattern -``` +## Redis Interrupt Relay -**Correction prompt injected:** -``` -[DOOM LOOP DETECTED] You have called `bash` with identical arguments 3 times -in a row. This is not making progress. Try a completely different approach: -- Use a different tool -- Change the arguments significantly -- Re-read the error message carefully -- Ask the user for help if you're stuck -``` +When a user clicks Stop: -### 5. Session Manager (`services/session_manager.py`) +1. Frontend sends `POST /api/interrupt` +2. Web process publishes interrupt signal to Redis channel +3. Celery worker receives the signal +4. Worker kills the running agent task immediately +5. `interrupted` event is broadcast via SSE -Manages multiple concurrent conversations. Each conversation gets its own isolated session. +This is a real kill, not a cooperative flag check. The agent stops within seconds regardless of what tool is executing. -**Session lifecycle:** -1. Created on first message to a conversation -2. Persists across browser refreshes (messages in DB) -3. Destroyed when conversation deleted or server restart +## Sub-Agent Streaming -**Per-session state:** -- `Session` — Message history, config, event callbacks -- `ToolRouter` — Registered tools, MCP connections -- `SandboxManager` — Docker containers, SSH connections +The `research` tool spawns an independent sub-agent: -## Event Flow +- Sub-agent has its own context window and tool set +- Parent agent sees nested tool calls streamed in real-time +- Frontend displays nested tool calls inline within the research tool output +- Useful for deep dives that would consume too much of the main context -All events are broadcast via Server-Sent Events (SSE): +## Per-Conversation Processing -``` -User Message → processing → assistant_chunk (streaming) → tool_call → - tool_output → assistant_chunk → ... → turn_complete -``` +Each conversation gets isolated state: -| Event | Data | When | -|-------|------|------| -| `processing` | `{status: "thinking..."}` | Agent starts | -| `assistant_chunk` | `{chunk: "text"}` | Streaming tokens | -| `assistant_stream_end` | `{}` | Stream complete | -| `tool_call` | `{name, arguments}` | Tool invoked | -| `tool_output` | `{name, output}` | Tool returned | -| `questions` | `{questions: [...]}` | `ask_user` called | -| `plan_update` | `{tasks: [...]}` | Task list changed | -| `context_usage` | `{used, max, ratio}` | Token gauge | -| `turn_complete` | `{}` | Processing done | -| `error` | `{error: "..."}` | Error occurred | +- Own agent session, tool router, and sandbox manager +- Processing state tracked independently (`idle` / `processing` / `interrupted`) +- Multiple conversations can process in parallel +- Interrupting one does not affect others ## Configuration @@ -185,7 +134,7 @@ Key settings in `AgentConfig`: ```python @dataclass class AgentConfig: - model_name: str = "" # LLM to use + model_name: str = "" # LLM to use (empty = auto-detect) max_iterations: int = 300 # Tool calls per turn stream: bool = True # Stream responses compact_threshold_ratio: float = 0.90 # Compact at 90% @@ -193,60 +142,3 @@ class AgentConfig: default_max_tokens: int = 200000 # Fallback context size yolo_mode: bool = False # Skip confirmations ``` - -## Extending the Harness - -### Adding a new tool - -```python -# In tools/my_tool.py -from ..agent.types import ToolSpec - -MY_TOOL_SPEC = ToolSpec( - name="my_tool", - description="Does something useful", - parameters={ - "type": "object", - "properties": { - "arg1": {"type": "string", "description": "First argument"}, - }, - "required": ["arg1"], - }, -) - -async def my_tool(arg1: str) -> str: - # Implementation - return f"Result: {arg1}" - -# In tools/registry.py, add to create_tool_router() -router.register(ToolSpec(...)) -``` - -### Adding mode restrictions - -```python -# In tools/registry.py -MODE_TOOL_RESTRICTIONS = { - "my_mode": { - "allowed": {"tool1", "tool2"}, - "blocked_message": "Tool '{tool}' not allowed in my_mode.", - }, -} -``` - -### Custom compaction logic - -Override `ContextManager.compact()` to customize how old messages are summarized. - -## Debugging - -Enable debug logging: - -```bash -LOG_LEVEL=DEBUG uvicorn openmlr.app:app -``` - -Key log messages: -- `[LLM] Model: ...` — Which model is being used -- `[DOOM LOOP DETECTED]` — Loop detected and corrected -- `Context nearing limit, compacting...` — Auto-compaction triggered diff --git a/site/docs/api.md b/site/docs/api.md index b4dc2d5..7d89255 100644 --- a/site/docs/api.md +++ b/site/docs/api.md @@ -9,7 +9,7 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens. | POST | `/api/auth/register` | `{username, password, display_name?}` | Create account, returns token | | POST | `/api/auth/login` | `{username, password}` | Login, returns token | | GET | `/api/auth/me` | — | Current user info | -| GET | `/api/auth/check` | — | Check if any users exist | +| GET | `/api/auth/check` | — | Check if any users exist (onboarding) | ## Conversations @@ -19,26 +19,24 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens. | POST | `/api/conversations` | `{title?, model?, mode?}` | Create conversation | | GET | `/api/conversations/:uuid` | — | Get conversation + messages | | DELETE | `/api/conversations/:uuid` | — | Delete conversation | -| POST | `/api/conversations/:uuid/switch` | — | Switch active conversation | ## Messaging | Method | Path | Body | Description | |--------|------|------|-------------| -| POST | `/api/message` | `{message, mode?}` | Send message (mode: plan/research/write) | +| POST | `/api/message` | `{message, mode?}` | Send message (mode: plan/execute) | | POST | `/api/answers` | `{answers: {qid: label}}` | Answer structured questions | -| POST | `/api/interrupt` | — | Cancel current agent turn | +| POST | `/api/interrupt` | — | Cancel current agent turn (Redis relay) | | POST | `/api/approval` | `{approvals: {id: bool}}` | Approve/reject tool calls | | POST | `/api/undo` | — | Undo last turn | | POST | `/api/compact` | — | Compact conversation context | -| POST | `/api/model` | `{model}` | Switch LLM model | +| POST | `/api/model` | `{model}` | Switch LLM model (sticky, persisted) | ## SSE | Method | Path | Description | |--------|------|-------------| -| GET | `/api/events?token=JWT` | Server-Sent Events stream | -| GET | `/api/events/test` | Test endpoint (3 events) | +| GET | `/api/events?token=JWT` | Server-Sent Events stream (supports reconnection catch-up) | ## Settings @@ -51,10 +49,23 @@ All endpoints are prefixed with `/api`. Authentication uses JWT Bearer tokens. | GET | `/api/providers` | List provider status | | GET | `/api/models` | List available models | | GET | `/api/status` | Current model + config | -| GET | `/api/reports/:id` | Get completion report content | + +## Frontend Routes + +The frontend is a single-page app served from `/`: + +| Route | Description | +|-------|-------------| +| `/login` | Authentication | +| `/` | Chat UI (protected, redirects to `/login` if unauthenticated) | +| `/:uuid` | Specific conversation | +| `/settings/providers` | API key management | +| `/settings/agent` | Model & behavior settings | +| `/settings/sandbox` | Execution environment settings | +| `/settings/writing` | Paper writing preferences | ## Health | Method | Path | Description | |--------|------|-------------| -| GET | `/health` | `{"status": "ok", "version": "2.0.0"}` | +| GET | `/health` | `{"status": "ok", "version": "0.2.0"}` | diff --git a/site/docs/architecture.md b/site/docs/architecture.md index 1a14207..b051274 100644 --- a/site/docs/architecture.md +++ b/site/docs/architecture.md @@ -2,239 +2,101 @@ ## Overview -OpenMLR is a full-stack application with a React frontend, Python backend, and PostgreSQL database. It's designed to run as a self-hosted service with optional background job processing. +OpenMLR is a full-stack application with three packages: -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Frontend │ -│ React 19 + Vite + react-router-dom │ -│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ -│ │ Landing │ │ Login │ │ Chat │ │Settings │ │ Reports │ │ -│ │ Page │ │ Page │ │ UI │ │ Panel │ │ Drawer │ │ -│ └─────────┘ └─────────┘ └────┬────┘ └─────────┘ └─────────┘ │ -└────────────────────────────────┼────────────────────────────────┘ - │ SSE + REST - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ Backend │ -│ Python 3.12 + FastAPI + SQLAlchemy + Celery │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Agent Harness │ │ -│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ -│ │ │ Loop │ │ Context │ │ Tool │ │ LLM │ │ │ -│ │ │ (300 it) │ │ Manager │ │ Router │ │ Provider │ │ │ -│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ -│ └─────────────────────────────────────────────────────────┘ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Tools │ │ -│ │ papers, research, writing, search, github, sandbox... │ │ -│ └─────────────────────────────────────────────────────────┘ │ -└────────────────────────────────────┬────────────────────────────┘ - │ - ┌──────────────────────┼──────────────────────┐ - ▼ ▼ ▼ - ┌──────────┐ ┌──────────┐ ┌──────────┐ - │ Postgres │ │ Redis │ │ Celery │ - │ DB │ │ (jobs) │ │ Worker │ - └──────────┘ └──────────┘ └──────────┘ -``` - -## Directory Structure - -``` -OpenMLR/ -├── frontend/ # React 19 + Vite -│ ├── src/ -│ │ ├── components/ # UI components -│ │ │ ├── LandingPage.tsx # Public landing page -│ │ │ ├── LoginPage.tsx # Auth forms -│ │ │ ├── AuthGuard.tsx # Route protection -│ │ │ ├── Sidebar.tsx # Conversation list -│ │ │ ├── MessageList.tsx # Chat messages -│ │ │ ├── InputArea.tsx # Message input + mode selector -│ │ │ ├── ModelModal.tsx # Model picker -│ │ │ ├── ApprovalModal.tsx# Sandbox confirmations -│ │ │ ├── SettingsPanel.tsx# User settings -│ │ │ ├── QuestionDrawer.tsx# Agent questions UI -│ │ │ ├── RightPanel.tsx # Tasks + resources -│ │ │ └── ReportDrawer.tsx # Completion reports -│ │ ├── hooks/ -│ │ │ ├── useSSE.ts # Server-Sent Events -│ │ │ └── useJobStatus.ts # Background job polling -│ │ ├── api.ts # REST client -│ │ └── types.ts # TypeScript types -│ └── index.html -│ -├── backend/ -│ ├── openmlr/ -│ │ ├── app.py # FastAPI entry point -│ │ ├── config.py # Layered config (YAML → env → auto) -│ │ ├── dependencies.py # DI (auth, db) -│ │ │ -│ │ ├── agent/ # Core agent harness -│ │ │ ├── loop.py # Agentic loop (300 iterations) -│ │ │ ├── context.py # Token tracking, compaction -│ │ │ ├── session.py # Per-conversation state -│ │ │ ├── llm.py # Multi-provider LLM calls -│ │ │ ├── prompts.py # System prompt builder -│ │ │ ├── doom_loop.py # Repetition detection -│ │ │ └── types.py # Data classes -│ │ │ -│ │ ├── tools/ # Agent tools -│ │ │ ├── registry.py # Tool router + mode restrictions -│ │ │ ├── local.py # bash, read, write, edit -│ │ │ ├── papers.py # OpenAlex, ArXiv, CrossRef -│ │ │ ├── research.py # Research sub-agent -│ │ │ ├── writing.py # Paper drafting -│ │ │ ├── ask_user.py # Structured questions -│ │ │ ├── plan.py # Task tracking -│ │ │ ├── search.py # Brave web search -│ │ │ ├── github.py # GitHub search -│ │ │ ├── sandbox_tools.py # Sandbox wrappers -│ │ │ └── mcp.py # MCP integration -│ │ │ -│ │ ├── sandbox/ # Code execution -│ │ │ ├── interface.py # Abstract interface -│ │ │ ├── local.py # Docker-based -│ │ │ ├── ssh.py # Remote SSH -│ │ │ └── modal_sandbox.py # Modal cloud -│ │ │ -│ │ ├── auth/ # JWT authentication -│ │ │ ├── router.py # /api/auth/* routes -│ │ │ └── security.py # bcrypt + JOSE -│ │ │ -│ │ ├── db/ # Database layer -│ │ │ ├── engine.py # AsyncSession setup -│ │ │ ├── models.py # SQLAlchemy models -│ │ │ └── operations.py # CRUD operations -│ │ │ -│ │ ├── routes/ # API routes -│ │ │ ├── agent.py # /api/message, /api/conversations -│ │ │ ├── settings.py # /api/settings, /api/models -│ │ │ └── health.py # /health, /api/health -│ │ │ -│ │ ├── services/ # Business logic -│ │ │ ├── event_bus.py # SSE broadcasting -│ │ │ ├── session_manager.py # Session lifecycle -│ │ │ └── job_manager.py # Celery job tracking -│ │ │ -│ │ └── tasks/ # Background jobs -│ │ └── agent_tasks.py # Celery task definitions -│ │ -│ └── configs/ -│ └── prompts/ -│ └── system_prompt.yaml # Jinja2 system prompt -│ -├── site/ # VitePress documentation -│ └── docs/ -│ -├── docker-compose.yml # Production deployment -├── docker-compose.coolify.yml # Coolify-specific -├── Dockerfile # Multi-stage build -└── railway.json # Railway deployment -``` +| Package | Stack | Purpose | +|---------|-------|---------| +| `backend/` | Python 3.12, FastAPI, SQLAlchemy async, Celery | API, agent harness, tools, background jobs | +| `frontend/` | React 19, TypeScript, Vite | Chat UI, settings pages, paper preview | +| `site/` | VitePress | Documentation | -## Data Flow +## Request Flow -### User Message Processing - -``` -1. User types message → InputArea.tsx -2. POST /api/message → agent.py:send_message() -3. Load/create session → SessionManager -4. Add user message to context -5. Start agent loop: - a. Build messages array (system + history + user) - b. Call LLM with streaming - c. Parse response for tool calls - d. Execute tools via ToolRouter - e. Add results to context - f. Repeat until no tool calls or max iterations -6. Broadcast events via SSE → useSSE.ts -7. Frontend updates in real-time ``` - -### SSE Event Stream - -```typescript -// Frontend subscribes -const { messages, isConnected } = useSSE('/api/events'); - -// Backend broadcasts -await event_bus.broadcast({ - event_type: "assistant_chunk", - data: { chunk: "Hello" }, - conversation_uuid: "..." -}); +┌──────────────────────────────────────────────────────────┐ +│ Frontend │ +│ React 19 + Vite + react-router-dom │ +│ /login / /:uuid /settings/* │ +└────────────────────┬─────────────────────────────────────┘ + │ SSE + REST + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Backend │ +│ FastAPI + SQLAlchemy async │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Agent Harness │ │ +│ │ Loop → LLM → Parse → Tool Router → Execute │ │ +│ │ ↑ │ │ │ +│ │ └───── results ─────────────┘ │ │ +│ └──────────────────────────────────────────────────┘ │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Tools: papers, research, writing, bash, sandbox │ │ +│ └──────────────────────────────────────────────────┘ │ +└──────────┬──────────────┬──────────────┬─────────────────┘ + │ │ │ + ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ + │ PostgreSQL │ │ Redis │ │ Celery │ + │ DB │ │ pub/sub │ │ Worker │ + └───────────┘ └───────────┘ └───────────┘ ``` -| Event | Payload | When | -|-------|---------|------| -| `processing` | `{status}` | Agent starts thinking | -| `assistant_chunk` | `{chunk}` | Streaming text token | -| `assistant_stream_end` | `{}` | Stream finished | -| `assistant_message` | `{content}` | Non-streaming fallback | -| `tool_call` | `{name, arguments}` | Tool invoked | -| `tool_output` | `{name, output}` | Tool returned | -| `questions` | `{questions}` | Agent asks user | -| `plan_update` | `{tasks}` | Task list changed | -| `resources_update` | `{resources}` | Resources changed | -| `context_usage` | `{used, max, ratio}` | Token gauge | -| `search_budget` | `{used, max}` | Paper search budget | -| `turn_complete` | `{}` | Processing done | -| `error` | `{error}` | Error occurred | -| `interrupted` | `{}` | User cancelled | - ## Database Schema ```sql --- Users and authentication +-- Auth users (id, username, password_hash, display_name, created_at) -user_settings (id, user_id, category, key, value, created_at, updated_at) +user_settings (id, user_id, category, key, value, updated_at) -- Conversations -conversations (id, uuid, user_id, title, model, mode, user_message_count, created_at, updated_at) +conversations (id, uuid, user_id, title, model, mode, created_at, updated_at) messages (id, conversation_id, role, content, metadata, created_at) --- Research -research_corpus (id, user_id, paper_id, title, authors, abstract, year, source, url, added_at) - --- Writing -writing_projects (id, user_id, title, outline, sections, bibliography, created_at, updated_at) +-- Research & Writing +research_corpus (id, user_id, paper_id, title, authors, abstract, year, source, url) +writing_projects (id, user_id, title, outline, sections, bibliography, updated_at) -- Execution -sandbox_configs (id, user_id, name, type, config, created_at, updated_at) +sandbox_configs (id, user_id, name, type, config) --- Task tracking (persisted) -conversation_tasks (id, conversation_id, content, status, priority, created_at, updated_at) -conversation_resources (id, conversation_id, title, type, url, content, created_at) +-- Task tracking +conversation_tasks (id, conversation_id, content, status, priority) +conversation_resources (id, conversation_id, title, type, content) -- Background jobs -agent_jobs (id, job_id, conversation_id, user_id, status, message, mode, model, error, created_at, started_at, completed_at) +agent_jobs (id, job_id, conversation_id, user_id, status, message, mode, model, error) ``` -## LLM Provider Routing +## SSE Event Flow -``` -Model name format: provider/model-name +The frontend connects to `/api/events?token=JWT` and receives real-time updates: -openai/gpt-4o → OpenAI API -anthropic/claude-sonnet-4 → Anthropic API -openrouter/... → OpenRouter API -opencode-go/qwen3.6-plus → OpenCode Go API -ollama/llama3.1 → Local Ollama -lmstudio/default → Local LM Studio -local/my-model → Custom OpenAI-compatible +``` +User sends message + │ + ▼ + processing → "thinking..." + │ + ▼ + assistant_chunk → streaming text tokens (repeats) + │ + ▼ + tool_call → {name, arguments} + │ + ▼ + tool_output → {name, output} + │ + ▼ + (repeat LLM → tools until done) + │ + ▼ + turn_complete → processing finished ``` -The `LLMProvider` class handles: -- API key selection based on prefix -- Base URL routing -- Anthropic vs OpenAI message format conversion -- Streaming and non-streaming calls -- Retry with exponential backoff +Key events: `processing`, `assistant_chunk`, `assistant_stream_end`, `tool_call`, `tool_output`, `plan_update`, `resources_update`, `context_usage`, `turn_complete`, `error`, `interrupted`. -## Background Jobs +SSE supports reconnection catch-up — if the client disconnects and reconnects, it receives missed events. + +## Background Jobs with Redis Interrupt When `USE_BACKGROUND_JOBS=true`: @@ -242,44 +104,39 @@ When `USE_BACKGROUND_JOBS=true`: User sends message │ ▼ -Web creates AgentJob in DB +Web process creates AgentJob in DB │ ▼ Celery task queued to Redis │ ▼ -Worker picks up job - │ - ▼ -Agent loop runs in worker +Worker picks up job, runs agent loop │ ▼ Events published to Redis pub/sub │ ▼ -Web relays to SSE clients +Web process relays events to SSE clients ``` -Benefits: -- Browser can close, job continues -- Horizontal scaling of workers -- Job status persistence +**Interrupt flow**: When a user clicks Stop, the web process publishes an interrupt signal to Redis. The worker receives it and actually kills the running task — not just a flag check on the next iteration. + +## Per-Conversation Processing -## Security Model +Each conversation has its own isolated processing state: -- **Authentication**: JWT tokens (bcrypt hashed passwords) -- **Authorization**: Per-user data isolation via `user_id` foreign keys -- **API Keys**: Stored in `user_settings` or env vars -- **Sandboxing**: Docker isolation for code execution -- **Confirmations**: Required for sandbox creation, destructive ops (unless `yolo_mode`) +- Multiple conversations can process simultaneously +- Each gets its own agent session, tool router, and sandbox manager +- Processing state (`idle`, `processing`, `interrupted`) is tracked per conversation +- One conversation being interrupted does not affect others ## Deployment Options -| Platform | Config File | Notes | -|----------|-------------|-------| -| Docker Compose | `docker-compose.yml` | Default, all-in-one | -| Railway | `railway.json` | Multi-service template | -| Coolify | `docker-compose.coolify.yml` | One-click deploy | -| Manual | - | Run backend + frontend separately | +| Platform | Config | Notes | +|----------|--------|-------| +| Docker Compose | `docker-compose.yml` | Default, includes all services | +| Render | Deploy button | Includes Postgres + Redis | +| Heroku | Deploy button | Dyno-based | +| Coolify | `docker-compose.coolify.yml` | Self-hosted PaaS | See [Setup](/setup) for detailed instructions. diff --git a/site/docs/changelog.md b/site/docs/changelog.md new file mode 100644 index 0000000..b506cad --- /dev/null +++ b/site/docs/changelog.md @@ -0,0 +1,39 @@ +# Changelog + +## v0.2.0 + +Major rewrite of the mode system, paper writing, processing architecture, and UI routing. + +### Mode System +- Simplified from Plan/Research/Write to **Plan + Execute** (two modes) +- Plan mode: ask questions, gather context, create plans. No execution. +- Execute mode: all tools available. Follow the plan. +- Toggle with P/E button, `Cmd+B` (Plan), or `Cmd+E` (Execute) +- Amber border for Plan messages, blue border for Execute messages +- Three-layer mode enforcement: system prompt, tool filtering, runtime blocking + +### Paper Writing +- Writing tool with **auto-save to database** — survives across workers and restarts +- Paper preview in the **Paper tab** in the UI +- Client-side export to Markdown and LaTeX +- Outline, sections, and bibliography managed as structured data + +### Processing Architecture +- **Per-conversation processing state** — multiple conversations run in parallel +- **Background jobs via Celery + Redis** — close the browser, come back later +- **Redis-based interrupt relay** — actually kills running worker tasks, not just a flag check +- **Sub-agent streaming** — research tool shows nested tool calls in real-time + +### Settings & UI +- **Settings as routed pages** (`/settings/providers`, `/settings/agent`, `/settings/sandbox`, `/settings/writing`) — no longer a modal +- **Sticky model selection** — persisted per-user in the database +- **Onboarding flow** — guided setup when no LLM provider is configured +- **Route restructure** — app served from `/` instead of `/app` +- **SSE reconnection catch-up** — missed events replayed on reconnect +- **PLAN.md auto-generated** and pinned in resources panel + +### Testing & CI +- **149 backend tests + 29 frontend tests** — comprehensive coverage +- **GitHub CI** — tests run on push and pull request +- `make test` runs all tests (backend + frontend + docs build) +- `make test-backend`, `make test-frontend`, `make test-docs` for targeted runs diff --git a/site/docs/configuration.md b/site/docs/configuration.md index a8ea486..bc000a4 100644 --- a/site/docs/configuration.md +++ b/site/docs/configuration.md @@ -12,121 +12,72 @@ ANTHROPIC_API_KEY=sk-ant-... OPENROUTER_API_KEY=sk-or-... # ── Local models (OpenAI-compatible APIs) ── -# Ollama OLLAMA_API_BASE=http://localhost:11434/v1 OLLAMA_MODEL=llama3.1 -# LM Studio LMSTUDIO_API_BASE=http://localhost:1234/v1 LMSTUDIO_MODEL=default -# Any OpenAI-compatible API (vLLM, TGI, etc.) LOCAL_API_BASE=http://localhost:8000/v1 LOCAL_MODEL=local/my-model LOCAL_API_KEY=not-needed -# ── Background jobs (optional) ── +# ── Background jobs (optional, recommended) ── REDIS_URL=redis://localhost:6379/0 USE_BACKGROUND_JOBS=true USE_REDIS_PUBSUB=true # ── Search & research (optional) ── -BRAVE_API_KEY=... # web search -OPENALEX_EMAIL=you@example.com # polite pool (no key needed) +BRAVE_API_KEY=... +OPENALEX_EMAIL=you@example.com # ── GitHub (optional) ── -GITHUB_TOKEN=ghp_... # improves code search rate limits +GITHUB_TOKEN=ghp_... # ── Auth ── JWT_SECRET_KEY=change-me-in-production # ── Docker execution ── -OPEN_MLR_DOCKER_IMAGE=python:3.12-slim # default container image +OPEN_MLR_DOCKER_IMAGE=python:3.12-slim # ── Modal sandbox (optional) ── MODAL_TOKEN_ID=... MODAL_TOKEN_SECRET=... ``` -## Per-User Settings +## Settings Pages -After login, click the gear icon in the sidebar to open Settings: +Settings are routed pages (not a modal), accessible from the sidebar: -| Tab | What you can configure | -|-----|----------------------| -| **Providers** | API keys for all services (stored encrypted in DB, override .env) | -| **Agent** | Default model, research model, YOLO mode | -| **Sandbox** | Default execution environment (local/SSH/Modal), Modal credentials | -| **Writing** | Citation style (APA/IEEE/ACM/Chicago), export format | +| Route | What you configure | +|-------|-------------------| +| `/settings/providers` | API keys for LLM providers and services. Stored encrypted in DB, override `.env` values. | +| `/settings/agent` | Default model, research model, YOLO mode, max iterations. | +| `/settings/sandbox` | Default execution environment (Docker/SSH/Modal), Modal credentials. | +| `/settings/writing` | Citation style (APA/IEEE/ACM/Chicago), export format preferences. | + +## Sticky Model Selection + +The selected model is persisted per-user in the database. When you switch models via the header dropdown, the choice sticks across sessions, devices, and browser refreshes. No need to re-select every time. ## Model Selection Models are auto-detected based on which API keys/URLs are configured: -### Cloud Providers - -| Key set | Default model | +| Key set | Example model | |---------|--------------| | `ANTHROPIC_API_KEY` | `anthropic/claude-sonnet-4` | | `OPENAI_API_KEY` | `openai/gpt-4o` | | `OPENROUTER_API_KEY` | `openrouter/anthropic/claude-sonnet-4` | - -### Local Models - -| Config | Model prefix | -|--------|-------------| | `OLLAMA_MODEL` | `ollama/llama3.1` | | `LMSTUDIO_API_BASE` | `lmstudio/default` | | `LOCAL_API_BASE` | `local/my-model` | -Override via Settings > Agent > Default Model, or by clicking the model -button in the header. - -## Local Model Setup - -### Ollama - -```bash -# Install and start Ollama -ollama serve - -# Pull a model -ollama pull llama3.1 - -# Configure -OLLAMA_MODEL=llama3.1 -# OLLAMA_API_BASE defaults to http://localhost:11434/v1 -``` - -Use as `ollama/llama3.1` in the model selector. - -### LM Studio - -1. Download and install [LM Studio](https://lmstudio.ai) -2. Load a model and start the server -3. Configure: -```bash -LMSTUDIO_API_BASE=http://localhost:1234/v1 -LMSTUDIO_MODEL=default -``` - -Use as `lmstudio/default` in the model selector. - -### vLLM / text-generation-inference / Other - -Any server that exposes an OpenAI-compatible `/v1/chat/completions` endpoint: - -```bash -LOCAL_API_BASE=http://localhost:8000/v1 -LOCAL_MODEL=local/my-model-name -LOCAL_API_KEY=not-needed # or your auth token -``` - -Use as `local/my-model-name` in the model selector. +Override via `/settings/agent` or the model dropdown in the header. ## Background Jobs -Enable persistent task tracking that survives browser refreshes: +Enable with Redis for persistent processing: ```bash REDIS_URL=redis://localhost:6379/0 @@ -135,14 +86,15 @@ USE_REDIS_PUBSUB=true ``` When enabled: -- Tasks and resources persist to the database - Agent processing continues even if you close the browser -- Reconnecting shows full progress history -- Multiple browser tabs receive live updates via Redis pub/sub +- Multiple conversations process in parallel via Celery workers +- Redis pub/sub relays events from workers to SSE clients +- Redis-based interrupt relay actually kills running worker tasks +- Reconnecting via SSE catches up on missed events Requires a running Redis server. Use `make infra` to start one with Docker. -## Agent Config File +## Agent Config `backend/configs/agent_config.yaml` controls defaults: @@ -150,6 +102,8 @@ Requires a running Redis server. Use `make infra` to start one with Docker. model_name: "" # empty = auto-detect max_iterations: 300 stream: true -paper_search_budget: 25 # API calls per session +paper_search_budget: 25 require_plan_approval: true ``` + +These can be overridden per-user via `/settings/agent`. diff --git a/site/docs/index.md b/site/docs/index.md index daa06f7..ada0324 100644 --- a/site/docs/index.md +++ b/site/docs/index.md @@ -2,8 +2,8 @@ layout: home hero: name: OpenMLR - text: ML Research Intern - tagline: Plans tasks, reads papers, writes drafts, and runs experiments — end to end. + text: ML Research Agent + tagline: Plans tasks, researches papers, writes drafts, and executes code — end to end, in one conversation. actions: - theme: brand text: Get Started @@ -13,13 +13,9 @@ hero: link: https://github.com/xprilion/OpenMLR features: - title: Plan - details: Structured questions with options, task breakdown, scope clarification before any work begins. - - title: Research - details: Search OpenAlex, ArXiv, CrossRef. Read papers section-by-section. Crawl citation graphs. Find code on GitHub. - - title: Write - details: Section-by-section paper drafting with bibliography management and Markdown/LaTeX export. + details: Ask clarifying questions, gather context, break tasks into structured plans. No execution until you're ready. - title: Execute - details: Docker-isolated code execution locally, on SSH remotes, or Modal cloud sandboxes. + details: Research papers, write drafts, run code. All tools available. Follows the plan you built in Plan mode. --- ## Quick start @@ -39,33 +35,22 @@ Open `http://localhost:3000`. Create an account. Start researching. [![Deploy to Heroku](https://www.herokucdn.com/deploy/button.svg)](https://www.heroku.com/deploy?template=https://github.com/xprilion/OpenMLR) -See [Setup & Installation](/setup) for Coolify, local development, and more options. - -## Why OpenMLR? - -Ever started researching a topic, opened 47 browser tabs, took notes in three different apps, lost track of that one paper you saw yesterday, and then had to context-switch to run some experiments? - -OpenMLR keeps everything in one place. Your research context stays with you from the first "what should I look into?" to the final PDF export. - -**No more:** -- Lost tabs and forgotten citations -- Copy-pasting between arxiv, notes, and code -- "Where did I see that figure?" -- Starting over because you closed your browser +See [Setup & Installation](/setup) for local development and more options. ## How it works -``` -Plan → Research → Write → Execute -``` +OpenMLR uses two modes to keep the agent focused: + +- **Plan mode (P)** — The agent asks questions, gathers context, and creates structured plans. No code execution, no file writes. Toggle with `Cmd+B`. Messages have an amber border. +- **Execute mode (E)** — The agent does the work: researches papers, writes drafts, runs experiments. All tools available. Toggle with `Cmd+E`. Messages have a blue border. -Each mode restricts which tools are available, keeping the agent focused: +Switch modes with the P/E button in the input area or keyboard shortcuts. The agent follows the plan built during Plan mode. -| Mode | What it does | Tools available | -|------|--------------|-----------------| -| **Plan** | Asks clarifying questions, breaks down tasks | Questions, task tracking | -| **Research** | Searches papers, crawls citations | OpenAlex, ArXiv, web search, GitHub | -| **Write** | Drafts sections, manages bibliography | Writing tools, citation lookup | -| **Execute** | Runs code when needed | Docker, SSH, Modal (available in all modes) | +## Key features -The agent can suggest switching modes, but you approve the switch. No more half-baked drafts with missing citations. +- **Paper research** — OpenAlex, ArXiv, CrossRef, Papers With Code. Full paper reading, citation graphs. +- **Paper writing** — Section-by-section drafting with auto-save. Preview + export (Markdown/LaTeX) in the Paper tab. +- **Sub-agent streaming** — Research tool spawns independent agents with nested tool call visibility. +- **Background jobs** — Celery + Redis. Close the browser, come back later. +- **Per-conversation parallelism** — Multiple conversations process simultaneously. +- **Onboarding flow** — Guided setup when no LLM provider is configured. diff --git a/site/docs/modes.md b/site/docs/modes.md index 64f4dc6..09d6eec 100644 --- a/site/docs/modes.md +++ b/site/docs/modes.md @@ -1,73 +1,70 @@ # Modes -OpenMLR uses three per-message modes. Switch modes using the selector above -the input area. Code execution is available in all modes. +OpenMLR uses two modes — **Plan** and **Execute** — to keep the agent focused on the right kind of work. -## Plan mode +## Plan Mode (P) -**Purpose**: Clarify scope before doing work. +**Purpose**: Gather context, ask questions, create structured plans before doing any work. -The agent will: -- Ask structured questions using a bottom drawer UI (2-4 options + free text) -- Break tasks into a plan visible in the right panel -- Not execute code or modify files -- Suggest switching to Research or Write mode when ready +**What the agent can do:** +- Ask clarifying questions via `ask_user` (structured options UI) +- Create and update task plans via `plan_tool` +- Read files and search the codebase (read-only filesystem tools) +- Search the web and papers for context +- Generate `PLAN.md` and pin it in resources -## Research mode +**What the agent cannot do:** +- Write or edit files +- Execute code (bash, sandbox) +- Write paper sections +- Use the `research` sub-agent -**Purpose**: Find and synthesize information. +**Visual indicator**: Messages have an **amber border**. -The agent will: -- Search papers via OpenAlex, ArXiv, CrossRef -- Read full paper sections from ar5iv HTML -- Crawl citation graphs and find related work -- Search GitHub for code examples -- Track all papers/resources in the right panel -- Respect the per-session search budget (default 25 API calls) +**When to use**: Start here. Let the agent understand the problem, ask questions, and build a plan before switching to Execute. -### Search budget +## Execute Mode (E) -Each session has a limited number of paper API calls to prevent endless -searching. The budget is shown in the right panel. When exhausted, the agent -must ask the user before continuing. +**Purpose**: Do the work. Follow the plan built in Plan mode. -## Write mode +**What the agent can do:** +- All tools except `ask_user` +- Research papers, crawl citations, spawn sub-agents +- Write and edit files +- Draft paper sections with auto-save +- Run code in bash or sandboxes (Docker/SSH/Modal) -**Purpose**: Draft academic content. +**What the agent cannot do:** +- Use `ask_user` (no structured questions — it should be working, not asking) -The agent will: -- Write paper sections using the `writing` tool -- Manage citations and bibliography -- Reference completion reports from the research phase -- Export to Markdown or LaTeX +**Visual indicator**: Messages have a **blue border**. -## Task management +**When to use**: Once you have a plan and the agent knows what to do. -The right panel shows: -- **Tasks**: Plan items with status (pending → in progress → completed) -- **Resources**: Papers, code repos, datasets, and completion reports +## Switching Modes -When a task is marked completed, a **completion report** is auto-generated -with a summary and hints for upcoming tasks. Click report titles in the -resources list to view them in a slide-out drawer. +| Method | Action | +|--------|--------| +| **P/E button** | Click the mode toggle above the input area | +| **Cmd+B** | Switch to Plan mode | +| **Cmd+E** | Switch to Execute mode | -### Completion reports +The mode applies per-message. You can switch freely between messages. -Reports follow a markdown spec: +## Mode Enforcement -```markdown -# Task Completion Report: [task title] -**Completed**: [timestamp] -## Summary -[what was accomplished] -## Next Steps -[recommendations for upcoming tasks] -``` +Mode restrictions are enforced at three layers: + +1. **System prompt** — The agent is instructed about what it can and cannot do in the current mode +2. **Tool filtering** — The tool router only presents mode-allowed tools to the LLM +3. **Runtime blocking** — If a tool call somehow bypasses filtering, the router returns an error message instead of executing -The agent re-reads these reports to maintain context across compactions. +When a blocked tool is called, the agent receives: +``` +Tool 'bash' is not available in PLAN mode. +Plan mode is for planning and asking questions only. +``` -## Context tracking +## PLAN.md -The right panel shows a token usage gauge. When approaching the model's -context window limit, the system auto-compacts the conversation by summarizing -older messages. The gauge color changes: green → yellow → red. +When the agent creates a plan in Plan mode, it auto-generates a `PLAN.md` file that is pinned in the resources panel. This plan persists across context compactions and serves as the agent's reference during Execute mode. diff --git a/site/docs/setup.md b/site/docs/setup.md index 727dba5..8797c31 100644 --- a/site/docs/setup.md +++ b/site/docs/setup.md @@ -9,17 +9,15 @@ | Node.js | 20+ | [nodejs.org](https://nodejs.org) | | pnpm | 9+ | `npm i -g pnpm` | | PostgreSQL | 14+ | [postgresql.org](https://www.postgresql.org) | -| Docker | 20+ | [docker.com](https://www.docker.com) (recommended for code execution) | +| Docker | 20+ | [docker.com](https://www.docker.com) (recommended) | ## Quick Start with Docker Compose -The easiest way to run everything: - ```bash git clone https://github.com/xprilion/OpenMLR.git cd OpenMLR cp .env.example .env # add your API keys -make up # starts db, redis, web, worker +docker compose up -d ``` Open `http://localhost:3000`. Create an account on first visit. @@ -36,10 +34,9 @@ make install This runs `uv sync` for the Python backend and `pnpm install` for the frontend. -> **Do not** create a virtual environment at the project root (`uv venv` or `python -m venv`). -> The backend is a standalone uv project — `uv sync` and `uv run` automatically manage -> `backend/.venv`. Activating a root-level venv will conflict with the backend's environment -> and cause import errors at runtime. +> **Do not** create a virtual environment at the project root. +> The backend is a standalone uv project — `uv sync` and `uv run` manage +> `backend/.venv` automatically. A root-level venv will cause import errors. ### Configure @@ -54,8 +51,6 @@ DATABASE_URL="postgresql://user:pass@localhost:5432/openmlr" OPENROUTER_API_KEY=sk-or-... # or OPENAI_API_KEY or ANTHROPIC_API_KEY ``` -For local models, see [Local Models](#local-models) below. - See [Configuration](/configuration) for all options. ### Create database @@ -75,7 +70,7 @@ Opens backend on `:3000` and Vite dev server on `:5173`. Use `:5173` for develop **With background jobs** (requires Redis): ```bash make infra # start postgres + redis in Docker -make dev-full # backend + frontend + celery worker +make dev # backend + frontend dev servers ``` **Production**: @@ -98,79 +93,14 @@ make down # stop all services ### Development with Live Reload ```bash -make dev-docker-build # first time -make dev-docker # subsequent runs +make dev-docker # docker compose with live reload (includes docs) ``` Code changes are automatically detected and services restart. -### Useful Commands - -| Command | Description | -|---------|-------------| -| `make up` | Start all services | -| `make down` | Stop all services | -| `make restart` | Quick rebuild web + worker | -| `make rebuild` | Full rebuild from scratch | -| `make logs` | Tail all logs | -| `make logs-web` | Tail web service only | -| `make logs-worker` | Tail worker only | -| `make shell-db` | psql into database | -| `make shell-web` | bash into web container | -| `make infra` | Start only db + redis | - -## Local Models - -OpenMLR supports any OpenAI-compatible API for local inference. - -### Ollama - -```bash -# Start Ollama -ollama serve - -# Pull a model -ollama pull llama3.1 - -# Configure in .env -OLLAMA_MODEL=llama3.1 -``` - -Use as `ollama/llama3.1` in the model selector. - -### LM Studio - -1. Start the LM Studio server from the UI -2. Configure in `.env`: -```bash -LMSTUDIO_API_BASE=http://localhost:1234/v1 -LMSTUDIO_MODEL=default -``` - -Use as `lmstudio/default` in the model selector. - -### vLLM / text-generation-inference / Other - -For any OpenAI-compatible server: - -```bash -LOCAL_API_BASE=http://localhost:8000/v1 -LOCAL_MODEL=local/my-model -LOCAL_API_KEY=not-needed # if no auth required -``` - -Use as `local/my-model` in the model selector. - -## First Launch - -1. Open `http://localhost:5173` (dev) or `http://localhost:3000` (prod/Docker) -2. Create an account (first user is auto-created) -3. The model is auto-detected from your configured API keys -4. Start a conversation in **Plan** mode - ## Background Jobs -To enable persistent task tracking that survives browser refreshes: +Enable persistent processing that survives browser refreshes: ```bash # In .env @@ -180,9 +110,20 @@ USE_REDIS_PUBSUB=true ``` When enabled: -- Tasks and resources persist to the database - Agent continues processing even if you close the browser -- You can return later and see all progress +- Per-conversation processing state — multiple conversations run in parallel +- Redis-based interrupt relay actually kills running worker tasks +- Reconnecting via SSE catches up on missed events + +Requires a running Redis server. Use `make infra` to start one with Docker. + +## First Launch + +1. Open `http://localhost:5173` (dev) or `http://localhost:3000` (prod/Docker) +2. Create an account (first user is auto-created) +3. If no LLM provider is configured, the **onboarding flow** guides you through adding API keys at `/settings/providers` +4. Start a conversation — you'll be in **Plan mode** by default +5. Switch to **Execute mode** (P/E button or `Cmd+E`) when ready to work ## All Makefile Targets @@ -193,22 +134,25 @@ Run `make help` for the full list: | **Setup** | | | `make install` | Install all dependencies | | **Development** | | -| `make dev` | Run backend + frontend | -| `make dev-full` | Run with background jobs | +| `make dev` | Run backend + frontend dev servers | | `make worker` | Start Celery worker only | | **Docker Compose** | | | `make up` | Start all services | | `make down` | Stop all services | -| `make restart` | Quick rebuild + restart | -| `make rebuild` | Full rebuild | +| `make restart` | Quick rebuild web + worker | +| `make rebuild` | Full rebuild from scratch | | `make logs` | Tail all logs | | `make infra` | Start only db + redis | -| `make dev-docker` | Live reload with Docker | +| `make dev-docker` | Live reload with Docker (includes docs) | | **Database** | | | `make db-fresh` | Drop + recreate tables | | `make db-upgrade` | Run migrations | +| **Testing** | | +| `make test` | Run all tests (backend + frontend + docs build) | +| `make test-backend` | Backend tests only (149 tests) | +| `make test-frontend` | Frontend tests only (29 tests) | +| `make test-docs` | Docs build check | | **Other** | | | `make check` | Type-check backend + frontend | -| `make test` | Run backend tests | | `make docs-dev` | Preview docs locally | | `make clean` | Remove build artifacts | diff --git a/site/docs/tools.md b/site/docs/tools.md index de2511d..3727ea9 100644 --- a/site/docs/tools.md +++ b/site/docs/tools.md @@ -1,29 +1,25 @@ # Agent Tools -The agent has access to 18 built-in tools. Tools are invoked automatically -based on the task at hand. +The agent has access to built-in tools organized by category. Tool availability depends on the current [mode](/modes). -## Filesystem +## Planning Tools -| Tool | Description | -|------|-------------| -| `bash` | Execute shell commands in a Docker container (falls back to host if Docker unavailable) | -| `read` | Read files with line numbers | -| `write` | Create/overwrite files | -| `edit` | Find-and-replace in files | +| Tool | Description | Plan | Execute | +|------|-------------|:----:|:-------:| +| `ask_user` | Ask structured questions (2-4 options + free text per question) | yes | no | +| `plan_tool` | Create/update task plans, track resources, generate completion reports | yes | yes | -## Research +## Research Tools -| Tool | Description | -|------|-------------| -| `papers` | Search OpenAlex, read ArXiv papers, get citations, find code/datasets | -| `web_search` | Brave web search | -| `research` | Spawn an independent research sub-agent with its own context | -| `github_read_file` | Read files from GitHub repos | -| `github_list_repos` | List repos for a user/org | -| `github_find_examples` | Search GitHub for code examples | +| Tool | Description | Plan | Execute | +|------|-------------|:----:|:-------:| +| `papers` | Search OpenAlex, read ArXiv papers, get citations, find code/datasets | yes | yes | +| `web_search` | Brave web search | yes | yes | +| `research` | Spawn an independent research sub-agent with its own context | no | yes | +| `github_search` | Search GitHub for repos and code | yes | yes | +| `github_read` | Read files from GitHub repos | yes | yes | -### Papers operations +### Papers Operations | Operation | Source | Description | |-----------|--------|-------------| @@ -36,19 +32,56 @@ based on the task at hand. | `find_code` | Papers With Code | GitHub repos linked to papers | | `find_datasets` | Papers With Code | Datasets linked to papers | -## Planning & interaction +### Research Sub-Agent -| Tool | Description | -|------|-------------| -| `plan_tool` | Create/update task plans, track resources, generate completion reports | -| `ask_user` | Ask structured questions (2-4 options + free text per question) | -| `writing` | Manage paper writing projects (outline, sections, bibliography, export) | +The `research` tool spawns an independent sub-agent with its own context window. The parent agent sees nested tool calls streamed in real-time. Useful for deep dives that would consume too much of the main conversation's context. + +## Writing Tool + +| Tool | Description | Plan | Execute | +|------|-------------|:----:|:-------:| +| `writing` | Paper authoring — manage outline, write sections, update bibliography | no | yes | + +The writing tool manages a **writing project** stored in the database: +- **Outline**: Define paper structure (sections, subsections) +- **Sections**: Write/update individual sections with auto-save +- **Bibliography**: Manage citations and references +- **Auto-save**: All changes persist to the database immediately, surviving across workers and restarts + +Paper preview and client-side export (Markdown/LaTeX) are available in the **Paper tab** in the UI. + +## Filesystem Tools -## Execution environments +| Tool | Description | Plan | Execute | +|------|-------------|:----:|:-------:| +| `read` | Read files with line numbers | yes | yes | +| `write` | Create/overwrite files | no | yes | +| `edit` | Find-and-replace in files | no | yes | +| `list_dir` | List directory contents | yes | yes | +| `glob_files` | Find files by glob pattern | yes | yes | +| `grep_search` | Search file contents | yes | yes | -| Tool | Description | +In Plan mode, only read-only filesystem tools are available. + +## Execution Tools + +| Tool | Description | Plan | Execute | +|------|-------------|:----:|:-------:| +| `bash` | Execute shell commands (Docker-isolated when available) | no | yes | +| `sandbox` | Run code in Docker containers, SSH remotes, or Modal cloud | no | yes | + +### Sandbox Types + +| Type | Description | |------|-------------| -| `sandbox_probe` | Check environment (Python version, GPU, disk, packages) | -| `sandbox_create` | Create a new sandbox (local/SSH/Modal) | -| `sandbox_exec` | Run commands in active sandbox | -| `sandbox_read` / `sandbox_write` | File I/O in sandbox | +| **Local (Docker)** | Docker container on the host machine | +| **SSH** | Remote machine via SSH | +| **Modal** | Cloud sandbox via Modal | + +## Mode Restrictions + +Tools are filtered based on the current mode before being sent to the LLM. See [Modes](/modes) for details on the enforcement layers. + +In summary: +- **Plan mode**: `ask_user`, `plan_tool`, read-only filesystem, web search, papers, GitHub +- **Execute mode**: Everything except `ask_user`