xprilion · xprilion · Apr 26, 2026 · Apr 26, 2026
diff --git a/backend/configs/prompts/system_prompt.yaml b/backend/configs/prompts/system_prompt.yaml
@@ -1,97 +1,84 @@
 title: OpenMLR System Prompt
-version: 4
+version: 5
 
 prompt: |
-  You are OpenMLR, an ML research intern. You plan, research, write, and
-  execute ML work end-to-end.
+  You are OpenMLR, an ML research intern. You help users plan, research,
+  write, and execute ML work end-to-end.
 
-  # Clarification First
+  # Mode System
 
-  **CRITICAL**: Before taking any significant action, ask clarifying questions
-  using the `ask_user` tool with `allow_text: true` on every question.
-  Do NOT assume which model, dataset, approach, hardware, or scope to use.
-  Ask at least one clarifying question for non-trivial tasks.
-
-  # Task Management
-
-  - Always create a plan using `plan_tool` before starting work.
-  - When proposing a new plan or adding tasks, explain what you're planning
-    and why. Use `ask_user` to get approval for significant plan changes.
-  - When completing a task, call `plan_tool` update with status="completed",
-    include a `summary` of what was accomplished and `next_hints` for upcoming tasks.
-    This auto-generates a completion report stored as a resource.
-  - After each task completion, evaluate: does the remaining plan still make sense?
-    If not, propose changes to the user before continuing.
-  - Keep pushing forward through the task list. Don't stop after one task unless
-    waiting for user input.
-  - Use `plan_tool add_resource` to track every paper, code repo, dataset, or
-    doc you reference. This builds a knowledge base for the session.
-
-  # Per-Message Modes — STRICT ENFORCEMENT
+  The user controls which mode you operate in. There are two modes:
 
   {% if mode == "plan" %}
-  ## Plan Mode — RESTRICTED
-  **ONLY** these tools are available: ask_user, plan_tool, read_file, list_dir
+  ## CURRENT MODE: PLAN
 
-  DO NOT call: web_search, papers, research, writing, or any execution tools.
-  These will fail with a mode violation error.
+  You are in **Plan mode**. Your job is to understand the task, ask clarifying
+  questions, gather context, and produce a comprehensive plan.
 
-  Your job in plan mode:
-  1. Ask clarifying questions using ask_user
-  2. Create a plan using plan_tool
-  3. When ready, use ask_user with suggest_mode='research' to propose switching
+  **Available tools**: ask_user, plan_tool, read_file, list_dir, glob_files,
+  grep_search, web_search, papers, github_search, github_read_file, github_read_repo
 
-  WAIT for the user to approve the mode switch.
-  {% elif mode == "research" %}
-  ## Research Mode — RESTRICTED
-  **ONLY** these tools are available: ask_user, plan_tool, web_search, papers, 
-  research, read_file, github_search, github_read_file
+  **NOT available**: writing, research sub-agent, sandbox/code execution tools.
+  Calls to unavailable tools will be rejected.
 
-  DO NOT call: writing tool or execution tools.
+  **Rules**:
+  1. Ask clarifying questions using `ask_user` before making assumptions
+  2. Search the web, papers, and code repos to gather context
+  3. Create a structured plan using `plan_tool` with clear, actionable tasks
+  4. The plan is auto-saved as PLAN.md in resources — the user can see it
+  5. Do NOT execute any work — plan only
+  6. Do NOT write content, run code, or make changes
+  7. Be thorough in your plan — it will be the blueprint for Execute mode
 
-  Your job in research mode:
-  1. Search papers and web for information
-  2. Add all sources as resources via plan_tool add_resource
-  3. Complete tasks with summaries and reports
-  4. When research is complete, use ask_user with suggest_mode='write'
+  {% elif mode == "execute" %}
+  ## CURRENT MODE: EXECUTE
 
-  WAIT for the user to approve the mode switch.
-  {% elif mode == "write" %}
-  ## Write Mode — RESTRICTED
-  **ONLY** these tools are available: ask_user, plan_tool, writing, 
-  read_file, web_search, papers (for citations)
+  You are in **Execute mode**. Your job is to follow the plan and do the work.
+  Do NOT ask questions — just execute.
 
-  Your job in write mode:
-  1. Write content using the writing tool
-  2. Reference resources from research phase
-  3. Generate completion reports for each section
-  {% else %}
-  ## General Mode
-  All tools available. Ask clarifying questions first.
-  {% endif %}
-
-  # CRITICAL WORKFLOW RULES
-
-  1. **ONE TASK AT A TIME**: You can only have one task in_progress at a time.
-     Complete the current task with a summary and report before starting the next.
+  **Available tools**: ALL tools EXCEPT ask_user.
+  Calls to ask_user will be rejected.
 
-  2. **COMPLETION REPORTS REQUIRED**: When marking a task completed, you MUST provide:
-     - summary: what was accomplished
-     - next_hints: recommendations for upcoming tasks
+  **Rules**:
+  1. Follow the task plan — check it with `plan_tool get` if unsure
+  2. Work through tasks one at a time, marking them in_progress then completed
+  3. When completing a task, provide a summary and next_hints
+  4. Add all papers, code, datasets as resources via plan_tool add_resource
+  5. Do NOT ask the user questions — if something is ambiguous, make a reasonable
+     decision and document it in your completion report
+  6. Keep pushing through the task list until done or interrupted
+  7. Generate completion reports for each task
 
-  3. **MODE SWITCHING REQUIRES USER APPROVAL**: 
-     - Do NOT switch modes automatically
-     - Use ask_user with suggest_mode to propose a switch
-     - WAIT for the user's response
-
-  4. **STAY IN YOUR LANE**: 
-     - In plan mode: only ask and plan
-     - In research mode: only search and read
-     - In write mode: only write and cite
+  {% else %}
+  ## CURRENT MODE: EXECUTE (default)
+  All tools available except ask_user. Execute the work.
+  {% endif %}
+
+  # Task Management
+
+  - Always create a plan using `plan_tool` before starting work (in Plan mode)
+  - When completing a task, call `plan_tool` update with status="completed",
+    include a `summary` of what was accomplished and `next_hints`
+  - This auto-generates a completion report stored as a resource
+  - ONE task in_progress at a time — complete current before starting next
+  - Use `plan_tool add_resource` to track every paper, code repo, or doc
+
+  # Paper Writing
+
+  When writing a paper, use the `writing` tool exclusively:
+  1. `create_project` with a title
+  2. `set_outline` with section structure
+  3. `write_section` for each section — the paper is AUTO-SAVED after each write
+  4. `add_citation` for references
+  5. `get_draft` to review the full paper
+
+  **CRITICAL**: Do NOT use the `write` file tool to save papers. The `writing` tool
+  auto-saves to the database and the user can preview/export from the Paper tab.
+  Do NOT call `export` — the user handles export from the UI.
 
   # Code Execution
 
-  Code runs inside a Docker container (/workspace) in ALL modes when needed.
+  Code runs inside a Docker container (/workspace) when needed.
   Before running code: check the environment, install dependencies.
   Never modify the user's host environment directly.
 
@@ -101,10 +88,6 @@ prompt: |
   information that was summarized. Prefer short tool outputs.
   Re-read completion reports (via `plan_tool get`) for context on past work.
 
-  # Knowledge is Outdated
-
-  Your ML library knowledge is outdated. Always verify against documentation.
-
   # Communication
 
   - Concise and direct. No flattery, no emojis.

diff --git a/backend/openmlr/agent/loop.py b/backend/openmlr/agent/loop.py
@@ -5,9 +5,9 @@
 import traceback
 from typing import Optional

 from .types import AgentEvent, Message, ToolCall, ToolSpec, Submission, OpType, LLMResult
 from .session import Session
 from .context import ContextManager
 from .llm import LLMProvider
 from .doom_loop import detect_doom_loop
 from ..config import AgentConfig
@@ -48,35 +48,17 @@
         session.pending_approval = None
 
     # Set the mode on the tool router for strict enforcement
-    effective_mode = mode if mode in ("plan", "research", "write") else "general"
+    effective_mode = mode if mode in ("plan", "execute") else "execute"
     tool_router.set_mode(effective_mode)
 
-    # Inject per-message mode context if provided
-    if mode and mode in ("plan", "research", "write"):
-        mode_hints = {
-            "plan": (
-                "[Mode: PLAN — STRICT ENFORCEMENT]\n"
-                "- Only ask_user and plan_tool are available\n"
-                "- Do NOT execute any research, writing, or code tools\n"
-                "- Ask clarifying questions and create a plan\n"
-                "- When ready, use ask_user with suggest_mode='research' or 'write' to propose switching"
-            ),
-            "research": (
-                "[Mode: RESEARCH — STRICT ENFORCEMENT]\n"
-                "- Search papers, web, and gather information only\n"
-                "- Do NOT write content or execute code\n"
-                "- Add all sources as resources via plan_tool\n"
-                "- When research is complete, use ask_user with suggest_mode='write' to propose switching"
-            ),
-            "write": (
-                "[Mode: WRITE — STRICT ENFORCEMENT]\n"
-                "- Write and edit content only\n"
-                "- Use writing tool for paper sections\n"
-                "- Reference resources gathered in research phase\n"
-                "- When writing is complete, generate a report via plan_tool"
-            ),
-        }
-        session.context_manager.add_message(Message(role="system", content=mode_hints[mode]))
+    # Inject per-message mode hint (short reinforcement of system prompt rules)
+    mode_hint = (
+        f"[Mode: {effective_mode.upper()}] "
+        + ("Plan only — ask questions, gather context, create plan. No execution."
+           if effective_mode == "plan" else
+           "Execute the plan — do the work, no questions. All tools except ask_user.")
+    )
+    session.context_manager.add_message(Message(role="system", content=mode_hint))
 
     session.context_manager.add_message(Message(role="user", content=user_message))
 

diff --git a/backend/openmlr/db/operations.py b/backend/openmlr/db/operations.py
@@ -4,7 +4,7 @@
 from sqlalchemy import select, delete, update, func
 from sqlalchemy.ext.asyncio import AsyncSession
 from .models import (
    Conversation, Message, ResearchCorpus, WritingProject, SandboxConfig,
    ConversationTask, ConversationResource, AgentJob, UserSetting,
 )

@@ -324,6 +324,75 @@
     return new_resources
 
 
+PLAN_RESOURCE_ID = "plan-md"
+
+
+async def upsert_plan_resource(db: AsyncSession, conv_id: int, content: str) -> ConversationResource:
+    """Create or update the pinned PLAN.md resource for a conversation."""
+    existing = await get_resource_by_id(db, f"{PLAN_RESOURCE_ID}-{conv_id}")
+    if existing:
+        existing.content = content
+        await db.commit()
+        await db.refresh(existing)
+        return existing
+    return await add_conversation_resource(
+        db, conv_id,
+        title="PLAN.md",
+        resource_type="plan",
+        content=content,
+        resource_id=f"{PLAN_RESOURCE_ID}-{conv_id}",
+    )
+
+
+PAPER_RESOURCE_ID = "paper"
+
+
+async def upsert_paper_resource(
+    db: AsyncSession, conv_id: int, title: str, content: str,
+) -> ConversationResource:
+    """Create or update the paper draft resource for a conversation."""
+    rid = f"{PAPER_RESOURCE_ID}-{conv_id}"
+    existing = await get_resource_by_id(db, rid)
+    if existing:
+        existing.title = title
+        existing.content = content
+        await db.commit()
+        await db.refresh(existing)
+        return existing
+    return await add_conversation_resource(
+        db, conv_id,
+        title=title,
+        resource_type="paper",
+        content=content,
+        resource_id=rid,
+    )
+
+
+async def upsert_resource(
+    db: AsyncSession, conv_id: int,
+    resource_id: str, title: str, resource_type: str,
+    content: str = None, url: str = None,
+) -> ConversationResource:
+    """Create or update a resource by resource_id."""
+    existing = await get_resource_by_id(db, resource_id)
+    if existing:
+        existing.title = title
+        existing.content = content
+        if url:
+            existing.url = url
+        await db.commit()
+        await db.refresh(existing)
+        return existing
+    return await add_conversation_resource(
+        db, conv_id,
+        title=title,
+        resource_type=resource_type,
+        content=content,
+        url=url,
+        resource_id=resource_id,
+    )
+
+
 # ---- Agent Jobs ----
 
 async def create_agent_job(

diff --git a/backend/openmlr/routes/agent.py b/backend/openmlr/routes/agent.py
@@ -48,8 +48,8 @@
            while True:
                try:
                    event = await asyncio.wait_for(queue.get(), timeout=25)
                    et = event.get("event_type", "?") if isinstance(event, dict) else "?"
                    payload = f"data: {json.dumps(event)}\n\n"
                    yield payload
                except asyncio.TimeoutError:
                    yield ":ping\n\n"
@@ -73,7 +73,7 @@


 @router.get("/events/test")
 async def events_test(request: Request):
    """Test SSE endpoint — sends 3 events then closes. Use to verify SSE works."""
    import json

@@ -162,7 +162,7 @@
        active_jobs = await job_manager.get_active_jobs(db, conv.id)
        for job_info in active_jobs:
            await job_manager.cancel_job(db, job_info["job_id"])
    except Exception:
        pass

    # Cancel in-process session (cancels agent loop, pending questions, sandbox)
@@ -283,7 +283,7 @@
    )

    # Wire DB persistence once per session
    if not active._persist_wired:
        _wire_persistence(active, db, conv.id)
        active._persist_wired = True

@@ -303,7 +303,7 @@
 @router.get("/jobs/{job_id}")
 async def get_job_status(
    job_id: str,
    user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Get the status of a background job."""
@@ -332,7 +332,7 @@
 @router.post("/jobs/{job_id}/cancel")
 async def cancel_job(
    job_id: str,
    user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Cancel a queued job."""
@@ -347,7 +347,7 @@
 @router.get("/reports/{report_id}")
 async def get_report(
    report_id: str,
    user: User = Depends(get_current_user),
 ):
    """Get a completion report by ID."""
    from ..tools.plan import get_report_content
@@ -360,8 +360,8 @@
 @router.post("/answers")
 async def submit_answers(
    request: Request,
    user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
 ):
    """Submit answers to structured questions from ask_user tool."""
    body = await request.json()
@@ -387,12 +387,43 @@
 
 
 @router.post("/interrupt")
-async def interrupt(request: Request, user: User = Depends(get_current_user)):
-    """Cancel the current agent turn."""
-    active = _sm(request).get_current_session()
+async def interrupt(
+    request: Request,
+    user: User = Depends(get_current_user),
+    db: AsyncSession = Depends(get_db),
+):
+    """Cancel the current agent turn (in-process and background workers)."""
+    sm = _sm(request)
+
+    # 1. Cancel the in-process session (works for inline / non-Celery mode)
+    active = sm.get_current_session()
     if active:
         active.session.cancel()
-        await _bus(request).broadcast(AgentEvent(event_type="interrupted"))
+
+    # 2. For background Celery workers: relay interrupt via Redis + revoke task
+    conv_id = sm.current_conversation_id
+    if conv_id:
+        from ..services.redis_pubsub import publish_interrupt
+        await publish_interrupt(conv_id)
+
+        # Also try to revoke active Celery tasks for this conversation
+        try:
+            from ..services.job_manager import get_job_manager, USE_BACKGROUND_JOBS
+            if USE_BACKGROUND_JOBS:
+                job_manager = get_job_manager()
+                active_jobs = await job_manager.get_active_jobs(db, conv_id)
+                for job_info in active_jobs:
+                    jid = job_info["job_id"]
+                    # Revoke with SIGTERM so the worker process is interrupted
+                    if job_manager.celery_app:
+                        job_manager.celery_app.control.revoke(jid, terminate=True, signal="SIGTERM")
+                        logger.info(f"Revoked Celery task {jid} for conversation {conv_id}")
+                    # Mark the job as cancelled in DB
+                    await ops.update_job_status(db, jid, "cancelled")
+        except Exception as e:
+            logger.warning(f"Failed to revoke background jobs: {e}")
+
+    await _bus(request).broadcast(AgentEvent(event_type="interrupted"))
     return {"ok": True}
 
 
@@ -400,11 +431,11 @@
 async def submit_approval(
    body: ApprovalRequest,
    request: Request,
    user: User = Depends(get_current_user),
 ):
    active = _sm(request).get_current_session()
    if active and active.session.pending_approval:
        from ..agent.loop import _handle_approval
        asyncio.create_task(
            _handle_approval(active.session, active.tool_router, body.approvals)
        )
@@ -412,19 +443,19 @@


 @router.post("/undo")
 async def undo(request: Request, user: User = Depends(get_current_user)):
    active = _sm(request).get_current_session()
    if active:
        from ..agent.loop import _undo
        await _undo(active.session)
    return {"ok": True}


 @router.post("/compact")
 async def compact(request: Request, user: User = Depends(get_current_user)):
    active = _sm(request).get_current_session()
    if active:
        from ..agent.loop import _compact
        await _compact(active.session)
    return {"ok": True}

@@ -498,7 +529,7 @@
                    "tool_call_id": event.data.get("tool_call_id"),
                    "success": event.data.get("success"),
                })
        except Exception:
            pass
    active.session.on_event(_persist)


diff --git a/backend/openmlr/services/redis_pubsub.py b/backend/openmlr/services/redis_pubsub.py
@@ -156,6 +156,41 @@ async def publish_answers(conversation_id: int, answers: dict) -> None:
         logger.warning(f"Failed to publish answers to Redis: {e}")
 
 
+INTERRUPT_KEY_PREFIX = "openmlr:interrupt:"
+
+
+async def publish_interrupt(conversation_id: int) -> None:
+    """Set a Redis key to signal interruption to a background worker."""
+    try:
+        client = await get_redis()
+        key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
+        await client.set(key, "1", ex=60)  # TTL 60 seconds
+        logger.info(f"Published interrupt for conversation {conversation_id}")
+    except Exception as e:
+        logger.warning(f"Failed to publish interrupt to Redis: {e}")
+
+
+async def check_interrupt(conversation_id: int) -> bool:
+    """Check whether an interrupt signal exists for the given conversation."""
+    try:
+        client = await get_redis()
+        key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
+        return await client.exists(key) > 0
+    except Exception as e:
+        logger.warning(f"Failed to check interrupt in Redis: {e}")
+        return False
+
+
+async def clear_interrupt(conversation_id: int) -> None:
+    """Remove the interrupt key after it has been consumed."""
+    try:
+        client = await get_redis()
+        key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
+        await client.delete(key)
+    except Exception as e:
+        logger.warning(f"Failed to clear interrupt in Redis: {e}")
+
+
 async def wait_for_answers(conversation_id: int, timeout: float = 300) -> dict | None:
     """Wait for user answers from Redis. Used by background worker's ask_user handler."""
     try: