Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 62 additions & 79 deletions backend/configs/prompts/system_prompt.yaml
Original file line number Diff line number Diff line change
@@ -1,97 +1,84 @@
title: OpenMLR System Prompt
version: 4
version: 5

prompt: |
You are OpenMLR, an ML research intern. You plan, research, write, and
execute ML work end-to-end.
You are OpenMLR, an ML research intern. You help users plan, research,
write, and execute ML work end-to-end.

# Clarification First
# Mode System

**CRITICAL**: Before taking any significant action, ask clarifying questions
using the `ask_user` tool with `allow_text: true` on every question.
Do NOT assume which model, dataset, approach, hardware, or scope to use.
Ask at least one clarifying question for non-trivial tasks.

# Task Management

- Always create a plan using `plan_tool` before starting work.
- When proposing a new plan or adding tasks, explain what you're planning
and why. Use `ask_user` to get approval for significant plan changes.
- When completing a task, call `plan_tool` update with status="completed",
include a `summary` of what was accomplished and `next_hints` for upcoming tasks.
This auto-generates a completion report stored as a resource.
- After each task completion, evaluate: does the remaining plan still make sense?
If not, propose changes to the user before continuing.
- Keep pushing forward through the task list. Don't stop after one task unless
waiting for user input.
- Use `plan_tool add_resource` to track every paper, code repo, dataset, or
doc you reference. This builds a knowledge base for the session.

# Per-Message Modes — STRICT ENFORCEMENT
The user controls which mode you operate in. There are two modes:

{% if mode == "plan" %}
## Plan Mode — RESTRICTED
**ONLY** these tools are available: ask_user, plan_tool, read_file, list_dir
## CURRENT MODE: PLAN

DO NOT call: web_search, papers, research, writing, or any execution tools.
These will fail with a mode violation error.
You are in **Plan mode**. Your job is to understand the task, ask clarifying
questions, gather context, and produce a comprehensive plan.

Your job in plan mode:
1. Ask clarifying questions using ask_user
2. Create a plan using plan_tool
3. When ready, use ask_user with suggest_mode='research' to propose switching
**Available tools**: ask_user, plan_tool, read_file, list_dir, glob_files,
grep_search, web_search, papers, github_search, github_read_file, github_read_repo

WAIT for the user to approve the mode switch.
{% elif mode == "research" %}
## Research Mode — RESTRICTED
**ONLY** these tools are available: ask_user, plan_tool, web_search, papers,
research, read_file, github_search, github_read_file
**NOT available**: writing, research sub-agent, sandbox/code execution tools.
Calls to unavailable tools will be rejected.

DO NOT call: writing tool or execution tools.
**Rules**:
1. Ask clarifying questions using `ask_user` before making assumptions
2. Search the web, papers, and code repos to gather context
3. Create a structured plan using `plan_tool` with clear, actionable tasks
4. The plan is auto-saved as PLAN.md in resources — the user can see it
5. Do NOT execute any work — plan only
6. Do NOT write content, run code, or make changes
7. Be thorough in your plan — it will be the blueprint for Execute mode

Your job in research mode:
1. Search papers and web for information
2. Add all sources as resources via plan_tool add_resource
3. Complete tasks with summaries and reports
4. When research is complete, use ask_user with suggest_mode='write'
{% elif mode == "execute" %}
## CURRENT MODE: EXECUTE

WAIT for the user to approve the mode switch.
{% elif mode == "write" %}
## Write Mode — RESTRICTED
**ONLY** these tools are available: ask_user, plan_tool, writing,
read_file, web_search, papers (for citations)
You are in **Execute mode**. Your job is to follow the plan and do the work.
Do NOT ask questions — just execute.

Your job in write mode:
1. Write content using the writing tool
2. Reference resources from research phase
3. Generate completion reports for each section
{% else %}
## General Mode
All tools available. Ask clarifying questions first.
{% endif %}

# CRITICAL WORKFLOW RULES

1. **ONE TASK AT A TIME**: You can only have one task in_progress at a time.
Complete the current task with a summary and report before starting the next.
**Available tools**: ALL tools EXCEPT ask_user.
Calls to ask_user will be rejected.

2. **COMPLETION REPORTS REQUIRED**: When marking a task completed, you MUST provide:
- summary: what was accomplished
- next_hints: recommendations for upcoming tasks
**Rules**:
1. Follow the task plan — check it with `plan_tool get` if unsure
2. Work through tasks one at a time, marking them in_progress then completed
3. When completing a task, provide a summary and next_hints
4. Add all papers, code, datasets as resources via plan_tool add_resource
5. Do NOT ask the user questions — if something is ambiguous, make a reasonable
decision and document it in your completion report
6. Keep pushing through the task list until done or interrupted
7. Generate completion reports for each task

3. **MODE SWITCHING REQUIRES USER APPROVAL**:
- Do NOT switch modes automatically
- Use ask_user with suggest_mode to propose a switch
- WAIT for the user's response

4. **STAY IN YOUR LANE**:
- In plan mode: only ask and plan
- In research mode: only search and read
- In write mode: only write and cite
{% else %}
## CURRENT MODE: EXECUTE (default)
All tools available except ask_user. Execute the work.
{% endif %}

# Task Management

- Always create a plan using `plan_tool` before starting work (in Plan mode)
- When completing a task, call `plan_tool` update with status="completed",
include a `summary` of what was accomplished and `next_hints`
- This auto-generates a completion report stored as a resource
- ONE task in_progress at a time — complete current before starting next
- Use `plan_tool add_resource` to track every paper, code repo, or doc

# Paper Writing

When writing a paper, use the `writing` tool exclusively:
1. `create_project` with a title
2. `set_outline` with section structure
3. `write_section` for each section — the paper is AUTO-SAVED after each write
4. `add_citation` for references
5. `get_draft` to review the full paper

**CRITICAL**: Do NOT use the `write` file tool to save papers. The `writing` tool
auto-saves to the database and the user can preview/export from the Paper tab.
Do NOT call `export` — the user handles export from the UI.

# Code Execution

Code runs inside a Docker container (/workspace) in ALL modes when needed.
Code runs inside a Docker container (/workspace) when needed.
Before running code: check the environment, install dependencies.
Never modify the user's host environment directly.

Expand All @@ -101,10 +88,6 @@ prompt: |
information that was summarized. Prefer short tool outputs.
Re-read completion reports (via `plan_tool get`) for context on past work.

# Knowledge is Outdated

Your ML library knowledge is outdated. Always verify against documentation.

# Communication

- Concise and direct. No flattery, no emojis.
Expand Down
36 changes: 9 additions & 27 deletions backend/openmlr/agent/loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
import traceback
from typing import Optional

from .types import AgentEvent, Message, ToolCall, ToolSpec, Submission, OpType, LLMResult

Check warning on line 8 in backend/openmlr/agent/loop.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused imports

Unused import statement `ToolSpec`
from .session import Session
from .context import ContextManager

Check warning on line 10 in backend/openmlr/agent/loop.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused imports

Unused import statement `from .context import ContextManager`
from .llm import LLMProvider
from .doom_loop import detect_doom_loop
from ..config import AgentConfig
Expand Down Expand Up @@ -48,35 +48,17 @@
session.pending_approval = None

# Set the mode on the tool router for strict enforcement
effective_mode = mode if mode in ("plan", "research", "write") else "general"
effective_mode = mode if mode in ("plan", "execute") else "execute"
tool_router.set_mode(effective_mode)

# Inject per-message mode context if provided
if mode and mode in ("plan", "research", "write"):
mode_hints = {
"plan": (
"[Mode: PLAN — STRICT ENFORCEMENT]\n"
"- Only ask_user and plan_tool are available\n"
"- Do NOT execute any research, writing, or code tools\n"
"- Ask clarifying questions and create a plan\n"
"- When ready, use ask_user with suggest_mode='research' or 'write' to propose switching"
),
"research": (
"[Mode: RESEARCH — STRICT ENFORCEMENT]\n"
"- Search papers, web, and gather information only\n"
"- Do NOT write content or execute code\n"
"- Add all sources as resources via plan_tool\n"
"- When research is complete, use ask_user with suggest_mode='write' to propose switching"
),
"write": (
"[Mode: WRITE — STRICT ENFORCEMENT]\n"
"- Write and edit content only\n"
"- Use writing tool for paper sections\n"
"- Reference resources gathered in research phase\n"
"- When writing is complete, generate a report via plan_tool"
),
}
session.context_manager.add_message(Message(role="system", content=mode_hints[mode]))
# Inject per-message mode hint (short reinforcement of system prompt rules)
mode_hint = (
f"[Mode: {effective_mode.upper()}] "
+ ("Plan only — ask questions, gather context, create plan. No execution."
if effective_mode == "plan" else
"Execute the plan — do the work, no questions. All tools except ask_user.")
)
session.context_manager.add_message(Message(role="system", content=mode_hint))

session.context_manager.add_message(Message(role="user", content=user_message))

Expand Down
69 changes: 69 additions & 0 deletions backend/openmlr/db/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from sqlalchemy import select, delete, update, func
from sqlalchemy.ext.asyncio import AsyncSession
from .models import (
Conversation, Message, ResearchCorpus, WritingProject, SandboxConfig,

Check warning on line 7 in backend/openmlr/db/operations.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused imports

Unused import statement `ResearchCorpus`

Check warning on line 7 in backend/openmlr/db/operations.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused imports

Unused import statement `WritingProject`

Check warning on line 7 in backend/openmlr/db/operations.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused imports

Unused import statement `SandboxConfig`
ConversationTask, ConversationResource, AgentJob, UserSetting,
)

Expand Down Expand Up @@ -324,6 +324,75 @@
return new_resources


PLAN_RESOURCE_ID = "plan-md"


async def upsert_plan_resource(db: AsyncSession, conv_id: int, content: str) -> ConversationResource:
"""Create or update the pinned PLAN.md resource for a conversation."""
existing = await get_resource_by_id(db, f"{PLAN_RESOURCE_ID}-{conv_id}")
if existing:
existing.content = content
await db.commit()
await db.refresh(existing)
return existing
return await add_conversation_resource(
db, conv_id,
title="PLAN.md",
resource_type="plan",
content=content,
resource_id=f"{PLAN_RESOURCE_ID}-{conv_id}",
)


PAPER_RESOURCE_ID = "paper"


async def upsert_paper_resource(
db: AsyncSession, conv_id: int, title: str, content: str,
) -> ConversationResource:
"""Create or update the paper draft resource for a conversation."""
rid = f"{PAPER_RESOURCE_ID}-{conv_id}"
existing = await get_resource_by_id(db, rid)
if existing:
existing.title = title
existing.content = content
await db.commit()
await db.refresh(existing)
return existing
return await add_conversation_resource(
db, conv_id,
title=title,
resource_type="paper",
content=content,
resource_id=rid,
)


async def upsert_resource(
db: AsyncSession, conv_id: int,
resource_id: str, title: str, resource_type: str,
content: str = None, url: str = None,
) -> ConversationResource:
"""Create or update a resource by resource_id."""
existing = await get_resource_by_id(db, resource_id)
if existing:
existing.title = title
existing.content = content
if url:
existing.url = url
await db.commit()
await db.refresh(existing)
return existing
return await add_conversation_resource(
db, conv_id,
title=title,
resource_type=resource_type,
content=content,
url=url,
resource_id=resource_id,
)


# ---- Agent Jobs ----

async def create_agent_job(
Expand Down
39 changes: 35 additions & 4 deletions backend/openmlr/routes/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@
while True:
try:
event = await asyncio.wait_for(queue.get(), timeout=25)
et = event.get("event_type", "?") if isinstance(event, dict) else "?"

Check notice on line 51 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Local variable 'et' value is not used
payload = f"data: {json.dumps(event)}\n\n"

Check notice on line 52 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Shadowing names from outer scopes

Shadows name 'payload' from outer scope
yield payload
except asyncio.TimeoutError:
yield ":ping\n\n"
Expand All @@ -73,7 +73,7 @@


@router.get("/events/test")
async def events_test(request: Request):

Check notice on line 76 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'request' value is not used
"""Test SSE endpoint — sends 3 events then closes. Use to verify SSE works."""
import json

Expand Down Expand Up @@ -162,7 +162,7 @@
active_jobs = await job_manager.get_active_jobs(db, conv.id)
for job_info in active_jobs:
await job_manager.cancel_job(db, job_info["job_id"])
except Exception:

Check notice on line 165 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unclear exception clauses

Too broad exception clause
pass

# Cancel in-process session (cancels agent loop, pending questions, sandbox)
Expand Down Expand Up @@ -283,7 +283,7 @@
)

# Wire DB persistence once per session
if not active._persist_wired:

Check notice on line 286 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Accessing a protected member of a class or a module

Access to a protected member _persist_wired of a class
_wire_persistence(active, db, conv.id)
active._persist_wired = True

Expand All @@ -303,7 +303,7 @@
@router.get("/jobs/{job_id}")
async def get_job_status(
job_id: str,
user: User = Depends(get_current_user),

Check notice on line 306 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
db: AsyncSession = Depends(get_db),
):
"""Get the status of a background job."""
Expand Down Expand Up @@ -332,7 +332,7 @@
@router.post("/jobs/{job_id}/cancel")
async def cancel_job(
job_id: str,
user: User = Depends(get_current_user),

Check notice on line 335 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
db: AsyncSession = Depends(get_db),
):
"""Cancel a queued job."""
Expand All @@ -347,7 +347,7 @@
@router.get("/reports/{report_id}")
async def get_report(
report_id: str,
user: User = Depends(get_current_user),

Check notice on line 350 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
):
"""Get a completion report by ID."""
from ..tools.plan import get_report_content
Expand All @@ -360,8 +360,8 @@
@router.post("/answers")
async def submit_answers(
request: Request,
user: User = Depends(get_current_user),

Check notice on line 363 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
db: AsyncSession = Depends(get_db),

Check notice on line 364 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'db' value is not used
):
"""Submit answers to structured questions from ask_user tool."""
body = await request.json()
Expand All @@ -387,12 +387,43 @@


@router.post("/interrupt")
async def interrupt(request: Request, user: User = Depends(get_current_user)):
"""Cancel the current agent turn."""
active = _sm(request).get_current_session()
async def interrupt(
request: Request,
user: User = Depends(get_current_user),

Check notice on line 392 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
db: AsyncSession = Depends(get_db),
):
"""Cancel the current agent turn (in-process and background workers)."""
sm = _sm(request)

# 1. Cancel the in-process session (works for inline / non-Celery mode)
active = sm.get_current_session()
if active:
active.session.cancel()
await _bus(request).broadcast(AgentEvent(event_type="interrupted"))

# 2. For background Celery workers: relay interrupt via Redis + revoke task
conv_id = sm.current_conversation_id
if conv_id:
from ..services.redis_pubsub import publish_interrupt
await publish_interrupt(conv_id)

# Also try to revoke active Celery tasks for this conversation
try:
from ..services.job_manager import get_job_manager, USE_BACKGROUND_JOBS
if USE_BACKGROUND_JOBS:
job_manager = get_job_manager()
active_jobs = await job_manager.get_active_jobs(db, conv_id)
for job_info in active_jobs:
jid = job_info["job_id"]
# Revoke with SIGTERM so the worker process is interrupted
if job_manager.celery_app:
job_manager.celery_app.control.revoke(jid, terminate=True, signal="SIGTERM")
logger.info(f"Revoked Celery task {jid} for conversation {conv_id}")
# Mark the job as cancelled in DB
await ops.update_job_status(db, jid, "cancelled")
except Exception as e:
logger.warning(f"Failed to revoke background jobs: {e}")

await _bus(request).broadcast(AgentEvent(event_type="interrupted"))
return {"ok": True}


Expand All @@ -400,11 +431,11 @@
async def submit_approval(
body: ApprovalRequest,
request: Request,
user: User = Depends(get_current_user),

Check notice on line 434 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
):
active = _sm(request).get_current_session()
if active and active.session.pending_approval:
from ..agent.loop import _handle_approval

Check notice on line 438 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Accessing a protected member of a class or a module

Access to a protected member _handle_approval of a module
asyncio.create_task(
_handle_approval(active.session, active.tool_router, body.approvals)
)
Expand All @@ -412,19 +443,19 @@


@router.post("/undo")
async def undo(request: Request, user: User = Depends(get_current_user)):

Check notice on line 446 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
active = _sm(request).get_current_session()
if active:
from ..agent.loop import _undo

Check notice on line 449 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Accessing a protected member of a class or a module

Access to a protected member _undo of a module
await _undo(active.session)
return {"ok": True}


@router.post("/compact")
async def compact(request: Request, user: User = Depends(get_current_user)):

Check notice on line 455 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unused local symbols

Parameter 'user' value is not used
active = _sm(request).get_current_session()
if active:
from ..agent.loop import _compact

Check notice on line 458 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Accessing a protected member of a class or a module

Access to a protected member _compact of a module
await _compact(active.session)
return {"ok": True}

Expand Down Expand Up @@ -498,7 +529,7 @@
"tool_call_id": event.data.get("tool_call_id"),
"success": event.data.get("success"),
})
except Exception:

Check notice on line 532 in backend/openmlr/routes/agent.py

View workflow job for this annotation

GitHub Actions / Qodana for Python

Unclear exception clauses

Too broad exception clause
pass
active.session.on_event(_persist)

Expand Down
35 changes: 35 additions & 0 deletions backend/openmlr/services/redis_pubsub.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,41 @@ async def publish_answers(conversation_id: int, answers: dict) -> None:
logger.warning(f"Failed to publish answers to Redis: {e}")


INTERRUPT_KEY_PREFIX = "openmlr:interrupt:"


async def publish_interrupt(conversation_id: int) -> None:
"""Set a Redis key to signal interruption to a background worker."""
try:
client = await get_redis()
key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
await client.set(key, "1", ex=60) # TTL 60 seconds
logger.info(f"Published interrupt for conversation {conversation_id}")
except Exception as e:
logger.warning(f"Failed to publish interrupt to Redis: {e}")


async def check_interrupt(conversation_id: int) -> bool:
"""Check whether an interrupt signal exists for the given conversation."""
try:
client = await get_redis()
key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
return await client.exists(key) > 0
except Exception as e:
logger.warning(f"Failed to check interrupt in Redis: {e}")
return False


async def clear_interrupt(conversation_id: int) -> None:
"""Remove the interrupt key after it has been consumed."""
try:
client = await get_redis()
key = f"{INTERRUPT_KEY_PREFIX}{conversation_id}"
await client.delete(key)
except Exception as e:
logger.warning(f"Failed to clear interrupt in Redis: {e}")


async def wait_for_answers(conversation_id: int, timeout: float = 300) -> dict | None:
"""Wait for user answers from Redis. Used by background worker's ask_user handler."""
try:
Expand Down
Loading
Loading