Skip to content

refactor(routing): LLM-driven routing cleanup — prompt-driven first routes#18

Merged
DONGRYEOLLEE1 merged 2 commits into
mainfrom
refactor/llm-driven-routing-cleanup
May 22, 2026
Merged

refactor(routing): LLM-driven routing cleanup — prompt-driven first routes#18
DONGRYEOLLEE1 merged 2 commits into
mainfrom
refactor/llm-driven-routing-cleanup

Conversation

@DONGRYEOLLEE1
Copy link
Copy Markdown
Owner

Summary

Two-step cleanup sweep that finishes the LLM-driven routing policy
(CLAUDE.md §"Supervisor → Sub-agent Handoff Policy" P1–P5). All routing
decisions now flow through RouterDecision + 4 P3 safeguards only; every
"first worker for domain X" intent lives in prompt-kit and is locked by
a regression test.

Commit 1 — 00568bf — remove rule-based heuristics

  • Delete _build_simple_research_plan keyword-dictionary heuristic in planner.
  • Delete dead _orchagent_identity_response (handled by SYSTEM_SUPERVISOR_PROMPT # IDENTITY).
  • Replace the heuristic-locking planner test with an LLM-driven regression test.

Commit 2 — 26138c2 — prompt-driven first routes + remove pre-LLM shortcuts

  • Drop the pre-LLM dispatch-limit shortcut in team_supervisor.py (and its dead helper). Dispatch ceiling is now a post-decision P3 safeguard only.
  • Drop the inline coding_team + repo_binding override in head_supervisor.py. The intent is expressed in prompt-kit instead so the router LLM owns the decision.
  • Add # REQUIRED FIRST ROUTES block to SYSTEM_SUPERVISOR_PROMPT (v2.7) — pins first worker for data / vision / research / coding / writing / FINISH.
  • Add # WRITING TEAM HANDOFF + # VISION TEAM HANDOFF blocks to TEAM_SUPERVISOR_PROMPT (v1.5); pin vision_analyst as the actual Vision Team first worker.
  • New test_routing_prompts.py locks the prompt contract so future drift fails CI.
  • New test_supervisor.py::test_team_dispatch_limit_runs_after_llm_decision locks the post-LLM safeguard ordering.
  • test_router_safeguards.py::test_public_safeguard_surface_is_limited_to_policy_functions locks the safeguard surface to exactly the 4 P3 functions.

Test plan

  • cd apps/backend && PYTHONPATH=. uv run pytest tests -q190 passed
  • grep -E "_should_force_|_APPROVAL_PATTERNS|_build_simple_research_plan|_orchagent_identity_response|reject_coding_team_without_repo_binding|_force_finish_due_to_dispatch_limit" packages/agent-core/src packages/prompt-kit/src apps/backend/workflow0 hits in code body
  • Playwright UI scenarios (CSV / image / latest news / greeting) — deferred to follow-up; Codex sandbox blocked browser_navigate and local dev-server access.

Known follow-ups (separate commits, not in this PR)

  • CLAUDE.md §"도메인별 첫 분기 의무" still lists vision_team workers as image_inspector/image_editor; current Vision Team only exposes vision_analyst. Doc update is a separate commit.

Plans: plans/ENFORCED_ROUTING_TO_LLM_DRIVEN_PLAN.md, plans/llm-routing-fix.md.

🤖 Generated with Claude Code

DONGRYEOLLEE1 and others added 2 commits May 22, 2026 15:39
…g safeguard

Sweep packages/agent-core for any remaining rule-based routing patterns and
move the LLM-driven policy (CLAUDE.md §"Supervisor → Sub-agent Handoff
Policy" P1-P5) to its full conclusion.

- planner.py: delete `_build_simple_research_plan` keyword-dictionary
  heuristic (and the dead `_extract_latest_user_text` helper that fed it).
  All plan generation now goes through the LLM `TaskPlan` structured
  output — `PLANNER_PROMPT` already covers the lightweight research case.
- head_supervisor.py + supervisor.py: delete the dead
  `_orchagent_identity_response` keyword fallback and its companions
  (`_extract_message_text`, `_latest_user_request_text`); identity Qs are
  handled by `SYSTEM_SUPERVISOR_PROMPT` `# IDENTITY` block, not by code.
- safeguards.py: add `reject_coding_team_without_repo_binding` — extracts
  the previously-inline coding_team/repo_binding block from
  head_supervisor into the canonical P3 chain. Now surfaces a
  `safeguard:` reason on the SSE `route` event (P4 visibility).
- head_supervisor.py: invoke the new safeguard BEFORE HITL so users are
  not asked to approve a dispatch the runtime cannot execute.
- tests: replace the heuristic-locking planner test with an LLM-driven
  regression test (`RecordingPlannerLLM.called` must be True). Add three
  safeguard unit tests (pass-through, force-FINISH, non-coding-team
  no-op) under the new P3 contract.

Plan: plans/ENFORCED_ROUTING_TO_LLM_DRIVEN_PLAN.md (all phases checked).

Validation:
- pytest tests -q → 188 passed (185 baseline + 3 new safeguard cases).
- grep -rE "_should_force_|_APPROVAL_PATTERNS|_build_simple_research_plan|_orchagent_identity_response" packages/agent-core/src → 0 hits in code body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up sweep on top of 00568bf to push the LLM-driven routing policy
(CLAUDE.md §"Supervisor → Sub-agent Handoff Policy" P1-P5) further. Move
the remaining "code decides before the LLM does" branches over to either
a prompt-kit guidance line or a post-decision safeguard.

- safeguards.py: drop `reject_coding_team_without_repo_binding`. The
  intent is now expressed as a `# REQUIRED FIRST ROUTES` guideline to the
  router LLM so the model itself avoids `coding_team` without a bound
  repo. Public surface is back to the canonical 4 safeguards.
- supervisors/head_supervisor.py: drop the inline repo-binding override
  introduced in 00568bf. The router LLM owns the decision; the safeguard
  chain still catches invalid gotos and team-redirect loops.
- supervisors/team_supervisor.py: drop the pre-LLM dispatch-limit
  shortcut (and the dead `_force_finish_due_to_dispatch_limit` helper).
  Dispatch limit is now applied only as a post-decision P3 safeguard via
  `decide_route()`, costing one extra LLM call per saturated turn in
  exchange for full P3 consistency.
- prompt-kit/prompts.py:
  * SYSTEM_SUPERVISOR_PROMPT v2.7 — new `# REQUIRED FIRST ROUTES` block
    pins the first worker for all six domains (data / vision / research
    / coding / writing / FINISH) so the LLM has the contract in prompt.
  * TEAM_SUPERVISOR_PROMPT v1.5 — new `# WRITING TEAM HANDOFF` and
    `# VISION TEAM HANDOFF` sections; pins `vision_analyst` as the real
    Vision Team first worker (matches current member list).
  * Minor wording (`keyword` → `term`) in unrelated title/suggestion
    prompts to keep the policy-grep audit noise-free.
- router_schema.py: clean stale `_should_force_approval` references from
  the docstrings.
- tests/test_router_safeguards.py: pin the public safeguard surface to
  exactly the 4 P3 policy functions (regression guard so a 5th can't
  sneak back in).
- tests/test_supervisor.py: add coverage that `max_team_dispatches=0`
  still runs the LLM once and then routes via the safeguard, not via a
  pre-LLM branch.
- tests/test_routing_prompts.py (new): pin the `# REQUIRED FIRST ROUTES`
  block and per-team handoff guidance so prompt drift fails CI.

Plan: plans/llm-routing-fix.md (Phase 1-2 done, Phase 3 Playwright
checks deferred to a follow-up — sandbox blocked `browser_navigate` and
local dev-server access).

Validation:
- pytest tests -q → 190 passed.
- grep -E "_should_force|_APPROVAL_PATTERNS|reject_coding_team_without_repo_binding|_force_finish_due_to_dispatch_limit" packages/agent-core/src packages/prompt-kit/src apps/backend/workflow → 0 hits.

Known follow-ups (not in this commit):
- CLAUDE.md §"도메인별 첫 분기 의무" still lists vision_team workers as
  `image_inspector`/`image_editor`; current Vision Team only exposes
  `vision_analyst`. Doc update is a separate commit.
- Playwright UI scenarios (CSV / image / latest news / greeting) need
  to be run against a live dev stack to fully retire Phase 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
orchagent Ready Ready Preview, Comment May 22, 2026 8:27am
project-vdajw Ready Ready Preview, Comment May 22, 2026 8:27am

@DONGRYEOLLEE1 DONGRYEOLLEE1 merged commit de7c11d into main May 22, 2026
5 checks passed
@DONGRYEOLLEE1 DONGRYEOLLEE1 deleted the refactor/llm-driven-routing-cleanup branch May 22, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant