refactor(routing): LLM-driven routing cleanup — prompt-driven first routes#18
Merged
Merged
Conversation
…g safeguard Sweep packages/agent-core for any remaining rule-based routing patterns and move the LLM-driven policy (CLAUDE.md §"Supervisor → Sub-agent Handoff Policy" P1-P5) to its full conclusion. - planner.py: delete `_build_simple_research_plan` keyword-dictionary heuristic (and the dead `_extract_latest_user_text` helper that fed it). All plan generation now goes through the LLM `TaskPlan` structured output — `PLANNER_PROMPT` already covers the lightweight research case. - head_supervisor.py + supervisor.py: delete the dead `_orchagent_identity_response` keyword fallback and its companions (`_extract_message_text`, `_latest_user_request_text`); identity Qs are handled by `SYSTEM_SUPERVISOR_PROMPT` `# IDENTITY` block, not by code. - safeguards.py: add `reject_coding_team_without_repo_binding` — extracts the previously-inline coding_team/repo_binding block from head_supervisor into the canonical P3 chain. Now surfaces a `safeguard:` reason on the SSE `route` event (P4 visibility). - head_supervisor.py: invoke the new safeguard BEFORE HITL so users are not asked to approve a dispatch the runtime cannot execute. - tests: replace the heuristic-locking planner test with an LLM-driven regression test (`RecordingPlannerLLM.called` must be True). Add three safeguard unit tests (pass-through, force-FINISH, non-coding-team no-op) under the new P3 contract. Plan: plans/ENFORCED_ROUTING_TO_LLM_DRIVEN_PLAN.md (all phases checked). Validation: - pytest tests -q → 188 passed (185 baseline + 3 new safeguard cases). - grep -rE "_should_force_|_APPROVAL_PATTERNS|_build_simple_research_plan|_orchagent_identity_response" packages/agent-core/src → 0 hits in code body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up sweep on top of 00568bf to push the LLM-driven routing policy (CLAUDE.md §"Supervisor → Sub-agent Handoff Policy" P1-P5) further. Move the remaining "code decides before the LLM does" branches over to either a prompt-kit guidance line or a post-decision safeguard. - safeguards.py: drop `reject_coding_team_without_repo_binding`. The intent is now expressed as a `# REQUIRED FIRST ROUTES` guideline to the router LLM so the model itself avoids `coding_team` without a bound repo. Public surface is back to the canonical 4 safeguards. - supervisors/head_supervisor.py: drop the inline repo-binding override introduced in 00568bf. The router LLM owns the decision; the safeguard chain still catches invalid gotos and team-redirect loops. - supervisors/team_supervisor.py: drop the pre-LLM dispatch-limit shortcut (and the dead `_force_finish_due_to_dispatch_limit` helper). Dispatch limit is now applied only as a post-decision P3 safeguard via `decide_route()`, costing one extra LLM call per saturated turn in exchange for full P3 consistency. - prompt-kit/prompts.py: * SYSTEM_SUPERVISOR_PROMPT v2.7 — new `# REQUIRED FIRST ROUTES` block pins the first worker for all six domains (data / vision / research / coding / writing / FINISH) so the LLM has the contract in prompt. * TEAM_SUPERVISOR_PROMPT v1.5 — new `# WRITING TEAM HANDOFF` and `# VISION TEAM HANDOFF` sections; pins `vision_analyst` as the real Vision Team first worker (matches current member list). * Minor wording (`keyword` → `term`) in unrelated title/suggestion prompts to keep the policy-grep audit noise-free. - router_schema.py: clean stale `_should_force_approval` references from the docstrings. - tests/test_router_safeguards.py: pin the public safeguard surface to exactly the 4 P3 policy functions (regression guard so a 5th can't sneak back in). - tests/test_supervisor.py: add coverage that `max_team_dispatches=0` still runs the LLM once and then routes via the safeguard, not via a pre-LLM branch. - tests/test_routing_prompts.py (new): pin the `# REQUIRED FIRST ROUTES` block and per-team handoff guidance so prompt drift fails CI. Plan: plans/llm-routing-fix.md (Phase 1-2 done, Phase 3 Playwright checks deferred to a follow-up — sandbox blocked `browser_navigate` and local dev-server access). Validation: - pytest tests -q → 190 passed. - grep -E "_should_force|_APPROVAL_PATTERNS|reject_coding_team_without_repo_binding|_force_finish_due_to_dispatch_limit" packages/agent-core/src packages/prompt-kit/src apps/backend/workflow → 0 hits. Known follow-ups (not in this commit): - CLAUDE.md §"도메인별 첫 분기 의무" still lists vision_team workers as `image_inspector`/`image_editor`; current Vision Team only exposes `vision_analyst`. Doc update is a separate commit. - Playwright UI scenarios (CSV / image / latest news / greeting) need to be run against a live dev stack to fully retire Phase 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two-step cleanup sweep that finishes the LLM-driven routing policy
(CLAUDE.md §"Supervisor → Sub-agent Handoff Policy" P1–P5). All routing
decisions now flow through
RouterDecision+ 4 P3 safeguards only; every"first worker for domain X" intent lives in
prompt-kitand is locked bya regression test.
Commit 1 —
00568bf— remove rule-based heuristics_build_simple_research_plankeyword-dictionary heuristic in planner._orchagent_identity_response(handled by SYSTEM_SUPERVISOR_PROMPT# IDENTITY).Commit 2 —
26138c2— prompt-driven first routes + remove pre-LLM shortcutsteam_supervisor.py(and its dead helper). Dispatch ceiling is now a post-decision P3 safeguard only.coding_team+repo_bindingoverride inhead_supervisor.py. The intent is expressed inprompt-kitinstead so the router LLM owns the decision.# REQUIRED FIRST ROUTESblock toSYSTEM_SUPERVISOR_PROMPT(v2.7) — pins first worker for data / vision / research / coding / writing / FINISH.# WRITING TEAM HANDOFF+# VISION TEAM HANDOFFblocks toTEAM_SUPERVISOR_PROMPT(v1.5); pinvision_analystas the actual Vision Team first worker.test_routing_prompts.pylocks the prompt contract so future drift fails CI.test_supervisor.py::test_team_dispatch_limit_runs_after_llm_decisionlocks the post-LLM safeguard ordering.test_router_safeguards.py::test_public_safeguard_surface_is_limited_to_policy_functionslocks the safeguard surface to exactly the 4 P3 functions.Test plan
cd apps/backend && PYTHONPATH=. uv run pytest tests -q→ 190 passedgrep -E "_should_force_|_APPROVAL_PATTERNS|_build_simple_research_plan|_orchagent_identity_response|reject_coding_team_without_repo_binding|_force_finish_due_to_dispatch_limit" packages/agent-core/src packages/prompt-kit/src apps/backend/workflow→ 0 hits in code bodybrowser_navigateand local dev-server access.Known follow-ups (separate commits, not in this PR)
CLAUDE.md§"도메인별 첫 분기 의무" still lists vision_team workers asimage_inspector/image_editor; current Vision Team only exposesvision_analyst. Doc update is a separate commit.Plans:
plans/ENFORCED_ROUTING_TO_LLM_DRIVEN_PLAN.md,plans/llm-routing-fix.md.🤖 Generated with Claude Code