fix: robust response parser + domain-aware personas#19
Merged
Conversation
## Problem Two critical blackboard architecture failures identified from post-mortem of task-caccf02b: 1. Empty refs fields — agents write `**Refs**: [e-3, e-4]` in body prose but leave the structured `refs` JSON field empty. Routing and convergence machinery depends entirely on this field. 2. Bundled entries — planner writes 4 plans in one entry; critic writes 4 critiques in one entry. Breaks selective referencing and prevents the CU from routing critique authors back to their own work. Root cause: persona prompts described fields without showing format examples. Parser trusted agent output completely — no extraction fallback. ## Changes ### daemon/src/core/response_parser.py [NEW] 585-line standalone parser with 7 extraction strategies in priority order: - Structured entries_v1 JSON (happy path) - Bundled entries split via entries array or prose --- / ## delimiters - Refs in 3 prose patterns (**Refs**: / refs=[] / (refs: e-N)) - Heuristic bare e-N mention in first 600 chars (validated vs known_ids) - Decider JSON code-fence unwrapping (with inner refs recovery) - finding → rebuttal auto-promotion on body/title signals - Decider JSON-in-body with nested entries-in-entries - Confidence string parsing + hedging-word heuristic - Cleaner/decline action passthroughs ### daemon/src/core/variants/traditional.py Replaced naive parse_agent_response with delegation to parse_entries. Added known_ids kwarg. ### daemon/src/core/orchestrator.py Threads known_ids (set of board entry IDs from already-fetched snapshot) into parse_agent_response to enable validated prose ref extraction. ### daemon/src/models/personas.py [full rewrite] All 5 constant role personas + expert generator rewritten with: - Explicit JSON output contract with worked examples - Confidence calibration table in every persona - Anti-patterns section (prose refs, bundled entries, JSON fence, wrong type) - Planner/critic: entries array required, one entry per idea - Expert: finding vs rebuttal distinction (refs contains critique ID) - Decider: body must be plain prose, never JSON; refs must be exhaustive - AG prompt: domain-specific experts only, generic roles forbidden ### daemon/src/config.py + bmas.example.yaml - experts_per_tier.complex: 3 → 4 (raised expert count for complex tasks) - Code default and example yaml both updated to match production config ### daemon/src/core/blackboard.py Node-name normalization: NODE_URL_TO_NAME reverse map, publish_log normalizes raw HTTP URLs to friendly node-1/2/3 identifiers. ### daemon/tests/test_response_parser.py [NEW, 37 tests] Every production failure pattern from the board audit has its own test. ### daemon/tests/test_config_validation.py Updated assertion to reflect new complex tier count (4). ## Test Results - ruff check src/ tests/ → All checks passed - mypy src/ --ignore-missing-imports → no issues (34 source files) - pytest daemon/tests/ → 446 passed, 1 skipped - pytest agent/tests/ → 39 passed - eslint + tsc → clean - vitest → 118 passed - npm run build → success
9b32041 to
8899d20
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two critical blackboard architecture failures identified from post-mortem of
task-caccf02bare fixed.Problem
1. Empty
refsfields — agents write**Refs**: [e-3, e-4]in body prose but leave the structuredrefsJSON field empty. The routing and convergence machinery depends entirely on this field (_deterministic_fallback,_infer_phase, salience scoring).2. Bundled entries — planner wrote 4 plans in one entry; critic wrote 4 critiques in one entry. Breaks selective referencing and prevents the CU from routing critique authors back to their own work.
Root cause: persona prompts described fields without showing format examples. Parser trusted agent output completely — no extraction fallback.
Changes
core/response_parser.py(new)587-line standalone parser with 7 extraction strategies applied in priority order:
"refs": ["e-3"]in JSON field**Refs**: [e-3, e-4]in bodyrefs=[e-3]in body(refs: e-3)in bodye-Nmentions in body prefixknown_idsentriesarray,---delimiters, or## Sectionheaderstype: finding+ rebuttal signals in bodyrebuttalaction: decline/action: cleancore/variants/traditional.pyReplaced naive
parse_agent_responsewith delegation toparse_entries. Addedknown_idsparameter.core/orchestrator.pyThreads
known_ids(set of board entry IDs from the already-fetched snapshot) intoparse_agent_responseto enable validated prose ref extraction.models/personas.py(full rewrite)All 5 constant role personas + expert generator rewritten with explicit JSON output contracts, worked examples, confidence calibration tables, and anti-pattern sections:
entriesarray, one entry per idearefscontains critique ID →type: rebuttalDomain Analyst,Systems Thinker) forbiddencore/config.py+core/blackboard.pyNode-name normalization:
NODE_URL_TO_NAMEreverse map;publish_lognormalizes raw HTTP URLs to friendlynode-1/2/3identifiers.tests/test_response_parser.py(new, 37 tests)Every production failure pattern from the board audit has its own test case.
Test Results
Architecture Impact
e-5.refs = [](body says**Refs**: [e-3, e-4])e-5.refs = ["e-3", "e-4"]planentriescritiqueentries, each with own refsfindingrebuttalon body signalsCloses the routing blind-spot:
_deterministic_fallbackand_infer_phasenow have accurate refs to work from.