fix: robust response parser + domain-aware personas by arvarik · Pull Request #19 · arvarik/bmas

arvarik · 2026-06-18T05:19:24Z

Summary

Two critical blackboard architecture failures identified from post-mortem of task-caccf02b are fixed.

Problem

1. Empty refs fields — agents write **Refs**: [e-3, e-4] in body prose but leave the structured refs JSON field empty. The routing and convergence machinery depends entirely on this field (_deterministic_fallback, _infer_phase, salience scoring).

2. Bundled entries — planner wrote 4 plans in one entry; critic wrote 4 critiques in one entry. Breaks selective referencing and prevents the CU from routing critique authors back to their own work.

Root cause: persona prompts described fields without showing format examples. Parser trusted agent output completely — no extraction fallback.

Changes

`core/response_parser.py` (new)

587-line standalone parser with 7 extraction strategies applied in priority order:

Pattern	Fix
`"refs": ["e-3"]` in JSON field	Pass through (happy path)
`Refs: [e-3, e-4]` in body	Regex → refs field
`refs=[e-3]` in body	Regex → refs field
`(refs: e-3)` in body	Regex → refs field
Bare `e-N` mentions in body prefix	Heuristic → validated against `known_ids`
Multiple entries in one body	Split by `entries` array, `---` delimiters, or `## Section` headers
Decider JSON code-fence wrapping	Unwrap + merge inner refs
`type: finding` + rebuttal signals in body	Auto-promote to `rebuttal`
Confidence as string / missing	Parse + hedging-word heuristic
`action: decline` / `action: clean`	Pass through unchanged

`core/variants/traditional.py`

Replaced naive parse_agent_response with delegation to parse_entries. Added known_ids parameter.

`core/orchestrator.py`

Threads known_ids (set of board entry IDs from the already-fetched snapshot) into parse_agent_response to enable validated prose ref extraction.

`models/personas.py` (full rewrite)

All 5 constant role personas + expert generator rewritten with explicit JSON output contracts, worked examples, confidence calibration tables, and anti-pattern sections:

Planner/critic: require entries array, one entry per idea
Expert: clear rule — if refs contains critique ID → type: rebuttal
Decider: body must be plain prose, never a JSON code block; refs must be exhaustive
AG prompt: domain-specific experts only, generic roles (Domain Analyst, Systems Thinker) forbidden

`core/config.py` + `core/blackboard.py`

Node-name normalization: NODE_URL_TO_NAME reverse map; publish_log normalizes raw HTTP URLs to friendly node-1/2/3 identifiers.

`tests/test_response_parser.py` (new, 37 tests)

Every production failure pattern from the board audit has its own test case.

Test Results

ruff check src/ tests/    → All checks passed
mypy src/                 → Success: no issues found in 34 source files
pytest daemon/tests/      → 446 passed, 1 skipped
pytest agent/tests/       → 39 passed
eslint + tsc              → clean
vitest                    → 118 passed
npm run build             → ✓

Architecture Impact

Before	After
`e-5.refs = []` (body says `Refs: [e-3, e-4]`)	`e-5.refs = ["e-3", "e-4"]`
Planner → 1 entry with 4 sub-goals	Planner → 4 separate `plan` entries
Critic → 1 entry with 4 critiques	Critic → 4 separate `critique` entries, each with own refs
Rebuttals typed as `finding`	Auto-promoted to `rebuttal` on body signals
Decider body = JSON code block	Unwrapped to plain prose
All confidence = 0.5 (flat)	Calibrated per agent + hedging heuristic

Closes the routing blind-spot: _deterministic_fallback and _infer_phase now have accurate refs to work from.

## Problem Two critical blackboard architecture failures identified from post-mortem of task-caccf02b: 1. Empty refs fields — agents write `**Refs**: [e-3, e-4]` in body prose but leave the structured `refs` JSON field empty. Routing and convergence machinery depends entirely on this field. 2. Bundled entries — planner writes 4 plans in one entry; critic writes 4 critiques in one entry. Breaks selective referencing and prevents the CU from routing critique authors back to their own work. Root cause: persona prompts described fields without showing format examples. Parser trusted agent output completely — no extraction fallback. ## Changes ### daemon/src/core/response_parser.py [NEW] 585-line standalone parser with 7 extraction strategies in priority order: - Structured entries_v1 JSON (happy path) - Bundled entries split via entries array or prose --- / ## delimiters - Refs in 3 prose patterns (**Refs**: / refs=[] / (refs: e-N)) - Heuristic bare e-N mention in first 600 chars (validated vs known_ids) - Decider JSON code-fence unwrapping (with inner refs recovery) - finding → rebuttal auto-promotion on body/title signals - Decider JSON-in-body with nested entries-in-entries - Confidence string parsing + hedging-word heuristic - Cleaner/decline action passthroughs ### daemon/src/core/variants/traditional.py Replaced naive parse_agent_response with delegation to parse_entries. Added known_ids kwarg. ### daemon/src/core/orchestrator.py Threads known_ids (set of board entry IDs from already-fetched snapshot) into parse_agent_response to enable validated prose ref extraction. ### daemon/src/models/personas.py [full rewrite] All 5 constant role personas + expert generator rewritten with: - Explicit JSON output contract with worked examples - Confidence calibration table in every persona - Anti-patterns section (prose refs, bundled entries, JSON fence, wrong type) - Planner/critic: entries array required, one entry per idea - Expert: finding vs rebuttal distinction (refs contains critique ID) - Decider: body must be plain prose, never JSON; refs must be exhaustive - AG prompt: domain-specific experts only, generic roles forbidden ### daemon/src/config.py + bmas.example.yaml - experts_per_tier.complex: 3 → 4 (raised expert count for complex tasks) - Code default and example yaml both updated to match production config ### daemon/src/core/blackboard.py Node-name normalization: NODE_URL_TO_NAME reverse map, publish_log normalizes raw HTTP URLs to friendly node-1/2/3 identifiers. ### daemon/tests/test_response_parser.py [NEW, 37 tests] Every production failure pattern from the board audit has its own test. ### daemon/tests/test_config_validation.py Updated assertion to reflect new complex tier count (4). ## Test Results - ruff check src/ tests/ → All checks passed - mypy src/ --ignore-missing-imports → no issues (34 source files) - pytest daemon/tests/ → 446 passed, 1 skipped - pytest agent/tests/ → 39 passed - eslint + tsc → clean - vitest → 118 passed - npm run build → success

arvarik force-pushed the fix/response-parser-and-personas branch from 9b32041 to 8899d20 Compare June 18, 2026 05:25

arvarik merged commit 90bc489 into main Jun 18, 2026
3 checks passed

arvarik deleted the fix/response-parser-and-personas branch June 18, 2026 05:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: robust response parser + domain-aware personas#19

fix: robust response parser + domain-aware personas#19
arvarik merged 1 commit into
mainfrom
fix/response-parser-and-personas

arvarik commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arvarik commented Jun 18, 2026

Summary

Problem

Changes

core/response_parser.py (new)

core/variants/traditional.py

core/orchestrator.py

models/personas.py (full rewrite)

core/config.py + core/blackboard.py

tests/test_response_parser.py (new, 37 tests)

Test Results

Architecture Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`core/response_parser.py` (new)

`core/variants/traditional.py`

`core/orchestrator.py`

`models/personas.py` (full rewrite)

`core/config.py` + `core/blackboard.py`

`tests/test_response_parser.py` (new, 37 tests)