Skip to content

fix: robust response parser + domain-aware personas#19

Merged
arvarik merged 1 commit into
mainfrom
fix/response-parser-and-personas
Jun 18, 2026
Merged

fix: robust response parser + domain-aware personas#19
arvarik merged 1 commit into
mainfrom
fix/response-parser-and-personas

Conversation

@arvarik

@arvarik arvarik commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

Two critical blackboard architecture failures identified from post-mortem of task-caccf02b are fixed.

Problem

1. Empty refs fields — agents write **Refs**: [e-3, e-4] in body prose but leave the structured refs JSON field empty. The routing and convergence machinery depends entirely on this field (_deterministic_fallback, _infer_phase, salience scoring).

2. Bundled entries — planner wrote 4 plans in one entry; critic wrote 4 critiques in one entry. Breaks selective referencing and prevents the CU from routing critique authors back to their own work.

Root cause: persona prompts described fields without showing format examples. Parser trusted agent output completely — no extraction fallback.

Changes

core/response_parser.py (new)

587-line standalone parser with 7 extraction strategies applied in priority order:

Pattern Fix
"refs": ["e-3"] in JSON field Pass through (happy path)
**Refs**: [e-3, e-4] in body Regex → refs field
refs=[e-3] in body Regex → refs field
(refs: e-3) in body Regex → refs field
Bare e-N mentions in body prefix Heuristic → validated against known_ids
Multiple entries in one body Split by entries array, --- delimiters, or ## Section headers
Decider JSON code-fence wrapping Unwrap + merge inner refs
type: finding + rebuttal signals in body Auto-promote to rebuttal
Confidence as string / missing Parse + hedging-word heuristic
action: decline / action: clean Pass through unchanged

core/variants/traditional.py

Replaced naive parse_agent_response with delegation to parse_entries. Added known_ids parameter.

core/orchestrator.py

Threads known_ids (set of board entry IDs from the already-fetched snapshot) into parse_agent_response to enable validated prose ref extraction.

models/personas.py (full rewrite)

All 5 constant role personas + expert generator rewritten with explicit JSON output contracts, worked examples, confidence calibration tables, and anti-pattern sections:

  • Planner/critic: require entries array, one entry per idea
  • Expert: clear rule — if refs contains critique ID → type: rebuttal
  • Decider: body must be plain prose, never a JSON code block; refs must be exhaustive
  • AG prompt: domain-specific experts only, generic roles (Domain Analyst, Systems Thinker) forbidden

core/config.py + core/blackboard.py

Node-name normalization: NODE_URL_TO_NAME reverse map; publish_log normalizes raw HTTP URLs to friendly node-1/2/3 identifiers.

tests/test_response_parser.py (new, 37 tests)

Every production failure pattern from the board audit has its own test case.

Test Results

ruff check src/ tests/    → All checks passed
mypy src/                 → Success: no issues found in 34 source files
pytest daemon/tests/      → 446 passed, 1 skipped
pytest agent/tests/       → 39 passed
eslint + tsc              → clean
vitest                    → 118 passed
npm run build             → ✓

Architecture Impact

Before After
e-5.refs = [] (body says **Refs**: [e-3, e-4]) e-5.refs = ["e-3", "e-4"]
Planner → 1 entry with 4 sub-goals Planner → 4 separate plan entries
Critic → 1 entry with 4 critiques Critic → 4 separate critique entries, each with own refs
Rebuttals typed as finding Auto-promoted to rebuttal on body signals
Decider body = JSON code block Unwrapped to plain prose
All confidence = 0.5 (flat) Calibrated per agent + hedging heuristic

Closes the routing blind-spot: _deterministic_fallback and _infer_phase now have accurate refs to work from.

## Problem

Two critical blackboard architecture failures identified from post-mortem
of task-caccf02b:

1. Empty refs fields — agents write `**Refs**: [e-3, e-4]` in body prose
   but leave the structured `refs` JSON field empty.  Routing and
   convergence machinery depends entirely on this field.

2. Bundled entries — planner writes 4 plans in one entry; critic writes
   4 critiques in one entry.  Breaks selective referencing and prevents
   the CU from routing critique authors back to their own work.

Root cause: persona prompts described fields without showing format
examples.  Parser trusted agent output completely — no extraction fallback.

## Changes

### daemon/src/core/response_parser.py  [NEW]
585-line standalone parser with 7 extraction strategies in priority order:
- Structured entries_v1 JSON (happy path)
- Bundled entries split via entries array or prose --- / ## delimiters
- Refs in 3 prose patterns (**Refs**: / refs=[] / (refs: e-N))
- Heuristic bare e-N mention in first 600 chars (validated vs known_ids)
- Decider JSON code-fence unwrapping (with inner refs recovery)
- finding → rebuttal auto-promotion on body/title signals
- Decider JSON-in-body with nested entries-in-entries
- Confidence string parsing + hedging-word heuristic
- Cleaner/decline action passthroughs

### daemon/src/core/variants/traditional.py
Replaced naive parse_agent_response with delegation to parse_entries.
Added known_ids kwarg.

### daemon/src/core/orchestrator.py
Threads known_ids (set of board entry IDs from already-fetched snapshot)
into parse_agent_response to enable validated prose ref extraction.

### daemon/src/models/personas.py  [full rewrite]
All 5 constant role personas + expert generator rewritten with:
- Explicit JSON output contract with worked examples
- Confidence calibration table in every persona
- Anti-patterns section (prose refs, bundled entries, JSON fence, wrong type)
- Planner/critic: entries array required, one entry per idea
- Expert: finding vs rebuttal distinction (refs contains critique ID)
- Decider: body must be plain prose, never JSON; refs must be exhaustive
- AG prompt: domain-specific experts only, generic roles forbidden

### daemon/src/config.py + bmas.example.yaml
- experts_per_tier.complex: 3 → 4 (raised expert count for complex tasks)
- Code default and example yaml both updated to match production config

### daemon/src/core/blackboard.py
Node-name normalization: NODE_URL_TO_NAME reverse map, publish_log
normalizes raw HTTP URLs to friendly node-1/2/3 identifiers.

### daemon/tests/test_response_parser.py  [NEW, 37 tests]
Every production failure pattern from the board audit has its own test.

### daemon/tests/test_config_validation.py
Updated assertion to reflect new complex tier count (4).

## Test Results
- ruff check src/ tests/ → All checks passed
- mypy src/ --ignore-missing-imports → no issues (34 source files)
- pytest daemon/tests/ → 446 passed, 1 skipped
- pytest agent/tests/ → 39 passed
- eslint + tsc → clean
- vitest → 118 passed
- npm run build → success
@arvarik arvarik force-pushed the fix/response-parser-and-personas branch from 9b32041 to 8899d20 Compare June 18, 2026 05:25
@arvarik arvarik merged commit 90bc489 into main Jun 18, 2026
3 checks passed
@arvarik arvarik deleted the fix/response-parser-and-personas branch June 18, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant