Skip to content

feat(groomer): schema-constrained LLM output (json_schema)#516

Merged
joryirving merged 2 commits into
mainfrom
feat/groomer-structured-output
Jul 1, 2026
Merged

feat(groomer): schema-constrained LLM output (json_schema)#516
joryirving merged 2 commits into
mainfrom
feat/groomer-structured-output

Conversation

@joryirving

Copy link
Copy Markdown
Contributor

Summary

  • Upgrade the groomer's response_format from {type:"json_object"} (valid JSON, unconstrained shape) to {type:"json_schema"} with a JSON Schema for the groomer output. On the self-hosted llama.cpp backend (via litellm, which forwards response_format), this grammar-constrains decoding to the exact shape — the key to reliable output from a small model.
  • lane.id is a dynamic enum built from the configured lanes (getLaneIds()), so the model can only emit a real lane.
  • Graceful fallback: on a 400 (backend without json_schema support) it retries with {type:"json_object"}, so grooming never breaks. validateGroomerOutput still runs as the safety net (and handles enum alias canonicalization) — belt and suspenders.

Why

Enables running the groomer on a small local model (e.g. the 4B vision model) reliably: the schema/grammar enforces structure that a 4B otherwise gets wrong, while the existing coercion layer handles the rest.

Verification

  • vitest run src/lib/groomer/ → 190 passing (3 new: schema shape + dynamic lane enum, json_schema sent first, 400→json_object fallback).
  • tsc --noEmit → 0; eslint → clean.

Live validation to do after merge

Confirm litellm forwards json_schema through to llama.cpp: dry-run a groom against the vision model and check the output is schema-valid. (Safe regardless — the fallback + validator mean worst case is today's behavior.)

Related: #498

The groomer used response_format {type:"json_object"} — valid JSON, but not
schema-conformant, so a model could emit wrong/missing fields. Switch to
{type:"json_schema"} with a JSON Schema for the groomer output, so a
llama.cpp backend (via litellm) grammar-constrains decoding to the exact
shape. This is what lets a small model (e.g. the 4B vision model) produce
reliable structure.

- buildGroomerResponseSchema() constrains structure + fixed enums, and builds
  lane.id as a dynamic enum from the configured lanes (getLaneIds) so the
  model can't emit a hallucinated lane.
- Falls back to json_object on a 400 (backend without json_schema support), so
  grooming never breaks. validateGroomerOutput still runs as the safety net and
  handles enum alias canonicalization — belt and suspenders.

Related: #498 (dispatch audit). Enables running the groomer on a small local model.
joryirving added a commit to joryirving/home-ops that referenced this pull request Jul 1, 2026
Grooming is triage/classification, not code — a small model is plenty. Point
DISPATCH_GROOMER_MODEL at `vision` (Qwen3.5-4B / llama-vision via litellm),
freeing the bigger model for the coder/reviewer. dispatch's json_schema
response_format (misospace/dispatch#516) grammar-constrains the 4B's output
so structure stays reliable.
its-saffron[bot]

This comment was marked as outdated.

@joryirving joryirving enabled auto-merge (squash) July 1, 2026 21:38
@joryirving joryirving merged commit 8b50c25 into main Jul 1, 2026
6 checks passed
@joryirving joryirving deleted the feat/groomer-structured-output branch July 1, 2026 21:43

@its-saffron its-saffron Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Automated Review

Full PR review.

Analysis engine: MiniMax-M2.7@https://litellm.jory.dev/v1 (anthropic) — escalated (fast_low_confidence)

Recommendation: Approve

This PR cleanly upgrades the groomer's LLM output from an unconstrained {type:"json_object"} to a schema-constrained {type:"json_schema"} format with a dynamic lane enum, while maintaining backward compatibility through a graceful fallback.


Change-by-Change Findings

src/lib/groomer/llm.ts

buildGroomerResponseSchema() (new function, lines 67–131)

  • Builds a strict JSON Schema with additionalProperties: false to enforce exact shape
  • lane.id is a dynamic enum populated from getLaneIds(), ensuring the model can only emit configured lanes — eliminates hallucinated lane IDs
  • confidence is an enum of ["high","medium","low"] shared across lane.confidence and the top-level field
  • Required fields are labelsToAdd, labelsToRemove, lane — consistent with the existing GroomerOutput schema contract
  • Belt-and-suspenders approach with validateGroomerOutput still running post-decode

postChatCompletion() (new helper, lines 133–149)

  • Extracted from callGroomerLLM to enable the retry pattern without duplicating the fetch call
  • Correctly passes AbortSignal for timeout cancellation in both the initial call and fallback
  • Temperature fixed at 0.1 — unchanged from previous behavior

callGroomerLLM() (modified, lines 151–169)

  • Attempts {type:"json_schema", json_schema: {...}} first, as intended
  • On response.status === 400 it retries with {type:"json_object"} — the documented graceful fallback
  • The controller.signal is correctly threaded into both postChatCompletion calls, so timeout cancellation is preserved
  • The timeoutId is still cleared in the finally block (visible in the diff's continuation)

src/lib/groomer/llm.test.ts

Existing test assertion (line 34): Changed from expect(capturedBody.response_format?.type).toBe("json_object")json_schema — correctly reflects the new default behavior.

buildGroomerResponseSchema test (lines 191–201):

  • Verifies required fields, additionalProperties: false, and lane.id enum matches getLaneIds()
  • Checks laneId.enum.length > 0 to guard against empty lane configs

callGroomerLLM response_format tests (lines 203–230):

  • Test 1 verifies the first call sends json_schema with the correct schema name and dynamic lane enum
  • Test 2 verifies the 400→json_object fallback: mock returns 400 first, then 200, and asserts both call bodies and a successful parsed result

Standards Compliance

Convention Status
Uses getLaneIds() from @/lib/lane-config for lane configuration ✅ Verified via git_grep hits
No new secrets or environment variables ✅ No secrets in diff
Error handling with error instanceof Error pattern if (!response.ok) with text extraction
API returns appropriate HTTP status codes ✅ 400 triggers fallback, not error
TypeScript types via Prisma/schema ✅ Uses existing GroomerOutput type
No hardcoded agent names in generic docs ✅ N/A — library code

Evidence Provider Findings

No evidence providers configured. No findings.


Tool Harness Findings

Tool Finding Resolution
getLaneIds usage scan Used correctly in llm.ts (imported, called at line 76, result used in schema) and test (imported, used in assertions) ✅ Verified
lane-config.ts read Shows getLaneIds() at line 150 returning an array of configured lane IDs; default config has default and backlog ✅ Consistent with test expectations
llm.ts full read Confirms buildGroomerResponseSchema is exported and callGroomerLLM uses postChatCompletion with signal threading ✅ Verified

Linked Issue Fit

No linked issue references found in the PR body. The PR references PR 498 as "Related" but it is not linked as an acceptance-criteria source. This review cannot assess requirement fulfillment against PR 498.


CI Check Results

All checks passed for commit b1eacda6:

  • Docker Build ✅
  • Typecheck ✅
  • Build ✅
  • Tests ✅
  • Lint ✅

Unknowns / Needs Verification

  1. Live validation (post-merge): The PR body states a manual verification is needed after merge — confirm litellm forwards json_schema through to llama.cpp. This is expected and documented; no blocker.
  2. Issue PR 498 acceptance criteria: Referenced as "Related" but not linked as a formal requirement source. No formal acceptance criteria to verify.

Findings

No findings to report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant