Skip to content

fix(groomer): grammar-enforce label allowlist and proposedTitle bounds#542

Merged
joryirving merged 1 commit into
mainfrom
fix/groomer-schema-tighten
Jul 2, 2026
Merged

fix(groomer): grammar-enforce label allowlist and proposedTitle bounds#542
joryirving merged 1 commit into
mainfrom
fix/groomer-schema-tighten

Conversation

@joryirving

Copy link
Copy Markdown
Contributor

Summary

  • buildGroomerResponseSchema now enum-constrains labelsToAdd/labelsToRemove items to the validator's allowlist (newly exported ALLOWED_GROOMER_LABELS) and bounds proposedTitle (null or 10–200 chars) / proposedBody (null or <10k).

Fixes the two prod groom failures on the vision 4B: labelsToAdd contains disallowed label: type/refactor (invented label) and proposedTitle must be between 10 and 200 characters, got 0 (empty string instead of omitting). The grammar was looser than validateGroomerOutput; now the model physically can't emit either. llama.cpp supports enum/anyOf/min-maxLength in json_schema; if a backend rejects the schema, the existing 400→json_object fallback still applies.

Verification

  • vitest run src/lib/groomer/ → 194 passed (+2); tsc + eslint clean.

The json_schema was looser than validateGroomerOutput, so a
grammar-constrained 4B could still emit output the validator rejects —
both seen in prod: labelsToAdd '[type/refactor]' (invented label) and
proposedTitle '' (emitted empty instead of omitting). Constrain label
items to the exported ALLOWED_GROOMER_LABELS enum and bound
proposedTitle (10-200 chars or null) / proposedBody (<10k or null) in
the grammar itself.

@its-saffron its-saffron Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Automated Review

Full PR review.

Analysis engine: MiniMax-M2.7@https://litellm.jory.dev/v1 (anthropic) — routed smart (risk match: db_or_migration_changes)

PR Review: fix(groomer): grammar-enforce label allowlist and proposedTitle bounds

Summary

This PR tightens the JSON schema emitted by buildGroomerResponseSchema so that small LLMs cannot produce output that fails downstream validation. It adds enum constraints to labelsToAdd/labelsToRemove and length bounds to proposedTitle/proposedBody.

Required Checks

  • review migration for data loss risk: ✅ NOT APPLICABLE — This PR contains no database migration. The changed files are all TypeScript sources in src/lib/groomer/ (llm.ts, llm.test.ts, schema.ts). There is no prisma/schema.prisma change, no migration file, and no Prisma operations. The db_or_migration_changes classification appears to be a misclassification; the PR is a pure code/schema change.

  • test migration on a copy of production schema: ✅ NOT APPLICABLE — No migration exists to test. All CI checks passed (Build, Typecheck, Lint, Tests: 194 passed).

Change-by-Change Findings

File Change Assessment
src/lib/groomer/schema.ts Exports ALLOWED_GROOMER_LABELS (a readonly array of status + priority + type labels). Refactors isAllowedLabel to use .includes() on that array. Clean refactor; centralizes the allowlist in one place.
src/lib/groomer/llm.ts Enum-constrains labelsToAdd/labelsToRemove items to ALLOWED_GROOMER_LABELS; adds anyOf bounds to proposedTitle (null or 10–200 chars) and proposedBody (null or ≤9999 chars). Addresses both prod failures described in the PR body: the "type/refactor" invented label and the empty-string proposedTitle.
src/lib/groomer/llm.test.ts Adds two new test cases verifying enum constraint and title/body bounds. Coverage is sufficient; verifies the exact failure mode seen in prod.

Standards Compliance

  • Code Standards (AGENTS.md): ✅ No agent-specific names in docs; ✅ Prisma schema not touched; ✅ no commit of secrets. The changes follow the existing pattern of schema-constrained LLM output introduced in PR PR 516.
  • Error handling: ✅ Uses existing validateGroomerOutput + 400→json_object fallback pattern mentioned in the PR body.
  • The PR follows the repository's established convention of using json_schema (with anyOf/enum) for LLM output validation.

Linked Issue Fit

No linked issue was found in the PR body. The PR is self-contained and directly addresses two described production failures without needing external issue context.

Evidence Provider Findings

No evidence providers were configured; none to evaluate.

Tool Harness Findings

No tool harness calls were issued; corpus reviewed directly.

Unknowns / Needs Verification

  • The db_or_migration_changes classification is inconsistent with the diff: no migration files exist in this PR. The classifier may have triggered on the word "schema" in file names (schema.ts, buildGroomerResponseSchema). This does not block approval but is worth noting.
  • The maxLength: 9999 bound on proposedBody was not explicitly called out in the PR body. The validator's actual upper bound is not shown in the diff, but the test confirms the expected value. Assuming this matches validateGroomerOutput (the PR states "grammar was looser than validateGroomerOutput; now the model physically can't emit either"), this is consistent.

Recommendation

Approve. The changes are internally consistent, all CI checks pass, tests cover the new constraints, and the approach directly fixes the two production failures described in the PR body. No database or migration impact exists.

@joryirving joryirving merged commit a1f9a74 into main Jul 2, 2026
6 checks passed
@joryirving joryirving deleted the fix/groomer-schema-tighten branch July 2, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant