feat(quality): persist the gate's grade signal as a cross-engagement trend by stxkxs · Pull Request #33 · nanohype/fab

stxkxs · 2026-06-22T23:13:28Z

Why

The merge gate grades every PR across the 9 QUALITY_RUBRIC dimensions, and the external-reviewer re-grades cold for calibration. Until now those grades drove a single ship/block decision and were then discarded — runExternalCalibration returned null on the aligned path, throwing the grades away. The factory could not answer the question that tells it whether it is getting better: are my grades trending up or down across engagements?

This closes the loop by persisting the signal the gate already computes, and surfacing the trend in fab perf. It is a wire-up, not a new repo or datastore.

What

src/quality.ts (new) — QualityRun records (timestamp, workflow, profile, decision, attempts, internal grades, external grades + drift for calibrated runs) over an append-only JSONL at ~/.fab/quality.jsonl. Lives next to state.json so the trend is cross-engagement, not per-working-tree. FAB_QUALITY_FILE override mirrors FAB_STATE_FILE. Includes gradeToGpa (letters → 0–4.3) and formatQualityTrend (per-dimension overall-vs-recent table with a direction arrow; declining dimensions in red; footer with approval / calibration / drift rates).
src/gate.ts — extracted aggregateGrades(verdicts) (the aggregation the calibration did inline) so the gate and the quality record share one source of truth.
src/workflows.ts — runExternalCalibration now returns the grades + drift alongside its blocking result instead of discarding them; runMergeGate records exactly one QualityRun per terminal path (approve / drift-block / reject / exhausted) via a best-effort helper that can never break the gate.
src/bin/fab.ts — fab perf prints the quality trend under the agent table.

Scope / deferred

A frozen golden-brief benchmark + re-grading harness become worthwhile only once the log has run density — deferred by design.
collectSessionMetrics (the per-role token table) stays unwired; it's managed-agents-API-specific and orthogonal. Tracked in fab perf: collectSessionMetrics is never called — per-role table stays empty #32.

Verification

npm run build, npm run typecheck, npm test (all suites), npm run lint, npm run format:check — all green. New tests: quality.test.ts + aggregateGrades cases in gate.test.ts.

…trend The merge gate grades every PR across the 9 QUALITY_RUBRIC dimensions and the external-reviewer re-grades cold, but those grades only ever drove a single ship/block decision and were then discarded. The factory could not answer the one question that tells it whether it is improving: are my grades trending up or down across engagements? This wires the existing signal into an append-only record and surfaces the trend. ─────────────────────────── What changed ─────────────────────────── src/quality.ts (new) — the quality log. - QualityRun record: timestamp, workflow, gate profile, final decision, revision attempts, the aggregate internal grades, and (for calibrated code runs) the external-reviewer grades + drift. - appendQualityRun / loadQualityRuns over an append-only JSONL at ~/.fab/quality.jsonl — next to state.json, so the signal spans every repo the factory ships rather than one working tree. Overridable with FAB_QUALITY_FILE, mirroring FAB_STATE_FILE. - gradeToGpa maps letters (with +/-) to a 0–4.3 scale; N/A is excluded. - formatQualityTrend renders a per-dimension table (overall vs recent-window GPA with a direction arrow — declining dimensions show in red) plus a footer with approval rate, calibration coverage, and drift rate. src/gate.ts — extracted aggregateGrades(verdicts), the internal-grade aggregation the external calibration already did inline (advisory verdicts skipped, later verdict wins on collision). Now shared by the gate and the quality record so they cannot diverge. src/workflows.ts — capture at the gate, not a new pass. - runExternalCalibration now returns the internal + external grades and the drift alongside its blocking result, instead of returning null and throwing the grades away on the aligned path. - runMergeGate records exactly one QualityRun on every terminal path (approve, drift-block reject, gate reject, exhausted revisions) via a best-effort recordQuality helper — a metrics write is wrapped so it can never break the gate. src/bin/fab.ts — `fab perf` now prints the quality trend below the agent performance table. One command, the whole picture. src/index.ts — exports the quality surface. ─────────────────────────── Scope ─────────────────────────── This is the loop-closing wire-up, deliberately not a new repo or datastore. A frozen golden-brief benchmark and a re-grading harness become worthwhile only once the JSONL has real run density — they are explicitly deferred. collectSessionMetrics (the per-role token table behind `fab perf`) remains unwired; it is managed-agents-API-specific and orthogonal to the grade trend. Tracked separately. Tests: quality.test.ts (roundtrip, gradeToGpa, trend formatting/footer) and aggregateGrades cases in gate.test.ts. Full suite, typecheck, lint, and prettier all green.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quality): persist the gate's grade signal as a cross-engagement trend#33

feat(quality): persist the gate's grade signal as a cross-engagement trend#33
stxkxs wants to merge 1 commit into
mainfrom
feat/quality-trend

stxkxs commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stxkxs commented Jun 22, 2026

Why

What

Scope / deferred

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant