Skip to content

feat(quality): persist the gate's grade signal as a cross-engagement trend#33

Open
stxkxs wants to merge 1 commit into
mainfrom
feat/quality-trend
Open

feat(quality): persist the gate's grade signal as a cross-engagement trend#33
stxkxs wants to merge 1 commit into
mainfrom
feat/quality-trend

Conversation

@stxkxs

@stxkxs stxkxs commented Jun 22, 2026

Copy link
Copy Markdown
Member

Why

The merge gate grades every PR across the 9 QUALITY_RUBRIC dimensions, and the external-reviewer re-grades cold for calibration. Until now those grades drove a single ship/block decision and were then discardedrunExternalCalibration returned null on the aligned path, throwing the grades away. The factory could not answer the question that tells it whether it is getting better: are my grades trending up or down across engagements?

This closes the loop by persisting the signal the gate already computes, and surfacing the trend in fab perf. It is a wire-up, not a new repo or datastore.

What

  • src/quality.ts (new)QualityRun records (timestamp, workflow, profile, decision, attempts, internal grades, external grades + drift for calibrated runs) over an append-only JSONL at ~/.fab/quality.jsonl. Lives next to state.json so the trend is cross-engagement, not per-working-tree. FAB_QUALITY_FILE override mirrors FAB_STATE_FILE. Includes gradeToGpa (letters → 0–4.3) and formatQualityTrend (per-dimension overall-vs-recent table with a direction arrow; declining dimensions in red; footer with approval / calibration / drift rates).
  • src/gate.ts — extracted aggregateGrades(verdicts) (the aggregation the calibration did inline) so the gate and the quality record share one source of truth.
  • src/workflows.tsrunExternalCalibration now returns the grades + drift alongside its blocking result instead of discarding them; runMergeGate records exactly one QualityRun per terminal path (approve / drift-block / reject / exhausted) via a best-effort helper that can never break the gate.
  • src/bin/fab.tsfab perf prints the quality trend under the agent table.

Scope / deferred

Verification

npm run build, npm run typecheck, npm test (all suites), npm run lint, npm run format:check — all green. New tests: quality.test.ts + aggregateGrades cases in gate.test.ts.

…trend

The merge gate grades every PR across the 9 QUALITY_RUBRIC dimensions and the
external-reviewer re-grades cold, but those grades only ever drove a single
ship/block decision and were then discarded. The factory could not answer the
one question that tells it whether it is improving: are my grades trending up
or down across engagements? This wires the existing signal into an append-only
record and surfaces the trend.

─────────────────────────── What changed ───────────────────────────

src/quality.ts (new) — the quality log.
  - QualityRun record: timestamp, workflow, gate profile, final decision,
    revision attempts, the aggregate internal grades, and (for calibrated
    code runs) the external-reviewer grades + drift.
  - appendQualityRun / loadQualityRuns over an append-only JSONL at
    ~/.fab/quality.jsonl — next to state.json, so the signal spans every repo
    the factory ships rather than one working tree. Overridable with
    FAB_QUALITY_FILE, mirroring FAB_STATE_FILE.
  - gradeToGpa maps letters (with +/-) to a 0–4.3 scale; N/A is excluded.
  - formatQualityTrend renders a per-dimension table (overall vs recent-window
    GPA with a direction arrow — declining dimensions show in red) plus a
    footer with approval rate, calibration coverage, and drift rate.

src/gate.ts — extracted aggregateGrades(verdicts), the internal-grade
  aggregation the external calibration already did inline (advisory verdicts
  skipped, later verdict wins on collision). Now shared by the gate and the
  quality record so they cannot diverge.

src/workflows.ts — capture at the gate, not a new pass.
  - runExternalCalibration now returns the internal + external grades and the
    drift alongside its blocking result, instead of returning null and
    throwing the grades away on the aligned path.
  - runMergeGate records exactly one QualityRun on every terminal path
    (approve, drift-block reject, gate reject, exhausted revisions) via a
    best-effort recordQuality helper — a metrics write is wrapped so it can
    never break the gate.

src/bin/fab.ts — `fab perf` now prints the quality trend below the agent
  performance table. One command, the whole picture.

src/index.ts — exports the quality surface.

─────────────────────────── Scope ───────────────────────────

This is the loop-closing wire-up, deliberately not a new repo or datastore.
A frozen golden-brief benchmark and a re-grading harness become worthwhile
only once the JSONL has real run density — they are explicitly deferred.

collectSessionMetrics (the per-role token table behind `fab perf`) remains
unwired; it is managed-agents-API-specific and orthogonal to the grade trend.
Tracked separately.

Tests: quality.test.ts (roundtrip, gradeToGpa, trend formatting/footer) and
aggregateGrades cases in gate.test.ts. Full suite, typecheck, lint, and
prettier all green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant