Skip to content

perf(refine): consume _labels.json + lazy numeric probes#19

Merged
ebootheee merged 3 commits into
mainfrom
feat/refine-label-index
May 28, 2026
Merged

perf(refine): consume _labels.json + lazy numeric probes#19
ebootheee merged 3 commits into
mainfrom
feat/refine-label-index

Conversation

@ebootheee
Copy link
Copy Markdown
Owner

Summary

You asked me to dig into the label→cell pre-index and verify how much refine already leverages _labels.json. The answer was: zero. manifest refine rebuilt a full label+numeric index over the entire ground truth on every run (buildIndex), even though it only ever inspects numerics on a matched label's own row. On big models the bulk of that work indexed giant unlabeled grids (e.g. a 190 MB PP&E depreciation schedule) the refiner never consults — pure waste. And _labels.json, which has existed since V4, wasn't consumed by refine at all.

This PR fixes that, behavior-preserving.

What changed (cli/commands/manifest-refine.mjs)

  • Labels from chunked/_labels.json when present — an O(labels) read instead of scanning every cell. Legacy engines without the index fall back to a one-time GT scan (buildLabelIndex), so nothing breaks.
  • Numerics resolved lazily, per matched row (numericsForRow, memoized): probe that row's columns on demand, stopping after a long empty-column run — instead of bucketing every numeric in a multi-million-cell workbook up front. The giant unlabeled grids are never touched.
  • Candidate ranking, dedup, value-range, and summary/rollup/hint logic are untouched.

Why this is the right scope (and what it isn't)

The naive idea — "just wire _labels.json in" — doesn't work alone, because _labels.json has labels only, no numbers, and refine needs same-row numerics for value-range validation + base-case values. So the win comes from also making numerics lazy. The remaining cost floor is the unavoidable JSON parse of the ground truth.

Two deliberate follow-ups (in ROADMAP, not this PR):

  • Tier B: a parser-emitted row-values artifact (numerics for label-bearing rows only) would let refine skip the GT parse entirely — big win on giant-grid models, ~GT-sized (no win) on dense-label models, so gate on a real-model size measurement.
  • Apply the same lazy-numerics path to searchByLabel (query/carry), and build the GT index once per init (today generate → refine → doctor → maps each re-parse the GT — the real source of the "~2.5 min" loop).

Impact

The eliminated buildIndex pass scales with total cell count; the new probe cost scales with matched label rows (a few dozen). Measured on a synthetic giant-grid GT: the removed pass alone was ~1.4 s (1.4 M cells) / ~7.9 s (6.4 M cells); end-to-end refine now finishes in less time than the old index build took, projecting ~11 s → ~3.6 s on a 200 MB GT and a larger relative win as the unlabeled grid grows.

Tests — tests/cli/test-refine-label-index.mjs (14), wired into npm test

  • correctness off _labels.json;
  • parity — identical mappings via the index path vs. the GT-scan fallback (the optimization changes speed, not results);
  • lazy-probe far/gapped columns + value-range filtering;
  • consumption proof — a label present only in the index (not as a GT string) is still resolved, which the fallback provably cannot do.

The existing test-artifact-slimming suite already runs runInit end-to-end through the real parser, so refine consuming a real _labels.json is covered there too.

Validation (local)

npm test (all 8 JS suites incl. the new 14) + smoke 78/78 + test:depgraph 11 + test:engine 21 + test:slimming 13 — all green. CI (added in #18) will re-run the matrix on this PR.

🤖 Generated with Claude Code

ebootheee and others added 3 commits May 28, 2026 14:46
…T index

`ete manifest refine` rebuilt a full label+numeric index over the entire
ground truth on every run (buildIndex), even though it only ever inspects
numerics on a matched label's own row. On big models the bulk of that work
indexed giant *unlabeled* grids (e.g. a PP&E schedule) the refiner never
consults.

Now buildIndex:
- sources labels from the Rust parser's chunked/_labels.json when present
  (O(labels), no GT scan), falling back to buildLabelIndex(gt) for legacy
  engines that predate the index;
- resolves same-row numerics lazily by probing the row's columns on demand
  (numericsForRow), memoized per row, stopping after a long empty-column run.

Behavior-preserving: ranking/dedup/value-range logic is untouched, so the
existing manifest/ship-ready suites stay green. The remaining full pass is the
unavoidable JSON parse of the ground truth (a follow-up could lift that with a
parser-emitted row-values artifact; see ROADMAP).

New tests/cli/test-refine-label-index.mjs (14): correctness off _labels.json,
parity between the index path and the GT-scan fallback, lazy-probe far/gapped
columns + value ranges, and a consumption proof (a label present only in the
index — not as a GT string — is still resolved; the fallback provably cannot).
Wired into `npm test`.

Measured (synthetic giant-grid GT): the eliminated buildIndex pass alone was
~1.4s on 1.4M cells / ~7.9s on 6.4M cells; new refine completes end-to-end in
less time than the old index build took, and the skipped work scales with total
cell count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CHANGELOG + PLAN entries for the _labels.json consumption + lazy numeric
probes. ROADMAP: mark the pre-indexed label->cell item done for refine, with
Tier B (parser-emitted row-values artifact) and the searchByLabel / init
single-index follow-ups called out. SKILL: note refine is faster on big models
(transparent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ebootheee ebootheee merged commit 480d27e into main May 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant