ebootheee · ebootheee · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,47 @@
 # excel-to-engine — Changelog
 
+## 2026-05-28 — refine consumes `_labels.json` + lazy numeric probes
+
+`ete manifest refine` rebuilt a full label+numeric index over the **entire**
+ground truth on every run (`buildIndex`), even though it only ever inspects
+numerics on a *matched label's own row*. On big models the bulk of that work
+indexed giant **unlabeled** grids (e.g. a 190 MB PP&E depreciation schedule)
+that the refiner never consults — pure waste. (Investigation also found refine
+did **not** consume the parser's `_labels.json` at all, despite that index
+existing since V4.)
+
+### What changed
+
+- **Labels now come from `chunked/_labels.json`** when the parser emitted it —
+  an O(labels) read instead of scanning every cell. Legacy engines without the
+  index fall back to a one-time GT scan (`buildLabelIndex`), so nothing breaks.
+- **Numerics are resolved lazily, per matched row**, by probing that row's
+  columns on demand (`numericsForRow`, memoized) — instead of bucketing every
+  numeric in a multi-million-cell workbook up front. The giant unlabeled grids
+  are never touched.
+- **Behavior-preserving:** the candidate ranking, dedup, value-range, and
+  summary/rollup/hint logic are untouched. The full manifest + ship-ready
+  suites stay green.
+
+### Impact
+
+The eliminated `buildIndex` pass scales with *total* cell count; the new probe
+cost scales with *matched label rows* (a few dozen). On a synthetic giant-grid
+ground truth the removed pass alone was ~1.4 s (1.4 M cells) / ~7.9 s (6.4 M
+cells); end-to-end refine now finishes in less time than the old index build
+took. The remaining floor is the unavoidable JSON parse of the ground truth — a
+follow-up could lift that with a parser-emitted row-values artifact (see
+ROADMAP), and the same lazy-numerics treatment could be extended to
+`searchByLabel` (the `query` / `carry` path).
+
+### Tests
+
+- `tests/cli/test-refine-label-index.mjs` (14), wired into `npm test`:
+  correctness off `_labels.json`; **parity** between the index path and the
+  GT-scan fallback; lazy-probe far/gapped columns + value ranges; and a
+  **consumption proof** — a label present only in the index (not as a GT
+  string) is still resolved, which the fallback provably cannot do.
+
 ## 2026-05-28 — Continuous integration (GitHub Actions)
 
 The test suite is now substantial (132 JS assertions across 7 suites, plus the

diff --git a/PLAN.md b/PLAN.md
@@ -1,5 +1,19 @@
 # excel-to-engine — Plan
 
+## Status: refine label-index optimization — landed 2026-05-28
+
+`ete manifest refine` now sources labels from the parser's `_labels.json`
+(O(labels), no full GT scan) and resolves same-row numerics lazily by probing,
+instead of bucketing every numeric in the workbook up front (`buildIndex`). The
+giant unlabeled grids that dominate big models — the very thing that made refine
+slow — are no longer touched. Behavior-preserving (rankings unchanged; suites
+green). New `tests/cli/test-refine-label-index.mjs` (14) proves consumption +
+parity. The remaining cost floor is the ground-truth JSON parse; lifting that
+would need a parser-emitted row-values artifact (Tier B). The same lazy-numerics
+treatment is still open for `searchByLabel` (the `query`/`carry` path), and the
+per-command GT re-parse multiplier in `init` (generate → refine → doctor → maps
+each reload the GT) remains a separate follow-up.
+
 ## Status: Continuous integration — landed 2026-05-28
 
 `.github/workflows/ci.yml` runs the full test matrix (Rust build + 11 unit

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -76,9 +76,21 @@ when we next touch the monitor server or auth surface.
 ### Manifest Refinement (continuing)
 - Model-family templates — recognize a family by its sheet signature and pick
   known cells directly (summary tabs, promote tab, etc.).
-- Pre-indexed label→cell map built once during parsing (the session log noted
-  `manifest refine` took 2.5 min CPU on a 200 MB ground truth; a pre-index
-  from the Rust parser would cut this 10–100×).
+- Pre-indexed label→cell map.
+  - **Done (2026-05-28):** `ete manifest refine` now consumes the parser's
+    `chunked/_labels.json` for labels (it previously ignored it and rebuilt a
+    full label+numeric index over the whole GT) and resolves same-row numerics
+    lazily by probing — so it no longer indexes the giant unlabeled grids that
+    dominate big models. The removed `buildIndex` pass was ~7.9 s on a 6.4 M-cell
+    GT; the work skipped scales with total cell count. `test-refine-label-index`.
+  - **Still open (Tier B):** the remaining floor is the ground-truth JSON parse.
+    A parser-emitted *row-values* artifact (numerics for label-bearing rows
+    only) would let refine skip the GT entirely — a large win on giant-grid
+    models, ~GT-sized (no win) on dense-label models, so gate it on a
+    real-model size measurement first.
+  - **Still open:** apply the same lazy-numerics path to `searchByLabel`
+    (`query` / `carry`), and build the GT index *once* per `init` so
+    generate → refine → doctor → maps stop each re-parsing it.
 - Manifest migration tooling for model updates (vN → vN+1 shape diff).
 
 ---

diff --git a/cli/commands/manifest-refine.mjs b/cli/commands/manifest-refine.mjs
@@ -11,7 +11,10 @@
 
 import { readFileSync, writeFileSync, existsSync } from 'fs';
 import { join } from 'path';
-import { loadManifest, loadGroundTruth, resolveCell, MANIFEST_VERSION } from '../../lib/manifest.mjs';
+import {
+  loadManifest, loadGroundTruth, resolveCell, MANIFEST_VERSION,
+  loadLabelIndex, buildLabelIndex,
+} from '../../lib/manifest.mjs';
 
 // ---------------------------------------------------------------------------
 // Required fields and their search strategies
@@ -100,34 +103,73 @@ const REQUIRED_FIELDS = [
   },
 ];
 
+// Excel's hard column ceiling (XFD = 16384). numericsForRow probes a row's
+// columns left-to-right and stops after this many consecutive empty columns —
+// generous enough to span any realistic financial layout (and far-right
+// restated copies lose to the canonical leftmost cell in ranking anyway), while
+// bounding the probe cost on a label-only row to a few hundred hash lookups.
+const MAX_PROBE_COL = 16384;
+const MAX_PROBE_GAP = 256;
+
 /**
- * Build a pre-index of the ground truth for fast searching.
- * Groups string labels by sheet+row and numeric values by sheet+row.
+ * Build a search index over the ground truth.
+ *
+ * Labels come from the Rust parser's pre-built index (`chunked/_labels.json`)
+ * when present — an O(labels) read instead of scanning every cell — and fall
+ * back to a one-time ground-truth scan (`buildLabelIndex`) for legacy engines
+ * that predate the index.
+ *
+ * Numeric values are resolved **lazily, per matched row**, by direct probing
+ * (see `numericsForRow`). The refiner only ever inspects numerics on a label's
+ * own row, so the old approach — bucketing every numeric in a multi-million-cell
+ * workbook up front — was almost entirely wasted: on a big model the bulk of
+ * those cells live in giant *unlabeled* grids (e.g. a PP&E depreciation
+ * schedule) the refiner never consults. Skipping that build is the win; the
+ * one remaining full pass is the unavoidable JSON parse of the ground truth.
+ *
+ * @param {Object} gt - Ground truth { addr: value }
+ * @param {string} [modelDir] - Model dir, for loading `_labels.json`
+ * @returns {{ labels: Array, numericsForRow: (sheet: string, row: number) => Array }}
  */
-function buildIndex(gt) {
-  const labels = [];       // { addr, text, sheet, col, row }
-  const numsByRow = {};    // "sheet!row" → [{ addr, value, col }]
-
-  for (const [addr, val] of Object.entries(gt)) {
-    const bang = addr.lastIndexOf('!');
-    if (bang < 0) continue;
-    const sheet = addr.substring(0, bang);
-    const cellPart = addr.substring(bang + 1);
-    const match = cellPart.match(/^([A-Z]+)(\d+)$/);
-    if (!match) continue;
-    const col = match[1];
-    const row = parseInt(match[2], 10);
-    const rowKey = `${sheet}!${row}`;
-
-    if (typeof val === 'string' && val.length > 2 && val.length < 200) {
-      labels.push({ addr, text: val, sheet, col, row, rowKey });
-    } else if (typeof val === 'number') {
-      if (!numsByRow[rowKey]) numsByRow[rowKey] = [];
-      numsByRow[rowKey].push({ addr, value: val, col });
+function buildIndex(gt, modelDir) {
+  const labelIndex = (modelDir && loadLabelIndex(modelDir)) || buildLabelIndex(gt);
+  const labels = [];
+  for (const entries of Object.values(labelIndex)) {
+    for (const e of entries) {
+      labels.push({
+        addr: `${e.sheet}!${e.col}${e.row}`,
+        text: e.text,
+        sheet: e.sheet,
+        col: e.col,
+        row: e.row,
+        rowKey: `${e.sheet}!${e.row}`,
+      });
     }
   }
 
-  return { labels, numsByRow };
+  const rowCache = new Map();   // "sheet!row" → [{ addr, value, col }]
+  function numericsForRow(sheet, row) {
+    const key = `${sheet}!${row}`;
+    const cached = rowCache.get(key);
+    if (cached) return cached;
+    const nums = [];
+    let gap = 0;
+    for (let c = 1; c <= MAX_PROBE_COL && gap < MAX_PROBE_GAP; c++) {
+      const col = numToCol(c);
+      const addr = `${sheet}!${col}${row}`;
+      const v = gt[addr];
+      if (typeof v === 'number') {
+        nums.push({ addr, value: v, col });
+        gap = 0;
+      } else {
+        gap++;
+      }
+    }
+    rowCache.set(key, nums);
+    return nums;
+  }
+
+  return { labels, numericsForRow };
 }
 
 /**
@@ -141,8 +183,9 @@ export function runManifestRefine(modelDir, args) {
   const manifest = loadManifest(modelDir);
   const gt = loadGroundTruth(manifest, modelDir);
 
-  // Pre-index for fast searching (single pass over GT)
-  const index = buildIndex(gt);
+  // Pre-index for fast searching. Labels come from `_labels.json` when the
+  // parser emitted it (no GT scan); numerics are probed lazily per matched row.
+  const index = buildIndex(gt, modelDir);
 
   // Resolve refinement hints: either passed in via args.hints (used by init
   // when a template has been applied), or read from a hand-edited manifest
@@ -279,7 +322,7 @@ function searchForFieldIndexed(index, field, opts = {}) {
 
   // Pass 2: For each matching label, select the best same-row numeric cell.
   for (const lm of labelMatches) {
-    const rowNums = index.numsByRow[lm.rowKey] || [];
+    const rowNums = index.numericsForRow(lm.sheet, lm.row);
     const labelColNum = colToNum(lm.col);
 
     const inRange = rowNums.filter(n => {
@@ -443,3 +486,15 @@ function colToNum(col) {
   }
   return num;
 }
+
+// Inverse of colToNum: 1 → "A", 26 → "Z", 27 → "AA". Used by numericsForRow to
+// reconstruct cell addresses when probing a row's columns.
+function numToCol(num) {
+  let col = '';
+  while (num > 0) {
+    const rem = (num - 1) % 26;
+    col = String.fromCharCode(65 + rem) + col;
+    num = Math.floor((num - 1) / 26);
+  }
+  return col;
+}
diff --git a/package.json b/package.json
@@ -41,7 +41,7 @@
     "test:engine": "node pipelines/rust/tests/test-engine-runtime.mjs",
     "test:depgraph": "node pipelines/rust/tests/test-dependency-graph.mjs",
     "test:slimming": "node tests/cli/test-artifact-slimming.mjs",
-    "test": "node tests/cli/test-cli.mjs && node tests/cli/test-manifest-improvements.mjs && node tests/cli/test-manifest-maps.mjs && node tests/cli/test-ai-interface.mjs && node tests/cli/test-e2e4-fixes.mjs && node tests/cli/test-ship-ready.mjs && node tests/cli/use-case-suite.mjs"
+    "test": "node tests/cli/test-cli.mjs && node tests/cli/test-manifest-improvements.mjs && node tests/cli/test-manifest-maps.mjs && node tests/cli/test-refine-label-index.mjs && node tests/cli/test-ai-interface.mjs && node tests/cli/test-e2e4-fixes.mjs && node tests/cli/test-ship-ready.mjs && node tests/cli/use-case-suite.mjs"
   },
   "devDependencies": {}
 }
diff --git a/skill/SKILL.md b/skill/SKILL.md
@@ -108,6 +108,12 @@ Silently falls through to a normal parse if `chunked/_ground-truth.json` is
 missing — safe to default on when iterating. Turns the tighten-the-manifest
 loop from minutes to seconds.
 
+The refine step inside that loop is also faster on big models: it reads labels
+from the parser's `chunked/_labels.json` and probes only the matched rows for
+values, instead of indexing every cell (it used to scan the whole ground truth,
+including giant unlabeled grids it never consults). Transparent — same command,
+same result.
+
 **Default output is slim.** `ete init` drops the large debug/intermediate
 artifacts (`dependency-graph.json`, `_graph.json`, root `model-map.json`) once
 the dependency closures are baked into `named-outputs.json` / `named-inputs.json`.