P0: Investigate LSM Indexer performanceSmoke failure — 36.5s vs 10s debug limit on Validate run

## Summary

`IndexerTests.performanceSmoke` — "indexing 5000 single-token 128-dim chunks completes within 5s" — took **36.5s vs the 10s debug-mode limit** (3.6× overshoot) on the Validate run for PR #94. The test is at `Tests/SwitchcraftTests/IndexerTests.swift:393-429` and is **not** currently disabled on CI. PR #94's Validate stage dismissed the failure as host load and verdicted "READY TO MERGE" anyway — a policy violation. This issue investigates the root cause and fixes it without relaxing the test through policy-violating means.

This is P0: it (and its sibling issues) blocks all merges until resolved.

## Requirements

- Root cause of the 36.5s observation must be identified and documented. Exactly one of the following will be true:
  - **(a) Real performance regression.** A recent commit slowed the indexer hot path (Q4 dequant, residual encode, bucket flush, etc.) by 3–7×. Fix the regression; do not relax the test.
  - **(b) Host-load / debug-budget mismatch.** The debug path is legitimately 10–20× slower than release (as the test comment acknowledges), and the 10s budget leaves no headroom. Raise the debug budget only with measurement evidence (median of 11+ runs on a quiet machine) recorded in the test comment, and only if the new budget still distinguishes a 3–5× perf regression from normal variance.
  - **(c) CI runner regression.** macOS-15 runners have changed. Compare CI runtime against a quiet local workstation. If CI is the outlier, move the test to release-only configuration with a tighter budget — never via `.disabled(if: CI)`.
- The test must pass ≥10 consecutive runs on CI in both debug and release after the fix.
- No skip mechanism may be added: no `.disabled(if: CI)`, `XCTSkipIf(...)`, `XCTSkipUnless(...)`, `#if !CI`, `@Test(.disabled(...))`, or equivalent.
- No assertion may be deleted, weakened to a tautology, or made a no-op.
- Any budget change must be justified by measurement evidence (median of 11+ runs on a quiet machine) recorded in a comment next to the new value.
- **No new failing tests may be introduced anywhere** — not in debug, not in release, not intermittently. "Flaky" counts as failing under project policy.
- **No previously-passing tests may have skip annotations added** as a side effect of this fix.
- The full test suite after the fix must be at least as healthy as `main` was before this PR (i.e., one fewer failing test — the target — and no new failures elsewhere).

## Scope

**In scope:**
- Investigate `IndexerTests.performanceSmoke` at `Tests/SwitchcraftTests/IndexerTests.swift:393-429`.
- Bisect git history (from the last green CI run on `main`) if a regression is suspected.
- Run timing measurements (≥11 runs, quiet machine, both debug and release) to characterize the actual runtime distribution.
- Fix the underlying cause — regression fix or evidence-backed budget adjustment.
- Verify the full suite is green (no net regression) after the fix.

**Out of scope:**
- Fixing other unrelated test failures discovered during investigation (file separate issues; treat them as P0 incidents per project policy).
- Skipping or disabling the performance test on CI for any reason.
- Raising the budget without measurement evidence.
- Deleting the test (only permissible if the Research stage produces an ADR demonstrating performance assertions of this kind cannot be measured reliably in this project's testing environment).

## Prior Art / Context

- `Tests/SwitchcraftTests/IndexerTests.swift:393-429` — the failing test. Budget: `< 5.0s` release, `< 10.0s` debug (doubled per test comment to account for debug-mode loop overhead).
- Test uses `Indexer(storage: storage, config: .production)` with `InMemoryStorage`, indexing 5000 random rows of 128-dim Q4-quantized embeddings.
- Observed: 36.5s on Validate stage local run (2026-05-07).
- PR #94 Validate report: incorrectly dismissed as host-load and verdicted READY TO MERGE — a policy violation under `.claude/CLAUDE.md`.
- Issue #93: prior discussion of why `.disabled(if: CI)` and "skip flaky on CI" are not acceptable resolutions on this project.
- Sibling WAL skip issues: resolved differently (those were pre-approved skips for known flaky concurrency microbench); this test has no such approval.

## Risks / Dependencies

- If the root cause is a real regression, git bisect may reveal the offending commit was already merged to `main` — the fix may need a clean revert or targeted patch.
- If the root cause is debug-budget too tight, the evidence requirement (11+ runs, quiet machine) means measurement takes real wall-clock time before any code change can be made.
- If CI runner performance has degraded, the fix path (release-only budget or environment-specific budget) must be designed so a 3–5× real regression is still detectable.
- Any code change touching the indexer hot path requires regression tests per project policy (storage, index, codec files all have this requirement).

## Acceptance Criteria

- [ ] Root cause identified and documented (real regression / host-load budget / runner change).
- [ ] If real regression: fix applied and test passes within the original budget.
- [ ] If budget adjustment: new budget set with measurement evidence (median of 11+ runs) recorded in the test comment, and the budget still distinguishes a 3–5× regression from normal variance.
- [ ] Test passes ≥10 consecutive runs on CI in both debug and release.
- [ ] No `.disabled(if: CI)`, `XCTSkip`, `#if !CI`, or equivalent skip mechanism added anywhere.
- [ ] No previously-passing test newly fails or is newly skipped as a side effect.
- [ ] Full `swift test` (debug) and `swift test -c release` suite is at least as healthy as `main` was before this PR, with exactly one fewer failure (the target test).
- [ ] CI is green on both jobs (`swift test (macOS)` and `swift test -c release (macOS)`) on the PR.

## Engineering Policy Reminder

Per `.claude/CLAUDE.md`: failing tests never merge. The Validate stage on PR #94 violated this by verdicting "READY TO MERGE" with this test red. Do not repeat that pattern on this issue's PR.

A fix that makes the target test pass by introducing fragility elsewhere is not a fix — it is moving the bug. Stop and escalate to a human comment instead of merging in that case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P0: Investigate LSM Indexer performanceSmoke failure — 36.5s vs 10s debug limit on Validate run #97

Summary

Requirements

Scope

Prior Art / Context

Risks / Dependencies

Acceptance Criteria

Engineering Policy Reminder

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

P0: Investigate LSM Indexer performanceSmoke failure — 36.5s vs 10s debug limit on Validate run #97

Description

Summary

Requirements

Scope

Prior Art / Context

Risks / Dependencies

Acceptance Criteria

Engineering Policy Reminder

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions