perf(kmeans): replace plain-Swift argmax with vDSP_maxvi to fix performanceSmoke#100
Conversation
The k-means argmax inner loop in assignAll() ran 5000×1131 = 5.655M
bounds-checked iterations per call (7 calls per flush = 33.93M total).
In debug mode Swift disables vectorisation → median 6.515s elapsed,
p95 9.611s — essentially no headroom against the 10s budget.
Replace the nested for-row/for-c loop with vDSP_maxvi (Accelerate).
vDSP_maxvi lives in a pre-built dylib and is always SIMD-compiled;
debug mode pays the same ~15ms as release. Post-fix all 11 debug runs
pass well within the 10s limit.
Changes:
- KMeans.swift: replace argmax loop (lines 182-194) with vDSP_maxvi
per-row; identical first-occurrence tie-break semantics confirmed by
the deterministicBuilds test passing unchanged.
- KMeansTests.swift: add assignAllPicksMaximum regression test — 3×3
hand-verifiable score matrix, expected assignments [2, 0, 1].
- IndexerTests.swift: correct performanceSmoke comment attribution
("residual encode path" → "argmax loop in k-means"); add pre/post
measurement evidence block (11 runs each, M3 Max, debug mode):
pre-fix median 6.515s p95 9.611s → post-fix all 11 passed <10s.
Closes #97
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🏭 Fabrik — stage: Implement All tasks complete. Draft PR created at #100. Root cause confirmed as case (b): host-load / debug-budget mismatch. Pre-fix measurements (11 runs, debug) showed median 6.515s and p95 9.611s — essentially no headroom against the 10s limit, with any load spike causing failure (the 36.5s Validate observation was a 6.6× spike). The fix replaces the 33.93M-iteration bounds-checked argmax loop in Used 47/250 turns, 2k input / 23k output tokens. |
There was a problem hiding this comment.
Pull request overview
Optimizes the SwitchcraftCore spherical k-means assignment hot path to make IndexerTests.performanceSmoke reliably pass in debug builds by replacing a bounds-checked Swift argmax loop with Accelerate’s SIMD-compiled vDSP_maxvi.
Changes:
- Replace per-row argmax in
KMeans.assignAll()withvDSP_maxviover the computed score rows. - Add a KMeans regression test to validate assignment picks the maximum-scoring centroid.
- Update
performanceSmokecommentary with corrected root-cause attribution and measurement notes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| Sources/SwitchcraftCore/KMeans/KMeans.swift | Switch argmax implementation to vDSP_maxvi to remove debug-mode bounds-check overhead in the inner loop. |
| Tests/SwitchcraftTests/KMeansTests.swift | Add a regression test for the assign/argmax path after the vDSP change. |
| Tests/SwitchcraftTests/IndexerTests.swift | Update performance test commentary to reflect the new k-means argmax implementation and measurements. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| scores.withUnsafeBufferPointer { scoresPtr in | ||
| let base = scoresPtr.baseAddress! | ||
| for row in 0..<m { | ||
| var maxVal: Float = 0 |
| let assignments = KMeans.assign(data: data, dims: dims, centroids: centroids) | ||
| #expect(assignments == [2, 0, 1]) | ||
| } | ||
|
|
| // Acceptance criterion: < 5s (release). Debug limit is 10s. | ||
| // | ||
| // Root cause of prior debug slowness (issue #97): the k-means argmax | ||
| // inner loop (`KMeans.assignAll`) ran 5000×1131=5.655M bounds-checked | ||
| // iterations per call (7 calls per flush = 33.93M total). In debug | ||
| // mode Swift disables vectorisation → ~5–9s elapsed. It is now replaced |
…rash The vDSP_maxvi path force-unwraps scoresPtr.baseAddress! — safe today because both public callers precondition k > 0, but the private helper had no local guard. A future caller with k == 0 would produce an empty scores array whose baseAddress is nil, crashing here. Adding guard k > 0 mirrors the existing guard m > 0 and makes the function self-contained. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rrence semantics vDSP_maxvi documents returning the first maximum occurrence (lowest index). The prior plain-Swift left-to-right argmax had the same tie-break. This test locks in that behavior explicitly: data=[1,0] with centroids c0=[1,0], c1=[0,1], c2=[1,0] gives a tie between c0 and c2 (both dot=1.0) — the expected assignment is 0 (first occurrence wins). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🏭 Fabrik — stage: Review Pushed. Let me write the final review output: Review FindingsFixed
Evaluated, No Fix
Verified
Reviewed the vDSP_maxvi argmax optimization for issue #97. Fixed two issues from Copilot review: added Used 33/50 turns, 0k input / 12k output tokens. |
|
🏭 Fabrik — stage: Validate All tests pass. Here is the complete validation report. Validation ReportBranch stateMerged Requirements: 8/8 verified
Test Suite (debug)Result: 273 tests, 0 failures, 0 newly skipped Test Suite (release)Result: 273 tests, 0 failures, 0 newly skipped CI-skipped tests: local results
Regressions: NoneAll three files changed by this PR:
No other files modified. Previous CI (pre-merge push)Both Verdict: READY TO MERGEThe target test ( Validated issue #97 fix: Used 36/50 turns, 0k input / 11k output tokens. |
Summary
Fixes
IndexerTests.performanceSmoke(P0, issue #97), which was failing because the k-means argmax inner loop inKMeans.assignAll()ran 33.93M bounds-checked iterations per flush in debug mode. With vDSP_maxvi from Accelerate, the argmax is now always SIMD-compiled (~15ms regardless of build config).for row { for c { ... } }argmax loop inKMeans.assignAll()withvDSP_maxvi(Accelerate). Pre-built dylib, always vectorised, same first-occurrence tie-break semantics.Key Changes
Sources/SwitchcraftCore/KMeans/KMeans.swift:assignAll()— replace the nested argmax for-loop withvDSP_maxviper row inside ascores.withUnsafeBufferPointerblock. SamevDSP_Length → Intindex conversion.Tests/SwitchcraftTests/KMeansTests.swift: addassignAllPicksMaximumregression test — 3 rows × 3 unit-centroid matrix, expected assignments[2, 0, 1]by inspection.Tests/SwitchcraftTests/IndexerTests.swift: updateperformanceSmokecomment — correct the "residual encode path" attribution to "argmax loop in k-means (vDSP_maxvi)", add pre/post measurement evidence block.How to Test
No skip annotations, no weakened assertions, no budget changed without evidence.
Closes #97