From 0ee239ba35921745331a08c9cb232e249375fddd Mon Sep 17 00:00:00 2001 From: Shannon Holland Date: Thu, 7 May 2026 14:48:02 -0700 Subject: [PATCH 1/3] perf(kmeans): replace plain-Swift argmax with vDSP_maxvi (issue #97) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The k-means argmax inner loop in assignAll() ran 5000×1131 = 5.655M bounds-checked iterations per call (7 calls per flush = 33.93M total). In debug mode Swift disables vectorisation → median 6.515s elapsed, p95 9.611s — essentially no headroom against the 10s budget. Replace the nested for-row/for-c loop with vDSP_maxvi (Accelerate). vDSP_maxvi lives in a pre-built dylib and is always SIMD-compiled; debug mode pays the same ~15ms as release. Post-fix all 11 debug runs pass well within the 10s limit. Changes: - KMeans.swift: replace argmax loop (lines 182-194) with vDSP_maxvi per-row; identical first-occurrence tie-break semantics confirmed by the deterministicBuilds test passing unchanged. - KMeansTests.swift: add assignAllPicksMaximum regression test — 3×3 hand-verifiable score matrix, expected assignments [2, 0, 1]. - IndexerTests.swift: correct performanceSmoke comment attribution ("residual encode path" → "argmax loop in k-means"); add pre/post measurement evidence block (11 runs each, M3 Max, debug mode): pre-fix median 6.515s p95 9.611s → post-fix all 11 passed <10s. Closes #97 Co-Authored-By: Claude Sonnet 4.6 --- Sources/SwitchcraftCore/KMeans/KMeans.swift | 23 ++++++++++--------- Tests/SwitchcraftTests/IndexerTests.swift | 25 ++++++++++++++++----- Tests/SwitchcraftTests/KMeansTests.swift | 24 ++++++++++++++++++++ 3 files changed, 56 insertions(+), 16 deletions(-) diff --git a/Sources/SwitchcraftCore/KMeans/KMeans.swift b/Sources/SwitchcraftCore/KMeans/KMeans.swift index 404cc45..7ef7ba6 100644 --- a/Sources/SwitchcraftCore/KMeans/KMeans.swift +++ b/Sources/SwitchcraftCore/KMeans/KMeans.swift @@ -179,18 +179,19 @@ public enum KMeans { } } } - for row in 0.. bestScore { - bestScore = s - bestCluster = c - } + // Use vDSP_maxvi for per-row argmax: Accelerate's implementation is + // always SIMD-compiled (pre-built dylib), so debug mode pays the same + // cost as release. The naive Swift loop above was ~5s in debug mode + // for m=5000, k=1131 — leaving essentially no headroom against the + // 10s budget. vDSP_maxvi reduces this to ~15ms in debug mode. + scores.withUnsafeBufferPointer { scoresPtr in + let base = scoresPtr.baseAddress! + for row in 0.. Date: Thu, 7 May 2026 15:00:25 -0700 Subject: [PATCH 2/3] fix(kmeans): add guard k > 0 in assignAll() to prevent baseAddress! crash MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The vDSP_maxvi path force-unwraps scoresPtr.baseAddress! — safe today because both public callers precondition k > 0, but the private helper had no local guard. A future caller with k == 0 would produce an empty scores array whose baseAddress is nil, crashing here. Adding guard k > 0 mirrors the existing guard m > 0 and makes the function self-contained. Co-Authored-By: Claude Sonnet 4.6 --- Sources/SwitchcraftCore/KMeans/KMeans.swift | 1 + 1 file changed, 1 insertion(+) diff --git a/Sources/SwitchcraftCore/KMeans/KMeans.swift b/Sources/SwitchcraftCore/KMeans/KMeans.swift index 7ef7ba6..8f62d47 100644 --- a/Sources/SwitchcraftCore/KMeans/KMeans.swift +++ b/Sources/SwitchcraftCore/KMeans/KMeans.swift @@ -159,6 +159,7 @@ public enum KMeans { ) { let m = data.count / dims guard m > 0 else { return } + guard k > 0 else { return } precondition( m <= Int(Int32.max) && k <= Int(Int32.max) && dims <= Int(Int32.max), "k-means dimensions (m=\(m), k=\(k), dims=\(dims)) must fit in Int32 for cblas_sgemm" From d431c72e3514e17cb25f64e426a1b92811462116 Mon Sep 17 00:00:00 2001 From: Shannon Holland Date: Thu, 7 May 2026 15:00:30 -0700 Subject: [PATCH 3/3] test(kmeans): add tie-break regression test for vDSP_maxvi first-occurrence semantics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit vDSP_maxvi documents returning the first maximum occurrence (lowest index). The prior plain-Swift left-to-right argmax had the same tie-break. This test locks in that behavior explicitly: data=[1,0] with centroids c0=[1,0], c1=[0,1], c2=[1,0] gives a tie between c0 and c2 (both dot=1.0) — the expected assignment is 0 (first occurrence wins). Co-Authored-By: Claude Sonnet 4.6 --- Tests/SwitchcraftTests/KMeansTests.swift | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/Tests/SwitchcraftTests/KMeansTests.swift b/Tests/SwitchcraftTests/KMeansTests.swift index 471da12..939248b 100644 --- a/Tests/SwitchcraftTests/KMeansTests.swift +++ b/Tests/SwitchcraftTests/KMeansTests.swift @@ -167,6 +167,27 @@ struct KMeansTests { #expect(assignments == [2, 0, 1]) } + /// Verifies vDSP_maxvi tie-break semantics: when two centroids produce + /// equal dot products, the lower-index centroid wins (first-occurrence). + /// This locks in deterministic behavior across Accelerate/OS versions and + /// matches the prior plain-Swift left-to-right argmax loop. + @Test("assign() picks the first maximum-score centroid on a tie (vDSP_maxvi tie-break)") + func assignTieBreakPicksFirst() { + // dims=2, m=1 row, k=3 centroids. + // data=[1,0]; centroids c0=[1,0], c1=[0,1], c2=[1,0]. + // dot products: c0→1.0, c1→0.0, c2→1.0. + // Tie between c0 and c2 — first occurrence (index 0) must win. + let dims = 2 + let data: [Float] = [1.0, 0.0] + let centroids: [Float] = [ + 1.0, 0.0, // centroid 0 — tied for max + 0.0, 1.0, // centroid 1 — lower score + 1.0, 0.0, // centroid 2 — tied for max (but higher index) + ] + let assignments = KMeans.assign(data: data, dims: dims, centroids: centroids) + #expect(assignments == [0], "tie should resolve to the first maximum (index 0), got \(assignments)") + } + @Test("respects maxIterations = 1") func minimalIterations() { var rng = SplitMix64(seed: 17)