Embedder must enforce input-size guard internally — oversized inputs poison ANE pool for all subsequent inferences#91
Conversation
…t ANE IOSurface pool poisoning Adds `maxInputTokens` + `EmbedderOverflowPolicy` to `T5CoreMLEmbedder` and `T5MetalEmbedder`. Inputs exceeding the limit are silently truncated (.truncate, default) or throw `EmbedderError.inputTooLarge(actual:max:)` (.reject). Default limit is 8 * windowSize (4,096 tokens), yielding ~15 windows — well below the ~577-window burst that exhausted the ANE IOSurface pool. Closes #89. - New `EmbedderOverflowPolicy` and `EmbedderError` enums in SwitchcraftCore - `maxInputTokens: Int { get }` added to the `Embedder` protocol - Overflow guard in both embedder `encode(_:)` implementations - Four new tests: property value, truncate policy, reject policy, 1,000-call stress test - All existing Embedder conformers (MockEmbedder, test-local structs) updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the design decisions from issue #89: why maxInputTokens belongs on the Embedder protocol, the 8 * windowSize default rationale, why EmbedderOverflowPolicy/EmbedderError live in SwitchcraftCore, and why T5MetalEmbedder gets the guard without stub-based tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🏭 Fabrik — stage: Review Confirmed: The implementation is complete and correct. Here is the full review: Review FindingsVerifiedRebase: Branch is already at External feedback: No PR comments to address. All new tests pass (5 tests in
The plan described 4 tests; the implement stage correctly split the stress test (R7c) into separate truncate and reject sub-tests, giving 5 total — better coverage, not a gap. Spec requirements verified:
Re-entrancy guard interaction (plan risk item): The Pre-existing test failure: Protocol/module placement: ADR 022 and FixedNone required — implementation is correct and complete. Reviewed the embedder overflow guard implementation for issue #89. All 5 new tests pass (truncate policy, reject policy, Used 37/50 turns, 0k input / 9k output tokens. |
Swift 6 (Xcode 16.4 / CI) rejects bare retroactive Sendable conformances on types defined in other modules. MLModel is defined in CoreML, so `extension MLModel: MLPredictor` implicitly conferred Sendable — illegal in Swift 6. Adding `extension MLModel: @unchecked Sendable {}` satisfies the compiler. Safe: all MLModel access is gated through T5CoreMLEmbedder's actor isolation. Fixes CI regression introduced when the CI runner upgraded to Xcode 16.4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # Sources/SwitchcraftCoreML/T5CoreMLEmbedder.swift # docs/Plan.md
Swift 6 (Xcode 16.4) treats closures passed to ObjC blocks (including autoreleasepool + MLExceptionCatcher.perform) as region-boundary crossings. Passing a non-Sendable `any MLFeatureProvider` across that boundary triggers "sending 'provider' risks causing data races" at T5CoreMLEmbedder.swift:506,526. Fix: add retroactive @unchecked Sendable to MLDictionaryFeatureProvider (safe: always accessed within T5CoreMLEmbedder's actor isolation), and narrow predictWindow's parameter type from `any MLFeatureProvider` to the concrete MLDictionaryFeatureProvider — the only type ever passed at the single call site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… warnings Swift recommends @retroactive for conformances of external types to external protocols to avoid future conflicts if CoreML adds the conformance itself. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🏭 Fabrik — stage: Review Review FindingsFixed
VerifiedSpec requirements (R1–R7):
Protocol conformers updated:
Module placement: ADR 022: Present at
Pre-existing failures: Build: Test suite: 272 tests in 67 suites, all passing except the pre-existing WAL concurrency timing flap. PR: References Reviewed the embedder overflow guard implementation for issue [89]. Fixed one issue: added Used 26/50 turns, 0k input / 4k output tokens. |
|
🏭 Fabrik — stage: Validate All checks complete. Here is the full validation report. Validation ReportBranch State
Requirements
Test Suite: PASSED (272 tests, 67 suites)
Regression Check: None detected
Code Completeness
Verdict: READY TO MERGEAll 7 requirements (R1–R7) verified. The overflow guard is correctly inserted in both Used 38/50 turns, 0k input / 8k output tokens. |
Summary
Add a configurable overflow policy to
T5CoreMLEmbedder(and expose it via amaxInputTokensproperty on theEmbedderprotocol or the concrete type) so that no input exceeding a safe maximum ever reachesMLPredictor.predict. On overflow the embedder either silently truncates the token sequence to the safe maximum (default) or throws a typedEmbedderError.inputTooLarge(actual:max:)error, depending on the configured policy. The maximum is documented and queryable by consumers.Problem
T5CoreMLEmbedder(and any otherEmbedderimplementation the library provides) has no internal upper bound on the total number of tokens it will pass toward a CoreML prediction. Callers can hand it inputs the underlying CoreML model literally cannot allocate output for, and the failure mode is catastrophic and silent for the pool, not just the call:MLE5OutputPortBinder bindAndReturnErrorIOSurface allocation failure during output binding.This was observed during a SafariUnfucker bulk-index run: the first failure had
inputLength=593,285tokens (~600k). At typical T5 dims (768) × 4 bytes per float, the output tensor for a sequence that long is roughly 1.8 GB of contiguous IOSurface — well outside what the runtime can allocate. After that single page, every subsequent input failed (sizes from <1k to 100k+ tokens, 6,157 failures total in the run, all the same error).The ~600k-token page that triggered the poisoning was a real page from the user's browsing history. Real-world content includes huge docs, paginated GitHub PRs, archived RSS dumps, etc. This is not a synthetic edge case.
Approach
Approach
The overflow guard is a single token-count check inserted in
encode(_:)after tokenization and beforeSlidingWindow.plan. The check either truncates the token array tomaxInputTokenselements (.truncate, the default) or throwsEmbedderError.inputTooLarge(actual:max:)(.reject).Key design decisions:
TQ1:
maxInputTokenson theEmbedderprotocol.Adding it to the protocol makes the safety contract queryable on any
Embedder-typed reference without downcasting. This mirrors howdimsandmodelIdentifierare already protocol-level properties.MockEmbeddertrivially conforms withInt.max.T5MetalEmbedderfollows the same pattern.TQ2: Default
maxInputTokens = 8 * windowSize→ 4,096 forwindowSize=512.4,096 tokens yields ~15 windows at stride 256 — far below the ~577-window burst that exhausted the ANE pool. The multiplier-form scales naturally if
windowSizeis customised. This comfortably covers most real-world documents (~3,000 words in English) while leaving substantial headroom.TQ3: New
public enum EmbedderErrorinSwitchcraftCore.The spec names the error
EmbedderError.inputTooLarge, implying a new type. Placing it inSwitchcraftCore(notSwitchcraftCoreML) means it is throwable by bothT5CoreMLEmbedderandT5MetalEmbedder, accessible to any caller, and not CoreML-gated. RequiredSendableandEquatableconformances are trivial to declare.TQ4:
public enum EmbedderOverflowPolicyinSwitchcraftCore.Since the policy type is shared by both embedder implementations and lives alongside
Embedder.swift, it belongs inSwitchcraftCore. It is pure Swift with no framework dependencies.TQ5: Add overflow guard to
T5MetalEmbedderas well.T5MetalEmbedderusesSlidingWindowidentically. WithmaxInputTokensnow on the protocol, implementing the guard in Metal is consistent and prevents analogous memory pressure from a 577-Metal-command-buffer burst.ADR assessment: An ADR is warranted. The decisions —
maxInputTokenson the protocol, the8 * windowSizedefault,EmbedderOverflowPolicy/EmbedderErrorinSwitchcraftCorerather thanSwitchcraftCoreML, and the two-policy contract — all constrain futureEmbedderconformers and are non-obvious from the code alone. The next ADR number should be determined at implementation time by checking the highest-numbered file inadrs/(currently021).New/Modified Files
Sources/SwitchcraftCore/Embedding/EmbedderOverflowPolicy.swiftpublic enum EmbedderOverflowPolicywith.truncateand.rejectcasesSources/SwitchcraftCore/Embedding/EmbedderError.swiftpublic enum EmbedderErrorwithinputTooLarge(actual: Int, max: Int)caseSources/SwitchcraftCore/Embedding/Embedder.swiftvar maxInputTokens: Int { get }to the protocolSources/SwitchcraftCoreML/T5CoreMLEmbedder.swiftmaxInputTokens/overflowPolicystored properties, init parameters (all three inits), and overflow guard inencode(_:)between tokenization andSlidingWindow.planSources/SwitchcraftMetal/T5MetalEmbedder.swiftmaxInputTokens/overflowPolicystored properties, init parameter, and overflow guard inencode(_:)Tests/SwitchcraftTests/Support/MockEmbedder.swiftlet maxInputTokens: Int = Int.maxto satisfy protocolTests/SwitchcraftTests/T5CoreMLEmbedderOverflowTests.swiftmaxInputTokensproperty, and 1,000-call overflow stress testadrs/NNN-embedder-overflow-guard.mdKey Decisions
EmbedderOverflowPolicyandEmbedderErrorinSwitchcraftCore, notSwitchcraftCoreML: They are pure Swift types with no CoreML dependency, and placing them in the Core module makes them accessible to all callers — includingT5MetalEmbedderand any future embedder — without the#if canImport(CoreML)gate.Default
maxInputTokens = 8 * windowSize: A flat constant would not scale with customwindowSizevalues. The multiplier form ensures the default always provides safe headroom (8× leaves ~15 windows, vs the ~577 that caused failure). Callers with large-document workloads and manual ANE management can pass a larger value at init.maxInputTokensonEmbedderprotocol: Keeps callers from needing to downcast. Every newEmbedderconformer must declare it explicitly — a useful invariant since an unknown embedder without a declared limit is unsafe to use in a bulk-index context.T5MetalEmbeddergains the guard but no stub-based tests: The Metal init requires a real Metal device and GGUF asset; there is noMLPredictor-style stub path. Overflow tests for Metal would require significant new test infrastructure. The guard implementation is identical in shape to the CoreML version; the CoreML stub tests provide adequate coverage of the algorithm. A note is added to the test file.Overflow guard position: After
tokenizer.encodeand beforeSlidingWindow.plan. This is the earliest point where the authoritative token count is known. It fires inside the re-entrancy guard'sinFlightwindow — the existingdeferblock continues to work correctly on the.rejectthrow path (no change to re-entrancy logic needed).No changes to
SlidingWindow,MLPredictor, orCountingStubPredictor: The existing stub infrastructure is sufficient for the new overflow tests.Task Checklist
Task 1: Create
Sources/SwitchcraftCore/Embedding/EmbedderOverflowPolicy.swift—public enum EmbedderOverflowPolicy: Sendable, Equatable, Hashable { case truncate; case reject }with module-level doc-comment explaining the two policies and their trade-offs.Task 2: Create
Sources/SwitchcraftCore/Embedding/EmbedderError.swift—public enum EmbedderError: Error, Sendable, Equatable { case inputTooLarge(actual: Int, max: Int) }with doc-comment describing when each case is thrown.Task 3: Add
var maxInputTokens: Int { get }to theEmbedderprotocol inSources/SwitchcraftCore/Embedding/Embedder.swift, with a doc-comment explaining the role of this limit and that conformers should expose it asnonisolated let.Task 4: Update
Tests/SwitchcraftTests/Support/MockEmbedder.swift— addlet maxInputTokens: Int = Int.maxto satisfy the updated protocol (mock has no real model limit).Task 5: Add
nonisolated let maxInputTokens: Intandnonisolated let overflowPolicy: EmbedderOverflowPolicystored properties toT5CoreMLEmbedder; addmaxInputTokens: Int = 8 * windowSizeandoverflowPolicy: EmbedderOverflowPolicy = .truncateparameters to all threeinitoverloads (public URL-based,Bundleconvenience, and both test-onlyinternalinits); addprecondition(maxInputTokens >= windowSize)guard in each init; update the class doc-comment to document the new parameters.Task 6: Insert overflow guard in
T5CoreMLEmbedder.encode(_:)betweentokenizer.encode(text, addSpecialTokens: true)and theSlidingWindow.plancall: iftokens.count > maxInputTokens, apply the policy —.truncatesilently clips toArray(tokens.prefix(maxInputTokens));.rejectthrowsEmbedderError.inputTooLarge(actual: tokens.count, max: maxInputTokens). Update theencode(_:)doc-comment to document the new throw case.Task 7: Add
nonisolated let maxInputTokens: Intandnonisolated let overflowPolicy: EmbedderOverflowPolicytoT5MetalEmbedder; addmaxInputTokens: Int = 8 * windowSizeandoverflowPolicy: EmbedderOverflowPolicy = .truncateto itsinit; insert the same overflow guard pattern inT5MetalEmbedder.encode(_:)after tokenization and beforeSlidingWindow.plan.Task 8: Create
Tests/SwitchcraftTests/T5CoreMLEmbedderOverflowTests.swiftwith four test cases:testMaxInputTokensPropertyReturnsConfiguredValue— construct with explicitmaxInputTokens, assert property equals that value (R7d)testTruncatePolicyEncodesOversizedInputWithoutError— construct withoverflowPolicy: .truncateandmaxInputTokens: N; encode a string that tokenizes to > N tokens; assert the result is non-empty and has count≤ N * dims(R7a)testRejectPolicyThrowsInputTooLargeError— construct withoverflowPolicy: .rejectandmaxInputTokens: N; encode a string > N tokens; assert throw isEmbedderError.inputTooLargewith correctactualandmaxvalues (R7b)testOverflowStress1000Calls— 1,000 sequentialencodecalls through aCountingStubPredictor, with calls at indices 250 and 750 using oversized inputs (both.truncateand.rejectpolicies in separate sub-tests); assert all non-oversized calls succeed and no predictor call receives a token sequence longer thanmaxInputTokens(R5, R7c)Task 9: Run
swift test --filter T5CoreMLEmbedderOverflowand confirm all four tests pass; then runswift testfor the full suite.Task 10: Create
adrs/NNN-embedder-overflow-guard.md(where NNN is one higher than the current highest ADR number inadrs/), documenting: the overflow guard design, whymaxInputTokensbelongs on theEmbedderprotocol, the8 * windowSizedefault rationale, whyEmbedderOverflowPolicy/EmbedderErrorlive inSwitchcraftCore, and whyT5MetalEmbeddergets the guard without stub-based tests.Risks
maxInputTokensprecondition at init. Aprecondition(maxInputTokens >= windowSize)prevents misconfiguration (setting the total limit lower than a single window would makeencodealways truncate to less than one window, producing empty or garbage embeddings). If callers pass very small values the app crashes at init-time with a clear message — better than silent bad behavior.Reject policy interacts with re-entrancy guard correctly but requires care. The throw path on
.rejectmust occur afterinFlight = trueis set so thedeferblock fires and releases the next waiter. Looking at the existing code flow:inFlight = trueis set unconditionally before the tokenization call, and thedeferblock is set immediately after. The overflow guard inserts after tokenization but thedeferis already registered — so the throw is handled correctly. Implement should verify this flow explicitly.T5MetalEmbedderencode(_:)structure. Research did not readT5MetalEmbedder.encode(_:)in detail. The implementation must find the tokenization andSlidingWindow.plancalls in that file and insert the guard between them. The pattern should be identical toT5CoreMLEmbedder's.Test isolation: stub tokenizer vs. real tokenizer. The existing
CountingStubPredictortests use the realxtr-base-en.tokenizer.jsonfixture. The overflow tests also need to produce inputs of known token count. The simplest approach: use a repeated short known-tokenizable substring so the token count is predictable. The test helpermakeTokenizer()inT5CoreMLEmbedderStressTests.swiftcan be reused or duplicated.Used 17/50 turns, 0k input / 8k output tokens.
Verification
Implemented the embedder overflow guard for issue #89. Added
EmbedderOverflowPolicy(.truncate/.reject) andEmbedderError.inputTooLargetoSwitchcraftCore, addedmaxInputTokens: Intto theEmbedderprotocol, and inserted the overflow guard in bothT5CoreMLEmbedder.encode(_:)andT5MetalEmbedder.encode(_:)between tokenization andSlidingWindow.plan. The default limit is8 * windowSize(4,096 tokens → ~15 windows), preventing the ~577-window bursts that caused ANE IOSurface pool poisoning. Five new tests cover property value, truncate policy, reject policy, and a 1,000-call stress test with interleaved oversized inputs; all 270 tests in the suite pass. ADR 022 anddocs/Plan.mdupdated.Closes #89