Skip to content

Embedder must enforce input-size guard internally — oversized inputs poison ANE pool for all subsequent inferences#91

Merged
totalslacker merged 8 commits into
mainfrom
fabrik/issue-89
May 6, 2026
Merged

Embedder must enforce input-size guard internally — oversized inputs poison ANE pool for all subsequent inferences#91
totalslacker merged 8 commits into
mainfrom
fabrik/issue-89

Conversation

@totalslacker
Copy link
Copy Markdown
Owner

@totalslacker totalslacker commented May 6, 2026

Summary

Add a configurable overflow policy to T5CoreMLEmbedder (and expose it via a maxInputTokens property on the Embedder protocol or the concrete type) so that no input exceeding a safe maximum ever reaches MLPredictor.predict. On overflow the embedder either silently truncates the token sequence to the safe maximum (default) or throws a typed EmbedderError.inputTooLarge(actual:max:) error, depending on the configured policy. The maximum is documented and queryable by consumers.

Problem

T5CoreMLEmbedder (and any other Embedder implementation the library provides) has no internal upper bound on the total number of tokens it will pass toward a CoreML prediction. Callers can hand it inputs the underlying CoreML model literally cannot allocate output for, and the failure mode is catastrophic and silent for the pool, not just the call:

  • One oversized input throws an MLE5OutputPortBinder bindAndReturnError IOSurface allocation failure during output binding.
  • The ANE pool is left in a degraded state.
  • Every subsequent inference — including small inputs — then fails with the same allocation error, until the host process restarts.

This was observed during a SafariUnfucker bulk-index run: the first failure had inputLength=593,285 tokens (~600k). At typical T5 dims (768) × 4 bytes per float, the output tensor for a sequence that long is roughly 1.8 GB of contiguous IOSurface — well outside what the runtime can allocate. After that single page, every subsequent input failed (sizes from <1k to 100k+ tokens, 6,157 failures total in the run, all the same error).

The ~600k-token page that triggered the poisoning was a real page from the user's browsing history. Real-world content includes huge docs, paginated GitHub PRs, archived RSS dumps, etc. This is not a synthetic edge case.

Approach

Approach

The overflow guard is a single token-count check inserted in encode(_:) after tokenization and before SlidingWindow.plan. The check either truncates the token array to maxInputTokens elements (.truncate, the default) or throws EmbedderError.inputTooLarge(actual:max:) (.reject).

Key design decisions:

TQ1: maxInputTokens on the Embedder protocol.
Adding it to the protocol makes the safety contract queryable on any Embedder-typed reference without downcasting. This mirrors how dims and modelIdentifier are already protocol-level properties. MockEmbedder trivially conforms with Int.max. T5MetalEmbedder follows the same pattern.

TQ2: Default maxInputTokens = 8 * windowSize → 4,096 for windowSize=512.
4,096 tokens yields ~15 windows at stride 256 — far below the ~577-window burst that exhausted the ANE pool. The multiplier-form scales naturally if windowSize is customised. This comfortably covers most real-world documents (~3,000 words in English) while leaving substantial headroom.

TQ3: New public enum EmbedderError in SwitchcraftCore.
The spec names the error EmbedderError.inputTooLarge, implying a new type. Placing it in SwitchcraftCore (not SwitchcraftCoreML) means it is throwable by both T5CoreMLEmbedder and T5MetalEmbedder, accessible to any caller, and not CoreML-gated. Required Sendable and Equatable conformances are trivial to declare.

TQ4: public enum EmbedderOverflowPolicy in SwitchcraftCore.
Since the policy type is shared by both embedder implementations and lives alongside Embedder.swift, it belongs in SwitchcraftCore. It is pure Swift with no framework dependencies.

TQ5: Add overflow guard to T5MetalEmbedder as well.
T5MetalEmbedder uses SlidingWindow identically. With maxInputTokens now on the protocol, implementing the guard in Metal is consistent and prevents analogous memory pressure from a 577-Metal-command-buffer burst.

ADR assessment: An ADR is warranted. The decisions — maxInputTokens on the protocol, the 8 * windowSize default, EmbedderOverflowPolicy/EmbedderError in SwitchcraftCore rather than SwitchcraftCoreML, and the two-policy contract — all constrain future Embedder conformers and are non-obvious from the code alone. The next ADR number should be determined at implementation time by checking the highest-numbered file in adrs/ (currently 021).


New/Modified Files

File Change
Sources/SwitchcraftCore/Embedding/EmbedderOverflowPolicy.swift Newpublic enum EmbedderOverflowPolicy with .truncate and .reject cases
Sources/SwitchcraftCore/Embedding/EmbedderError.swift Newpublic enum EmbedderError with inputTooLarge(actual: Int, max: Int) case
Sources/SwitchcraftCore/Embedding/Embedder.swift Add var maxInputTokens: Int { get } to the protocol
Sources/SwitchcraftCoreML/T5CoreMLEmbedder.swift Add maxInputTokens/overflowPolicy stored properties, init parameters (all three inits), and overflow guard in encode(_:) between tokenization and SlidingWindow.plan
Sources/SwitchcraftMetal/T5MetalEmbedder.swift Add maxInputTokens/overflowPolicy stored properties, init parameter, and overflow guard in encode(_:)
Tests/SwitchcraftTests/Support/MockEmbedder.swift Add let maxInputTokens: Int = Int.max to satisfy protocol
Tests/SwitchcraftTests/T5CoreMLEmbedderOverflowTests.swift New — unit tests for truncate policy, reject policy, maxInputTokens property, and 1,000-call overflow stress test
adrs/NNN-embedder-overflow-guard.md New — documents overflow guard design decisions

Key Decisions

  • EmbedderOverflowPolicy and EmbedderError in SwitchcraftCore, not SwitchcraftCoreML: They are pure Swift types with no CoreML dependency, and placing them in the Core module makes them accessible to all callers — including T5MetalEmbedder and any future embedder — without the #if canImport(CoreML) gate.

  • Default maxInputTokens = 8 * windowSize: A flat constant would not scale with custom windowSize values. The multiplier form ensures the default always provides safe headroom (8× leaves ~15 windows, vs the ~577 that caused failure). Callers with large-document workloads and manual ANE management can pass a larger value at init.

  • maxInputTokens on Embedder protocol: Keeps callers from needing to downcast. Every new Embedder conformer must declare it explicitly — a useful invariant since an unknown embedder without a declared limit is unsafe to use in a bulk-index context.

  • T5MetalEmbedder gains the guard but no stub-based tests: The Metal init requires a real Metal device and GGUF asset; there is no MLPredictor-style stub path. Overflow tests for Metal would require significant new test infrastructure. The guard implementation is identical in shape to the CoreML version; the CoreML stub tests provide adequate coverage of the algorithm. A note is added to the test file.

  • Overflow guard position: After tokenizer.encode and before SlidingWindow.plan. This is the earliest point where the authoritative token count is known. It fires inside the re-entrancy guard's inFlight window — the existing defer block continues to work correctly on the .reject throw path (no change to re-entrancy logic needed).

  • No changes to SlidingWindow, MLPredictor, or CountingStubPredictor: The existing stub infrastructure is sufficient for the new overflow tests.


Task Checklist

  • Task 1: Create Sources/SwitchcraftCore/Embedding/EmbedderOverflowPolicy.swiftpublic enum EmbedderOverflowPolicy: Sendable, Equatable, Hashable { case truncate; case reject } with module-level doc-comment explaining the two policies and their trade-offs.

  • Task 2: Create Sources/SwitchcraftCore/Embedding/EmbedderError.swiftpublic enum EmbedderError: Error, Sendable, Equatable { case inputTooLarge(actual: Int, max: Int) } with doc-comment describing when each case is thrown.

  • Task 3: Add var maxInputTokens: Int { get } to the Embedder protocol in Sources/SwitchcraftCore/Embedding/Embedder.swift, with a doc-comment explaining the role of this limit and that conformers should expose it as nonisolated let.

  • Task 4: Update Tests/SwitchcraftTests/Support/MockEmbedder.swift — add let maxInputTokens: Int = Int.max to satisfy the updated protocol (mock has no real model limit).

  • Task 5: Add nonisolated let maxInputTokens: Int and nonisolated let overflowPolicy: EmbedderOverflowPolicy stored properties to T5CoreMLEmbedder; add maxInputTokens: Int = 8 * windowSize and overflowPolicy: EmbedderOverflowPolicy = .truncate parameters to all three init overloads (public URL-based, Bundle convenience, and both test-only internal inits); add precondition(maxInputTokens >= windowSize) guard in each init; update the class doc-comment to document the new parameters.

  • Task 6: Insert overflow guard in T5CoreMLEmbedder.encode(_:) between tokenizer.encode(text, addSpecialTokens: true) and the SlidingWindow.plan call: if tokens.count > maxInputTokens, apply the policy — .truncate silently clips to Array(tokens.prefix(maxInputTokens)); .reject throws EmbedderError.inputTooLarge(actual: tokens.count, max: maxInputTokens). Update the encode(_:) doc-comment to document the new throw case.

  • Task 7: Add nonisolated let maxInputTokens: Int and nonisolated let overflowPolicy: EmbedderOverflowPolicy to T5MetalEmbedder; add maxInputTokens: Int = 8 * windowSize and overflowPolicy: EmbedderOverflowPolicy = .truncate to its init; insert the same overflow guard pattern in T5MetalEmbedder.encode(_:) after tokenization and before SlidingWindow.plan.

  • Task 8: Create Tests/SwitchcraftTests/T5CoreMLEmbedderOverflowTests.swift with four test cases:

    • testMaxInputTokensPropertyReturnsConfiguredValue — construct with explicit maxInputTokens, assert property equals that value (R7d)
    • testTruncatePolicyEncodesOversizedInputWithoutError — construct with overflowPolicy: .truncate and maxInputTokens: N; encode a string that tokenizes to > N tokens; assert the result is non-empty and has count ≤ N * dims (R7a)
    • testRejectPolicyThrowsInputTooLargeError — construct with overflowPolicy: .reject and maxInputTokens: N; encode a string > N tokens; assert throw is EmbedderError.inputTooLarge with correct actual and max values (R7b)
    • testOverflowStress1000Calls — 1,000 sequential encode calls through a CountingStubPredictor, with calls at indices 250 and 750 using oversized inputs (both .truncate and .reject policies in separate sub-tests); assert all non-oversized calls succeed and no predictor call receives a token sequence longer than maxInputTokens (R5, R7c)
  • Task 9: Run swift test --filter T5CoreMLEmbedderOverflow and confirm all four tests pass; then run swift test for the full suite.

  • Task 10: Create adrs/NNN-embedder-overflow-guard.md (where NNN is one higher than the current highest ADR number in adrs/), documenting: the overflow guard design, why maxInputTokens belongs on the Embedder protocol, the 8 * windowSize default rationale, why EmbedderOverflowPolicy/EmbedderError live in SwitchcraftCore, and why T5MetalEmbedder gets the guard without stub-based tests.


Risks

  • maxInputTokens precondition at init. A precondition(maxInputTokens >= windowSize) prevents misconfiguration (setting the total limit lower than a single window would make encode always truncate to less than one window, producing empty or garbage embeddings). If callers pass very small values the app crashes at init-time with a clear message — better than silent bad behavior.

  • Reject policy interacts with re-entrancy guard correctly but requires care. The throw path on .reject must occur after inFlight = true is set so the defer block fires and releases the next waiter. Looking at the existing code flow: inFlight = true is set unconditionally before the tokenization call, and the defer block is set immediately after. The overflow guard inserts after tokenization but the defer is already registered — so the throw is handled correctly. Implement should verify this flow explicitly.

  • T5MetalEmbedder encode(_:) structure. Research did not read T5MetalEmbedder.encode(_:) in detail. The implementation must find the tokenization and SlidingWindow.plan calls in that file and insert the guard between them. The pattern should be identical to T5CoreMLEmbedder's.

  • Test isolation: stub tokenizer vs. real tokenizer. The existing CountingStubPredictor tests use the real xtr-base-en.tokenizer.json fixture. The overflow tests also need to produce inputs of known token count. The simplest approach: use a repeated short known-tokenizable substring so the token count is predictable. The test helper makeTokenizer() in T5CoreMLEmbedderStressTests.swift can be reused or duplicated.


Used 17/50 turns, 0k input / 8k output tokens.

Verification

Implemented the embedder overflow guard for issue #89. Added EmbedderOverflowPolicy (.truncate/.reject) and EmbedderError.inputTooLarge to SwitchcraftCore, added maxInputTokens: Int to the Embedder protocol, and inserted the overflow guard in both T5CoreMLEmbedder.encode(_:) and T5MetalEmbedder.encode(_:) between tokenization and SlidingWindow.plan. The default limit is 8 * windowSize (4,096 tokens → ~15 windows), preventing the ~577-window bursts that caused ANE IOSurface pool poisoning. Five new tests cover property value, truncate policy, reject policy, and a 1,000-call stress test with interleaved oversized inputs; all 270 tests in the suite pass. ADR 022 and docs/Plan.md updated.


Closes #89

totalslacker and others added 2 commits May 6, 2026 08:21
…t ANE IOSurface pool poisoning

Adds `maxInputTokens` + `EmbedderOverflowPolicy` to `T5CoreMLEmbedder` and `T5MetalEmbedder`.
Inputs exceeding the limit are silently truncated (.truncate, default) or throw
`EmbedderError.inputTooLarge(actual:max:)` (.reject). Default limit is 8 * windowSize
(4,096 tokens), yielding ~15 windows — well below the ~577-window burst that exhausted
the ANE IOSurface pool. Closes #89.

- New `EmbedderOverflowPolicy` and `EmbedderError` enums in SwitchcraftCore
- `maxInputTokens: Int { get }` added to the `Embedder` protocol
- Overflow guard in both embedder `encode(_:)` implementations
- Four new tests: property value, truncate policy, reject policy, 1,000-call stress test
- All existing Embedder conformers (MockEmbedder, test-local structs) updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the design decisions from issue #89: why maxInputTokens belongs on
the Embedder protocol, the 8 * windowSize default rationale, why
EmbedderOverflowPolicy/EmbedderError live in SwitchcraftCore, and why
T5MetalEmbedder gets the guard without stub-based tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@totalslacker totalslacker marked this pull request as ready for review May 6, 2026 15:23
Copilot AI review requested due to automatic review settings May 6, 2026 15:23
@totalslacker
Copy link
Copy Markdown
Owner Author

🏭 Fabrik — stage: Review
branch: fabrik/issue-89 | commit: c3857a0 | main: 39344b5 | 2026-05-06 15:28 UTC

Confirmed: SQLiteStorageConcurrencyTests.swift is not changed by this PR. The timing failure is pre-existing.

The implementation is complete and correct. Here is the full review:


Review Findings

Verified

Rebase: Branch is already at origin/main — no rebase needed, no conflict resolution required.

External feedback: No PR comments to address.

All new tests pass (5 tests in T5CoreMLEmbedder overflow guard suite):

  • "maxInputTokens property returns the configured value" (R7d) ✅
  • "Truncate policy encodes oversized input without error and returns non-empty embeddings" (R7a) ✅
  • "Reject policy throws EmbedderError.inputTooLarge with correct actual and max values" (R7b) ✅
  • "1,000 encode calls with oversized inputs interleaved complete without ANE pool degradation (truncate policy)" (R5/R7c) ✅
  • "1,000 encode calls: reject policy throws on oversized inputs but normal-sized calls always succeed" (R5/R7c) ✅

The plan described 4 tests; the implement stage correctly split the stress test (R7c) into separate truncate and reject sub-tests, giving 5 total — better coverage, not a gap.

Spec requirements verified:

  • R1: Overflow guard fires after tokenizer.encode and before SlidingWindow.plan in both T5CoreMLEmbedder.encode(_:) and T5MetalEmbedder.encode(_:)
  • R2: .truncate (prefix clip) and .reject (throw) policies implemented ✅
  • R3: EmbedderError: Error, Sendable, Equatable declared in SwitchcraftCore
  • R4: public nonisolated let maxInputTokens: Int on both embedders ✅
  • R5: Stress test — 1,000 calls with oversized inputs at indices 250 and 750 — all normal calls succeed ✅
  • R6: Default policy is .truncate in all init overloads ✅
  • R7: All four test coverage areas satisfied across 5 tests ✅

Re-entrancy guard interaction (plan risk item): The defer block registering inFlight = false is set up at line 376 of T5CoreMLEmbedder.swift, before the overflow guard fires at line 394. Throws from .reject correctly release the re-entrancy lock and resume waiting callers. ✅

Pre-existing test failure: "Concurrent read+write is faster than sum of isolated operations" in SQLiteStorage WAL Concurrency fails intermittently with a tiny timing margin (23ms vs 20ms threshold). git diff origin/main..HEAD -- Tests/SwitchcraftTests/SQLiteStorageConcurrencyTests.swift produces no output — this file is not modified by this PR. The failure is pre-existing, timing-dependent, and unrelated to the overflow guard changes.

Protocol/module placement: EmbedderOverflowPolicy and EmbedderError in SwitchcraftCore (not CoreML-gated), maxInputTokens: Int { get } added to Embedder protocol, all conformers (MockEmbedder, OddDimsEmbedder in SwitchcraftStoreTests, SlowEmbedder wrapper in SearchTimeoutTests) updated.

ADR 022 and docs/Plan.md updated. PR includes Closes #89. ✅

Fixed

None required — implementation is correct and complete.


Reviewed the embedder overflow guard implementation for issue #89. All 5 new tests pass (truncate policy, reject policy, maxInputTokens property, and two 1,000-call stress tests). The only failing test in the suite is a pre-existing timing-dependent WAL concurrency test unmodified by this PR. No fixes were needed — implementation is correct, all spec requirements R1–R7 are met, and the PR is ready for human review.


Used 37/50 turns, 0k input / 9k output tokens.

totalslacker and others added 4 commits May 6, 2026 09:34
Swift 6 (Xcode 16.4 / CI) rejects bare retroactive Sendable conformances
on types defined in other modules. MLModel is defined in CoreML, so
`extension MLModel: MLPredictor` implicitly conferred Sendable — illegal
in Swift 6. Adding `extension MLModel: @unchecked Sendable {}` satisfies
the compiler. Safe: all MLModel access is gated through T5CoreMLEmbedder's
actor isolation.

Fixes CI regression introduced when the CI runner upgraded to Xcode 16.4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
#	Sources/SwitchcraftCoreML/T5CoreMLEmbedder.swift
#	docs/Plan.md
Swift 6 (Xcode 16.4) treats closures passed to ObjC blocks (including
autoreleasepool + MLExceptionCatcher.perform) as region-boundary crossings.
Passing a non-Sendable `any MLFeatureProvider` across that boundary triggers
"sending 'provider' risks causing data races" at T5CoreMLEmbedder.swift:506,526.

Fix: add retroactive @unchecked Sendable to MLDictionaryFeatureProvider (safe:
always accessed within T5CoreMLEmbedder's actor isolation), and narrow
predictWindow's parameter type from `any MLFeatureProvider` to the concrete
MLDictionaryFeatureProvider — the only type ever passed at the single call site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… warnings

Swift recommends @retroactive for conformances of external types to external
protocols to avoid future conflicts if CoreML adds the conformance itself.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@totalslacker
Copy link
Copy Markdown
Owner Author

🏭 Fabrik — stage: Review
branch: fabrik/issue-89 | commit: 61e2397 | main: 9fd782b | 2026-05-06 17:54 UTC

Review Findings

Fixed

  • Finding [1]: Missing @retroactive on MLDictionaryFeatureProvider and MLModel Sendable conformances. MLPredictor.swift declared @unchecked Sendable conformances on two CoreML types without the required @retroactive annotation. Swift emits a compiler warning for both (extension declares a conformance of imported type ... to imported protocol 'Sendable'; this will not behave correctly if the owners of 'CoreML' introduce this conformance in the future). These warnings were introduced by this branch's CI fixes. Fix: Added @retroactive to both conformance declarations. Build now completes with zero warnings. Commit: 61e2397 fix(ci): add @retroactive to silence retroactive Sendable conformance warnings

Verified

Spec requirements (R1–R7):

  • R1: Overflow guard fires in T5CoreMLEmbedder.encode(_:) after tokenizer.encode at line ~394, before SlidingWindow.plan. Same pattern in T5MetalEmbedder.encode(_:). ✅
  • R2: .truncate (prefix clip via tokens.prefix(maxInputTokens)) and .reject (throws EmbedderError.inputTooLarge) both implemented. ✅
  • R3: public enum EmbedderError: Error, Sendable, Equatable in SwitchcraftCore, with case inputTooLarge(actual: Int, max: Int). ✅
  • R4: public nonisolated let maxInputTokens: Int on T5CoreMLEmbedder (and T5MetalEmbedder). ✅
  • R5: Both 1,000-call stress tests pass. Oversized inputs at indices 250 and 750 are handled without leaving the embedder degraded — normal calls at all other indices succeed. ✅
  • R6: Default overflowPolicy: EmbedderOverflowPolicy = .truncate in all four init overloads. ✅
  • R7: All four coverage areas met across 5 tests (R7c split into separate truncate/reject sub-tests — better coverage). ✅

Protocol conformers updated:

  • MockEmbedderlet maxInputTokens: Int = Int.max
  • OddDimsEmbedder (inline in SwitchcraftStoreTests) — same ✅
  • SlowEmbedder wrapper in SearchTimeoutTests — forwards to inner ✅

Module placement: EmbedderOverflowPolicy, EmbedderError, and maxInputTokens: Int { get } in SwitchcraftCore — not CoreML-gated, reachable from any Embedder conformer. ✅

ADR 022: Present at adrs/022-embedder-overflow-guard.md, documents all five design decisions from the plan (module placement, protocol membership, default value, policy trade-offs, Metal guard without stub tests). ✅

docs/Plan.md: Issue [89] overflow guard entry checked off. ✅

Pre-existing failures: "Concurrent read+write is faster than sum of isolated operations" in SQLiteStorageConcurrencyTests.swift fails intermittently with a timing margin (51ms vs 28ms threshold). git diff origin/main..HEAD -- Tests/SwitchcraftTests/SQLiteStorageConcurrencyTests.swift returns zero lines — this file is unmodified by this PR. The failure has been present in every prior review session and is unrelated to the overflow guard changes.

Build: swift build completes with zero errors and zero warnings after the @retroactive fix.

Test suite: 272 tests in 67 suites, all passing except the pre-existing WAL concurrency timing flap.

PR: References Closes #89. Branch is at 61e2397, pushed to origin/fabrik/issue-89.


Reviewed the embedder overflow guard implementation for issue [89]. Fixed one issue: added @retroactive to the @unchecked Sendable conformances on MLModel and MLDictionaryFeatureProvider in MLPredictor.swift to silence compiler warnings introduced by this branch. All 272 tests pass (one pre-existing timing-sensitive WAL concurrency test flaps intermittently but is unmodified by this PR). All spec requirements R1–R7 are met, the @retroactive fix is pushed, and the PR is ready for human review.


Used 26/50 turns, 0k input / 4k output tokens.

@totalslacker totalslacker requested review from Copilot and removed request for Copilot May 6, 2026 23:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@totalslacker
Copy link
Copy Markdown
Owner Author

🏭 Fabrik — stage: Validate
branch: fabrik/issue-89 | commit: a4153de | main: f43e29e | 2026-05-06 23:56 UTC

All checks complete. Here is the full validation report.


Validation Report

Branch State

  • Branch fabrik/issue-89 is 0 commits behind origin/main (merge commits a4153de/a82cb63 already incorporated main's CI flaky-skip commits).
  • 8 commits ahead of main, all pushed. git status clean.
  • Build: swift build completes with zero errors and zero warnings.

Requirements

  • R1 — Overflow guard: Guard fires in T5CoreMLEmbedder.encode(_:) at line 398, after tokenizer.encode(text, addSpecialTokens: true) (line 393) and before SlidingWindow.plan (further in function). Identical guard in T5MetalEmbedder.encode(_:) at line 473. No oversized token sequence reaches MLPredictor.predict.
  • R2 — Two policies: .truncate (clips to tokens.prefix(maxInputTokens)) and .reject (throws EmbedderError.inputTooLarge) both implemented in both embedders.
  • R3 — Typed error: public enum EmbedderError: Error, Sendable, Equatable { case inputTooLarge(actual: Int, max: Int) } in Sources/SwitchcraftCore/Embedding/EmbedderError.swift. Sendable and Equatable confirmed.
  • R4 — Exposed maximum: public nonisolated let maxInputTokens: Int present on both T5CoreMLEmbedder and T5MetalEmbedder. Also added to the Embedder protocol as var maxInputTokens: Int { get }.
  • R5 — No pool poisoning: Both 1,000-call stress tests pass — oversized inputs at indices 250/750 are handled (truncated or rejected), and all 998 normal-sized calls succeed. No predictor receives an oversized token sequence.
  • R6 — Default is truncate: overflowPolicy: EmbedderOverflowPolicy = .truncate in all four init overloads of T5CoreMLEmbedder and in T5MetalEmbedder.init.
  • R7 — Tests: 5 tests in T5CoreMLEmbedderOverflowTests.swift covering: (a) truncate policy returns non-empty embeddings, (b) reject policy throws correct typed error with actual/max values, (c) 1,000-call stress for both truncate and reject policies, (d) maxInputTokens property returns configured value.

Test Suite: PASSED (272 tests, 67 suites)

Regression Check: None detected

  • All Embedder protocol conformers updated: MockEmbedder (Int.max), OddDimsEmbedder in SwitchcraftStoreTests (Int.max), SlowEmbedder wrapper in SearchTimeoutTests (forwards to inner).
  • MLPredictor.swift fixes (@unchecked @retroactive Sendable on MLModel and MLDictionaryFeatureProvider) are consistent with Swift 6 requirements.
  • T5CoreMLEmbedder's re-entrancy guard interaction verified: inFlight = true set at line 379, defer registered at line 380, overflow guard at line 398 — throw on .reject correctly releases waiters via the existing defer.

Code Completeness

Verdict: READY TO MERGE

All 7 requirements (R1–R7) verified. The overflow guard is correctly inserted in both T5CoreMLEmbedder and T5MetalEmbedder after tokenization and before SlidingWindow.plan. All 5 new overflow tests pass, CI is green on PR #91 (debug and release), and the one locally-failing test is a pre-existing timing flap in an unmodified file that is CI-skipped. No issues found; PR is ready to merge.


Used 38/50 turns, 0k input / 8k output tokens.

@totalslacker totalslacker merged commit d814646 into main May 6, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedder must enforce input-size guard internally — oversized inputs poison ANE pool for all subsequent inferences

2 participants