fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132)#135
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In multi-model
--registryserve mode, image inference could fail with a vision-feature shape mismatch (HTTP 503vision feature shape [1, 272, 1536] != [1, 272, 2560]) when a different model previously processed the same image.Root cause: the multimodal encoder-output cache (
MultimodalCache, one shared instance perAppState) keyed entries on an xxh3_64 digest over(header || pixel/PCM bytes)with no model identity. Two registry models sharing one image collide on that digest; the cached encoder output is projected to the first model'shidden_size, so a second model with a different hidden gets the wrong-dim soft token. e2b hidden=1536, e4b hidden=2560. Single---modelserve is unaffected.Fix
Fold a stable per-model signature into the cache key:
multimodal_cache::model_sig(&str) -> u64= fixed-seed xxh3_64 over the model's stable id (registry id / snapshot basename). Deterministic across requests + processes;hidden_sizedeliberately NOT used (two models can share a hidden yet produce different features).build_headerwidened 12 → 20 bytes (model_sigappended LE at[12..20]); headerversionbumped 1 → 2 (internal-only layout).image_key/audio_keynow takemodel_sig. Audio hardened symmetrically (no live callers yet).model_sigcomputed once per request inengine/arch_generator.rsandembeddings.rsfrommodel_id, threaded alongside the already-threadedmm_cacheto the 5 image call sites + the Qwen3-VLget_many/put_manypath.Invariant: same model + same input ⇒ same key ⇒ cache HIT (no perf regression); different model + same input ⇒ different key ⇒ no leak.
Tests / proof
image_key/audio_keydisambiguate onmodel_sig, identical on same sig;model_sigstable + distinct. All mm-cache unit tests green.--log debug):POST image → e2b→ 200,mm_cache: insert key=d013fc0a(1536-dim).POST same image → e4b→ 200 (was 503),mm_cache: insert key=6db90d0b(2560-dim) — distinct key for the same image, entries=2.POST same image → e2b again→ 200,mm_cache: hit key=d013fc0a— same-model repeat still HITs.--model e4b+ same image → 200, unchanged.make cigreen (fmt-check + clippy-D warnings+ test + deny + audit).Also in this PR
README.mdrefreshed: status0.1.0 → 0.2.2+ updated "What works" (streaming, gemma4_unified 12B + jina-v4 image,rmlx transcribeWhisper, multi-model registry with load-on-demand + per-model-scoped mm-cache). Corrected an overclaim —rmlx convertis a roadmap target, not a shipped subcommand.docs/CLI.md--mm-cache-bytesnote: key now includes model identity (no cross-model sharing).Closes #132.
🤖 Generated with Claude Code