Skip to content

fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132)#135

Merged
Pushkinist merged 2 commits into
mainfrom
fix/132-mm-cache-model-key
Jun 18, 2026
Merged

fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132)#135
Pushkinist merged 2 commits into
mainfrom
fix/132-mm-cache-model-key

Conversation

@Pushkinist

Copy link
Copy Markdown
Owner

Summary

In multi-model --registry serve mode, image inference could fail with a vision-feature shape mismatch (HTTP 503 vision feature shape [1, 272, 1536] != [1, 272, 2560]) when a different model previously processed the same image.

Root cause: the multimodal encoder-output cache (MultimodalCache, one shared instance per AppState) keyed entries on an xxh3_64 digest over (header || pixel/PCM bytes) with no model identity. Two registry models sharing one image collide on that digest; the cached encoder output is projected to the first model's hidden_size, so a second model with a different hidden gets the wrong-dim soft token. e2b hidden=1536, e4b hidden=2560. Single---model serve is unaffected.

Fix

Fold a stable per-model signature into the cache key:

  • New multimodal_cache::model_sig(&str) -> u64 = fixed-seed xxh3_64 over the model's stable id (registry id / snapshot basename). Deterministic across requests + processes; hidden_size deliberately NOT used (two models can share a hidden yet produce different features).
  • build_header widened 12 → 20 bytes (model_sig appended LE at [12..20]); header version bumped 1 → 2 (internal-only layout). image_key/audio_key now take model_sig. Audio hardened symmetrically (no live callers yet).
  • model_sig computed once per request in engine/arch_generator.rs and embeddings.rs from model_id, threaded alongside the already-threaded mm_cache to the 5 image call sites + the Qwen3-VL get_many/put_many path.

Invariant: same model + same input ⇒ same key ⇒ cache HIT (no perf regression); different model + same input ⇒ different key ⇒ no leak.

Tests / proof

  • Unit tests: image_key/audio_key disambiguate on model_sig, identical on same sig; model_sig stable + distinct. All mm-cache unit tests green.
  • Real-model proof (registry e2b/e4b, mm-cache on, --log debug):
    • POST image → e2b → 200, mm_cache: insert key=d013fc0a (1536-dim).
    • POST same image → e4b200 (was 503), mm_cache: insert key=6db90d0b (2560-dim) — distinct key for the same image, entries=2.
    • POST same image → e2b again → 200, mm_cache: hit key=d013fc0a — same-model repeat still HITs.
    • Control: single --model e4b + same image → 200, unchanged.
  • make ci green (fmt-check + clippy -D warnings + test + deny + audit).

Also in this PR

  • README.md refreshed: status 0.1.0 → 0.2.2 + updated "What works" (streaming, gemma4_unified 12B + jina-v4 image, rmlx transcribe Whisper, multi-model registry with load-on-demand + per-model-scoped mm-cache). Corrected an overclaim — rmlx convert is a roadmap target, not a shipped subcommand.
  • docs/CLI.md --mm-cache-bytes note: key now includes model identity (no cross-model sharing).

Closes #132.

🤖 Generated with Claude Code

@Pushkinist Pushkinist merged commit 390d178 into main Jun 18, 2026
2 checks passed
@Pushkinist Pushkinist deleted the fix/132-mm-cache-model-key branch June 18, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Registry/multi-model vision: mm encoder-output cache keyed by content hash, not model → cross-model feature leak (503 shape mismatch)

1 participant