fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132) by Pushkinist · Pull Request #135 · Pushkinist/rMLX

Pushkinist · 2026-06-18T07:39:24Z

Summary

In multi-model --registry serve mode, image inference could fail with a vision-feature shape mismatch (HTTP 503 vision feature shape [1, 272, 1536] != [1, 272, 2560]) when a different model previously processed the same image.

Root cause: the multimodal encoder-output cache (MultimodalCache, one shared instance per AppState) keyed entries on an xxh3_64 digest over (header || pixel/PCM bytes) with no model identity. Two registry models sharing one image collide on that digest; the cached encoder output is projected to the first model's hidden_size, so a second model with a different hidden gets the wrong-dim soft token. e2b hidden=1536, e4b hidden=2560. Single---model serve is unaffected.

Fix

Fold a stable per-model signature into the cache key:

New multimodal_cache::model_sig(&str) -> u64 = fixed-seed xxh3_64 over the model's stable id (registry id / snapshot basename). Deterministic across requests + processes; hidden_size deliberately NOT used (two models can share a hidden yet produce different features).
build_header widened 12 → 20 bytes (model_sig appended LE at [12..20]); header version bumped 1 → 2 (internal-only layout). image_key/audio_key now take model_sig. Audio hardened symmetrically (no live callers yet).
model_sig computed once per request in engine/arch_generator.rs and embeddings.rs from model_id, threaded alongside the already-threaded mm_cache to the 5 image call sites + the Qwen3-VL get_many/put_many path.

Invariant: same model + same input ⇒ same key ⇒ cache HIT (no perf regression); different model + same input ⇒ different key ⇒ no leak.

Tests / proof

Unit tests: image_key/audio_key disambiguate on model_sig, identical on same sig; model_sig stable + distinct. All mm-cache unit tests green.
Real-model proof (registry e2b/e4b, mm-cache on, --log debug):
- POST image → e2b → 200, mm_cache: insert key=d013fc0a (1536-dim).
- POST same image → e4b → 200 (was 503), mm_cache: insert key=6db90d0b (2560-dim) — distinct key for the same image, entries=2.
- POST same image → e2b again → 200, mm_cache: hit key=d013fc0a — same-model repeat still HITs.
- Control: single --model e4b + same image → 200, unchanged.
make ci green (fmt-check + clippy -D warnings + test + deny + audit).

Also in this PR

README.md refreshed: status 0.1.0 → 0.2.2 + updated "What works" (streaming, gemma4_unified 12B + jina-v4 image, rmlx transcribe Whisper, multi-model registry with load-on-demand + per-model-scoped mm-cache). Corrected an overclaim — rmlx convert is a roadmap target, not a shipped subcommand.
docs/CLI.md --mm-cache-bytes note: key now includes model identity (no cross-model sharing).

Closes #132.

🤖 Generated with Claude Code

…ss-model leak (#132)

Pushkinist added 2 commits June 18, 2026 14:34

fix(mm-cache): key encoder-output cache on model identity to stop cro…

90d56f7

…ss-model leak (#132)

fix(mm-cache): update stale 12-byte header docstrings to 20 bytes

d846206

Pushkinist merged commit 390d178 into main Jun 18, 2026
2 checks passed

Pushkinist deleted the fix/132-mm-cache-model-key branch June 18, 2026 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132)#135

fix(mm-cache): key encoder-output cache on model identity to stop cross-model leak (#132)#135
Pushkinist merged 2 commits into
mainfrom
fix/132-mm-cache-model-key

Pushkinist commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pushkinist commented Jun 18, 2026

Summary

Fix

Tests / proof

Also in this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant