Skip to content

fix: gracefully handle pixel_values TypeError for text-only Gemma 4 models#82

Open
yelban wants to merge 1 commit intojjang-ai:mainfrom
yelban:fix/batched-pixel-values-text-only-gemma4
Open

fix: gracefully handle pixel_values TypeError for text-only Gemma 4 models#82
yelban wants to merge 1 commit intojjang-ai:mainfrom
yelban:fix/batched-pixel-values-text-only-gemma4

Conversation

@yelban
Copy link
Copy Markdown

@yelban yelban commented Apr 15, 2026

Summary

  • Text-only Gemma 4 models (e.g., Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2) have model_type="gemma4" in config.json but do not accept pixel_values
  • The fallback heuristic in MLLMModelWrapper.__init__ detects "gemma4" in model_type and enables pixel_values injection, causing a TypeError crash in continuous-batching mode
  • This PR adds a self-healing try/except in __call__: on the first pixel_values TypeError, it permanently disables injection and retries — all subsequent calls work without overhead

Reproducer

vmlx serve Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
  --continuous-batching

Before: Every request fails with [Engine error: Model.__call__() got an unexpected keyword argument 'pixel_values'], retries 3 times, then crashes the engine loop.

After: First request auto-detects text-only model, disables pixel_values injection for the session, and all subsequent requests work normally.

Root cause

The model_config_registry.lookup() receives a HuggingFace model name (e.g., Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2) and tries to read config.json from the relative path — which doesn't exist (the model is in the HF cache). It falls back to the "unknown" config, and the fallback heuristic at line 47-53 then checks model.model_type == "gemma4" and enables pixel_values injection, even though the model is text-only (text_config.model_type="gemma4_text", is_mllm=False).

Test plan

  • Verified vmlx serve with --continuous-batching on Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 — requests succeed after patch
  • VLM models with inject_pixel_values: True in registry are unaffected (injection still happens, no TypeError)
  • Change is minimal and backward-compatible — only catches the specific pixel_values TypeError

…odels

Text-only Gemma 4 models (e.g., supergemma4-26b-uncensored-mlx-4bit-v2)
have model_type="gemma4" in config.json but do not accept pixel_values.

The fallback heuristic in MLLMModelWrapper.__init__ detects "gemma4" in
model_type and enables pixel_values injection, which then causes a
TypeError crash in continuous-batching mode.

Add a try/except in __call__ that catches the TypeError, permanently
disables injection for the session, and retries without pixel_values.
This is self-healing: the first call pays a tiny retry cost, all
subsequent calls run without overhead.

Reproducer:
  vmlx serve Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \
    --continuous-batching

Before: [Engine error: Model.__call__() got an unexpected keyword
argument 'pixel_values'] on every request, 3 retries then crash.

After: first request auto-detects text-only model, disables injection,
and all subsequent requests work normally.
jjang-ai added a commit that referenced this pull request Apr 23, 2026
User-added model folders silently hid image/diffusion pipelines:
- mlxstudio #85 (Chinese): 'swift 版本不能添加模型文件夹'
  ("Swift version cannot add model folder")
- mlxstudio #82: 'Moved image models still not opening REDUX'
- mlxstudio #96: 'Relocated image models do not work. Nor do other
  mflux models by others.'

Root cause: walkUserDir accepted only dirs containing config.json.
Diffusion pipelines (Flux, Z-Image, SD) ship a root-level
model_index.json and their config.json files live inside the
transformer / vae / text_encoder subdirs — which the walker was
ALSO skipping via the diffusionSubmodules allowlist. Net effect:
zero entries produced for a valid relocated Flux dir.

Two-part fix:

1. walkUserDir accepts either 'config.json' OR 'model_index.json'
   as a model-root marker. Sub-module rejection (parent owns
   model_index.json) still applies so the tree doesn't double-emit
   transformer/text_encoder as standalone models.

2. buildEntry forces modality=.image when model_index.json is
   present at the root. Without this, a non-obviously-named
   Flux folder (e.g. 'my-custom-flux-weights/') would land as
   modality=.unknown and the Image tab filter hides it.

Adjacent benefit: the minimum-weight-bytes guard already passes
image modality through, so the root-level model_index.json path
(which doesn't carry weights at the root — transformer/vae subdirs
do) no longer rejects via the stub-snapshot guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant