Skip to content

fix: preserve LoRA adapter across requests in OpenAI responses endpoint#920

Closed
JasonOA888 wants to merge 1 commit into
Blaizzy:mainfrom
JasonOA888:fix/907-server-drops-lora-adapters
Closed

fix: preserve LoRA adapter across requests in OpenAI responses endpoint#920
JasonOA888 wants to merge 1 commit into
Blaizzy:mainfrom
JasonOA888:fix/907-server-drops-lora-adapters

Conversation

@JasonOA888
Copy link
Copy Markdown

Problem

When starting mlx_vlm server with --adapter-path, the adapter loads correctly on startup but gets dropped after the first request. The OpenAI-compatible /v1/responses endpoint calls get_cached_model(model) without passing adapter_path, so the cache key becomes (model, None) instead of (model, adapter_path). This triggers a cache mismatch, unloads the adapted model, and reloads the base model.

Fix

  • Add server_adapter_path module-level variable to track the adapter path set at startup
  • Pass server_adapter_path to get_cached_model() in both the lifespan preload and the responses endpoint
  • Now the cache key matches and the adapter persists across requests

Testing

  1. Start server with --adapter-path ./my-adapter
  2. Send multiple requests to /v1/responses
  3. Verify adapter is still loaded via /health endpoint

Closes #907

The responses endpoint called get_cached_model() without passing
adapter_path, causing the cache key to mismatch and the adapter to be
dropped after the first request. Now tracks the server's adapter_path
and passes it to all cache lookups.

Closes Blaizzy#907
@lucasnewman
Copy link
Copy Markdown
Collaborator

Thanks! This is fixed for you original case, and we're covering the newer endpoints here: #1251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Server drops LoRA adapters after every request — adapter_path not preserved in cache

2 participants