fix(serve): bound registry eager-preload to --max-loaded-models (#133) by Pushkinist · Pull Request #134 · Pushkinist/rMLX

Pushkinist · 2026-06-18T07:01:46Z

Summary

rmlx serve --registry <file> eagerly preloaded every model in the registry at startup, ignoring --max-loaded-models K. With a 13-model registry that is a ~5-minute boot that loads all 13 once, contradicting the documented load-on-demand / idle-unload model.

Root cause: the eager-preload loop in crates/rmlx-cli/src/commands/serve.rs iterated over all registry.list() entries calling ensure_loaded. AppState::ensure_loaded LRU-evicts once slots.len() >= max_loaded_models, so preloading N at cap K paid the full load cost for all N but kept only the last K resident — pure waste + transient memory pressure.

Fix

Bound the eager preload to at most max_loaded_models entries (min(cap, N)):

let cap = max_loaded_models.max(1);
let ids: Vec<String> = state.registry.list().iter().take(cap).map(...).collect();

This warms exactly the resident-set size and removes the N-load thrash. The rest stay lazy via the documented on-demand path. Best-effort semantics, spawn_blocking off-runtime execution, and per-model idle timers are unchanged. .max(1) keeps preload sane at cap 0 (a pre-existing pathological value, out of scope here).

Note: registry.list() iterates a BTreeMap sorted by id, so the preloaded set is the alphabetically-first cap ids — deterministic, not JSON array order. Comments + docs/CLI.md updated to say so.

Tests / proof

e2e comments in runner.rs / manifest.toml updated to the new bounded behavior (comment-only; assert_model_lifecycle already exercises the cap=1 registry path).
Real-model proof: 3-model registry (gemma4 26b/e2b/e4b) served with --max-loaded-models 1 → exactly one "eager preload starting/complete" pair at boot (was three). On-demand still works: chat to a non-preloaded model → 200 "Paris", serve log shows it loaded on first request and LRU-evicted the preloaded one.
make ci green (fmt-check + clippy -D warnings + test + deny + audit).

Closes #133.

🤖 Generated with Claude Code

Registry serve eagerly preloaded EVERY model at startup, serially, even at --max-loaded-models 1. A 13-model registry = ~5-min startup loading all 13 once, contradicting the documented load-on-demand + idle-unload model and ignoring the resident cap. ensure_loaded() LRU-evicts when slots.len() >= cap, so preloading N models at cap=K loaded N and kept only the last K — the first N-K loads were pure waste plus transient memory pressure. Bound the eager-preload loop to the first min(cap, N) registry entries (BTreeMap order, deterministic). This warms exactly the resident-set size; the rest stay lazy and load on first request via the existing on-demand path. Best-effort semantics, spawn_blocking off-runtime execution, and per-model idle timers via ensure_loaded are preserved. Update the multi-model lifecycle e2e comments (runner.rs/manifest.toml) to describe the new bounded behavior; leg (c)'s defensive explicit load-B already forces the LRU swap, so the test stays green and meaningful. Document the cap-bounded preload on --registry and --max-loaded-models in docs/CLI.md.

…ON order Follow-up to the bounded eager-preload fix. Comments and docs said "the first cap in registry order" / "first cap registry entries", implying JSON array order. The registry iterates a BTreeMap<String, ModelEntry> sorted by id, so the actual selection is the alphabetically-first cap model ids — independent of JSON array order. Update serve.rs comment, docs/CLI.md (--registry and --max-loaded-models rows), and the e2e runner/manifest comments to say "alphabetically-first cap model ids". The runner.rs lifecycle test comments also drop the over-specific "Order [A,B] → A resident" claim (which of A/B sorts first depends on the real id basenames); the test's defensive load-B path is robust to either ordering regardless.

Pushkinist added 2 commits June 18, 2026 12:56

Pushkinist merged commit ff4d46d into main Jun 18, 2026
2 checks passed

Pushkinist deleted the fix/133-preload-respects-cap branch June 18, 2026 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serve): bound registry eager-preload to --max-loaded-models (#133)#134

fix(serve): bound registry eager-preload to --max-loaded-models (#133)#134
Pushkinist merged 2 commits into
mainfrom
fix/133-preload-respects-cap

Pushkinist commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pushkinist commented Jun 18, 2026

Summary

Fix

Tests / proof

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant