Skip to content

fix(serve): bound registry eager-preload to --max-loaded-models (#133)#134

Merged
Pushkinist merged 2 commits into
mainfrom
fix/133-preload-respects-cap
Jun 18, 2026
Merged

fix(serve): bound registry eager-preload to --max-loaded-models (#133)#134
Pushkinist merged 2 commits into
mainfrom
fix/133-preload-respects-cap

Conversation

@Pushkinist

Copy link
Copy Markdown
Owner

Summary

rmlx serve --registry <file> eagerly preloaded every model in the registry at startup, ignoring --max-loaded-models K. With a 13-model registry that is a ~5-minute boot that loads all 13 once, contradicting the documented load-on-demand / idle-unload model.

Root cause: the eager-preload loop in crates/rmlx-cli/src/commands/serve.rs iterated over all registry.list() entries calling ensure_loaded. AppState::ensure_loaded LRU-evicts once slots.len() >= max_loaded_models, so preloading N at cap K paid the full load cost for all N but kept only the last K resident — pure waste + transient memory pressure.

Fix

Bound the eager preload to at most max_loaded_models entries (min(cap, N)):

let cap = max_loaded_models.max(1);
let ids: Vec<String> = state.registry.list().iter().take(cap).map(...).collect();

This warms exactly the resident-set size and removes the N-load thrash. The rest stay lazy via the documented on-demand path. Best-effort semantics, spawn_blocking off-runtime execution, and per-model idle timers are unchanged. .max(1) keeps preload sane at cap 0 (a pre-existing pathological value, out of scope here).

Note: registry.list() iterates a BTreeMap sorted by id, so the preloaded set is the alphabetically-first cap ids — deterministic, not JSON array order. Comments + docs/CLI.md updated to say so.

Tests / proof

  • e2e comments in runner.rs / manifest.toml updated to the new bounded behavior (comment-only; assert_model_lifecycle already exercises the cap=1 registry path).
  • Real-model proof: 3-model registry (gemma4 26b/e2b/e4b) served with --max-loaded-models 1 → exactly one "eager preload starting/complete" pair at boot (was three). On-demand still works: chat to a non-preloaded model → 200 "Paris", serve log shows it loaded on first request and LRU-evicted the preloaded one.
  • make ci green (fmt-check + clippy -D warnings + test + deny + audit).

Closes #133.

🤖 Generated with Claude Code

Registry serve eagerly preloaded EVERY model at startup, serially, even
at --max-loaded-models 1. A 13-model registry = ~5-min startup loading
all 13 once, contradicting the documented load-on-demand + idle-unload
model and ignoring the resident cap. ensure_loaded() LRU-evicts when
slots.len() >= cap, so preloading N models at cap=K loaded N and kept
only the last K — the first N-K loads were pure waste plus transient
memory pressure.

Bound the eager-preload loop to the first min(cap, N) registry entries
(BTreeMap order, deterministic). This warms exactly the resident-set
size; the rest stay lazy and load on first request via the existing
on-demand path. Best-effort semantics, spawn_blocking off-runtime
execution, and per-model idle timers via ensure_loaded are preserved.

Update the multi-model lifecycle e2e comments (runner.rs/manifest.toml)
to describe the new bounded behavior; leg (c)'s defensive explicit
load-B already forces the LRU swap, so the test stays green and
meaningful. Document the cap-bounded preload on --registry and
--max-loaded-models in docs/CLI.md.
…ON order

Follow-up to the bounded eager-preload fix. Comments and docs said "the
first cap in registry order" / "first cap registry entries", implying
JSON array order. The registry iterates a BTreeMap<String, ModelEntry>
sorted by id, so the actual selection is the alphabetically-first cap
model ids — independent of JSON array order.

Update serve.rs comment, docs/CLI.md (--registry and --max-loaded-models
rows), and the e2e runner/manifest comments to say "alphabetically-first
cap model ids". The runner.rs lifecycle test comments also drop the
over-specific "Order [A,B] → A resident" claim (which of A/B sorts
first depends on the real id basenames); the test's defensive load-B
path is robust to either ordering regardless.
@Pushkinist Pushkinist merged commit ff4d46d into main Jun 18, 2026
2 checks passed
@Pushkinist Pushkinist deleted the fix/133-preload-respects-cap branch June 18, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--registry eager-preloads every model at startup, ignoring --max-loaded-models (slow startup, contradicts load-on-demand)

1 participant