fix(serve): bound registry eager-preload to --max-loaded-models (#133)#134
Merged
Conversation
Registry serve eagerly preloaded EVERY model at startup, serially, even at --max-loaded-models 1. A 13-model registry = ~5-min startup loading all 13 once, contradicting the documented load-on-demand + idle-unload model and ignoring the resident cap. ensure_loaded() LRU-evicts when slots.len() >= cap, so preloading N models at cap=K loaded N and kept only the last K — the first N-K loads were pure waste plus transient memory pressure. Bound the eager-preload loop to the first min(cap, N) registry entries (BTreeMap order, deterministic). This warms exactly the resident-set size; the rest stay lazy and load on first request via the existing on-demand path. Best-effort semantics, spawn_blocking off-runtime execution, and per-model idle timers via ensure_loaded are preserved. Update the multi-model lifecycle e2e comments (runner.rs/manifest.toml) to describe the new bounded behavior; leg (c)'s defensive explicit load-B already forces the LRU swap, so the test stays green and meaningful. Document the cap-bounded preload on --registry and --max-loaded-models in docs/CLI.md.
…ON order Follow-up to the bounded eager-preload fix. Comments and docs said "the first cap in registry order" / "first cap registry entries", implying JSON array order. The registry iterates a BTreeMap<String, ModelEntry> sorted by id, so the actual selection is the alphabetically-first cap model ids — independent of JSON array order. Update serve.rs comment, docs/CLI.md (--registry and --max-loaded-models rows), and the e2e runner/manifest comments to say "alphabetically-first cap model ids". The runner.rs lifecycle test comments also drop the over-specific "Order [A,B] → A resident" claim (which of A/B sorts first depends on the real id basenames); the test's defensive load-B path is robust to either ordering regardless.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rmlx serve --registry <file>eagerly preloaded every model in the registry at startup, ignoring--max-loaded-models K. With a 13-model registry that is a ~5-minute boot that loads all 13 once, contradicting the documented load-on-demand / idle-unload model.Root cause: the eager-preload loop in
crates/rmlx-cli/src/commands/serve.rsiterated over allregistry.list()entries callingensure_loaded.AppState::ensure_loadedLRU-evicts onceslots.len() >= max_loaded_models, so preloading N at cap K paid the full load cost for all N but kept only the last K resident — pure waste + transient memory pressure.Fix
Bound the eager preload to at most
max_loaded_modelsentries (min(cap, N)):This warms exactly the resident-set size and removes the N-load thrash. The rest stay lazy via the documented on-demand path. Best-effort semantics,
spawn_blockingoff-runtime execution, and per-model idle timers are unchanged..max(1)keeps preload sane at cap 0 (a pre-existing pathological value, out of scope here).Note:
registry.list()iterates aBTreeMapsorted by id, so the preloaded set is the alphabetically-firstcapids — deterministic, not JSON array order. Comments +docs/CLI.mdupdated to say so.Tests / proof
runner.rs/manifest.tomlupdated to the new bounded behavior (comment-only;assert_model_lifecyclealready exercises the cap=1 registry path).--max-loaded-models 1→ exactly one "eager preload starting/complete" pair at boot (was three). On-demand still works: chat to a non-preloaded model →200 "Paris", serve log shows it loaded on first request and LRU-evicted the preloaded one.make cigreen (fmt-check + clippy-D warnings+ test + deny + audit).Closes #133.
🤖 Generated with Claude Code