Status: canonical
PMLX-701/#14reference record, updated 2026-03-17 after landingcrates/psionic-mlx-lm.
psionic-mlx-lm is the first bounded mlx-lm-style text package in this
repo.
It does not create a second runtime. It packages the already-landed Psionic GGUF loader, prompt renderer, generation runtime, streaming surface, continuous batch scheduler, and shared-prefix cache provenance into one local library and CLI boundary.
- load one local GGUF text model through
psionic-catalogandpsionic-serve - expose a package-owned load report with blob identity, decoder descriptor, runtime support, and chat-template metadata
- render chat prompts through the GGUF template metadata before generation
- execute generate, stream, and continuous-batch workflows
- persist prompt-cache artifacts from generation responses as JSON
- keep context-overflow and shared-prefix cache policy explicit at the request boundary
- local GGUF paths only
- package-owned CLI only
- catalog, Hugging Face cache, and architecture-registry workflows now live in
psionic-mlx-catalog - OpenAI-compatible HTTP serving now lives in
psionic-mlx-serve - no notebook/UI shell
Those later surfaces belong to PMLX-706 through PMLX-709.
The crate now exposes:
MlxLmTextRuntimeMlxLmTextRequestMlxLmLoadReportMlxLmPromptCacheArtifactMlxLmBatchReport
The package keeps public request semantics honest by reusing the native Psionic types underneath:
GenerationOptionsContextOverflowPolicyPrefixCacheControlPromptMessageRenderedPrompt
Run from the repo root:
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- load --model /path/to/model.gguf
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- generate --model /path/to/model.gguf --prompt "hello"
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- stream --model /path/to/model.gguf --prompt "hello"
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- batch --model /path/to/model.gguf --prompt "hello" --prompt "hello again"Render one chat prompt from GGUF template metadata:
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- render-chat \
--model /path/to/model.gguf \
--messages-json /tmp/messages.jsonPersist one prompt-cache artifact from a generate or stream run:
cargo run -p psionic-mlx-lm --bin psionic-mlx-lm -- generate \
--model /path/to/model.gguf \
--prompt "hello" \
--prompt-cache-artifact /tmp/prompt-cache.jsonPMLX-701 is not about claiming the whole MLX ecosystem is done. It is the
first package layer above the native framework so local text-model workflows no
longer require callers to stitch together raw GGUF loading, prompt rendering,
generation, streaming, batch scheduling, and prefix-cache receipts by hand.