This document defines the bounded psionic-mlx-bench package that closes
PMLX-708.
psionic-mlx-bench is the MLX-facing benchmark package layer above
psionic-eval, psionic-mlx-lm, and psionic-mlx-vlm.
It owns:
- machine-readable benchmark suite manifests above
BenchmarkPackage - repeated benchmark receipts that reuse
psionic-eval - local text and served provider adapters for MLX-class lanes
- multimodal prompt projection through
psionic-mlx-vlm - bounded text and structured-output scoring contracts for local or private eval suites
It does not own:
- a second eval runtime
- benchmark registry authority
- benchmark transport or hosted execution authority
- a native multimodal encoder runtime
The package is a benchmark and adapter layer, not a second scoring stack.
It reuses:
psionic-evalfor benchmark packages, eval runs, verification facts, and repeat-run aggregationpsionic-mlx-lmfor local text request shapingpsionic-mlx-vlmfor multimodal prompt projection and served-request planning
That keeps MLX-facing benchmark work tied to the same receipts and aggregate truth as the rest of Psionic.
psionic-mlx-bench now exposes two bounded adapter families:
- a local text adapter that renders prompt messages into one deterministic text request before calling the caller-owned local generation lane
- a served adapter that emits machine-readable JSON requests for
/v1/responsesor/v1/chat/completionsso caller-owned harnesses can score local or private served lanes
Both adapters return the same provider-response contract, which carries output text or structured data, token accounting, timer facts, final-state digests, execution-strategy facts, and optional artifacts.
psionic-mlx-bench now exposes a fixture-driven CLI for the package layer:
psionic-mlx-bench build-suitepsionic-mlx-bench run-text-fixturepsionic-mlx-bench run-served-fixture
Those commands are intended for package examples, local or private harnesses, and migration guidance. They are not a promise of hosted benchmark authority.
Multimodal benchmark cases stay bounded to prompt projection.
psionic-mlx-bench does not claim a native image, audio, or video encoder.
Instead it reuses psionic-mlx-vlm to turn multimodal inputs into digest-bound
projection receipts plus either:
- one local text prompt for the local adapter
- one served JSON request for the served adapter
The package currently ships bounded expectation families for:
- exact text matches
- required-fragment text matches
- exact structured JSON matches
Those expectations produce normal psionic-eval sample metrics, artifacts,
verification facts, finalized eval runs, and repeated aggregate summaries.
That means benchmark receipts remain machine-legible and comparable with the rest of the repo instead of becoming notebook-only side effects.