This document defines the bounded psionic-mlx-audio package that closes
PMLX-705.
psionic-mlx-audio is the MLX-facing reusable audio package above the shared
Psionic runtime and artifact layers.
It owns:
- audio-family registry and quantized-checkpoint metadata
- WAV IO and bounded codec helpers
- text-to-speech and speech-to-speech request contracts
- streaming audio chunk contracts
- server-facing speech request/response shapes
- package-facing CLI entrypoints for synthesize, speech-to-speech, and WAV inspection
It does not own:
- product voice UX
- a claim of human-quality TTS or speech translation
- a browser/mobile audio player shell
The current package closes the MLX audio ecosystem gap with one honest CPU-reference runtime.
That means:
- text-to-speech requests render deterministic waveform clips
- speech-to-speech requests apply a bounded reference transform to the input waveform
- codec mode normalizes clips and owns the WAV/container contract
- streaming output is surfaced as explicit chunk lists
- server-facing speech requests can be handled through the same reference lane
This is a contract and packaging closure, not a claim that Psionic already ships a production neural speech stack.
The builtin registry currently covers:
kokorofor bounded text-to-speechxtts/xtts_v2for bounded text-to-speech plus speech-to-speechencodec/codecfor bounded codec helpers
Each family exposes explicit supported tasks, conditioning modes, and
quantized-checkpoint descriptors such as q4_k, q6_k, and q8_0.
The package keeps conditioning posture explicit:
nonevoice_labelreference_audio
If a family does not support one conditioning mode, the request must fail explicitly.
MlxAudioSpeechRequest and MlxAudioSpeechResponse define the server-facing
speech contract for this package.
The current reference lane can answer those requests directly and surface:
- output content type
- clip digest
- output clip metadata
- optional stream chunks
The package CLI is:
psionic-mlx-audio synthesizepsionic-mlx-audio speech-to-speechpsionic-mlx-audio inspect-wavpsionic-mlx-audio speech-request