NeuralCompose ships with no weights. Out of the box, the app uses:
MockIntentClassifier— deterministic intent generator driven by EEG energy.StubNextWordPredictor— a tiny built-in n-gram / unigram fallback.
Drop in real models when you want them.
Models/
IntentClassifier.mlmodelc/ <- Apple's compiled Core ML bundle
If IntentClassifier.mlmodelc exists, ClassifierFactory.live() will load it
under the following configuration:
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine // ANE preferred, no GPU
config.allowLowPrecisionAccumulationOnGPU = false.cpuAndNeuralEngine is the right setting here — it deliberately keeps work
off the GPU so MLX has unobstructed GPU access. The runtime-configurable
ClassifierComputeMode enum can be flipped from the UI to .all (let Core ML
schedule freely) or .cpuOnly (debug fallback). It never offers
.neuralEngineOnly — that case does not exist in MLComputeUnits.
The default wrapper expects:
- Input:
MLMultiArray<Float32>of shape[1, 4, 512](i.e.[1, channels, samples]). The512matches a 2 s window at 256 Hz, which is whatEEGWindowingConfigandAppContainerconfigure by default, and equalsCoreMLIntentClassifier.expectedSamplesout of the box. If you changeEEGWindowingConfig.windowSecondsorsampleRate, also updateexpectedSamples— the two must agree, and the wrapper validates channel count at runtime. - Output:
MLMultiArray<Float32>of shape[1, classes], logits or probabilities for the intent labels inIntentClass.modelOutputOrder.
If your model uses different names, adjust the input/output keys in
CoreMLIntentClassifier.swift — that's the only place they appear.
ClassifierFactory looks for either:
Models/IntentClassifier.mlmodelc/— Xcode-compiled bundle. Fastest first-launch (no compile step at runtime). Requires full Xcode.app (not just Command Line Tools) sincexcrun coremlcompilerships only with Xcode:xcrun coremlcompiler compile path/to/IntentClassifier.mlpackage Models/
Models/IntentClassifier.mlpackage— raw export fromScripts/train-intent-classifier.py(or any coremltools.convert()call).CoreMLIntentClassifierrunsMLModel.compileModel(at:)on first load (~500 ms one-time), then loads the resulting.mlmodelcfrom the per-launch temp dir. No Xcode required.
If both exist, .mlmodelc wins.
The repo ships a training script that turns one or more calibration sessions
into a .mlpackage:
./venv/bin/python Scripts/train-intent-classifier.py
# or scope to specific sessions:
./venv/bin/python Scripts/train-intent-classifier.py \
~/Documents/NeuralCompose/Recordings/calibration_<ts>_muses/Architecture is a 1-D CNN (~25K params, ANE-friendly). Output:
Models/IntentClassifier.mlpackage. See CALIBRATION.md
for how to collect the input data.
Pick something small enough that next-word latency stays under 200 ms on a recent Apple Silicon mac (M1 → M4). Good starting points:
Qwen2.5-0.5B-Instruct-4bitQwen2.5-1.5B-Instruct-4bitSmolLM2-360M-Instruct-4bitSmolLM2-1.7B-Instruct-4bitgemma-2-2b-it-4bit(heavier; may add latency on M1)Phi-3.5-mini-instruct-4bit
The easiest path is to download an already-MLX-converted variant from the
Hugging Face mlx-community org and unpack it into Models/. With the
modern hf CLI from huggingface_hub:
hf download mlx-community/Qwen2.5-0.5B-Instruct-4bit \
--local-dir Models/Qwen2.5-0.5B-Instruct-4bit(The older huggingface-cli download command works on installations
predating huggingface_hub 0.27 but is deprecated.)
The expected layout under Models/<name>/:
config.json
tokenizer.json
tokenizer_config.json
*.safetensors
Either edit defaultMLXModelName in BCILLM/PredictorFactory.swift, or set
NEURALCOMPOSE_MLX_MODEL=<folder name> in the environment before launch.
MLXNextWordPredictor.init is tolerant: if the folder is missing, the config
is malformed, or MLXLMCommon raises during weight load, the factory logs the
specific reason once and returns a StubNextWordPredictor in its place. The
app continues working in degraded mode and the privacy banner reflects it.
MLX-swift's Metal kernels are compiled from .metal sources by SPM at
build time, which needs xcrun metal — that binary ships only with
the full Xcode.app (not with Command Line Tools). If you build under CLT
alone, the binary links but launching the predictor raises:
MLX error: Failed to load the default metallib. library not found ...
The C++ exception isn't catchable from Swift, so the whole process exits.
Workaround until Xcode is installed: do not place MLX weights in Models/
— PredictorFactory.live() will skip MLX init entirely and fall back to
the stub. Once Xcode is installed:
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
sudo xcodebuild -license accept
xcrun -find metal # should print a path
rm -rf .build && ./Scripts/build.sh --with-brainflowCore ML (ANE) for classification + MLX (GPU) for generation is the sweet spot
on Apple Silicon: the two engines don't fight for the same compute. Routing
the classifier to .cpuAndNeuralEngine is the load-bearing choice that keeps
the GPU free for MLX, and keeps the per-window classifier under ~3 ms even on
M1. Don't move it to .all unless you've profiled and confirmed it does not
contend with the LLM forward pass.