BonsaiChat (iOS sample)

A minimal SwiftUI iPhone app that runs Bonsai-1.7B (1-bit, Q1_0) on-device via upstream llama.cpp with the Metal GPU backend. No forks, no custom kernels.

Real-device only. The bundled llama.xcframework contains just the ios-arm64 (device) slice, so this project builds and runs on a physical iPhone/iPad — not the iOS Simulator or macOS. (The simulator's Metal is incomplete anyway; a real device is what matters.) To re-add simulator/macOS slices, rebuild the framework — see the bottom of this file.

Why no fork is needed

Bonsai's Q1_0 1-bit format (originally Q1_0_g128) was merged into upstream llama.cpp in April 2026 (PR #21273, CPU+format; Metal in follow-up #21528). GGML_TYPE_Q1_0 = 41 ships upstream. The Hugging Face model card still says "use the PrismML fork" — that's out of date.

Layout

project.yml              XcodeGen config (generates BonsaiChat.xcodeproj)
Frameworks/
  llama.xcframework      Prebuilt llama.cpp engine, Metal, ios-arm64 device slice (~5 MB)
BonsaiChat/
  LlamaWrapper.swift     Thin Swift wrapper over the llama.cpp C API (the integration unit)
  ChatViewModel.swift    Model loading, streaming, bundled-model auto-load, benchmark
  ContentView.swift      SwiftUI UI
  BonsaiChatApp.swift    @main entry
scripts/
  build-llama-xcframework.sh   Rebuild llama.xcframework from upstream llama.cpp
models/                  (gitignored) put your .gguf here — too large for GitHub

The model .gguf is not in the repo (exceeds GitHub's 100 MB limit). Download it into models/ before building (see below).

Prerequisites

Xcode 16+ (built on Xcode 26), a physical iPhone/iPad, and an Apple Developer Team for signing.
XcodeGen: brew install xcodegen (the .xcodeproj is generated, not committed).

Get the model

mkdir -p models
# Official Bonsai 1.7B Q1_0 (~237 MB):
pip install -U "huggingface_hub[cli]"
hf download prism-ml/Bonsai-1.7B-gguf Bonsai-1.7B-Q1_0.gguf --local-dir models

project.yml bundles models/bonsai-1.7b-multitask-7tasks-Q1_0.gguf into the app — put a .gguf at that path, or edit the buildPhase: resources line to point at the file you downloaded. ChatViewModel.loadBundledModel() auto-loads whatever .gguf is bundled.

Build & run on your iPhone

xcodegen generate
open BonsaiChat.xcodeproj

In Xcode: scheme BonsaiChat-iOS → select your iPhone → target Signing & Capabilities → set your Team → ⌘R. The app auto-loads the bundled model and runs on Metal (status bar shows ⚡ Metal). Tap Benchmark for prefill/generate tok/s.

Integrating into your own app

Drag Frameworks/llama.xcframework into your target → Embed & Sign (ensure LD_RUNPATH_SEARCH_PATHS has @executable_path/Frameworks).
Add LlamaWrapper.swift — it's import llama (no bridging header; the xcframework ships a module map).

Use it:

LlamaWrapper.bootstrap()
let llama = try LlamaWrapper(modelPath: path)
for try await e in llama.generate(prompt: "Hello") {
    if case .token(let t) = e { print(t, terminator: "") }
}

LlamaWrapper already handles Metal offload (n_gpu_layers = 99), the ChatML chat template

<think></think> suppression, EOS stop (<|im_end|>), and performance-core thread counts.

Reference performance (M4)

Native macOS CPU ~63 tok/s · macOS Metal ~217 tok/s. On-device numbers: use the in-app Benchmark. (The iOS Simulator runs CPU-only at ~9 tok/s and is not a valid benchmark.)

Rebuilding the engine (all slices)

The committed Frameworks/llama.xcframework is device-only. To rebuild from upstream:

git clone --depth 1 https://github.com/ggml-org/llama.cpp.git vendor/llama.cpp
bash scripts/build-llama-xcframework.sh     # → vendor/llama.cpp/build-apple/llama.xcframework (iOS device+sim, macOS)

That produces a multi-slice framework. To keep only the device slice (as shipped here):

xcrun xcodebuild -create-xcframework \
  -framework vendor/llama.cpp/build-apple/llama.xcframework/ios-arm64/llama.framework \
  -output Frameworks/llama.xcframework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BonsaiChat		BonsaiChat
Frameworks/llama.xcframework		Frameworks/llama.xcframework
scripts		scripts
.gitignore		.gitignore
README.md		README.md
project.yml		project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BonsaiChat (iOS sample)

Why no fork is needed

Layout

Prerequisites

Get the model

Build & run on your iPhone

Integrating into your own app

Reference performance (M4)

Rebuilding the engine (all slices)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BonsaiChat (iOS sample)

Why no fork is needed

Layout

Prerequisites

Get the model

Build & run on your iPhone

Integrating into your own app

Reference performance (M4)

Rebuilding the engine (all slices)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages