Skip to content

vieduy/bonsai-sample-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BonsaiChat (iOS sample)

A minimal SwiftUI iPhone app that runs Bonsai-1.7B (1-bit, Q1_0) on-device via upstream llama.cpp with the Metal GPU backend. No forks, no custom kernels.

Real-device only. The bundled llama.xcframework contains just the ios-arm64 (device) slice, so this project builds and runs on a physical iPhone/iPad — not the iOS Simulator or macOS. (The simulator's Metal is incomplete anyway; a real device is what matters.) To re-add simulator/macOS slices, rebuild the framework — see the bottom of this file.

Why no fork is needed

Bonsai's Q1_0 1-bit format (originally Q1_0_g128) was merged into upstream llama.cpp in April 2026 (PR #21273, CPU+format; Metal in follow-up #21528). GGML_TYPE_Q1_0 = 41 ships upstream. The Hugging Face model card still says "use the PrismML fork" — that's out of date.

Layout

project.yml              XcodeGen config (generates BonsaiChat.xcodeproj)
Frameworks/
  llama.xcframework      Prebuilt llama.cpp engine, Metal, ios-arm64 device slice (~5 MB)
BonsaiChat/
  LlamaWrapper.swift     Thin Swift wrapper over the llama.cpp C API (the integration unit)
  ChatViewModel.swift    Model loading, streaming, bundled-model auto-load, benchmark
  ContentView.swift      SwiftUI UI
  BonsaiChatApp.swift    @main entry
scripts/
  build-llama-xcframework.sh   Rebuild llama.xcframework from upstream llama.cpp
models/                  (gitignored) put your .gguf here — too large for GitHub

The model .gguf is not in the repo (exceeds GitHub's 100 MB limit). Download it into models/ before building (see below).

Prerequisites

  • Xcode 16+ (built on Xcode 26), a physical iPhone/iPad, and an Apple Developer Team for signing.
  • XcodeGen: brew install xcodegen (the .xcodeproj is generated, not committed).

Get the model

mkdir -p models
# Official Bonsai 1.7B Q1_0 (~237 MB):
pip install -U "huggingface_hub[cli]"
hf download prism-ml/Bonsai-1.7B-gguf Bonsai-1.7B-Q1_0.gguf --local-dir models

project.yml bundles models/bonsai-1.7b-multitask-7tasks-Q1_0.gguf into the app — put a .gguf at that path, or edit the buildPhase: resources line to point at the file you downloaded. ChatViewModel.loadBundledModel() auto-loads whatever .gguf is bundled.

Build & run on your iPhone

xcodegen generate
open BonsaiChat.xcodeproj

In Xcode: scheme BonsaiChat-iOS → select your iPhone → target Signing & Capabilities → set your Team⌘R. The app auto-loads the bundled model and runs on Metal (status bar shows ⚡ Metal). Tap Benchmark for prefill/generate tok/s.

Integrating into your own app

  1. Drag Frameworks/llama.xcframework into your target → Embed & Sign (ensure LD_RUNPATH_SEARCH_PATHS has @executable_path/Frameworks).
  2. Add LlamaWrapper.swift — it's import llama (no bridging header; the xcframework ships a module map).
  3. Use it:
    LlamaWrapper.bootstrap()
    let llama = try LlamaWrapper(modelPath: path)
    for try await e in llama.generate(prompt: "Hello") {
        if case .token(let t) = e { print(t, terminator: "") }
    }

LlamaWrapper already handles Metal offload (n_gpu_layers = 99), the ChatML chat template

  • <think></think> suppression, EOS stop (<|im_end|>), and performance-core thread counts.

Reference performance (M4)

Native macOS CPU ~63 tok/s · macOS Metal ~217 tok/s. On-device numbers: use the in-app Benchmark. (The iOS Simulator runs CPU-only at ~9 tok/s and is not a valid benchmark.)

Rebuilding the engine (all slices)

The committed Frameworks/llama.xcframework is device-only. To rebuild from upstream:

git clone --depth 1 https://github.com/ggml-org/llama.cpp.git vendor/llama.cpp
bash scripts/build-llama-xcframework.sh     # → vendor/llama.cpp/build-apple/llama.xcframework (iOS device+sim, macOS)

That produces a multi-slice framework. To keep only the device slice (as shipped here):

xcrun xcodebuild -create-xcframework \
  -framework vendor/llama.cpp/build-apple/llama.xcframework/ios-arm64/llama.framework \
  -output Frameworks/llama.xcframework

About

A minimal SwiftUI iPhone app that runs Bonsai-1.7B (1-bit, Q1_0) on-device via upstream llama.cpp with the Metal GPU backend.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors