Glide

A personalized iOS keyboard that autocompletes in your voice — lowercase, slang, the sentence shape you actually use — instead of generic AI output. Built on a custom logit-bias layer over Gemma 4 E2B, served from a FastAPI cloud backend.

What makes this hard

iOS keyboard extensions are capped at ~48 MB resident. A 4-bit Gemma is 3 GB. On-device LLM is physically impossible, so the keyboard is necessarily a thin client and inference lives in the cloud.
LoRA fine-tuning didn't produce usable quality. Tried r=64 full-corpus, r=64 DoRA, a tiny r=8, and r=8 on a scrubbed corpus. None of the runs beat the bias-layer approach in side-by-side evaluation, and the smaller adapters showed the usual fine-tune brittleness on out-of-distribution inputs. Replaced with the sample-time logit-bias layer below (full postmortem at experimental/docs/LORA_TRAINING_LOG.md).
Sub-500 ms suggestion latency on a 4B-param model required KV-cache reuse for the persona preamble, retrieval-augmented exemplars from the user's own message corpus, per-request mode dispatch, and a short-TTL dedup cache to absorb KeyboardKit's rapid re-fires.

The vector-bias layer (the technical core)

For every next token, Gemma produces a length-262,144 logits vector. The bias layer adds a second length-262,144 vector before sampling:

adjusted = logits
         + strength × bias_vector              # unigram voice
         + ngram_alpha × bigram_bonus[prev]    # 2-gram context
adjusted[logits < (max_logit - margin)] = -1e30   # admissibility filter

Where bias_vector[t] = log(1 + count[t]) − log(1 + total/unique), clipped to ±MAX_BIAS, with a ~9k-token dampen set (numeric-led + pure-punctuation) zeroed out so digits and punctuation in the user's corpus don't contaminate ordinary suggestion contexts.

The admissibility margin is the safety story: bias can reorder tokens the raw model already considered plausible, but it can't resurrect implausible ones. That's why the model never says "<NAME_A>" the way LoRA did — the base model didn't put <NAME_A> in its top-K, so no amount of bias can push it through.

Hyperparameters (T=0.5, top_p=0.92, top_k=64, min_p=0.05, repeat_penalty=1.15, max_tokens=10, margin=5.0, strength=0.4, ngram_alpha=0.4) came out of a 744-run sweep across realistic typing contexts. The harness and fixtures are at experimental/server/tools/voice_eval_v2/.

System diagram

                                  App Group (UserDefaults suite)
                                  ────────────────────────────────
                                  device_id, server_url, snippets,
                                  blocklist, mode, retention_pref
                                            │
       ┌────────────────────────────────────┼────────────────────────────────────┐
       ▼                                    ▼                                    ▼
┌─────────────┐                  ┌──────────────────┐                  ┌──────────────────┐
│ Glide app   │                  │ GlideKeyboard    │                  │ ActionExtension  │
│ (SwiftUI    │                  │ (KeyboardKit     │                  │ (selectedText    │
│ settings,   │                  │ extension —      │                  │  capture for     │
│ onboarding, │                  │ ~48MB ceiling)   │                  │  Smart Reply)    │
│ blocklist,  │                  └────────┬─────────┘                  └──────────────────┘
│ retention)  │                           │
└─────────────┘                           │ HTTPS
                                          ▼
                          ┌────────────────────────────────────┐
                          │  FastAPI server (server/main.py)   │
                          │  ┌──────────────────────────────┐  │
                          │  │ /api/suggest  →  cotypist.py │  │
                          │  │   • Gemma 4 E2B Q4_K_M       │  │
                          │  │   • UserProfile (bias layer) │  │
                          │  │   • Retrieval exemplars      │  │
                          │  │   • Admissibility filter     │  │
                          │  ├──────────────────────────────┤  │
                          │  │ /api/autocorrect             │  │
                          │  │   • SymSpell + QWERTY-DL     │  │
                          │  │   • Bigram rescore           │  │
                          │  ├──────────────────────────────┤  │
                          │  │ /api/keystrokes  /api/flush  │  │
                          │  │ /api/accept  /api/smart_reply│  │
                          │  └──────────────────────────────┘  │
                          │   SQLite: keystrokes, messages,    │
                          │           device_preferences       │
                          └────────────────────────────────────┘

Tech stack

Layer	Stack
iOS keyboard	Swift 5.9, SwiftUI, KeyboardKit, App Groups, AVFoundation (dictation)
iOS main app	SwiftUI, App Intents (Back-Tap context capture), Action Extension (selected-text capture)
Server	FastAPI, llama-cpp-python, SQLite
Model	Gemma 4 E2B Q4_K_M (~3 GB, ~2.3B effective params via Per-Layer Embeddings)
Autocorrect	SymSpell (Damerau-Levenshtein) + QWERTY-weighted edit distances + bigram rescore
Build	xcodegen + Xcode 15 / iOS 16+ deployment target

Run it locally

# Server
python3.11 -m venv .venv && source .venv/bin/activate
pip install -r server/requirements.txt

export GLIDE_GEMMA_GGUF=/path/to/gemma-4-e2b-q4_k_m.gguf
uvicorn --app-dir server main:app --host 0.0.0.0 --port 8000

# Smoke test
curl -s -X POST localhost:8000/api/suggest \
  -H 'content-type: application/json' \
  -d '{"context":"hey wha","device_id":"local-dev"}' | jq .

# iOS
xcodegen                       # generates Glide.xcodeproj from project.yml
open Glide.xcodeproj           # build & run on a real device (App Group entitlements)

In the Glide app's Settings → Server, point at your Mac's local IP (http://<your-mac>.local:8000) or your cloud URL. The keyboard reads the URL from the App Group; no rebuild when you switch.

Repo layout

glide/
├── Glide/                  iOS main app (SwiftUI — settings, blocklist, retention toggle)
├── GlideKeyboard/          iOS keyboard extension (KeyboardKit, ~48MB ceiling)
├── GlidePolishExtension/   Action extension for selected-text → Smart Reply
├── server/
│   ├── main.py             FastAPI app, endpoints, retention cleanup
│   ├── cotypist.py         Logit-bias completer + retrieval exemplars + KV cache
│   ├── autocorrect.py      SymSpell + QWERTY-weighted DL + bigram rescore
│   ├── retrieval.py        FAISS-style embedding retrieval
│   ├── blocklist.py        Suggestion content moderation
│   └── importers/          iMessage chat.db → corpus importer
├── experimental/           LoRA training pipeline + voice_eval_v2 sweep + Qwen 14B
│                           rewrite tools. Parked, kept for the postmortem.
├── docs/
│   ├── AUTOCORRECT.md      Autocorrect design (SymSpell + QWERTY-DL + bigram)
│   ├── COTYPIST_LOG.md     Algorithm reference for the suggestion path
│   ├── DEPLOY.md           Fly.io deployment walkthrough
│   ├── PERFORMANCE.md      Suggestion-path latency reference
│   ├── SNIPPETS.md         Text-snippet feature design
│   └── EMOJI_SUPPORT.md    Emoji key + emoji-suggestion design notes
└── project.yml             xcodegen input — generates Glide.xcodeproj

What I tried that didn't work

LoRA fine-tuning, then DoRA, then a tiny LoRA, then a tiny LoRA on a scrubbed dataset. None of them produced output quality that beat the bias-layer approach in side-by-side comparisons, and the smaller adapters showed the usual fine-tune brittleness on out-of-distribution inputs. Full negative-result writeup at experimental/docs/LORA_TRAINING_LOG.md. The vector-bias layer described above replaced all of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glide

What makes this hard

The vector-bias layer (the technical core)

System diagram

Tech stack

Run it locally

Repo layout

What I tried that didn't work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
Glide		Glide
GlideKeyboard		GlideKeyboard
GlidePolishExtension		GlidePolishExtension
docs		docs
experimental		experimental
server		server
.gitignore		.gitignore
README.md		README.md
project.yml		project.yml

Folders and files

Latest commit

History

Repository files navigation

Glide

What makes this hard

The vector-bias layer (the technical core)

System diagram

Tech stack

Run it locally

Repo layout

What I tried that didn't work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages