Skip to content

ishaan-sharma-7/glide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

137 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Glide

A personalized iOS keyboard that autocompletes in your voice — lowercase, slang, the sentence shape you actually use — instead of generic AI output. Built on a custom logit-bias layer over Gemma 4 E2B, served from a FastAPI cloud backend.

What makes this hard

  • iOS keyboard extensions are capped at ~48 MB resident. A 4-bit Gemma is 3 GB. On-device LLM is physically impossible, so the keyboard is necessarily a thin client and inference lives in the cloud.
  • LoRA fine-tuning didn't produce usable quality. Tried r=64 full-corpus, r=64 DoRA, a tiny r=8, and r=8 on a scrubbed corpus. None of the runs beat the bias-layer approach in side-by-side evaluation, and the smaller adapters showed the usual fine-tune brittleness on out-of-distribution inputs. Replaced with the sample-time logit-bias layer below (full postmortem at experimental/docs/LORA_TRAINING_LOG.md).
  • Sub-500 ms suggestion latency on a 4B-param model required KV-cache reuse for the persona preamble, retrieval-augmented exemplars from the user's own message corpus, per-request mode dispatch, and a short-TTL dedup cache to absorb KeyboardKit's rapid re-fires.

The vector-bias layer (the technical core)

For every next token, Gemma produces a length-262,144 logits vector. The bias layer adds a second length-262,144 vector before sampling:

adjusted = logits
         + strength × bias_vector              # unigram voice
         + ngram_alpha × bigram_bonus[prev]    # 2-gram context
adjusted[logits < (max_logit - margin)] = -1e30   # admissibility filter

Where bias_vector[t] = log(1 + count[t]) − log(1 + total/unique), clipped to ±MAX_BIAS, with a ~9k-token dampen set (numeric-led + pure-punctuation) zeroed out so digits and punctuation in the user's corpus don't contaminate ordinary suggestion contexts.

The admissibility margin is the safety story: bias can reorder tokens the raw model already considered plausible, but it can't resurrect implausible ones. That's why the model never says "<NAME_A>" the way LoRA did — the base model didn't put <NAME_A> in its top-K, so no amount of bias can push it through.

Hyperparameters (T=0.5, top_p=0.92, top_k=64, min_p=0.05, repeat_penalty=1.15, max_tokens=10, margin=5.0, strength=0.4, ngram_alpha=0.4) came out of a 744-run sweep across realistic typing contexts. The harness and fixtures are at experimental/server/tools/voice_eval_v2/.

System diagram

                                  App Group (UserDefaults suite)
                                  ────────────────────────────────
                                  device_id, server_url, snippets,
                                  blocklist, mode, retention_pref
                                            │
       ┌────────────────────────────────────┼────────────────────────────────────┐
       ▼                                    ▼                                    ▼
┌─────────────┐                  ┌──────────────────┐                  ┌──────────────────┐
│ Glide app   │                  │ GlideKeyboard    │                  │ ActionExtension  │
│ (SwiftUI    │                  │ (KeyboardKit     │                  │ (selectedText    │
│ settings,   │                  │ extension —      │                  │  capture for     │
│ onboarding, │                  │ ~48MB ceiling)   │                  │  Smart Reply)    │
│ blocklist,  │                  └────────┬─────────┘                  └──────────────────┘
│ retention)  │                           │
└─────────────┘                           │ HTTPS
                                          ▼
                          ┌────────────────────────────────────┐
                          │  FastAPI server (server/main.py)   │
                          │  ┌──────────────────────────────┐  │
                          │  │ /api/suggest  →  cotypist.py │  │
                          │  │   • Gemma 4 E2B Q4_K_M       │  │
                          │  │   • UserProfile (bias layer) │  │
                          │  │   • Retrieval exemplars      │  │
                          │  │   • Admissibility filter     │  │
                          │  ├──────────────────────────────┤  │
                          │  │ /api/autocorrect             │  │
                          │  │   • SymSpell + QWERTY-DL     │  │
                          │  │   • Bigram rescore           │  │
                          │  ├──────────────────────────────┤  │
                          │  │ /api/keystrokes  /api/flush  │  │
                          │  │ /api/accept  /api/smart_reply│  │
                          │  └──────────────────────────────┘  │
                          │   SQLite: keystrokes, messages,    │
                          │           device_preferences       │
                          └────────────────────────────────────┘

Tech stack

Layer Stack
iOS keyboard Swift 5.9, SwiftUI, KeyboardKit, App Groups, AVFoundation (dictation)
iOS main app SwiftUI, App Intents (Back-Tap context capture), Action Extension (selected-text capture)
Server FastAPI, llama-cpp-python, SQLite
Model Gemma 4 E2B Q4_K_M (~3 GB, ~2.3B effective params via Per-Layer Embeddings)
Autocorrect SymSpell (Damerau-Levenshtein) + QWERTY-weighted edit distances + bigram rescore
Build xcodegen + Xcode 15 / iOS 16+ deployment target

Run it locally

# Server
python3.11 -m venv .venv && source .venv/bin/activate
pip install -r server/requirements.txt

export GLIDE_GEMMA_GGUF=/path/to/gemma-4-e2b-q4_k_m.gguf
uvicorn --app-dir server main:app --host 0.0.0.0 --port 8000

# Smoke test
curl -s -X POST localhost:8000/api/suggest \
  -H 'content-type: application/json' \
  -d '{"context":"hey wha","device_id":"local-dev"}' | jq .

# iOS
xcodegen                       # generates Glide.xcodeproj from project.yml
open Glide.xcodeproj           # build & run on a real device (App Group entitlements)

In the Glide app's Settings → Server, point at your Mac's local IP (http://<your-mac>.local:8000) or your cloud URL. The keyboard reads the URL from the App Group; no rebuild when you switch.

Repo layout

glide/
├── Glide/                  iOS main app (SwiftUI — settings, blocklist, retention toggle)
├── GlideKeyboard/          iOS keyboard extension (KeyboardKit, ~48MB ceiling)
├── GlidePolishExtension/   Action extension for selected-text → Smart Reply
├── server/
│   ├── main.py             FastAPI app, endpoints, retention cleanup
│   ├── cotypist.py         Logit-bias completer + retrieval exemplars + KV cache
│   ├── autocorrect.py      SymSpell + QWERTY-weighted DL + bigram rescore
│   ├── retrieval.py        FAISS-style embedding retrieval
│   ├── blocklist.py        Suggestion content moderation
│   └── importers/          iMessage chat.db → corpus importer
├── experimental/           LoRA training pipeline + voice_eval_v2 sweep + Qwen 14B
│                           rewrite tools. Parked, kept for the postmortem.
├── docs/
│   ├── AUTOCORRECT.md      Autocorrect design (SymSpell + QWERTY-DL + bigram)
│   ├── COTYPIST_LOG.md     Algorithm reference for the suggestion path
│   ├── DEPLOY.md           Fly.io deployment walkthrough
│   ├── PERFORMANCE.md      Suggestion-path latency reference
│   ├── SNIPPETS.md         Text-snippet feature design
│   └── EMOJI_SUPPORT.md    Emoji key + emoji-suggestion design notes
└── project.yml             xcodegen input — generates Glide.xcodeproj

What I tried that didn't work

LoRA fine-tuning, then DoRA, then a tiny LoRA, then a tiny LoRA on a scrubbed dataset. None of them produced output quality that beat the bias-layer approach in side-by-side comparisons, and the smaller adapters showed the usual fine-tune brittleness on out-of-distribution inputs. Full negative-result writeup at experimental/docs/LORA_TRAINING_LOG.md. The vector-bias layer described above replaced all of it.

About

Mobile Co-typist, coming soon...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors