Vibe code without the internet.
A local coding agent powered by Google’s Gemma 4.
macOS uses Apple MLX. Windows and Linux use Ollama. No API keys. No cloud.
What if you could vibe code from an airplane? Or a cabin with no signal? Or just without sending your code to someone else’s server?
Gemma Chat Windows is an open-source Electron app that runs Gemma 4 locally. You describe what you want to build, and it writes the code: HTML, CSS, JavaScript, and multi-file projects with a live preview that updates as the model types.
- Describe what you want to build: "A retro calculator app" or "A landing page for a coffee shop"
- Watch it code: Gemma writes files character-by-character with a live preview
- Iterate: Ask for changes, and it edits the files in place
Everything happens locally.
- macOS on Apple Silicon runs Gemma through MLX-LM in a managed Python virtual environment.
- Windows 11 and Linux run Gemma through Ollama, using the native local HTTP API on
127.0.0.1:11434.
- Build Mode: Coding agent with a live preview canvas. Writes multi-file projects into a sandboxed workspace.
- Chat Mode: Conversational AI with tool use (web search, URL fetch, calculator, shell).
- Model Switching: Switch between Gemma 4 variants on the fly.
- Voice Input: Local speech-to-text via in-browser Whisper.
- Works Offline: After the one-time model download, chat and build flows stay local.
- Cross-Platform Runtime: MLX on macOS Apple Silicon, Ollama on Windows and Linux.
| Model | MLX Size | Ollama Tag | Ollama Size | Notes |
|---|---|---|---|---|
| Gemma 4 E2B | ~1.5 GB | gemma4:e2b |
7.2 GB | Fastest |
| Gemma 4 E4B | ~3 GB | gemma4:e4b (gemma4:latest) |
9.6 GB | Default model |
| Gemma 4 26B MoE (4B active) | ~8 GB | gemma4:26b |
18 GB | Larger MoE model |
| Gemma 4 31B | ~18 GB | gemma4:31b |
20 GB | Maximum local quality |
Requirements: macOS on Apple Silicon, Python 3.10-3.13, Node 20+.
git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run devFirst launch will auto-detect Python, create a venv, install MLX-LM, download the selected model, and then open the app.
Tip: If you do not already have Python, Homebrew works well:
brew install python@3.13
Requirements: Windows 11, Node 20+, Ollama for Windows.
git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run devOn first launch, Gemma Chat Windows checks for Ollama. If Ollama is missing, the app opens a setup screen with:
- Download Ollama: opens the official Windows installer page
- Re-check: verifies the install and retries the local runtime startup
If Ollama is installed but not already running, the app attempts to start ollama.exe serve automatically and then pulls the selected model. The default Windows model is gemma4:e4b.
Note: Windows Defender Firewall may prompt on first launch because the workspace preview server binds to
127.0.0.1. Allowing Private networks only is sufficient.
Open the gear button in the chat header to open the Performance drawer. On Windows it now shows:
- your detected GPU and effective free VRAM
- the active model tag and quant
- last-message tokens/sec and first-token latency
- a GPU residency proxy from Ollama's running-model telemetry
- Ollama tuning controls for flash attention and KV cache type
Fresh Windows setups also use GPU-aware model recommendations:
- under 6 GB free VRAM:
gemma4:e2b - 6-11 GB free VRAM:
gemma4:e4b - 11-22 GB free VRAM:
gemma4:26b - 22+ GB free VRAM:
gemma4:31b
If Ollama is already running outside the app, the Performance drawer shows an advisory with copyable setx commands so you can apply the same flash-attention and KV-cache defaults to the external service.
It also surfaces optional Windows OS advisories for Defender exclusions, High Performance power mode, and HAGS when those checks indicate they may help.
Reasoning mode also lives in the chat header and is off by default for faster local responses.
Requirements: Node 20+, Ollama, and enough RAM for the model you choose.
git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run devIf Windows Defender Firewall prompts the first time the app starts, choose Private networks only. The preview server binds to 127.0.0.1 and does not need public exposure.
Gemma Chat Windows checks both PATH and the default installer location under %LOCALAPPDATA%\Programs\Ollama. If you installed Ollama somewhere custom, add it to PATH or reinstall with the default installer.
Ollama still works on CPU, but larger models will be slower. If you expected GPU acceleration, verify the native Ollama install can see your CUDA or ROCm stack outside the app first.
Use the gear button in the top-right header to open the Performance drawer. If the app started Ollama itself, the default Windows tuning is flash attention plus q8_0 KV cache. If Ollama was already running as a separate service, the drawer explains how to set the matching environment variables yourself and lets you copy the commands.
The same drawer also shows optional one-click advisories for Defender exclusions and High Performance mode, plus a link to the HAGS settings page when Windows reports GPU scheduling is off.
npm run distThat builds for the current host OS only.
Platform-specific builds:
npm run dist:mac
npm run dist:win
npm run dist:linuxOutputs land in dist/.
| Layer | Tech |
|---|---|
| App Shell | Electron + Vite + React 19 + TypeScript + Tailwind |
| Model Runtime (macOS) | MLX-LM in a managed venv |
| Model Runtime (Windows/Linux) | Ollama |
| Speech-to-Text | transformers.js (Whisper, runs in-browser via WASM) |
| Workspace | Per-conversation sandboxed filesystem + local HTTP server |
src/
├── main/ Electron main process
│ ├── index.ts Window + IPC + agent loop
│ ├── runtime/ Local runtime abstraction + MLX/Ollama implementations
│ ├── workspace.ts Per-conversation workspace + static file server
│ ├── shell.ts Cross-platform shell executor for the tool layer
│ ├── tools.ts Tool definitions + system prompts + XML action parser
│ └── settings.ts Persistent runtime and shell preferences
├── preload/ contextBridge API surface
├── renderer/src/
│ ├── components/
│ │ ├── Setup.tsx First-run onboarding + runtime download / pull flow
│ │ ├── Chat.tsx Main layout + runtime-aware model switcher
│ │ ├── Canvas.tsx Preview / Code / Files tabs (Build mode)
│ │ ├── Message.tsx Chat bubbles + tool cards + activity bar
│ │ ├── Composer.tsx Input + mic button
│ │ └── Sidebar.tsx Conversation list
│ └── lib/whisper.ts Browser Whisper pipeline
└── shared/types.ts IPC types + model registry + settings types
Agent Loop: In Build mode, each assistant turn streams tokens from the selected local runtime. XML <action> blocks are parsed from the stream, executed, and then fed back into the next turn. Up to 40 rounds per user message.
Live Streaming: As the model generates file content, partial writes are flushed to disk every ~450ms. The preview iframe reloads in real time.
Tool Protocol: Small local models tend to follow XML actions more reliably than JSON function calling, so tools are invoked with an XML-based format:
<action name="write_file">
<path>index.html</path>
<content>
<!doctype html>
...
</content>
</action>- Gemma by Google DeepMind
- MLX by Apple Machine Learning Research
- Ollama
- transformers.js by Hugging Face
Created by @PrimeEcto and AI :)
MIT
