Gemma Chat Windows

Vibe code without the internet.
A local coding agent powered by Google’s Gemma 4.
macOS uses Apple MLX. Windows and Linux use Ollama. No API keys. No cloud.

The Idea

What if you could vibe code from an airplane? Or a cabin with no signal? Or just without sending your code to someone else’s server?

Gemma Chat Windows is an open-source Electron app that runs Gemma 4 locally. You describe what you want to build, and it writes the code: HTML, CSS, JavaScript, and multi-file projects with a live preview that updates as the model types.

How It Works

Describe what you want to build: "A retro calculator app" or "A landing page for a coffee shop"
Watch it code: Gemma writes files character-by-character with a live preview
Iterate: Ask for changes, and it edits the files in place

Everything happens locally.

macOS on Apple Silicon runs Gemma through MLX-LM in a managed Python virtual environment.
Windows 11 and Linux run Gemma through Ollama, using the native local HTTP API on 127.0.0.1:11434.

Features

Build Mode: Coding agent with a live preview canvas. Writes multi-file projects into a sandboxed workspace.
Chat Mode: Conversational AI with tool use (web search, URL fetch, calculator, shell).
Model Switching: Switch between Gemma 4 variants on the fly.
Voice Input: Local speech-to-text via in-browser Whisper.
Works Offline: After the one-time model download, chat and build flows stay local.
Cross-Platform Runtime: MLX on macOS Apple Silicon, Ollama on Windows and Linux.

Available Models

Model	MLX Size	Ollama Tag	Ollama Size	Notes
Gemma 4 E2B	~1.5 GB	`gemma4:e2b`	7.2 GB	Fastest
Gemma 4 E4B	~3 GB	`gemma4:e4b` (`gemma4:latest`)	9.6 GB	Default model
Gemma 4 26B MoE (4B active)	~8 GB	`gemma4:26b`	18 GB	Larger MoE model
Gemma 4 31B	~18 GB	`gemma4:31b`	20 GB	Maximum local quality

Getting Started

macOS

Requirements: macOS on Apple Silicon, Python 3.10-3.13, Node 20+.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

First launch will auto-detect Python, create a venv, install MLX-LM, download the selected model, and then open the app.

Tip: If you do not already have Python, Homebrew works well: brew install python@3.13

Windows

Requirements: Windows 11, Node 20+, Ollama for Windows.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

On first launch, Gemma Chat Windows checks for Ollama. If Ollama is missing, the app opens a setup screen with:

Download Ollama: opens the official Windows installer page
Re-check: verifies the install and retries the local runtime startup

If Ollama is installed but not already running, the app attempts to start ollama.exe serve automatically and then pulls the selected model. The default Windows model is gemma4:e4b.

Note: Windows Defender Firewall may prompt on first launch because the workspace preview server binds to 127.0.0.1. Allowing Private networks only is sufficient.

Performance

Open the gear button in the chat header to open the Performance drawer. On Windows it now shows:

your detected GPU and effective free VRAM
the active model tag and quant
last-message tokens/sec and first-token latency
a GPU residency proxy from Ollama's running-model telemetry
Ollama tuning controls for flash attention and KV cache type

Fresh Windows setups also use GPU-aware model recommendations:

under 6 GB free VRAM: gemma4:e2b
6-11 GB free VRAM: gemma4:e4b
11-22 GB free VRAM: gemma4:26b
22+ GB free VRAM: gemma4:31b

If Ollama is already running outside the app, the Performance drawer shows an advisory with copyable setx commands so you can apply the same flash-attention and KV-cache defaults to the external service. It also surfaces optional Windows OS advisories for Defender exclusions, High Performance power mode, and HAGS when those checks indicate they may help. Reasoning mode also lives in the chat header and is off by default for faster local responses.

Linux

Requirements: Node 20+, Ollama, and enough RAM for the model you choose.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

Windows Troubleshooting

Firewall Prompt

If Windows Defender Firewall prompts the first time the app starts, choose Private networks only. The preview server binds to 127.0.0.1 and does not need public exposure.

Ollama Not on PATH

Gemma Chat Windows checks both PATH and the default installer location under %LOCALAPPDATA%\Programs\Ollama. If you installed Ollama somewhere custom, add it to PATH or reinstall with the default installer.

GPU Not Detected

Ollama still works on CPU, but larger models will be slower. If you expected GPU acceleration, verify the native Ollama install can see your CUDA or ROCm stack outside the app first.

Performance Settings

Use the gear button in the top-right header to open the Performance drawer. If the app started Ollama itself, the default Windows tuning is flash attention plus q8_0 KV cache. If Ollama was already running as a separate service, the drawer explains how to set the matching environment variables yourself and lets you copy the commands. The same drawer also shows optional one-click advisories for Defender exclusions and High Performance mode, plus a link to the HAGS settings page when Windows reports GPU scheduling is off.

Building Distributables

npm run dist

That builds for the current host OS only.

Platform-specific builds:

npm run dist:mac
npm run dist:win
npm run dist:linux

Outputs land in dist/.

Tech Stack

Layer	Tech
App Shell	Electron + Vite + React 19 + TypeScript + Tailwind
Model Runtime (macOS)	MLX-LM in a managed venv
Model Runtime (Windows/Linux)	Ollama
Speech-to-Text	transformers.js (Whisper, runs in-browser via WASM)
Workspace	Per-conversation sandboxed filesystem + local HTTP server

Architecture

src/
├── main/                 Electron main process
│   ├── index.ts          Window + IPC + agent loop
│   ├── runtime/          Local runtime abstraction + MLX/Ollama implementations
│   ├── workspace.ts      Per-conversation workspace + static file server
│   ├── shell.ts          Cross-platform shell executor for the tool layer
│   ├── tools.ts          Tool definitions + system prompts + XML action parser
│   └── settings.ts       Persistent runtime and shell preferences
├── preload/              contextBridge API surface
├── renderer/src/
│   ├── components/
│   │   ├── Setup.tsx     First-run onboarding + runtime download / pull flow
│   │   ├── Chat.tsx      Main layout + runtime-aware model switcher
│   │   ├── Canvas.tsx    Preview / Code / Files tabs (Build mode)
│   │   ├── Message.tsx   Chat bubbles + tool cards + activity bar
│   │   ├── Composer.tsx  Input + mic button
│   │   └── Sidebar.tsx   Conversation list
│   └── lib/whisper.ts    Browser Whisper pipeline
└── shared/types.ts       IPC types + model registry + settings types

Under the Hood

Agent Loop: In Build mode, each assistant turn streams tokens from the selected local runtime. XML <action> blocks are parsed from the stream, executed, and then fed back into the next turn. Up to 40 rounds per user message.

Live Streaming: As the model generates file content, partial writes are flushed to disk every ~450ms. The preview iframe reloads in real time.

Tool Protocol: Small local models tend to follow XML actions more reliably than JSON function calling, so tools are invoked with an XML-based format:

<action name="write_file">
<path>index.html</path>
<content>
<!doctype html>
...
</content>
</action>

Credits

Gemma by Google DeepMind
MLX by Apple Machine Learning Research
Ollama
transformers.js by Hugging Face

Created by @PrimeEcto and AI :)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
Assets 3.xcassets/AppIcon.appiconset		Assets 3.xcassets/AppIcon.appiconset
build		build
src		src
.gitignore		.gitignore
Gemma-app-icon.png		Gemma-app-icon.png
LICENSE		LICENSE
PORTING_NOTES.md		PORTING_NOTES.md
README.md		README.md
electron-builder.yml		electron-builder.yml
electron.vite.config.ts		electron.vite.config.ts
eslint.config.js		eslint.config.js
gemma-extruded-app.png		gemma-extruded-app.png
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma Chat Windows

The Idea

How It Works

Features

Available Models

Getting Started

macOS

Windows

Performance

Linux

Windows Troubleshooting

Firewall Prompt

Ollama Not on PATH

GPU Not Detected

Performance Settings

Building Distributables

Tech Stack

Architecture

Under the Hood

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemma Chat Windows

The Idea

How It Works

Features

Available Models

Getting Started

macOS

Windows

Performance

Linux

Windows Troubleshooting

Firewall Prompt

Ollama Not on PATH

GPU Not Detected

Performance Settings

Building Distributables

Tech Stack

Architecture

Under the Hood

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages