Skip to content

PrimeEcto/gemma-chat-windows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemma Chat Windows

Gemma Chat Windows

Vibe code without the internet.
A local coding agent powered by Google’s Gemma 4.
macOS uses Apple MLX. Windows and Linux use Ollama. No API keys. No cloud.


Gemma Chat Windows screenshot

The Idea

What if you could vibe code from an airplane? Or a cabin with no signal? Or just without sending your code to someone else’s server?

Gemma Chat Windows is an open-source Electron app that runs Gemma 4 locally. You describe what you want to build, and it writes the code: HTML, CSS, JavaScript, and multi-file projects with a live preview that updates as the model types.

How It Works

  1. Describe what you want to build: "A retro calculator app" or "A landing page for a coffee shop"
  2. Watch it code: Gemma writes files character-by-character with a live preview
  3. Iterate: Ask for changes, and it edits the files in place

Everything happens locally.

  • macOS on Apple Silicon runs Gemma through MLX-LM in a managed Python virtual environment.
  • Windows 11 and Linux run Gemma through Ollama, using the native local HTTP API on 127.0.0.1:11434.

Features

  • Build Mode: Coding agent with a live preview canvas. Writes multi-file projects into a sandboxed workspace.
  • Chat Mode: Conversational AI with tool use (web search, URL fetch, calculator, shell).
  • Model Switching: Switch between Gemma 4 variants on the fly.
  • Voice Input: Local speech-to-text via in-browser Whisper.
  • Works Offline: After the one-time model download, chat and build flows stay local.
  • Cross-Platform Runtime: MLX on macOS Apple Silicon, Ollama on Windows and Linux.

Available Models

Model MLX Size Ollama Tag Ollama Size Notes
Gemma 4 E2B ~1.5 GB gemma4:e2b 7.2 GB Fastest
Gemma 4 E4B ~3 GB gemma4:e4b (gemma4:latest) 9.6 GB Default model
Gemma 4 26B MoE (4B active) ~8 GB gemma4:26b 18 GB Larger MoE model
Gemma 4 31B ~18 GB gemma4:31b 20 GB Maximum local quality

Getting Started

macOS

Requirements: macOS on Apple Silicon, Python 3.10-3.13, Node 20+.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

First launch will auto-detect Python, create a venv, install MLX-LM, download the selected model, and then open the app.

Tip: If you do not already have Python, Homebrew works well: brew install python@3.13

Windows

Requirements: Windows 11, Node 20+, Ollama for Windows.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

On first launch, Gemma Chat Windows checks for Ollama. If Ollama is missing, the app opens a setup screen with:

  • Download Ollama: opens the official Windows installer page
  • Re-check: verifies the install and retries the local runtime startup

If Ollama is installed but not already running, the app attempts to start ollama.exe serve automatically and then pulls the selected model. The default Windows model is gemma4:e4b.

Note: Windows Defender Firewall may prompt on first launch because the workspace preview server binds to 127.0.0.1. Allowing Private networks only is sufficient.

Performance

Open the gear button in the chat header to open the Performance drawer. On Windows it now shows:

  • your detected GPU and effective free VRAM
  • the active model tag and quant
  • last-message tokens/sec and first-token latency
  • a GPU residency proxy from Ollama's running-model telemetry
  • Ollama tuning controls for flash attention and KV cache type

Fresh Windows setups also use GPU-aware model recommendations:

  • under 6 GB free VRAM: gemma4:e2b
  • 6-11 GB free VRAM: gemma4:e4b
  • 11-22 GB free VRAM: gemma4:26b
  • 22+ GB free VRAM: gemma4:31b

If Ollama is already running outside the app, the Performance drawer shows an advisory with copyable setx commands so you can apply the same flash-attention and KV-cache defaults to the external service. It also surfaces optional Windows OS advisories for Defender exclusions, High Performance power mode, and HAGS when those checks indicate they may help. Reasoning mode also lives in the chat header and is off by default for faster local responses.

Linux

Requirements: Node 20+, Ollama, and enough RAM for the model you choose.

git clone https://github.com/PrimeEcto/gemma-chat-windows.git
cd gemma-chat-windows
npm install
npm run dev

Windows Troubleshooting

Firewall Prompt

If Windows Defender Firewall prompts the first time the app starts, choose Private networks only. The preview server binds to 127.0.0.1 and does not need public exposure.

Ollama Not on PATH

Gemma Chat Windows checks both PATH and the default installer location under %LOCALAPPDATA%\Programs\Ollama. If you installed Ollama somewhere custom, add it to PATH or reinstall with the default installer.

GPU Not Detected

Ollama still works on CPU, but larger models will be slower. If you expected GPU acceleration, verify the native Ollama install can see your CUDA or ROCm stack outside the app first.

Performance Settings

Use the gear button in the top-right header to open the Performance drawer. If the app started Ollama itself, the default Windows tuning is flash attention plus q8_0 KV cache. If Ollama was already running as a separate service, the drawer explains how to set the matching environment variables yourself and lets you copy the commands. The same drawer also shows optional one-click advisories for Defender exclusions and High Performance mode, plus a link to the HAGS settings page when Windows reports GPU scheduling is off.

Building Distributables

npm run dist

That builds for the current host OS only.

Platform-specific builds:

npm run dist:mac
npm run dist:win
npm run dist:linux

Outputs land in dist/.

Tech Stack

Layer Tech
App Shell Electron + Vite + React 19 + TypeScript + Tailwind
Model Runtime (macOS) MLX-LM in a managed venv
Model Runtime (Windows/Linux) Ollama
Speech-to-Text transformers.js (Whisper, runs in-browser via WASM)
Workspace Per-conversation sandboxed filesystem + local HTTP server

Architecture

src/
├── main/                 Electron main process
│   ├── index.ts          Window + IPC + agent loop
│   ├── runtime/          Local runtime abstraction + MLX/Ollama implementations
│   ├── workspace.ts      Per-conversation workspace + static file server
│   ├── shell.ts          Cross-platform shell executor for the tool layer
│   ├── tools.ts          Tool definitions + system prompts + XML action parser
│   └── settings.ts       Persistent runtime and shell preferences
├── preload/              contextBridge API surface
├── renderer/src/
│   ├── components/
│   │   ├── Setup.tsx     First-run onboarding + runtime download / pull flow
│   │   ├── Chat.tsx      Main layout + runtime-aware model switcher
│   │   ├── Canvas.tsx    Preview / Code / Files tabs (Build mode)
│   │   ├── Message.tsx   Chat bubbles + tool cards + activity bar
│   │   ├── Composer.tsx  Input + mic button
│   │   └── Sidebar.tsx   Conversation list
│   └── lib/whisper.ts    Browser Whisper pipeline
└── shared/types.ts       IPC types + model registry + settings types

Under the Hood

Agent Loop: In Build mode, each assistant turn streams tokens from the selected local runtime. XML <action> blocks are parsed from the stream, executed, and then fed back into the next turn. Up to 40 rounds per user message.

Live Streaming: As the model generates file content, partial writes are flushed to disk every ~450ms. The preview iframe reloads in real time.

Tool Protocol: Small local models tend to follow XML actions more reliably than JSON function calling, so tools are invoked with an XML-based format:

<action name="write_file">
<path>index.html</path>
<content>
<!doctype html>
...
</content>
</action>

Credits

Created by @PrimeEcto and AI :)

License

MIT

About

Local-first Electron coding agent powered by Gemma 4, with MLX on Apple Silicon and Ollama on Windows/Linux for offline chat, code generation, and live preview.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages