LLM Showcase — Local-First WebGPU Chat

A browser-native Qwen showcase with WebGPU inference, persistent local history, adjustable generation controls, and an editorial production-ready interface.

Live Demo

Origin Story

This project started with a Reddit comment. Someone mentioned how cool it would be to run a tiny LLM directly in the browser. At the time, only very small models were feasible. But I saw that Qwen 3.5 had just been released with ONNX exports — and realized we could push the limits. What if we could run a 4B parameter model entirely client-side with WebGPU?

This showcase is the result: an experiment in how far browser-native AI can go.

⚠️ Honest disclaimer: I have no idea if this will work on your machine. Honestly, I don't really care. But on my M1 Max everything flies. Your mileage may vary.

Features

🖥️ Editorial showcase layout with browser-native chat workspace
💬 Multi-chat sidebar with create, select, rename, and delete capabilities
⚡ Streaming responses with interrupt support
💾 Persistent local history via IndexedDB
🎛️ Adjustable inference settings (temperature, top-p, top-k, repetition penalty, max tokens)
📊 Context window gauge with approximate prompt-budget tracking
🧠 Thinking mode support for Qwen 2B and 4B models
🔒 Privacy-first — all data stays local, no server calls

Quick Start

# Clone the repository
git clone https://github.com/oglenyaboss/llmshowcase.git
cd llmshowcase

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:3000 in Chrome 113+ or Edge 113+.

Models

Model	Size	Tier	Thinking	Recommended For
Qwen 3.5 0.8B	~500MB	Stable	❌	Quick demos, weaker hardware
Qwen 3.5 2B	~1.5GB	Stable	✅	Better quality, mid-range GPUs
Qwen 3.5 4B	~2.5GB	Experimental	✅	High quality, dedicated GPUs

Browser Support

Browser	Version	Status
Chrome	113+	✅ Recommended
Edge	113+	✅ Supported
Safari	TP+	⚠️ Experimental (enable WebGPU)
Firefox	—	❌ Not supported (no WebGPU)

Privacy

All data stays local. Your chats never leave your browser:

✅ No server calls for inference
✅ Chat history stored in IndexedDB locally
✅ Model weights downloaded once and cached
✅ Full privacy—no data leaves your device

Documentation

Architecture Overview — System design and data flow
Contributing Guide — How to contribute
Changelog — Version history

Scripts

Command	Description
`npm run dev`	Start development server
`npm run build`	Production build
`npm run start`	Start production server
`npm run lint`	Run ESLint
`npm run test`	Run unit tests (Vitest)
`npm run test:e2e`	Run E2E tests (Playwright)

Deploy Your Own

Limitations

One model loaded at a time
WebGPU required—no CPU/WASM fallback
Actual VRAM cannot be queried reliably from browsers
Model weights downloaded on first load (may take time)
4B model experimental—may fail on integrated GPUs

Screenshots

Tech Stack

Framework: Next.js 16, React 19
Inference: Transformers.js, ONNX Runtime Web
Styling: Tailwind CSS
Testing: Vitest, Playwright
Persistence: IndexedDB

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
.sisyphus		.sisyphus
docs		docs
playwright-report		playwright-report
public		public
src		src
test-results		test-results
tests/e2e		tests/e2e
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Showcase — Local-First WebGPU Chat

Origin Story

Features

Quick Start

Models

Browser Support

Privacy

Documentation

Scripts

Deploy Your Own

Limitations

Screenshots

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Showcase — Local-First WebGPU Chat

Origin Story

Features

Quick Start

Models

Browser Support

Privacy

Documentation

Scripts

Deploy Your Own

Limitations

Screenshots

Tech Stack

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages