-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Add support for local AI providers that run entirely in the browser using WebLLM and new web APIs for local LLMs. This enables offline functionality without requiring external server setup.
Primary Solutions
1. WebLLM (Recommended)
- Run LLMs directly in the browser using WebGPU
- No server required - fully client-side
- Supports popular models (Llama, Mistral, Phi, etc.)
- Leverages WebGPU for hardware acceleration
- Models cached locally for offline use
2. Chrome Built-in AI APIs (Experimental)
- Prompt API: Chrome's native AI prompt interface
- Summarization API: Built-in summarization
- Translation API: Local translation
- Currently in origin trial, becoming available in Chrome
3. Transformers.js
- Run Hugging Face transformers in the browser
- ONNX Runtime for inference
- Wide model support
- WebGPU and WASM backends
4. MediaPipe LLM Inference
- Google's on-device LLM solution
- Optimized for edge devices
- Cross-platform support
Implementation Plan
Phase 1: WebLLM Integration
- Add WebLLM as dependency
- Create WebLLM provider in
packages/app/src/features/ai/lib/ - Implement model download/caching UI
- Add WebGPU capability detection
- Update AI config store to support local models
Phase 2: Chrome Built-in AI
- Detect Chrome AI API availability
- Create provider wrapper for Prompt API
- Graceful fallback if not available
- Document feature flags needed
Phase 3: UI/UX
- Model management interface (download, delete, update)
- Show storage usage for cached models
- Progress indicators for model loading
- Performance metrics (tokens/sec, memory usage)
Benefits
✅ Fully offline - No internet required after initial model download
✅ Privacy - All inference happens locally, no data leaves device
✅ No API costs - Free to use
✅ No rate limits - Unlimited usage
✅ Fast responses - No network latency with WebGPU acceleration
Technical Requirements
- WebGPU support (Chrome 113+, Edge 113+)
- Sufficient RAM for model loading (4GB+ recommended)
- Storage space for model caching (1-8GB per model)
Example Models
- Llama-3.2-1B (~1GB) - Fast, good for code
- Phi-3-mini (~2GB) - Balanced performance
- Gemma-2B (~2GB) - Google's efficient model
- TinyLlama (~600MB) - Smallest option
Related
- Related to Enhanced AI Configuration #12 (Enhanced AI Configuration)
- Complements existing cloud provider support
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request