Skip to content

Add Local AI Providers using WebLLM and Browser APIs #20

@acbcdev

Description

@acbcdev

Description

Add support for local AI providers that run entirely in the browser using WebLLM and new web APIs for local LLMs. This enables offline functionality without requiring external server setup.

Primary Solutions

1. WebLLM (Recommended)

  • Run LLMs directly in the browser using WebGPU
  • No server required - fully client-side
  • Supports popular models (Llama, Mistral, Phi, etc.)
  • Leverages WebGPU for hardware acceleration
  • Models cached locally for offline use

2. Chrome Built-in AI APIs (Experimental)

  • Prompt API: Chrome's native AI prompt interface
  • Summarization API: Built-in summarization
  • Translation API: Local translation
  • Currently in origin trial, becoming available in Chrome

3. Transformers.js

  • Run Hugging Face transformers in the browser
  • ONNX Runtime for inference
  • Wide model support
  • WebGPU and WASM backends

4. MediaPipe LLM Inference

  • Google's on-device LLM solution
  • Optimized for edge devices
  • Cross-platform support

Implementation Plan

Phase 1: WebLLM Integration

  1. Add WebLLM as dependency
  2. Create WebLLM provider in packages/app/src/features/ai/lib/
  3. Implement model download/caching UI
  4. Add WebGPU capability detection
  5. Update AI config store to support local models

Phase 2: Chrome Built-in AI

  1. Detect Chrome AI API availability
  2. Create provider wrapper for Prompt API
  3. Graceful fallback if not available
  4. Document feature flags needed

Phase 3: UI/UX

  1. Model management interface (download, delete, update)
  2. Show storage usage for cached models
  3. Progress indicators for model loading
  4. Performance metrics (tokens/sec, memory usage)

Benefits

Fully offline - No internet required after initial model download
Privacy - All inference happens locally, no data leaves device
No API costs - Free to use
No rate limits - Unlimited usage
Fast responses - No network latency with WebGPU acceleration

Technical Requirements

  • WebGPU support (Chrome 113+, Edge 113+)
  • Sufficient RAM for model loading (4GB+ recommended)
  • Storage space for model caching (1-8GB per model)

Example Models

  • Llama-3.2-1B (~1GB) - Fast, good for code
  • Phi-3-mini (~2GB) - Balanced performance
  • Gemma-2B (~2GB) - Google's efficient model
  • TinyLlama (~600MB) - Smallest option

Related

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions