Add Local AI Providers using WebLLM and Browser APIs

## Description

Add support for local AI providers that run entirely in the browser using WebLLM and new web APIs for local LLMs. This enables offline functionality without requiring external server setup.

## Primary Solutions

### 1. **WebLLM** (Recommended)
- Run LLMs directly in the browser using WebGPU
- No server required - fully client-side
- Supports popular models (Llama, Mistral, Phi, etc.)
- Leverages WebGPU for hardware acceleration
- Models cached locally for offline use

### 2. **Chrome Built-in AI APIs** (Experimental)
- **Prompt API**: Chrome's native AI prompt interface
- **Summarization API**: Built-in summarization
- **Translation API**: Local translation
- Currently in origin trial, becoming available in Chrome

### 3. **Transformers.js**
- Run Hugging Face transformers in the browser
- ONNX Runtime for inference
- Wide model support
- WebGPU and WASM backends

### 4. **MediaPipe LLM Inference**
- Google's on-device LLM solution
- Optimized for edge devices
- Cross-platform support

## Implementation Plan

### Phase 1: WebLLM Integration
1. Add WebLLM as dependency
2. Create WebLLM provider in `packages/app/src/features/ai/lib/`
3. Implement model download/caching UI
4. Add WebGPU capability detection
5. Update AI config store to support local models

### Phase 2: Chrome Built-in AI
1. Detect Chrome AI API availability
2. Create provider wrapper for Prompt API
3. Graceful fallback if not available
4. Document feature flags needed

### Phase 3: UI/UX
1. Model management interface (download, delete, update)
2. Show storage usage for cached models
3. Progress indicators for model loading
4. Performance metrics (tokens/sec, memory usage)

## Benefits

✅ **Fully offline** - No internet required after initial model download  
✅ **Privacy** - All inference happens locally, no data leaves device  
✅ **No API costs** - Free to use  
✅ **No rate limits** - Unlimited usage  
✅ **Fast responses** - No network latency with WebGPU acceleration  

## Technical Requirements

- WebGPU support (Chrome 113+, Edge 113+)
- Sufficient RAM for model loading (4GB+ recommended)
- Storage space for model caching (1-8GB per model)

## Example Models

- **Llama-3.2-1B** (~1GB) - Fast, good for code
- **Phi-3-mini** (~2GB) - Balanced performance
- **Gemma-2B** (~2GB) - Google's efficient model
- **TinyLlama** (~600MB) - Smallest option

## Related

- Related to #12 (Enhanced AI Configuration)
- Complements existing cloud provider support

## References

- [WebLLM](https://github.com/mlc-ai/web-llm)
- [Chrome Built-in AI](https://developer.chrome.com/docs/ai/built-in)
- [Transformers.js](https://huggingface.co/docs/transformers.js)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Local AI Providers using WebLLM and Browser APIs #20

Description

Primary Solutions

1. WebLLM (Recommended)

2. Chrome Built-in AI APIs (Experimental)

3. Transformers.js

4. MediaPipe LLM Inference

Implementation Plan

Phase 1: WebLLM Integration

Phase 2: Chrome Built-in AI

Phase 3: UI/UX

Benefits

Technical Requirements

Example Models

Related

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Local AI Providers using WebLLM and Browser APIs #20

Description

Description

Primary Solutions

1. WebLLM (Recommended)

2. Chrome Built-in AI APIs (Experimental)

3. Transformers.js

4. MediaPipe LLM Inference

Implementation Plan

Phase 1: WebLLM Integration

Phase 2: Chrome Built-in AI

Phase 3: UI/UX

Benefits

Technical Requirements

Example Models

Related

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions