Progressive model loading with early inference via Flare

## Summary

Expose Flare's unique progressive inference capability through BrowserAI — start generating text while the model is still downloading.

## How it works

Flare can run inference with a partial model:
- `FlareEngine.forward_partial(token, pos, num_layers)` — runs with N layers
- `FlareEngine.available_layers()` — how many layers are loaded
- `FlareEngine.inference_quality()` — 0.0 to 1.0 quality score

## User experience

```typescript
const ai = new BrowserAI({ engine: 'flare', progressive: true });

// Start loading — returns immediately, downloads in background
ai.loadModel('llama-3.2-1b-flare', {
    onProgress: (loaded, total) => updateProgressBar(loaded, total),
    onLayersReady: (available, total) => {
        qualityMeter.value = available / total;
    }
});

// User can start chatting before download completes
// Flare uses available layers, quality improves as more arrive
const response = await ai.generateText('Hello!');
// Response generated with partial model — rough but usable

// Later, model fully loaded — full quality
const response2 = await ai.generateText('Explain quantum computing');
// Full quality response
```

## UX elements
- Quality meter showing inference_quality (0-100%)
- "Generating with X/Y layers" indicator
- Smooth quality upgrade — no jarring transition
- Show estimated improvement: "4 more layers loading..."

## Why this is unique
No other browser LLM engine supports this. WebLLM and Transformers.js require the full model before any inference.

## Depends on
- #295 (FlareEngine adapter)

## Related
- Parent: #293

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive model loading with early inference via Flare #300

Summary

How it works

User experience

UX elements

Why this is unique

Depends on

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Progressive model loading with early inference via Flare #300

Description

Summary

How it works

User experience

UX elements

Why this is unique

Depends on

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions