feat: factory-level streaming with fallback chain support

## Summary

\`LLMProviderFactory.generateResponse()\` is non-streaming only. Per-provider \`streamResponse()\` exists, but it bypasses the factory's fallback rules, circuit breakers, and exhaustion registry. Consumers that need streaming **with failover** (e.g., AEGIS daemon's \`/chat\` endpoint) have to either:

1. Lose failover semantics when streaming (unacceptable — streaming is the primary user-facing surface), or
2. Reimplement the orchestration on the consumer side — which is exactly the pattern we're trying to eliminate by standardizing on llm-providers.

This is a direct blocker for the AEGIS daemon Phase D migration (full adoption of llm-providers, removing all bolted-in LLM logic).

## Proposal

Add factory-level streaming that integrates with the existing fallback/circuit/exhaustion infrastructure:

\`\`\`ts
class LLMProviderFactory {
  async generateResponseStream(
    request: LLMRequest
  ): Promise<ReadableStream<LLMStreamChunk>>;
  // or
  generateResponseStream(request: LLMRequest): AsyncIterable<LLMStreamChunk>;
}
\`\`\`

Behavior:

1. **Provider selection** — identical to \`generateResponse()\`: explicit model, default provider, cost optimization, exhaustion filter.
2. **Pre-stream fallback** — on stream-start errors (401, 429, 503, circuit open, exhausted), fall over to the next provider in the fallback chain **before emitting the first chunk**. This is the primary failover path and must work transparently.
3. **Mid-stream policy** — explicitly *documented* policy (recommend: do not retry mid-stream; surface error + partial content and let consumer decide). Mid-stream retries risk emitting duplicate tokens and are rarely the right call.
4. **Circuit breaker integration** — stream-start errors count as failures for circuit state transitions; successful stream completion counts as success.
5. **Observability** — same \`onRequestStart\` / \`onRequestEnd\` / \`onFallback\` hooks fire for streaming calls. \`onRequestEnd\` fires at stream close.
6. **Tool calls in streams** — out of scope for this issue (tools + streaming is a separate concern). Flag as not-yet-supported in the API docs for this first cut.

## Why this unblocks Phase D

AEGIS daemon's current streaming path ([dispatch.ts:514-545](https://github.com/Stackbilt-dev/aegis/blob/main/web/src/kernel/dispatch.ts)) manually orchestrates:
- Stream claude via raw Anthropic SDK
- On failure, import and call \`executeCerebrasReasoning\` or \`executeCerebrasMid\`
- Emit fallback result as a single delta

That is custom failover logic — the exact thing llm-providers is supposed to own. Without factory-level streaming, Phase D has two bad options:
1. Keep the custom orchestration (violates the no-bolted-in-LLM-logic rule we just committed to)
2. Lose failover on streaming calls (unacceptable UX)

## Priority

**HIGH** — direct blocker for AEGIS daemon Phase D migration. No workaround that preserves the architectural goal.

## Related

- AEGIS daemon memory: \`project_phase_d_llm_providers.md\` — full Phase D scoping
- Policy: \`feedback_no_bolted_llm_logic.md\` — the rule that makes this a blocker

🤖 Filed by AEGIS during Phase D scoping session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: factory-level streaming with fallback chain support #26

Summary

Proposal

Why this unblocks Phase D

Priority

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: factory-level streaming with fallback chain support #26

Description

Summary

Proposal

Why this unblocks Phase D

Priority

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions