Skip to content

feat: factory-level streaming with fallback chain support #26

@stackbilt-admin

Description

@stackbilt-admin

Summary

`LLMProviderFactory.generateResponse()` is non-streaming only. Per-provider `streamResponse()` exists, but it bypasses the factory's fallback rules, circuit breakers, and exhaustion registry. Consumers that need streaming with failover (e.g., AEGIS daemon's `/chat` endpoint) have to either:

  1. Lose failover semantics when streaming (unacceptable — streaming is the primary user-facing surface), or
  2. Reimplement the orchestration on the consumer side — which is exactly the pattern we're trying to eliminate by standardizing on llm-providers.

This is a direct blocker for the AEGIS daemon Phase D migration (full adoption of llm-providers, removing all bolted-in LLM logic).

Proposal

Add factory-level streaming that integrates with the existing fallback/circuit/exhaustion infrastructure:

```ts
class LLMProviderFactory {
async generateResponseStream(
request: LLMRequest
): Promise<ReadableStream>;
// or
generateResponseStream(request: LLMRequest): AsyncIterable;
}
```

Behavior:

  1. Provider selection — identical to `generateResponse()`: explicit model, default provider, cost optimization, exhaustion filter.
  2. Pre-stream fallback — on stream-start errors (401, 429, 503, circuit open, exhausted), fall over to the next provider in the fallback chain before emitting the first chunk. This is the primary failover path and must work transparently.
  3. Mid-stream policy — explicitly documented policy (recommend: do not retry mid-stream; surface error + partial content and let consumer decide). Mid-stream retries risk emitting duplicate tokens and are rarely the right call.
  4. Circuit breaker integration — stream-start errors count as failures for circuit state transitions; successful stream completion counts as success.
  5. Observability — same `onRequestStart` / `onRequestEnd` / `onFallback` hooks fire for streaming calls. `onRequestEnd` fires at stream close.
  6. Tool calls in streams — out of scope for this issue (tools + streaming is a separate concern). Flag as not-yet-supported in the API docs for this first cut.

Why this unblocks Phase D

AEGIS daemon's current streaming path (dispatch.ts:514-545) manually orchestrates:

  • Stream claude via raw Anthropic SDK
  • On failure, import and call `executeCerebrasReasoning` or `executeCerebrasMid`
  • Emit fallback result as a single delta

That is custom failover logic — the exact thing llm-providers is supposed to own. Without factory-level streaming, Phase D has two bad options:

  1. Keep the custom orchestration (violates the no-bolted-in-LLM-logic rule we just committed to)
  2. Lose failover on streaming calls (unacceptable UX)

Priority

HIGH — direct blocker for AEGIS daemon Phase D migration. No workaround that preserves the architectural goal.

Related

  • AEGIS daemon memory: `project_phase_d_llm_providers.md` — full Phase D scoping
  • Policy: `feedback_no_bolted_llm_logic.md` — the rule that makes this a blocker

🤖 Filed by AEGIS during Phase D scoping session

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions