Summary
`LLMProviderFactory.generateResponse()` is non-streaming only. Per-provider `streamResponse()` exists, but it bypasses the factory's fallback rules, circuit breakers, and exhaustion registry. Consumers that need streaming with failover (e.g., AEGIS daemon's `/chat` endpoint) have to either:
- Lose failover semantics when streaming (unacceptable — streaming is the primary user-facing surface), or
- Reimplement the orchestration on the consumer side — which is exactly the pattern we're trying to eliminate by standardizing on llm-providers.
This is a direct blocker for the AEGIS daemon Phase D migration (full adoption of llm-providers, removing all bolted-in LLM logic).
Proposal
Add factory-level streaming that integrates with the existing fallback/circuit/exhaustion infrastructure:
```ts
class LLMProviderFactory {
async generateResponseStream(
request: LLMRequest
): Promise<ReadableStream>;
// or
generateResponseStream(request: LLMRequest): AsyncIterable;
}
```
Behavior:
- Provider selection — identical to `generateResponse()`: explicit model, default provider, cost optimization, exhaustion filter.
- Pre-stream fallback — on stream-start errors (401, 429, 503, circuit open, exhausted), fall over to the next provider in the fallback chain before emitting the first chunk. This is the primary failover path and must work transparently.
- Mid-stream policy — explicitly documented policy (recommend: do not retry mid-stream; surface error + partial content and let consumer decide). Mid-stream retries risk emitting duplicate tokens and are rarely the right call.
- Circuit breaker integration — stream-start errors count as failures for circuit state transitions; successful stream completion counts as success.
- Observability — same `onRequestStart` / `onRequestEnd` / `onFallback` hooks fire for streaming calls. `onRequestEnd` fires at stream close.
- Tool calls in streams — out of scope for this issue (tools + streaming is a separate concern). Flag as not-yet-supported in the API docs for this first cut.
Why this unblocks Phase D
AEGIS daemon's current streaming path (dispatch.ts:514-545) manually orchestrates:
- Stream claude via raw Anthropic SDK
- On failure, import and call `executeCerebrasReasoning` or `executeCerebrasMid`
- Emit fallback result as a single delta
That is custom failover logic — the exact thing llm-providers is supposed to own. Without factory-level streaming, Phase D has two bad options:
- Keep the custom orchestration (violates the no-bolted-in-LLM-logic rule we just committed to)
- Lose failover on streaming calls (unacceptable UX)
Priority
HIGH — direct blocker for AEGIS daemon Phase D migration. No workaround that preserves the architectural goal.
Related
- AEGIS daemon memory: `project_phase_d_llm_providers.md` — full Phase D scoping
- Policy: `feedback_no_bolted_llm_logic.md` — the rule that makes this a blocker
🤖 Filed by AEGIS during Phase D scoping session
Summary
`LLMProviderFactory.generateResponse()` is non-streaming only. Per-provider `streamResponse()` exists, but it bypasses the factory's fallback rules, circuit breakers, and exhaustion registry. Consumers that need streaming with failover (e.g., AEGIS daemon's `/chat` endpoint) have to either:
This is a direct blocker for the AEGIS daemon Phase D migration (full adoption of llm-providers, removing all bolted-in LLM logic).
Proposal
Add factory-level streaming that integrates with the existing fallback/circuit/exhaustion infrastructure:
```ts
class LLMProviderFactory {
async generateResponseStream(
request: LLMRequest
): Promise<ReadableStream>;
// or
generateResponseStream(request: LLMRequest): AsyncIterable;
}
```
Behavior:
Why this unblocks Phase D
AEGIS daemon's current streaming path (dispatch.ts:514-545) manually orchestrates:
That is custom failover logic — the exact thing llm-providers is supposed to own. Without factory-level streaming, Phase D has two bad options:
Priority
HIGH — direct blocker for AEGIS daemon Phase D migration. No workaround that preserves the architectural goal.
Related
🤖 Filed by AEGIS during Phase D scoping session