Deferred from the quality-audit fix pass (#6). Token-cost metering was added in that PR; the fallback half remains.
Location: src/rag/generator.ts:85-139
Problem: the AI path has no model fallback. On any Bedrock failure (throttle/timeout/5xx) it returns a static "having trouble" message and counts LLMError — there is no secondary/cheaper/cached model attempt, so a transient primary-model issue degrades straight to a non-answer.
Proposed fix: make the model a small router — try the primary llmModelId, and on throttle/timeout fall back to a secondary (cheaper/faster) model id (new optional config). Keep the existing per-call timeout and the token metering on every attempt.
Why deferred: it's a feature addition (needs a second model-id config + a deliberate fallback policy), not a bug fix; out of scope for the audit-fix pass.
Deferred from the quality-audit fix pass (#6). Token-cost metering was added in that PR; the fallback half remains.
Location:
src/rag/generator.ts:85-139Problem: the AI path has no model fallback. On any Bedrock failure (throttle/timeout/5xx) it returns a static "having trouble" message and counts
LLMError— there is no secondary/cheaper/cached model attempt, so a transient primary-model issue degrades straight to a non-answer.Proposed fix: make the model a small router — try the primary
llmModelId, and on throttle/timeout fall back to a secondary (cheaper/faster) model id (new optional config). Keep the existing per-call timeout and the token metering on every attempt.Why deferred: it's a feature addition (needs a second model-id config + a deliberate fallback policy), not a bug fix; out of scope for the audit-fix pass.