feat(llm): multi-model routing with availability fallback#217
feat(llm): multi-model routing with availability fallback#217qiankunli wants to merge 6 commits into
Conversation
lizhengfeng101
left a comment
There was a problem hiding this comment.
Hey @qiankunli, thanks for this PR — the multi-model routing design is really well thought out. The cooldown mechanism, the single-member optimization, and the test coverage are all solid. I have a few pieces of feedback below, roughly ordered by severity.
🐛 Bug: req.Model leaks across router members
This one is a real issue that will break cross-provider fallover in practice.
In shared.go, the runtime pins Model: eps[0].Model, and this value flows into every ChatRequest.Model. Each client's CompletionsWithCtx prioritizes req.Model over its own cfg.Model:
model := req.Model
if model == "" {
model = c.cfg.Model // only used when req.Model is empty
}So when a request falls over from e.g. claude-opus-4-6 (Anthropic) to DeepSeek, the ChatRequest.Model is still "claude-opus-4-6" — DeepSeek doesn't know that model and returns a 400/404. Then shouldFallover sees a client-side error and short-circuits, so the whole request fails immediately instead of trying the next member.
Suggested fix: have the router clear req.Model before forwarding to each member, so each client uses its own cfg.Model:
memberReq := req
memberReq.Model = "" // let each client use its own configured model
resp, err := r.members[i].client.CompletionsWithCtx(ctx, memberReq)⚠️ 401/403 silently falling over may mask config errors
In falloverStatus, the default branch treats 401/403 as fallover-worthy:
case 400, 413, 422:
return false
default:
return true // 401/403/404/408/409/429/5xxThe comment says "a different provider/key/capacity may differ", which is true in theory. But in practice, 401/403 almost always means the API key is misconfigured. Silently skipping to the next provider makes that really hard to diagnose — the user sees a successful review from their fallback model and never realizes their primary provider's key is broken.
Would it make sense to at least log a warning for 401/403 hinting at a possible key misconfiguration? Or optionally treat them as non-fallover errors?
📝 README localization incomplete
Per the project's CLAUDE.md: "When modifying README.md, always sync the changes to all localized versions."
The PR updates README.md and README.zh-CN.md, but README.ja-JP.md, README.ko-KR.md, and README.ru-RU.md are missing the new "Multi-model fallback" section and the two config table rows.
Minor notes
-
log.Printfusage — thelog.Printf("[llm-router] ...")call inCompletionsWithCtxintroduces a dependency on the standardlogpackage. Worth checking if the project has a preferred logging approach to keep output consistent. -
stripModelSuffixdouble call —resolveModelRefcallsstripModelSuffix(ep.Model)at the end, but the caller chain (ResolveEndpointWithModelOverride) already strips the suffix. It's harmless (idempotent), just a bit redundant — feel free to leave it if you prefer the safety. -
MaxRetriescomment spans two lines — the struct field comment wraps to a second//line, which some Go tooling won't pick up cleanly. A single shorter line would be more conventional.
Overall this is a really nice feature addition. The main thing to address before merge is the req.Model bug — the rest are smaller suggestions. Looking forward to the next iteration! 🙌
… all locales - router: a 401/403 from a member still falls over (the next member has its own key) but now logs a louder 'likely misconfigured api_key' hint so a healthy fallback doesn't silently mask a broken primary key. Adds an isAuthError classifier with unit coverage. - docs: port the Multi-model fallback section and the two routing.* config rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions. Addresses reviewer follow-ups on PR alibaba#217.
|
Thanks for the thorough review, @lizhengfeng101 — really appreciate the detail. All points addressed in the latest commits; summary below. 🐛
📝 README localization ( Minor notes:
Thanks again! 🙌 |
Add an ordered model pool so a review falls over to another provider/model
when the primary is rate-limited, down, or timing out — instead of failing
the file.
- config: new `routing` namespace — `routing.models` ([{provider, model}],
priority order, reusing the existing `providers` map for credentials) and
`routing.policy` (only "priority" today; reserved for future policies, an
unknown value is rejected rather than silently ignored). Namespacing under
`routing` keeps it distinct from providers.<name>.models (a provider's model
catalog) and gives future routing knobs a home.
- LLMRouter implements LLMClient: tries members in order, advances on
availability errors (429/5xx/network), short-circuits on client-side errors
(400/413/422) and context cancellation. A per-run shared cooldown parks a
throttled model so concurrent per-file subtasks skip it.
- router members use a low SDK retry budget so a rate-limited model fails fast
to the next instead of burning the full backoff (MaxRetries now configurable;
default 5 preserved).
- docs: README.md / README.zh-CN.md config reference + Multi-model fallback.
No `routing.models` keeps the current single-model behavior; `--model` pins a
single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown,
error classification, config chain resolution, and policy validation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- resolveModelRef: clear sub.Model so a top-level `model` cannot leak into a routing entry that omits its own model (model now comes only from ref.Model or the provider default). - LLMRouter: when a call fails, stop and return ctx.Err() if the shared context is canceled or past its deadline — every member uses that ctx, so none can succeed; avoids wasted fallover attempts and misleading logs. A per-request timeout (ctx still live) still falls over. - order(): delete expired cooldown entries so the map stays bounded. - ResolvedEndpoint.MaxRetries: clarify it is internal/router-set, not read from config. Adds a router test for the context-done short-circuit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
A web edit (5bbe6e9) accidentally pasted the for/if/if header twice in LLMRouter.order(), leaving unbalanced braces that broke the build. Remove the duplicate; the intended if/else cooldown handling is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The caller pins ChatRequest.Model to the primary's model name and member clients prefer req.Model over their own cfg.Model, so the router forwarded the primary's name to every member. After a cross-provider fallover that name is unknown to the new provider, yielding a client-side 400/404 that shouldFallover short-circuits — failing the request instead of falling over. Clear req.Model once at the router entry so each member uses its configured model. Adds a regression test asserting members receive an empty model.
… all locales - router: a 401/403 from a member still falls over (the next member has its own key) but now logs a louder 'likely misconfigured api_key' hint so a healthy fallback doesn't silently mask a broken primary key. Adds an isAuthError classifier with unit coverage. - docs: port the Multi-model fallback section and the two routing.* config rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions. Addresses reviewer follow-ups on PR alibaba#217.
d01d631 to
ba5468a
Compare
Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file.
routingnamespace —routing.models([{provider, model}], priority order, reusing the existingprovidersmap for credentials) androuting.policy(only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing underroutingkeeps it distinct from providers..models (a provider's model catalog) and gives future routing knobs a home.No
routing.modelskeeps the current single-model behavior;--modelpins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation.