feat(llm): multi-model routing with availability fallback by qiankunli · Pull Request #217 · alibaba/open-code-review

qiankunli · 2026-06-25T04:21:05Z

Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file.

config: new routing namespace — routing.models ([{provider, model}], priority order, reusing the existing providers map for credentials) and routing.policy (only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing under routing keeps it distinct from providers..models (a provider's model catalog) and gives future routing knobs a home.
LLMRouter implements LLMClient: tries members in order, advances on availability errors (429/5xx/network), short-circuits on client-side errors (400/413/422) and context cancellation. A per-run shared cooldown parks a throttled model so concurrent per-file subtasks skip it.
router members use a low SDK retry budget so a rate-limited model fails fast to the next instead of burning the full backoff (MaxRetries now configurable; default 5 preserved).
docs: README.md / README.zh-CN.md config reference + Multi-model fallback.

No routing.models keeps the current single-model behavior; --model pins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation.

CLAassistant · 2026-06-25T04:21:11Z

All committers have signed the CLA.

github-actions

🔍 OpenCodeReview found 4 issue(s) in this PR.

✅ 4 posted as inline comment(s)
📝 0 posted as summary

lizhengfeng101

Hey @qiankunli, thanks for this PR — the multi-model routing design is really well thought out. The cooldown mechanism, the single-member optimization, and the test coverage are all solid. I have a few pieces of feedback below, roughly ordered by severity.

🐛 Bug: `req.Model` leaks across router members

This one is a real issue that will break cross-provider fallover in practice.

In shared.go, the runtime pins Model: eps[0].Model, and this value flows into every ChatRequest.Model. Each client's CompletionsWithCtx prioritizes req.Model over its own cfg.Model:

model := req.Model
if model == "" {
    model = c.cfg.Model  // only used when req.Model is empty
}

So when a request falls over from e.g. claude-opus-4-6 (Anthropic) to DeepSeek, the ChatRequest.Model is still "claude-opus-4-6" — DeepSeek doesn't know that model and returns a 400/404. Then shouldFallover sees a client-side error and short-circuits, so the whole request fails immediately instead of trying the next member.

Suggested fix: have the router clear req.Model before forwarding to each member, so each client uses its own cfg.Model:

memberReq := req
memberReq.Model = ""  // let each client use its own configured model
resp, err := r.members[i].client.CompletionsWithCtx(ctx, memberReq)

⚠️ 401/403 silently falling over may mask config errors

In falloverStatus, the default branch treats 401/403 as fallover-worthy:

case 400, 413, 422:
    return false
default:
    return true // 401/403/404/408/409/429/5xx

The comment says "a different provider/key/capacity may differ", which is true in theory. But in practice, 401/403 almost always means the API key is misconfigured. Silently skipping to the next provider makes that really hard to diagnose — the user sees a successful review from their fallback model and never realizes their primary provider's key is broken.

Would it make sense to at least log a warning for 401/403 hinting at a possible key misconfiguration? Or optionally treat them as non-fallover errors?

📝 README localization incomplete

Per the project's CLAUDE.md: "When modifying README.md, always sync the changes to all localized versions."

The PR updates README.md and README.zh-CN.md, but README.ja-JP.md, README.ko-KR.md, and README.ru-RU.md are missing the new "Multi-model fallback" section and the two config table rows.

Minor notes

log.Printf usage — the log.Printf("[llm-router] ...") call in CompletionsWithCtx introduces a dependency on the standard log package. Worth checking if the project has a preferred logging approach to keep output consistent.
stripModelSuffix double call — resolveModelRef calls stripModelSuffix(ep.Model) at the end, but the caller chain (ResolveEndpointWithModelOverride) already strips the suffix. It's harmless (idempotent), just a bit redundant — feel free to leave it if you prefer the safety.
MaxRetries comment spans two lines — the struct field comment wraps to a second // line, which some Go tooling won't pick up cleanly. A single shorter line would be more conventional.

Overall this is a really nice feature addition. The main thing to address before merge is the req.Model bug — the rest are smaller suggestions. Looking forward to the next iteration! 🙌

… all locales - router: a 401/403 from a member still falls over (the next member has its own key) but now logs a louder 'likely misconfigured api_key' hint so a healthy fallback doesn't silently mask a broken primary key. Adds an isAuthError classifier with unit coverage. - docs: port the Multi-model fallback section and the two routing.* config rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions. Addresses reviewer follow-ups on PR alibaba#217.

qiankunli · 2026-06-26T04:31:43Z

Thanks for the thorough review, @lizhengfeng101 — really appreciate the detail. All points addressed in the latest commits; summary below.

🐛 req.Model leak across members (7d5b6a0) — Fixed. The router now clears req.Model once at its entry so each member falls back to its own cfg.Model. Forwarding the primary's name was exactly turning a cross-provider fallover into an unknown-model 400/404 that shouldFallover short-circuited. Added a regression test (TestLLMRouter_ClearsCallerModel) asserting members receive an empty model.

⚠️ 401/403 silently masking config errors (d01d631) — Went with the "log a warning" option rather than treating 401/403 as terminal. Each member has its own key, so a 401 on one member should still fall over to the next (keeps the pool resilient when only the primary key is broken). It now logs auth error (...) — likely a misconfigured api_key for this provider so a healthy fallback doesn't silently hide a broken primary key. Added an isAuthError classifier with unit coverage. Happy to make 401/403 terminal instead if you'd prefer fail-fast over resilience here.

📝 README localization (d01d631) — Ported the Multi-model fallback section and the two routing.* config rows to README.ja-JP.md / README.ko-KR.md / README.ru-RU.md, matching the en/zh versions.

Minor notes:

log.Printf — kept as-is: the codebase uses the standard log package throughout (no shared logger), so this stays consistent with the existing style.
stripModelSuffix double call — left it, and it turns out it isn't actually redundant on this path: resolveModelRef calls tryProviderConfig directly (which doesn't strip), not ResolveEndpointWithModelOverride. So resolver.go:222 is the only strip for routing-pool entries — removing it would drop suffix stripping for routing.models.
MaxRetries comment — it's now a single line and clarified to note the field is set programmatically by NewLLMRouter and not read from config, which covers both the line-wrap nit and the "is this intentionally programmatic-only?" question.

Thanks again! 🙌

Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file. - config: new `routing` namespace — `routing.models` ([{provider, model}], priority order, reusing the existing `providers` map for credentials) and `routing.policy` (only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing under `routing` keeps it distinct from providers.<name>.models (a provider's model catalog) and gives future routing knobs a home. - LLMRouter implements LLMClient: tries members in order, advances on availability errors (429/5xx/network), short-circuits on client-side errors (400/413/422) and context cancellation. A per-run shared cooldown parks a throttled model so concurrent per-file subtasks skip it. - router members use a low SDK retry budget so a rate-limited model fails fast to the next instead of burning the full backoff (MaxRetries now configurable; default 5 preserved). - docs: README.md / README.zh-CN.md config reference + Multi-model fallback. No `routing.models` keeps the current single-model behavior; `--model` pins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- resolveModelRef: clear sub.Model so a top-level `model` cannot leak into a routing entry that omits its own model (model now comes only from ref.Model or the provider default). - LLMRouter: when a call fails, stop and return ctx.Err() if the shared context is canceled or past its deadline — every member uses that ctx, so none can succeed; avoids wasted fallover attempts and misleading logs. A per-request timeout (ctx still live) still falls over. - order(): delete expired cooldown entries so the map stays bounded. - ResolvedEndpoint.MaxRetries: clarify it is internal/router-set, not read from config. Adds a router test for the context-done short-circuit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

A web edit (5bbe6e9) accidentally pasted the for/if/if header twice in LLMRouter.order(), leaving unbalanced braces that broke the build. Remove the duplicate; the intended if/else cooldown handling is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The caller pins ChatRequest.Model to the primary's model name and member clients prefer req.Model over their own cfg.Model, so the router forwarded the primary's name to every member. After a cross-provider fallover that name is unknown to the new provider, yielding a client-side 400/404 that shouldFallover short-circuits — failing the request instead of falling over. Clear req.Model once at the router entry so each member uses its configured model. Adds a regression test asserting members receive an empty model.

… all locales - router: a 401/403 from a member still falls over (the next member has its own key) but now logs a louder 'likely misconfigured api_key' hint so a healthy fallback doesn't silently mask a broken primary key. Adds an isAuthError classifier with unit coverage. - docs: port the Multi-model fallback section and the two routing.* config rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions. Addresses reviewer follow-ups on PR alibaba#217.

github-actions Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread internal/llm/client.go

Comment thread internal/llm/client.go

Comment thread internal/llm/resolver.go Outdated

Comment thread internal/llm/resolver.go

lizhengfeng101 reviewed Jun 26, 2026

View reviewed changes

liqiankun1111 and others added 6 commits June 27, 2026 14:38

Update internal/llm/client.go

ff62867

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

qiankunli force-pushed the feat/llm-multi-model branch from d01d631 to ba5468a Compare June 27, 2026 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llm): multi-model routing with availability fallback#217

feat(llm): multi-model routing with availability fallback#217
qiankunli wants to merge 6 commits into
alibaba:mainfrom
qiankunli:feat/llm-multi-model

qiankunli commented Jun 25, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lizhengfeng101 left a comment

Uh oh!

qiankunli commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

qiankunli commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lizhengfeng101 left a comment

Choose a reason for hiding this comment

🐛 Bug: req.Model leaks across router members

⚠️ 401/403 silently falling over may mask config errors

📝 README localization incomplete

Minor notes

Uh oh!

qiankunli commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qiankunli commented Jun 25, 2026 •

edited

Loading

CLAassistant commented Jun 25, 2026 •

edited

Loading

🐛 Bug: `req.Model` leaks across router members