Skip to content

feat(llm): multi-model routing with availability fallback#217

Open
qiankunli wants to merge 6 commits into
alibaba:mainfrom
qiankunli:feat/llm-multi-model
Open

feat(llm): multi-model routing with availability fallback#217
qiankunli wants to merge 6 commits into
alibaba:mainfrom
qiankunli:feat/llm-multi-model

Conversation

@qiankunli

@qiankunli qiankunli commented Jun 25, 2026

Copy link
Copy Markdown

Add an ordered model pool so a review falls over to another provider/model when the primary is rate-limited, down, or timing out — instead of failing the file.

  • config: new routing namespace — routing.models ([{provider, model}], priority order, reusing the existing providers map for credentials) and routing.policy (only "priority" today; reserved for future policies, an unknown value is rejected rather than silently ignored). Namespacing under routing keeps it distinct from providers..models (a provider's model catalog) and gives future routing knobs a home.
  • LLMRouter implements LLMClient: tries members in order, advances on availability errors (429/5xx/network), short-circuits on client-side errors (400/413/422) and context cancellation. A per-run shared cooldown parks a throttled model so concurrent per-file subtasks skip it.
  • router members use a low SDK retry budget so a rate-limited model fails fast to the next instead of burning the full backoff (MaxRetries now configurable; default 5 preserved).
  • docs: README.md / README.zh-CN.md config reference + Multi-model fallback.

No routing.models keeps the current single-model behavior; --model pins a single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown, error classification, config chain resolution, and policy validation.

@CLAassistant

CLAassistant commented Jun 25, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 4 issue(s) in this PR.

  • ✅ 4 posted as inline comment(s)
  • 📝 0 posted as summary

Comment thread internal/llm/client.go
Comment thread internal/llm/client.go
Comment thread internal/llm/resolver.go Outdated
Comment thread internal/llm/resolver.go

@lizhengfeng101 lizhengfeng101 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qiankunli, thanks for this PR — the multi-model routing design is really well thought out. The cooldown mechanism, the single-member optimization, and the test coverage are all solid. I have a few pieces of feedback below, roughly ordered by severity.


🐛 Bug: req.Model leaks across router members

This one is a real issue that will break cross-provider fallover in practice.

In shared.go, the runtime pins Model: eps[0].Model, and this value flows into every ChatRequest.Model. Each client's CompletionsWithCtx prioritizes req.Model over its own cfg.Model:

model := req.Model
if model == "" {
    model = c.cfg.Model  // only used when req.Model is empty
}

So when a request falls over from e.g. claude-opus-4-6 (Anthropic) to DeepSeek, the ChatRequest.Model is still "claude-opus-4-6" — DeepSeek doesn't know that model and returns a 400/404. Then shouldFallover sees a client-side error and short-circuits, so the whole request fails immediately instead of trying the next member.

Suggested fix: have the router clear req.Model before forwarding to each member, so each client uses its own cfg.Model:

memberReq := req
memberReq.Model = ""  // let each client use its own configured model
resp, err := r.members[i].client.CompletionsWithCtx(ctx, memberReq)

⚠️ 401/403 silently falling over may mask config errors

In falloverStatus, the default branch treats 401/403 as fallover-worthy:

case 400, 413, 422:
    return false
default:
    return true // 401/403/404/408/409/429/5xx

The comment says "a different provider/key/capacity may differ", which is true in theory. But in practice, 401/403 almost always means the API key is misconfigured. Silently skipping to the next provider makes that really hard to diagnose — the user sees a successful review from their fallback model and never realizes their primary provider's key is broken.

Would it make sense to at least log a warning for 401/403 hinting at a possible key misconfiguration? Or optionally treat them as non-fallover errors?


📝 README localization incomplete

Per the project's CLAUDE.md: "When modifying README.md, always sync the changes to all localized versions."

The PR updates README.md and README.zh-CN.md, but README.ja-JP.md, README.ko-KR.md, and README.ru-RU.md are missing the new "Multi-model fallback" section and the two config table rows.


Minor notes

  • log.Printf usage — the log.Printf("[llm-router] ...") call in CompletionsWithCtx introduces a dependency on the standard log package. Worth checking if the project has a preferred logging approach to keep output consistent.

  • stripModelSuffix double callresolveModelRef calls stripModelSuffix(ep.Model) at the end, but the caller chain (ResolveEndpointWithModelOverride) already strips the suffix. It's harmless (idempotent), just a bit redundant — feel free to leave it if you prefer the safety.

  • MaxRetries comment spans two lines — the struct field comment wraps to a second // line, which some Go tooling won't pick up cleanly. A single shorter line would be more conventional.


Overall this is a really nice feature addition. The main thing to address before merge is the req.Model bug — the rest are smaller suggestions. Looking forward to the next iteration! 🙌

qiankunli added a commit to qiankunli/open-codereview that referenced this pull request Jun 26, 2026
… all locales

- router: a 401/403 from a member still falls over (the next member has its
  own key) but now logs a louder 'likely misconfigured api_key' hint so a
  healthy fallback doesn't silently mask a broken primary key. Adds an
  isAuthError classifier with unit coverage.
- docs: port the Multi-model fallback section and the two routing.* config
  rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions.

Addresses reviewer follow-ups on PR alibaba#217.
@qiankunli

Copy link
Copy Markdown
Author

Thanks for the thorough review, @lizhengfeng101 — really appreciate the detail. All points addressed in the latest commits; summary below.

🐛 req.Model leak across members (7d5b6a0) — Fixed. The router now clears req.Model once at its entry so each member falls back to its own cfg.Model. Forwarding the primary's name was exactly turning a cross-provider fallover into an unknown-model 400/404 that shouldFallover short-circuited. Added a regression test (TestLLMRouter_ClearsCallerModel) asserting members receive an empty model.

⚠️ 401/403 silently masking config errors (d01d631) — Went with the "log a warning" option rather than treating 401/403 as terminal. Each member has its own key, so a 401 on one member should still fall over to the next (keeps the pool resilient when only the primary key is broken). It now logs auth error (...) — likely a misconfigured api_key for this provider so a healthy fallback doesn't silently hide a broken primary key. Added an isAuthError classifier with unit coverage. Happy to make 401/403 terminal instead if you'd prefer fail-fast over resilience here.

📝 README localization (d01d631) — Ported the Multi-model fallback section and the two routing.* config rows to README.ja-JP.md / README.ko-KR.md / README.ru-RU.md, matching the en/zh versions.

Minor notes:

  • log.Printf — kept as-is: the codebase uses the standard log package throughout (no shared logger), so this stays consistent with the existing style.
  • stripModelSuffix double call — left it, and it turns out it isn't actually redundant on this path: resolveModelRef calls tryProviderConfig directly (which doesn't strip), not ResolveEndpointWithModelOverride. So resolver.go:222 is the only strip for routing-pool entries — removing it would drop suffix stripping for routing.models.
  • MaxRetries comment — it's now a single line and clarified to note the field is set programmatically by NewLLMRouter and not read from config, which covers both the line-wrap nit and the "is this intentionally programmatic-only?" question.

Thanks again! 🙌

liqiankun1111 and others added 6 commits June 27, 2026 14:38
Add an ordered model pool so a review falls over to another provider/model
when the primary is rate-limited, down, or timing out — instead of failing
the file.

- config: new `routing` namespace — `routing.models` ([{provider, model}],
  priority order, reusing the existing `providers` map for credentials) and
  `routing.policy` (only "priority" today; reserved for future policies, an
  unknown value is rejected rather than silently ignored). Namespacing under
  `routing` keeps it distinct from providers.<name>.models (a provider's model
  catalog) and gives future routing knobs a home.
- LLMRouter implements LLMClient: tries members in order, advances on
  availability errors (429/5xx/network), short-circuits on client-side errors
  (400/413/422) and context cancellation. A per-run shared cooldown parks a
  throttled model so concurrent per-file subtasks skip it.
- router members use a low SDK retry budget so a rate-limited model fails fast
  to the next instead of burning the full backoff (MaxRetries now configurable;
  default 5 preserved).
- docs: README.md / README.zh-CN.md config reference + Multi-model fallback.

No `routing.models` keeps the current single-model behavior; `--model` pins a
single endpoint. Tests cover fallover / short-circuit / exhaustion / cooldown,
error classification, config chain resolution, and policy validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- resolveModelRef: clear sub.Model so a top-level `model` cannot leak into a
  routing entry that omits its own model (model now comes only from ref.Model
  or the provider default).
- LLMRouter: when a call fails, stop and return ctx.Err() if the shared context
  is canceled or past its deadline — every member uses that ctx, so none can
  succeed; avoids wasted fallover attempts and misleading logs. A per-request
  timeout (ctx still live) still falls over.
- order(): delete expired cooldown entries so the map stays bounded.
- ResolvedEndpoint.MaxRetries: clarify it is internal/router-set, not read from
  config.

Adds a router test for the context-done short-circuit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
A web edit (5bbe6e9) accidentally pasted the for/if/if header twice in
LLMRouter.order(), leaving unbalanced braces that broke the build. Remove
the duplicate; the intended if/else cooldown handling is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The caller pins ChatRequest.Model to the primary's model name and member
clients prefer req.Model over their own cfg.Model, so the router forwarded
the primary's name to every member. After a cross-provider fallover that
name is unknown to the new provider, yielding a client-side 400/404 that
shouldFallover short-circuits — failing the request instead of falling over.
Clear req.Model once at the router entry so each member uses its configured
model. Adds a regression test asserting members receive an empty model.
… all locales

- router: a 401/403 from a member still falls over (the next member has its
  own key) but now logs a louder 'likely misconfigured api_key' hint so a
  healthy fallback doesn't silently mask a broken primary key. Adds an
  isAuthError classifier with unit coverage.
- docs: port the Multi-model fallback section and the two routing.* config
  rows to README.ja-JP / ko-KR / ru-RU, matching the en/zh versions.

Addresses reviewer follow-ups on PR alibaba#217.
@qiankunli qiankunli force-pushed the feat/llm-multi-model branch from d01d631 to ba5468a Compare June 27, 2026 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants