diff --git a/README.ja-JP.md b/README.ja-JP.md index 45f0a8d..ad4a4bb 100644 --- a/README.ja-JP.md +++ b/README.ja-JP.md @@ -662,6 +662,8 @@ OCRは4層の優先度チェーンを使ってレビュールールを解決し | `providers..extra_body` | object | すべてのリクエストボディにマージされるJSONオブジェクト | | `providers..extra_headers` | string | カンマ区切りの `key=value` HTTPヘッダー | | `custom_providers..*` | — | 任意の`models`を含む`providers..*`と同じフィールド | +| `routing.models` | array | フェイルオーバー用の順序付きモデルプール:`[{provider, model}]`([マルチモデルフェイルオーバー](#マルチモデルフェイルオーバー)を参照) | +| `routing.policy` | string | 選択ポリシー;`priority`(デフォルト、現時点で唯一の値) | | `llm.url` | string | `https://api.openai.com/v1/chat/completions` | | `llm.auth_token` | string | `sk-xxxxxxx` | | `llm.auth_header` | string | Anthropicのみ:`x-api-key` \| `authorization` | @@ -688,6 +690,32 @@ OCRは4層の優先度チェーンを使ってレビュールールを解決し | `OCR_LLM_MODEL` | モデル名 | | `OCR_USE_ANTHROPIC` | `true` = Anthropic、`false` = OpenAI | +### マルチモデルフェイルオーバー + +デフォルトでは、レビューは単一のモデル(`provider` + `model`)を使用します。レート制限やプロバイダー障害に耐えるには、順序付きの `routing.models` プールを設定します。レビュアーは各モデルを順番に試し、あるモデルがレート制限・サーバーエラー・タイムアウトになると次のモデルへフェイルオーバーします: + +```json +{ + "providers": { + "anthropic": { "api_key": "sk-ant-...", "model": "claude-opus-4-6" }, + "deepseek": { "api_key": "sk-...", "model": "deepseek-v3" } + }, + "routing": { + "models": [ + { "provider": "anthropic", "model": "claude-opus-4-6" }, + { "provider": "deepseek", "model": "deepseek-v3" } + ], + "policy": "priority" + } +} +``` + +- 各エントリは、設定済みのプロバイダー(認証情報 / エンドポイント用)とモデルを参照します。`model` を省略した場合はプロバイダーのデフォルトモデルを使用します。 +- `routing.policy` はプールの順序付け方法を選択します。現在サポートされているのは `priority` のみ(最初のエントリがプライマリ)で、このフィールドは将来のポリシー(例:weighted)のために予約されており、未知の値は暗黙に無視されるのではなく拒否されます。 +- レート制限中または利用不可のモデルは一時的に保留され、並行するファイル単位のレビューが再ヒットせずスキップできます。 +- フェイルオーバーは可用性エラー(レート制限、5xx、ネットワーク / タイムアウト)で発生します。クライアント側エラー(不正なリクエスト、ペイロード過大)では発生しません。別のモデルでも同様に失敗するためです。 +- `routing.models` がなければ動作は変わりません。`--model` は単一モデルを固定し、プールをバイパスします。 + ## テレメトリー diff --git a/README.ko-KR.md b/README.ko-KR.md index 3efd104..ef05be0 100644 --- a/README.ko-KR.md +++ b/README.ko-KR.md @@ -620,6 +620,8 @@ Config file: `~/.opencodereview/config.json` | `providers..extra_body` | object | 모든 요청 본문에 병합되는 JSON 객체 | | `providers..extra_headers` | string | 쉼표로 구분된 `key=value` HTTP 헤더 | | `custom_providers..*` | — | optional `models`를 포함한 `providers..*`과 동일한 필드 | +| `routing.models` | array | 페일오버용 정렬된 모델 풀: `[{provider, model}]` ([다중 모델 페일오버](#다중-모델-페일오버) 참조) | +| `routing.policy` | string | 선택 정책; `priority` (기본값, 현재 유일한 값) | | `llm.url` | string | `https://api.openai.com/v1/chat/completions` | | `llm.auth_token` | string | `sk-xxxxxxx` | | `llm.auth_header` | string | Anthropic only: `x-api-key` \| `authorization` | @@ -646,6 +648,32 @@ Config file: `~/.opencodereview/config.json` | `OCR_LLM_MODEL` | Model name | | `OCR_USE_ANTHROPIC` | `true` = Anthropic, `false` = OpenAI | +### 다중 모델 페일오버 + +기본적으로 리뷰는 단일 모델(`provider` + `model`)을 사용합니다. 속도 제한과 공급자 장애에 대응하려면 정렬된 `routing.models` 풀을 구성하세요. 리뷰어는 각 모델을 순서대로 시도하고, 어떤 모델이 속도 제한에 걸리거나 서버 오류를 반환하거나 타임아웃되면 다음 모델로 페일오버합니다: + +```json +{ + "providers": { + "anthropic": { "api_key": "sk-ant-...", "model": "claude-opus-4-6" }, + "deepseek": { "api_key": "sk-...", "model": "deepseek-v3" } + }, + "routing": { + "models": [ + { "provider": "anthropic", "model": "claude-opus-4-6" }, + { "provider": "deepseek", "model": "deepseek-v3" } + ], + "policy": "priority" + } +} +``` + +- 각 항목은 구성된 공급자(자격 증명 / 엔드포인트용)와 모델을 참조합니다. `model`을 생략하면 공급자의 기본 모델을 사용합니다. +- `routing.policy`는 풀의 정렬 방식을 선택합니다. 현재는 `priority`만 지원되며(첫 번째 항목이 기본), 이 필드는 향후 정책(예: weighted)을 위해 예약되어 있고, 알 수 없는 값은 조용히 무시되지 않고 거부됩니다. +- 속도 제한에 걸렸거나 사용할 수 없는 모델은 잠시 보류되어, 동시에 실행되는 파일별 리뷰가 다시 시도하지 않고 건너뜁니다. +- 페일오버는 가용성 오류(속도 제한, 5xx, 네트워크 / 타임아웃)에서 발생합니다. 클라이언트 측 오류(잘못된 요청, 페이로드 초과)에서는 발생하지 않습니다. 다른 모델도 동일하게 실패하기 때문입니다. +- `routing.models`가 없으면 동작은 변경되지 않습니다. `--model`은 단일 모델을 고정하고 풀을 우회합니다. + ## Telemetry 관측성을 위한 OpenTelemetry 통합(spans, metrics)입니다. 기본값은 disabled입니다. diff --git a/README.md b/README.md index 047b29c..7417673 100644 --- a/README.md +++ b/README.md @@ -667,6 +667,8 @@ Config file: `~/.opencodereview/config.json` | `providers..extra_body` | object | JSON object merged into every request body | | `providers..extra_headers` | string | Comma-separated `key=value` HTTP headers | | `custom_providers..*` | — | Same fields as `providers..*`, including optional `models` | +| `routing.models` | array | Ordered model pool for failover: `[{provider, model}]` (see [Multi-model fallback](#multi-model-fallback)) | +| `routing.policy` | string | Selection policy; `priority` (default, only value today) | | `llm.url` | string | `https://api.openai.com/v1/chat/completions` | | `llm.auth_token` | string | `sk-xxxxxxx` | | `llm.auth_header` | string | Anthropic only: `x-api-key` \| `authorization` | @@ -693,6 +695,32 @@ Environment variables take precedence over the config file. | `OCR_LLM_MODEL` | Model name | | `OCR_USE_ANTHROPIC` | `true` = Anthropic, `false` = OpenAI | +### Multi-model fallback + +By default a review uses a single model (`provider` + `model`). To survive rate limits and provider outages, configure an ordered `routing.models` pool — the reviewer tries each in order and falls over to the next when one is rate-limited, returns a server error, or times out: + +```json +{ + "providers": { + "anthropic": { "api_key": "sk-ant-...", "model": "claude-opus-4-6" }, + "deepseek": { "api_key": "sk-...", "model": "deepseek-v3" } + }, + "routing": { + "models": [ + { "provider": "anthropic", "model": "claude-opus-4-6" }, + { "provider": "deepseek", "model": "deepseek-v3" } + ], + "policy": "priority" + } +} +``` + +- Each entry references a configured provider (for credentials / endpoint) and a model; an omitted `model` uses the provider's default. +- `routing.policy` selects how the pool is ordered. Only `priority` is supported today (first entry is primary); the field is reserved for future policies (e.g. weighted), and an unknown value is rejected rather than silently ignored. +- A rate-limited or unavailable model is briefly parked so concurrent per-file reviews skip it instead of each re-hitting it. +- Failover triggers on availability errors (rate limit, 5xx, network/timeout). Client-side errors (bad request, payload too large) do **not** trigger failover, since another model would fail identically. +- Without `routing.models`, behavior is unchanged. `--model` pins a single model and bypasses the pool. + ## Telemetry diff --git a/README.ru-RU.md b/README.ru-RU.md index 72e1274..830dc84 100644 --- a/README.ru-RU.md +++ b/README.ru-RU.md @@ -664,6 +664,8 @@ OCR разрешает правила ревью по цепочке приор | `providers..extra_body` | object | JSON-объект, добавляемый в каждое тело запроса | | `providers..extra_headers` | string | HTTP-заголовки `key=value` через запятую | | `custom_providers..*` | — | Те же поля, что и `providers..*`, включая необязательное `models` | +| `routing.models` | array | Упорядоченный пул моделей для отказоустойчивости: `[{provider, model}]` (см. [Отказоустойчивость с несколькими моделями](#отказоустойчивость-с-несколькими-моделями)) | +| `routing.policy` | string | Политика выбора; `priority` (по умолчанию, единственное значение на сегодня) | | `llm.url` | string | `https://api.openai.com/v1/chat/completions` | | `llm.auth_token` | string | `sk-xxxxxxx` | | `llm.auth_header` | string | Только для Anthropic: `x-api-key` \| `authorization` | @@ -690,6 +692,32 @@ OCR разрешает правила ревью по цепочке приор | `OCR_LLM_MODEL` | Имя модели | | `OCR_USE_ANTHROPIC` | `true` = Anthropic, `false` = OpenAI | +### Отказоустойчивость с несколькими моделями + +По умолчанию ревью использует одну модель (`provider` + `model`). Чтобы пережить ограничения частоты и сбои провайдеров, настройте упорядоченный пул `routing.models` — ревьюер пробует каждую по порядку и переключается на следующую, когда одна из них ограничена по частоте, возвращает серверную ошибку или истекает по тайм-ауту: + +```json +{ + "providers": { + "anthropic": { "api_key": "sk-ant-...", "model": "claude-opus-4-6" }, + "deepseek": { "api_key": "sk-...", "model": "deepseek-v3" } + }, + "routing": { + "models": [ + { "provider": "anthropic", "model": "claude-opus-4-6" }, + { "provider": "deepseek", "model": "deepseek-v3" } + ], + "policy": "priority" + } +} +``` + +- Каждая запись ссылается на настроенного провайдера (для учётных данных / эндпоинта) и модель; пропущенное `model` использует модель провайдера по умолчанию. +- `routing.policy` определяет порядок пула. Сегодня поддерживается только `priority` (первая запись — основная); поле зарезервировано для будущих политик (например, weighted), и неизвестное значение отклоняется, а не игнорируется молча. +- Ограниченная по частоте или недоступная модель ненадолго откладывается, чтобы параллельные пофайловые ревью пропускали её, а не обращались к ней повторно. +- Переключение срабатывает при ошибках доступности (ограничение частоты, 5xx, сеть / тайм-аут). Ошибки на стороне клиента (неверный запрос, слишком большой payload) не вызывают переключение, поскольку другая модель завершится с той же ошибкой. +- Без `routing.models` поведение не меняется. `--model` фиксирует одну модель и обходит пул. + ## Телеметрия diff --git a/README.zh-CN.md b/README.zh-CN.md index 01446a6..67ade2b 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -652,6 +652,8 @@ OCR 通过四层优先级链解析评审规则。每层采用首次匹配原则 | `providers..extra_body` | object | 合并到每个请求体的 JSON 对象 | | `providers..extra_headers` | string | 逗号分隔的 `key=value` HTTP 头 | | `custom_providers..*` | — | 与 `providers..*` 相同的字段,包括可选的 `models` | +| `routing.models` | array | 用于故障转移的有序模型池:`[{provider, model}]`(见[多模型故障转移](#多模型故障转移)) | +| `routing.policy` | string | 选择策略;`priority`(默认,目前唯一取值) | | `llm.url` | string | `https://api.openai.com/v1/chat/completions` | | `llm.auth_token` | string | `sk-xxxxxxx` | | `llm.auth_header` | string | 仅 Anthropic:`x-api-key` \| `authorization` | @@ -678,6 +680,32 @@ OCR 通过四层优先级链解析评审规则。每层采用首次匹配原则 | `OCR_LLM_MODEL` | 模型名称 | | `OCR_USE_ANTHROPIC` | `true` = Anthropic,`false` = OpenAI | +### 多模型故障转移 + +默认评审使用单一模型(`provider` + `model`)。为应对限流与供应商故障,可配置有序的 `routing.models` 池——评审按顺序尝试,当某个模型被限流、返回服务端错误或超时时,自动转移到下一个: + +```json +{ + "providers": { + "anthropic": { "api_key": "sk-ant-...", "model": "claude-opus-4-6" }, + "deepseek": { "api_key": "sk-...", "model": "deepseek-v3" } + }, + "routing": { + "models": [ + { "provider": "anthropic", "model": "claude-opus-4-6" }, + { "provider": "deepseek", "model": "deepseek-v3" } + ], + "policy": "priority" + } +} +``` + +- 每个条目引用一个已配置的供应商(提供凭据 / 端点)及一个模型;省略 `model` 时使用该供应商的默认模型。 +- `routing.policy` 决定池的排序方式。目前仅支持 `priority`(第一个为主模型);该字段为未来策略(如 weighted)预留,填入未知值会报错而非被静默忽略。 +- 被限流或不可用的模型会被短暂搁置,使并发的逐文件评审跳过它,而非各自重复命中。 +- 仅在可用性错误(限流、5xx、网络 / 超时)时转移。客户端错误(请求错误、负载过大)**不**触发转移,因为换个模型同样会失败。 +- 不配置 `routing.models` 时行为不变。`--model` 固定单一模型并绕过该池。 + ## 遥测 diff --git a/cmd/opencodereview/shared.go b/cmd/opencodereview/shared.go index 8d898a1..941c344 100644 --- a/cmd/opencodereview/shared.go +++ b/cmd/opencodereview/shared.go @@ -145,14 +145,14 @@ func loadLLMRuntime(tpl *template.Template, toolConfigPath, modelOverride string } tpl.ApplyLanguage(lang) - ep, err := llm.ResolveEndpointWithModelOverride(cfgPath, modelOverride) + eps, err := llm.ResolveModelsWithModelOverride(cfgPath, modelOverride) if err != nil { return nil, fmt.Errorf("resolve LLM endpoint: %w", err) } return &llmRuntime{ - Client: llm.NewLLMClient(ep), - Model: ep.Model, + Client: llm.NewLLMRouter(eps), + Model: eps[0].Model, PlanToolDefs: planToolDefs, MainToolDefs: mainToolDefs, Collector: tool.NewCommentCollector(), diff --git a/internal/llm/client.go b/internal/llm/client.go index afbec68..5874d13 100644 --- a/internal/llm/client.go +++ b/internal/llm/client.go @@ -5,7 +5,9 @@ package llm import ( "context" "encoding/json" + "errors" "fmt" + "log" "strings" "sync" "time" @@ -186,6 +188,8 @@ type ClientConfig struct { Timeout time.Duration // Request timeout ExtraBody map[string]any // Vendor-specific fields merged into every request body ExtraHeaders map[string]string // Extra HTTP headers sent with every request + MaxRetries int // SDK in-provider retry budget; 0 → default. Lowered for router members so a + // rate-limited model fails fast to the next instead of burning the full backoff. } // --- Factory --- @@ -200,6 +204,7 @@ func NewLLMClient(ep ResolvedEndpoint) LLMClient { AuthHeader: ep.AuthHeader, ExtraBody: ep.ExtraBody, ExtraHeaders: ep.ExtraHeaders, + MaxRetries: ep.MaxRetries, } if ep.Protocol == "anthropic" { return NewAnthropicClient(cfg) @@ -207,6 +212,169 @@ func NewLLMClient(ep ResolvedEndpoint) LLMClient { return NewOpenAIClient(cfg) } +func maxRetriesOrDefault(n int) int { + if n > 0 { + return n + } + return 5 // SDK default budget when caller doesn't constrain it +} + +// --- Multi-model router --- + +// Tunables for LLMRouter. A router member that returns a fallover-worthy error is +// parked for routerCooldown so concurrent subtasks skip it instead of each re-hitting +// a model that's down/throttled. Members get a low retry budget so a rate-limited +// model fails fast to the next rather than burning the full SDK backoff. +const ( + routerMemberRetries = 2 + routerCooldown = 30 * time.Second +) + +type routerMember struct { + client LLMClient + label string // "protocol/model" for logs +} + +// LLMRouter is an LLMClient over an ordered pool of models. On a fallover-worthy +// failure (rate limit / 5xx / network) it advances to the next member; client-side +// errors (bad request / payload too large) short-circuit since another model would +// fail identically. Cooldown state is shared across concurrent CompletionsWithCtx +// calls (one ocr run's per-file subtasks), so a throttled model is skipped fleet-wide. +// Selection is strict priority order today; the order() seam is where a weighted / +// capability policy would plug in. +type LLMRouter struct { + members []routerMember + mu sync.Mutex + cooldown map[int]time.Time // member index → parked-until +} + +// NewLLMRouter builds an LLMClient from an ordered pool. A pool of one returns a +// plain client (no router overhead, unchanged single-model behavior). +func NewLLMRouter(eps []ResolvedEndpoint) LLMClient { + if len(eps) == 1 { + return NewLLMClient(eps[0]) + } + members := make([]routerMember, len(eps)) + for i, ep := range eps { + if ep.MaxRetries == 0 { + ep.MaxRetries = routerMemberRetries + } + members[i] = routerMember{client: NewLLMClient(ep), label: ep.Protocol + "/" + ep.Model} + } + return &LLMRouter{members: members, cooldown: make(map[int]time.Time)} +} + +func (r *LLMRouter) CompletionsWithCtx(ctx context.Context, req ChatRequest) (*ChatResponse, error) { + // Each member is its own provider/model endpoint. The caller pins req.Model to the + // primary's model name, and member clients prefer req.Model over their own cfg.Model + // — so forwarding it verbatim sends the primary's name to every member. After a + // cross-provider fallover that name is unknown to the new provider → a client-side + // 400/404 that shouldFallover short-circuits, failing the whole request. Clear it so + // each member uses its configured model. (req is by value; this is our local copy.) + req.Model = "" + var lastErr error + for _, i := range r.order() { + resp, err := r.members[i].client.CompletionsWithCtx(ctx, req) + if err == nil { + return resp, nil + } + lastErr = err + if ctx.Err() != nil { + // The shared ctx is canceled or past its deadline: the overall budget is + // exhausted and no other member can succeed (they all use this ctx). Stop + // here rather than burning fallover attempts. A per-request timeout (ctx + // still live) is NOT caught here and still falls over below. + return nil, ctx.Err() + } + if !shouldFallover(err) { + return nil, err + } + r.park(i) + if isAuthError(err) { + // 401/403 still falls over (the next member has its own key), but a healthy + // fallback would otherwise silently mask a broken primary key. Log it loudly. + log.Printf("[llm-router] %s auth error (%v) — likely a misconfigured api_key for this provider; trying next model", r.members[i].label, err) + } else { + log.Printf("[llm-router] %s failed (%v) — trying next model", r.members[i].label, err) + } + } + return nil, fmt.Errorf("all %d models exhausted; last error: %w", len(r.members), lastErr) +} + +// order returns member indices in priority order with non-parked first; parked ones +// are appended (not dropped) so an all-parked pool is still attempted as last resort. +func (r *LLMRouter) order() []int { + r.mu.Lock() + defer r.mu.Unlock() + now := time.Now() + live := make([]int, 0, len(r.members)) + parked := make([]int, 0) + for i := range r.members { + if t, ok := r.cooldown[i]; ok { + if now.Before(t) { + parked = append(parked, i) + } else { + delete(r.cooldown, i) + live = append(live, i) + } + } else { + live = append(live, i) + } + } + return append(live, parked...) +} + +func (r *LLMRouter) park(i int) { + r.mu.Lock() + r.cooldown[i] = time.Now().Add(routerCooldown) + r.mu.Unlock() +} + +// shouldFallover reports whether err warrants trying the next model. Availability +// failures (rate limit, server, network) → yes; a caller-cancelled context or a +// client-side request error (same payload fails on every model) → no. +func shouldFallover(err error) bool { + if err == nil { + return false + } + if errors.Is(err, context.Canceled) { + return false + } + var aerr *anthropic.Error + if errors.As(err, &aerr) { + return falloverStatus(aerr.StatusCode) + } + var oerr *openai.Error + if errors.As(err, &oerr) { + return falloverStatus(oerr.StatusCode) + } + return true // unknown (network blip / timeout / parse) → next model may succeed +} + +func falloverStatus(code int) bool { + switch code { + case 400, 413, 422: + return false // bad request / payload too large / unprocessable: deterministic across models + default: + return true // 401/403/404/408/409/429/5xx: a different provider/key/capacity may differ + } +} + +// isAuthError reports whether err is a 401/403 from a provider — almost always a +// misconfigured api_key rather than a transient failure. Used only to escalate the +// fallover log; it does not change fallover behavior (the next member has its own key). +func isAuthError(err error) bool { + var aerr *anthropic.Error + if errors.As(err, &aerr) { + return aerr.StatusCode == 401 || aerr.StatusCode == 403 + } + var oerr *openai.Error + if errors.As(err, &oerr) { + return oerr.StatusCode == 401 || oerr.StatusCode == 403 + } + return false +} + // --- Token counting with tiktoken --- // modelTokenizerCache caches initialized tiktoken encoders keyed by encoding name. @@ -296,7 +464,7 @@ func NewOpenAIClient(cfg ClientConfig) *OpenAIClient { opts := []openaiopt.RequestOption{ openaiopt.WithAPIKey(cfg.APIKey), openaiopt.WithBaseURL(sdkBaseURL), - openaiopt.WithMaxRetries(5), + openaiopt.WithMaxRetries(maxRetriesOrDefault(cfg.MaxRetries)), openaiopt.WithHeader("User-Agent", userAgent("")), openaiopt.WithRequestTimeout(cfg.Timeout), } @@ -499,7 +667,7 @@ func NewAnthropicClient(cfg ClientConfig) *AnthropicClient { opts := []option.RequestOption{ option.WithBaseURL(sdkBaseURL), - option.WithMaxRetries(5), + option.WithMaxRetries(maxRetriesOrDefault(cfg.MaxRetries)), option.WithHeader("User-Agent", userAgent("claude")), option.WithRequestTimeout(cfg.Timeout), } diff --git a/internal/llm/resolver.go b/internal/llm/resolver.go index 8ed011d..d7044e5 100644 --- a/internal/llm/resolver.go +++ b/internal/llm/resolver.go @@ -19,6 +19,7 @@ type ResolvedEndpoint struct { Source string // human-readable config source label ExtraBody map[string]any // vendor-specific request body fields ExtraHeaders map[string]string // extra HTTP headers for the LLM request + MaxRetries int // internal SDK retry budget (0 = SDK default); not read from config — set by NewLLMRouter, low for pool members so a throttled one fails fast to the next } // Environment variable names for OCR-specific configuration. @@ -148,36 +149,130 @@ type providerEntryConfig struct { ExtraHeaders map[string]string `json:"extra_headers,omitempty"` } +// modelRef is one entry of the ordered routing pool: which configured provider to +// use and (optionally) which of its models. An empty Model falls back to the +// provider's default model. +type modelRef struct { + Provider string `json:"provider"` + Model string `json:"model,omitempty"` +} + +// routingConfig is the multi-model namespace: an ordered pool plus a selection +// policy. Grouping these under `routing` (vs a bare top-level list) gives the policy +// and future per-pool knobs a stable home, and avoids colliding with +// providers..models (which is a provider's model catalog, not a routing pool). +type routingConfig struct { + Models []modelRef `json:"models,omitempty"` // ordered pool; index 0 is primary + Policy string `json:"policy,omitempty"` // selection policy; only "priority" supported today +} + type configFile struct { Provider string `json:"provider,omitempty"` Model string `json:"model,omitempty"` + Routing routingConfig `json:"routing,omitempty"` Providers map[string]providerEntryConfig `json:"providers,omitempty"` CustomProviders map[string]providerEntryConfig `json:"custom_providers,omitempty"` Llm llmFileConfig `json:"llm,omitempty"` } -// tryOCRConfig reads the OCR config file. -func tryOCRConfig(path, modelOverride string) (ResolvedEndpoint, bool, error) { +// loadConfigFile reads + parses the OCR config file. ok=false (nil err) when absent. +func loadConfigFile(path string) (configFile, bool, error) { data, err := os.ReadFile(path) if err != nil { if os.IsNotExist(err) { - return ResolvedEndpoint{}, false, nil + return configFile{}, false, nil } - return ResolvedEndpoint{}, false, err + return configFile{}, false, err } - var cfg configFile if err := json.Unmarshal(data, &cfg); err != nil { - return ResolvedEndpoint{}, false, fmt.Errorf("parse config: %w", err) + return configFile{}, false, fmt.Errorf("parse config: %w", err) + } + return cfg, true, nil +} + +// tryOCRConfig resolves a single endpoint from the config file (the primary, for +// callers that want one client). `provider` wins when set; otherwise `models[0]` +// (the pool's primary); otherwise the legacy `llm` block. +func tryOCRConfig(path, modelOverride string) (ResolvedEndpoint, bool, error) { + cfg, ok, err := loadConfigFile(path) + if err != nil || !ok { + return ResolvedEndpoint{}, false, err } if cfg.Provider != "" { return tryProviderConfig(cfg, modelOverride) } + if len(cfg.Routing.Models) > 0 { + ep, err := resolveModelRef(cfg, cfg.Routing.Models[0]) + if err != nil { + return ResolvedEndpoint{}, false, err + } + return ep, true, nil + } return tryLegacyLlmConfig(cfg, modelOverride) } +// resolveModelRef resolves one pool entry, reusing tryProviderConfig by pinning it to +// this entry's provider. ref.Model is passed as the model override so it wins over the +// provider's default and is validated against the provider's available models. +func resolveModelRef(cfg configFile, ref modelRef) (ResolvedEndpoint, error) { + if ref.Provider == "" { + return ResolvedEndpoint{}, fmt.Errorf("models[] entry missing required 'provider' field") + } + sub := cfg + sub.Provider = ref.Provider + sub.Model = "" // don't let a top-level `model` leak into routing entries; model comes from ref.Model or the provider default + sub.Routing = routingConfig{} + ep, ok, err := tryProviderConfig(sub, ref.Model) + if err != nil { + return ResolvedEndpoint{}, err + } + if !ok || ep.URL == "" || ep.Token == "" || ep.Model == "" { + return ResolvedEndpoint{}, fmt.Errorf("models[] entry {provider:%q model:%q} did not resolve to a complete endpoint", ref.Provider, ref.Model) + } + ep.Model = stripModelSuffix(ep.Model) + return ep, nil +} + +// ResolveModels resolves the full ordered model pool for the router. +func ResolveModels(configPath string) ([]ResolvedEndpoint, error) { + return ResolveModelsWithModelOverride(configPath, "") +} + +// ResolveModelsWithModelOverride returns the ordered pool of endpoints. An explicit +// modelOverride (--model) bypasses the pool and pins a single endpoint. Without it, a +// config `models` list resolves to the whole chain; otherwise it falls back to the +// single-endpoint resolution (env / single provider / legacy / shell), wrapped as a +// one-element pool — so existing configs behave exactly as before. +func ResolveModelsWithModelOverride(configPath, modelOverride string) ([]ResolvedEndpoint, error) { + if strings.TrimSpace(modelOverride) == "" { + if cfg, ok, err := loadConfigFile(configPath); err != nil { + return nil, err + } else if ok && len(cfg.Routing.Models) > 0 { + if pol := strings.TrimSpace(cfg.Routing.Policy); pol != "" && pol != "priority" { + return nil, fmt.Errorf("unsupported routing.policy %q (only \"priority\" is supported)", pol) + } + eps := make([]ResolvedEndpoint, 0, len(cfg.Routing.Models)) + for _, ref := range cfg.Routing.Models { + ep, err := resolveModelRef(cfg, ref) + if err != nil { + return nil, err + } + eps = append(eps, ep) + } + return eps, nil + } + } + + ep, err := ResolveEndpointWithModelOverride(configPath, modelOverride) + if err != nil { + return nil, err + } + return []ResolvedEndpoint{ep}, nil +} + // tryProviderConfig resolves an endpoint from the provider-based configuration. func tryProviderConfig(cfg configFile, modelOverride string) (ResolvedEndpoint, bool, error) { preset, isPreset := LookupProvider(cfg.Provider) diff --git a/internal/llm/router_test.go b/internal/llm/router_test.go new file mode 100644 index 0000000..bdc6de1 --- /dev/null +++ b/internal/llm/router_test.go @@ -0,0 +1,230 @@ +package llm + +import ( + "context" + "encoding/json" + "errors" + "os" + "path/filepath" + "testing" + "time" + + anthropic "github.com/anthropics/anthropic-sdk-go" + openai "github.com/openai/openai-go/v3" +) + +// fakeClient is a programmable LLMClient for router tests: it records call count and +// returns a fixed resp/err. +type fakeClient struct { + calls int + gotModel string // req.Model seen on the most recent call + resp *ChatResponse + err error +} + +func (f *fakeClient) CompletionsWithCtx(_ context.Context, req ChatRequest) (*ChatResponse, error) { + f.calls++ + f.gotModel = req.Model + return f.resp, f.err +} + +func newRouter(members ...routerMember) *LLMRouter { + return &LLMRouter{members: members, cooldown: make(map[int]time.Time)} +} + +func TestLLMRouter_FalloverThenSuccess(t *testing.T) { + c0 := &fakeClient{err: errors.New("network blip")} // unknown error → fallover + c1 := &fakeClient{resp: &ChatResponse{ID: "ok"}} + r := newRouter(routerMember{c0, "a"}, routerMember{c1, "b"}) + + resp, err := r.CompletionsWithCtx(context.Background(), ChatRequest{}) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if resp.ID != "ok" { + t.Fatalf("expected fallback response, got %+v", resp) + } + if c0.calls != 1 || c1.calls != 1 { + t.Fatalf("calls c0=%d c1=%d, want 1/1", c0.calls, c1.calls) + } + + // member 0 is now parked: a second call should skip it and hit c1 directly. + if _, err := r.CompletionsWithCtx(context.Background(), ChatRequest{}); err != nil { + t.Fatalf("second call error: %v", err) + } + if c0.calls != 1 { + t.Fatalf("parked member retried: c0.calls=%d, want 1", c0.calls) + } + if c1.calls != 2 { + t.Fatalf("c1.calls=%d, want 2", c1.calls) + } +} + +func TestLLMRouter_ClientErrorShortCircuits(t *testing.T) { + c0 := &fakeClient{err: &openai.Error{StatusCode: 400}} // bad request → no fallover + c1 := &fakeClient{resp: &ChatResponse{ID: "ok"}} + r := newRouter(routerMember{c0, "a"}, routerMember{c1, "b"}) + + if _, err := r.CompletionsWithCtx(context.Background(), ChatRequest{}); err == nil { + t.Fatal("expected error to propagate, got nil") + } + if c1.calls != 0 { + t.Fatalf("next model tried on client-side error: c1.calls=%d, want 0", c1.calls) + } +} + +// The caller pins req.Model to the primary's model name; the router must not forward it +// to members, or a cross-provider fallover would send an unknown model name and 400. +func TestLLMRouter_ClearsCallerModel(t *testing.T) { + c0 := &fakeClient{err: errors.New("network blip")} // fall over to c1 + c1 := &fakeClient{resp: &ChatResponse{ID: "ok"}} + r := newRouter(routerMember{c0, "a"}, routerMember{c1, "b"}) + + if _, err := r.CompletionsWithCtx(context.Background(), ChatRequest{Model: "primary-model"}); err != nil { + t.Fatalf("unexpected error: %v", err) + } + if c0.gotModel != "" || c1.gotModel != "" { + t.Fatalf("router forwarded caller model: c0=%q c1=%q, want both empty", c0.gotModel, c1.gotModel) + } +} + +func TestLLMRouter_AllExhausted(t *testing.T) { + c0 := &fakeClient{err: errors.New("down")} + c1 := &fakeClient{err: errors.New("down")} + r := newRouter(routerMember{c0, "a"}, routerMember{c1, "b"}) + + _, err := r.CompletionsWithCtx(context.Background(), ChatRequest{}) + if err == nil { + t.Fatal("expected exhausted error, got nil") + } + if c0.calls != 1 || c1.calls != 1 { + t.Fatalf("calls c0=%d c1=%d, want both 1", c0.calls, c1.calls) + } +} + +func TestLLMRouter_StopsWhenContextDone(t *testing.T) { + c0 := &fakeClient{err: errors.New("boom")} + c1 := &fakeClient{resp: &ChatResponse{ID: "ok"}} + r := newRouter(routerMember{c0, "a"}, routerMember{c1, "b"}) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() // shared budget exhausted — no member can succeed + + if _, err := r.CompletionsWithCtx(ctx, ChatRequest{}); err == nil { + t.Fatal("expected error when ctx is done, got nil") + } + if c1.calls != 0 { + t.Fatalf("fell over despite done ctx: c1.calls=%d, want 0", c1.calls) + } +} + +func TestShouldFallover(t *testing.T) { + cases := []struct { + name string + err error + want bool + }{ + {"nil", nil, false}, + {"unknown", errors.New("boom"), true}, + {"canceled", context.Canceled, false}, + {"openai 400", &openai.Error{StatusCode: 400}, false}, + {"openai 429", &openai.Error{StatusCode: 429}, true}, + {"anthropic 503", &anthropic.Error{StatusCode: 503}, true}, + {"anthropic 401", &anthropic.Error{StatusCode: 401}, true}, + } + for _, c := range cases { + if got := shouldFallover(c.err); got != c.want { + t.Errorf("%s: shouldFallover=%v, want %v", c.name, got, c.want) + } + } +} + +func TestIsAuthError(t *testing.T) { + cases := []struct { + name string + err error + want bool + }{ + {"nil", nil, false}, + {"openai 401", &openai.Error{StatusCode: 401}, true}, + {"openai 403", &openai.Error{StatusCode: 403}, true}, + {"openai 429", &openai.Error{StatusCode: 429}, false}, + {"anthropic 401", &anthropic.Error{StatusCode: 401}, true}, + {"anthropic 500", &anthropic.Error{StatusCode: 500}, false}, + {"unknown", errors.New("boom"), false}, + } + for _, c := range cases { + if got := isAuthError(c.err); got != c.want { + t.Errorf("%s: isAuthError=%v, want %v", c.name, got, c.want) + } + } +} + +func TestNewLLMRouter_SinglePoolNoRouter(t *testing.T) { + ep := ResolvedEndpoint{URL: "https://x.example.com", Protocol: "openai", Model: "m", Token: "t"} + if _, isRouter := NewLLMRouter([]ResolvedEndpoint{ep}).(*LLMRouter); isRouter { + t.Fatal("single-model pool should not be wrapped in a router") + } + two := NewLLMRouter([]ResolvedEndpoint{ep, ep}) + if _, isRouter := two.(*LLMRouter); !isRouter { + t.Fatal("multi-model pool should be a router") + } +} + +func TestResolveModels_Chain(t *testing.T) { + for _, k := range []string{"OCR_LLM_URL", "OCR_LLM_TOKEN", "OCR_LLM_MODEL", "ANTHROPIC_BASE_URL", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_MODEL"} { + t.Setenv(k, "") + } + cfg := configFile{ + Routing: routingConfig{ + Models: []modelRef{{Provider: "p1"}, {Provider: "p2", Model: "m2b"}}, + Policy: "priority", + }, + CustomProviders: map[string]providerEntryConfig{ + "p1": {URL: "https://a.example.com", Protocol: "openai", APIKey: "k1", Model: "m1"}, + "p2": {URL: "https://b.example.com", Protocol: "openai", APIKey: "k2", Model: "m2"}, + }, + } + data, _ := json.Marshal(cfg) + cfgPath := filepath.Join(t.TempDir(), "config.json") + if err := os.WriteFile(cfgPath, data, 0o644); err != nil { + t.Fatal(err) + } + + eps, err := ResolveModels(cfgPath) + if err != nil { + t.Fatalf("ResolveModels: %v", err) + } + if len(eps) != 2 { + t.Fatalf("pool size=%d, want 2", len(eps)) + } + if eps[0].Model != "m1" { + t.Errorf("eps[0].Model=%q, want m1", eps[0].Model) + } + if eps[1].Model != "m2b" { // explicit ref.Model wins over the provider's default m2 + t.Errorf("eps[1].Model=%q, want m2b (ref.Model overrides provider default)", eps[1].Model) + } +} + +func TestResolveModels_RejectsUnknownPolicy(t *testing.T) { + for _, k := range []string{"OCR_LLM_URL", "OCR_LLM_TOKEN", "OCR_LLM_MODEL", "ANTHROPIC_BASE_URL", "ANTHROPIC_AUTH_TOKEN", "ANTHROPIC_MODEL"} { + t.Setenv(k, "") + } + cfg := configFile{ + Routing: routingConfig{ + Models: []modelRef{{Provider: "p1"}}, + Policy: "weighted", // not supported yet → reserved, must error rather than silently ignore + }, + CustomProviders: map[string]providerEntryConfig{ + "p1": {URL: "https://a.example.com", Protocol: "openai", APIKey: "k1", Model: "m1"}, + }, + } + data, _ := json.Marshal(cfg) + cfgPath := filepath.Join(t.TempDir(), "config.json") + if err := os.WriteFile(cfgPath, data, 0o644); err != nil { + t.Fatal(err) + } + if _, err := ResolveModels(cfgPath); err == nil { + t.Fatal("expected error for unsupported routing.policy, got nil") + } +}