LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations#537
Merged
Conversation
…uide
First tranche of the LiteLLM drop-in epic (the config-translation cluster).
- crates/sbproxy-config/src/litellm.rs: translate a LiteLLM config.yaml into an
sbproxy ai_proxy sb.yml. Maps model_list -> providers (provider/model split,
os.environ/VAR -> ${VAR}, model_map, weight, base_url/api_version/org),
router_settings.routing_strategy -> routing (name map), rpm/tpm ->
model_rate_limits, litellm_settings.cache -> semantic_cache. Two model_list
entries sharing a model_name become a load-balanced model group (co-routed
providers). Unmapped keys, Python callback hooks, external guardrails, and
general_settings.master_key/database_url are surfaced as warnings, never
dropped. 10 unit tests incl. translate-and-compile fixtures.
- sbproxy config import-litellm <path> --out sb.yml CLI subcommand.
- docs/migration-litellm.md (indexed) with a field-by-field mapping table and a
worked example guarded by a compile test.
Covers WOR-1522, the strategy-name map portion of WOR-1524, the translator side
of WOR-1523 model groups, WOR-1531, and the config-fixture portion of WOR-1533.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
Add parse_budget_period(): accepts the canonical daily/weekly/monthly/total names plus LiteLLM duration strings (30s/30m/30h/30d), erroring on unknown units rather than silently falling through. Foundation for multi-window budget enforcement (WOR-1527) and lets translated LiteLLM budget_duration values be interpreted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
WOR-1528 (callback sinks) OSS seam: crates/sbproxy-ai/src/usage_sink.rs defines the LlmUsageEvent record, the UsageSink trait (non-blocking, failure-isolated), a JSONL file sink and a fire-and-forget webhook sink, plus declarative UsageSinkConfig (jsonl_file / webhook). Closed-source sinks extend the same trait. Unit-tested. WOR-1523: examples/ai-model-group demonstrates one public model name backed by several deployments, load-balanced via model-based routing + the routing strategy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
…o, /health) WOR-1530: serve model metadata + health from the ai_proxy config without an upstream call. /model/info lists every deployment; /model_group/info groups deployments by public model name (num_deployments); /health (+ /health/readiness and the /health/liveliness | /health/liveness aliases) report status. Handled in the GET branch of the AI dispatcher before the upstream forward; pure ai_management_response() builder with unit tests. Demonstrated in the ai-model-group example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
WOR-1529 seam: crates/sbproxy-ai/src/external_guardrail.rs adds a generic HTTP guardrail adapter. ExternalGuardrailConfig (name/url/mode/default_on/fail_open/ timeout); GuardrailMode maps LiteLLM pre_call/post_call/during_call/logging_only to input/output phases and a no-block logging mode; check_external_guardrail() POSTs content and parses a verdict, failing open or closed per config; flexible parse_verdict() accepts allowed/flagged/blocked response shapes. Unit-tested incl. a fail-open vs fail-closed test against an unreachable endpoint. Wiring into the guardrail pipeline is the next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
…red) WOR-1528: ai_proxy configs can now declare `usage_sinks:` (jsonl_file / webhook); AiHandlerConfig::usage_sinks() lazily builds them once and shares the Arc'd instances across requests, mirroring the router/embedding_cache OnceLock pattern. UsageSink gains a Debug bound and the sinks derive Debug so the config still derives Debug. Sink config + builders are unit-tested. Emitting events from the billing choke point spans three functions (handle_ai_proxy + the cache- and stream-relay helpers); that wiring is left as a focused follow-up rather than threaded through the hot path here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First PR of the LiteLLM drop-in effort: make sbproxy a drop-in replacement for the LiteLLM proxy. This lands the config-translation layer (the cheapest path to "drop-in" and the discovery tool for parity gaps) plus the first runtime seams. It is an in-progress epic PR; further runtime subsystems will be pushed to this branch.
Complete (tested, gate green)
sbproxy config import-litellm <litellm.yaml> --out sb.yml. Mapsmodel_list->ai_proxyproviders (provider/model split,os.environ/VAR->${VAR},model_map,weight,base_url/api_version/organization),router_settings.routing_strategy->routing(name map),rpm/tpm->model_rate_limits,litellm_settings.cache->semantic_cache. Unmapped keys, Python callback hooks, external guardrails, andmaster_key/database_urlare surfaced as a warnings report, never dropped. 10 unit tests incl. translate-and-compile fixtures.model_listentries sharing amodel_namebecome co-routed providers (load-balanced via model-based routing + the routing strategy).examples/ai-model-group/added.docs/migration-litellm.md(indexed) with the field-by-field mapping and a worked example guarded by a compile test.Foundations / seams (tested; deep wiring to follow on this branch)
parse_budget_period()accepts the canonical names plus LiteLLM duration strings (30s/30m/30h/30d). Foundation for multi-window enforcement.UsageSinktrait + JSONL and webhook sinks + declarative config (the OSS seam LiteLLMsuccess_callback/callbacksmap onto). Wiring into the billing choke point is the next step.Remaining (to be pushed to this branch)
External-guardrail HTTP adapter;
/responses,/model/info,/model_group/info,/health/*; budget time-windowing; context-window/content-policy fallback classification + per-error retry; usage-sink billing-path wiring; cassettes for the new features; end-to-end client conformance.Enterprise (runtime key store, persistent spend ledger) and the marketing landing page live in other repos and are out of scope for this OSS PR.