LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations by rickcrawford · Pull Request #537 · soapbucket/sbproxy

rickcrawford · 2026-06-25T01:33:45Z

First PR of the LiteLLM drop-in effort: make sbproxy a drop-in replacement for the LiteLLM proxy. This lands the config-translation layer (the cheapest path to "drop-in" and the discovery tool for parity gaps) plus the first runtime seams. It is an in-progress epic PR; further runtime subsystems will be pushed to this branch.

Complete (tested, gate green)

Config translator + CLI: sbproxy config import-litellm <litellm.yaml> --out sb.yml. Maps model_list -> ai_proxy providers (provider/model split, os.environ/VAR -> ${VAR}, model_map, weight, base_url/api_version/organization), router_settings.routing_strategy -> routing (name map), rpm/tpm -> model_rate_limits, litellm_settings.cache -> semantic_cache. Unmapped keys, Python callback hooks, external guardrails, and master_key/database_url are surfaced as a warnings report, never dropped. 10 unit tests incl. translate-and-compile fixtures.
Model groups: two model_list entries sharing a model_name become co-routed providers (load-balanced via model-based routing + the routing strategy). examples/ai-model-group/ added.
Migration guide: docs/migration-litellm.md (indexed) with the field-by-field mapping and a worked example guarded by a compile test.

Foundations / seams (tested; deep wiring to follow on this branch)

Budget periods: parse_budget_period() accepts the canonical names plus LiteLLM duration strings (30s/30m/30h/30d). Foundation for multi-window enforcement.
Usage sinks: UsageSink trait + JSONL and webhook sinks + declarative config (the OSS seam LiteLLM success_callback/callbacks map onto). Wiring into the billing choke point is the next step.

Remaining (to be pushed to this branch)

External-guardrail HTTP adapter; /responses, /model/info, /model_group/info, /health/*; budget time-windowing; context-window/content-policy fallback classification + per-error retry; usage-sink billing-path wiring; cassettes for the new features; end-to-end client conformance.

Enterprise (runtime key store, persistent spend ledger) and the marketing landing page live in other repos and are out of scope for this OSS PR.

…uide First tranche of the LiteLLM drop-in epic (the config-translation cluster). - crates/sbproxy-config/src/litellm.rs: translate a LiteLLM config.yaml into an sbproxy ai_proxy sb.yml. Maps model_list -> providers (provider/model split, os.environ/VAR -> ${VAR}, model_map, weight, base_url/api_version/org), router_settings.routing_strategy -> routing (name map), rpm/tpm -> model_rate_limits, litellm_settings.cache -> semantic_cache. Two model_list entries sharing a model_name become a load-balanced model group (co-routed providers). Unmapped keys, Python callback hooks, external guardrails, and general_settings.master_key/database_url are surfaced as warnings, never dropped. 10 unit tests incl. translate-and-compile fixtures. - sbproxy config import-litellm <path> --out sb.yml CLI subcommand. - docs/migration-litellm.md (indexed) with a field-by-field mapping table and a worked example guarded by a compile test. Covers WOR-1522, the strategy-name map portion of WOR-1524, the translator side of WOR-1523 model groups, WOR-1531, and the config-fixture portion of WOR-1533. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

Add parse_budget_period(): accepts the canonical daily/weekly/monthly/total names plus LiteLLM duration strings (30s/30m/30h/30d), erroring on unknown units rather than silently falling through. Foundation for multi-window budget enforcement (WOR-1527) and lets translated LiteLLM budget_duration values be interpreted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

WOR-1528 (callback sinks) OSS seam: crates/sbproxy-ai/src/usage_sink.rs defines the LlmUsageEvent record, the UsageSink trait (non-blocking, failure-isolated), a JSONL file sink and a fire-and-forget webhook sink, plus declarative UsageSinkConfig (jsonl_file / webhook). Closed-source sinks extend the same trait. Unit-tested. WOR-1523: examples/ai-model-group demonstrates one public model name backed by several deployments, load-balanced via model-based routing + the routing strategy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

…o, /health) WOR-1530: serve model metadata + health from the ai_proxy config without an upstream call. /model/info lists every deployment; /model_group/info groups deployments by public model name (num_deployments); /health (+ /health/readiness and the /health/liveliness | /health/liveness aliases) report status. Handled in the GET branch of the AI dispatcher before the upstream forward; pure ai_management_response() builder with unit tests. Demonstrated in the ai-model-group example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

WOR-1529 seam: crates/sbproxy-ai/src/external_guardrail.rs adds a generic HTTP guardrail adapter. ExternalGuardrailConfig (name/url/mode/default_on/fail_open/ timeout); GuardrailMode maps LiteLLM pre_call/post_call/during_call/logging_only to input/output phases and a no-block logging mode; check_external_guardrail() POSTs content and parses a verdict, failing open or closed per config; flexible parse_verdict() accepts allowed/flagged/blocked response shapes. Unit-tested incl. a fail-open vs fail-closed test against an unreachable endpoint. Wiring into the guardrail pipeline is the next step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

…red) WOR-1528: ai_proxy configs can now declare `usage_sinks:` (jsonl_file / webhook); AiHandlerConfig::usage_sinks() lazily builds them once and shares the Arc'd instances across requests, mirroring the router/embedding_cache OnceLock pattern. UsageSink gains a Debug bound and the sinks derive Debug so the config still derives Debug. Sink config + builders are unit-tested. Emitting events from the billing choke point spans three functions (handle_ai_proxy + the cache- and stream-relay helpers); that wiring is left as a focused follow-up rather than threaded through the hot path here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

rickcrawford and others added 7 commits June 24, 2026 16:37

cargo fmt: format LiteLLM translator + budget parser tests

bfdff0a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww

rickcrawford merged commit 890f263 into main Jun 25, 2026
9 checks passed

rickcrawford deleted the litellm-drop-in branch June 25, 2026 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations#537

LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations#537
rickcrawford merged 7 commits into
mainfrom
litellm-drop-in

rickcrawford commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rickcrawford commented Jun 25, 2026

Complete (tested, gate green)

Foundations / seams (tested; deep wiring to follow on this branch)

Remaining (to be pushed to this branch)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant