Skip to content

LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations#537

Merged
rickcrawford merged 7 commits into
mainfrom
litellm-drop-in
Jun 25, 2026
Merged

LiteLLM drop-in: config translator, model groups, usage-sink + budget foundations#537
rickcrawford merged 7 commits into
mainfrom
litellm-drop-in

Conversation

@rickcrawford

Copy link
Copy Markdown
Contributor

First PR of the LiteLLM drop-in effort: make sbproxy a drop-in replacement for the LiteLLM proxy. This lands the config-translation layer (the cheapest path to "drop-in" and the discovery tool for parity gaps) plus the first runtime seams. It is an in-progress epic PR; further runtime subsystems will be pushed to this branch.

Complete (tested, gate green)

  • Config translator + CLI: sbproxy config import-litellm <litellm.yaml> --out sb.yml. Maps model_list -> ai_proxy providers (provider/model split, os.environ/VAR -> ${VAR}, model_map, weight, base_url/api_version/organization), router_settings.routing_strategy -> routing (name map), rpm/tpm -> model_rate_limits, litellm_settings.cache -> semantic_cache. Unmapped keys, Python callback hooks, external guardrails, and master_key/database_url are surfaced as a warnings report, never dropped. 10 unit tests incl. translate-and-compile fixtures.
  • Model groups: two model_list entries sharing a model_name become co-routed providers (load-balanced via model-based routing + the routing strategy). examples/ai-model-group/ added.
  • Migration guide: docs/migration-litellm.md (indexed) with the field-by-field mapping and a worked example guarded by a compile test.

Foundations / seams (tested; deep wiring to follow on this branch)

  • Budget periods: parse_budget_period() accepts the canonical names plus LiteLLM duration strings (30s/30m/30h/30d). Foundation for multi-window enforcement.
  • Usage sinks: UsageSink trait + JSONL and webhook sinks + declarative config (the OSS seam LiteLLM success_callback/callbacks map onto). Wiring into the billing choke point is the next step.

Remaining (to be pushed to this branch)

External-guardrail HTTP adapter; /responses, /model/info, /model_group/info, /health/*; budget time-windowing; context-window/content-policy fallback classification + per-error retry; usage-sink billing-path wiring; cassettes for the new features; end-to-end client conformance.

Enterprise (runtime key store, persistent spend ledger) and the marketing landing page live in other repos and are out of scope for this OSS PR.

rickcrawford and others added 7 commits June 24, 2026 16:37
…uide

First tranche of the LiteLLM drop-in epic (the config-translation cluster).

- crates/sbproxy-config/src/litellm.rs: translate a LiteLLM config.yaml into an
  sbproxy ai_proxy sb.yml. Maps model_list -> providers (provider/model split,
  os.environ/VAR -> ${VAR}, model_map, weight, base_url/api_version/org),
  router_settings.routing_strategy -> routing (name map), rpm/tpm ->
  model_rate_limits, litellm_settings.cache -> semantic_cache. Two model_list
  entries sharing a model_name become a load-balanced model group (co-routed
  providers). Unmapped keys, Python callback hooks, external guardrails, and
  general_settings.master_key/database_url are surfaced as warnings, never
  dropped. 10 unit tests incl. translate-and-compile fixtures.
- sbproxy config import-litellm <path> --out sb.yml CLI subcommand.
- docs/migration-litellm.md (indexed) with a field-by-field mapping table and a
  worked example guarded by a compile test.

Covers WOR-1522, the strategy-name map portion of WOR-1524, the translator side
of WOR-1523 model groups, WOR-1531, and the config-fixture portion of WOR-1533.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
Add parse_budget_period(): accepts the canonical daily/weekly/monthly/total
names plus LiteLLM duration strings (30s/30m/30h/30d), erroring on unknown
units rather than silently falling through. Foundation for multi-window budget
enforcement (WOR-1527) and lets translated LiteLLM budget_duration values be
interpreted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
WOR-1528 (callback sinks) OSS seam: crates/sbproxy-ai/src/usage_sink.rs defines
the LlmUsageEvent record, the UsageSink trait (non-blocking, failure-isolated),
a JSONL file sink and a fire-and-forget webhook sink, plus declarative
UsageSinkConfig (jsonl_file / webhook). Closed-source sinks extend the same
trait. Unit-tested.

WOR-1523: examples/ai-model-group demonstrates one public model name backed by
several deployments, load-balanced via model-based routing + the routing
strategy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
…o, /health)

WOR-1530: serve model metadata + health from the ai_proxy config without an
upstream call. /model/info lists every deployment; /model_group/info groups
deployments by public model name (num_deployments); /health (+ /health/readiness
and the /health/liveliness | /health/liveness aliases) report status. Handled in
the GET branch of the AI dispatcher before the upstream forward; pure
ai_management_response() builder with unit tests. Demonstrated in the
ai-model-group example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
WOR-1529 seam: crates/sbproxy-ai/src/external_guardrail.rs adds a generic HTTP
guardrail adapter. ExternalGuardrailConfig (name/url/mode/default_on/fail_open/
timeout); GuardrailMode maps LiteLLM pre_call/post_call/during_call/logging_only
to input/output phases and a no-block logging mode; check_external_guardrail()
POSTs content and parses a verdict, failing open or closed per config; flexible
parse_verdict() accepts allowed/flagged/blocked response shapes. Unit-tested
incl. a fail-open vs fail-closed test against an unreachable endpoint. Wiring
into the guardrail pipeline is the next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
…red)

WOR-1528: ai_proxy configs can now declare `usage_sinks:` (jsonl_file / webhook);
AiHandlerConfig::usage_sinks() lazily builds them once and shares the Arc'd
instances across requests, mirroring the router/embedding_cache OnceLock
pattern. UsageSink gains a Debug bound and the sinks derive Debug so the config
still derives Debug. Sink config + builders are unit-tested.

Emitting events from the billing choke point spans three functions
(handle_ai_proxy + the cache- and stream-relay helpers); that wiring is left as
a focused follow-up rather than threaded through the hot path here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B8vnyA6iBx5FnoLbsnwWww
@rickcrawford rickcrawford merged commit 890f263 into main Jun 25, 2026
9 checks passed
@rickcrawford rickcrawford deleted the litellm-drop-in branch June 25, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant