From bd189d27f223ec26380cf9b339a8a044792d019f Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 02:28:16 -0400 Subject: [PATCH 01/10] docs(litellm-gateway): how-to + working example for the LiteLLM AI Gateway in front of OCI Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 194 ++++++++++++++++++++ docs/how-to/oci-models.md | 11 ++ examples/litellm-gateway/README.md | 65 +++++++ examples/litellm-gateway/config.yaml | 115 ++++++++++++ examples/litellm-gateway/docker-compose.yml | 62 +++++++ examples/litellm-gateway/helm-values.yaml | 125 +++++++++++++ mkdocs.yml | 1 + 7 files changed, 573 insertions(+) create mode 100644 docs/how-to/litellm-gateway.md create mode 100644 examples/litellm-gateway/README.md create mode 100644 examples/litellm-gateway/config.yaml create mode 100644 examples/litellm-gateway/docker-compose.yml create mode 100644 examples/litellm-gateway/helm-values.yaml diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md new file mode 100644 index 0000000..bce4fb4 --- /dev/null +++ b/docs/how-to/litellm-gateway.md @@ -0,0 +1,194 @@ +# Running Locus behind the LiteLLM AI Gateway + +[LiteLLM](https://litellm.ai) ships an open-source proxy — variously +branded the **LiteLLM Proxy Server** and the **LiteLLM AI Gateway** — +that fronts 100+ model providers behind one OpenAI-shaped HTTP API. + +When you put it in front of Oracle Generative AI Infrastructure (and +optionally other providers), Locus consumes it through its existing +[`OpenAIModel`](../concepts/providers/openai.md) with no Locus-side code +change. The gateway carries the parts of the integration that genuinely +belong in a gateway: virtual keys, per-team budgets, fallback chains, +centralised observability, cost reporting, caching, and guardrails. + +```text +Locus agent + │ OpenAIModel(base_url="http://litellm-gateway:4000", api_key="") + ▼ +LiteLLM Proxy Server (config.yaml carries every provider + key) + │ + ├──► OCI Generative AI (/20231130/actions/chat — vendor adapters) + ├──► OpenAI direct + ├──► Anthropic + ├──► AWS Bedrock + └──► … 100+ providers +``` + +## When to choose this over the direct OCI providers + +Locus's [direct OCI model providers](oci-models.md) remain the right +default for **single-tenant production, dev / CI, and on-OKE workload +identity** — they're simpler, in-process, lower-latency, and have no +extra service to operate. + +**Reach for the gateway when** you need: + +- **Multi-tenant key management** — issue virtual keys per team / agent + / customer with per-key budgets, RPM/TPM limits, expiry, and model + allowlists. +- **Fallback chains across regions or providers** — "OCI us-chicago-1 + → OCI us-ashburn-1 → external Anthropic" defined in `config.yaml`, + no Locus restart. +- **Centralised observability** — one Langfuse / OpenTelemetry / + Datadog / Helicone hook configured in the gateway, every Locus + service feeds it. +- **Centralised cost tracking** — Postgres-backed per-key / per-team / + per-model spend reporting across every consumer. +- **Polyglot consumers** — Python Locus, JS workbench, Ruby / Go + services all talk OpenAI to the same gateway. +- **Caching across services** — Redis / S3 / Qdrant in-flight, shared + across every consumer. + +If none of those apply, **prefer the direct OCI providers**. The +gateway is an extra deployment, not a shortcut. + +## Quickstart — local Docker + +The `examples/litellm-gateway/` directory ships a working sample: + +```bash +cd examples/litellm-gateway/ + +# Populate the OCI credentials the gateway will use to sign upstream calls. +# These live in the *gateway's* environment, not in your Locus app. +export OCI_REGION="us-chicago-1" +export OCI_USER="ocid1.user.oc1..xxx" +export OCI_FINGERPRINT="aa:bb:cc:..." +export OCI_TENANCY="ocid1.tenancy.oc1..xxx" +export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" +export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" + +docker compose up +``` + +The gateway listens on `http://localhost:4000` and exposes the model +aliases declared in `config.yaml` (`oci-grok`, `oci-cohere-command`, +`oci-gpt5-mini`, `oci-cohere-embed` in the sample). + +Verify with a `curl`: + +```bash +curl -s http://localhost:4000/v1/models \ + -H "Authorization: Bearer $LITELLM_VIRTUAL_KEY" | jq '.data[].id' +``` + +## Pointing Locus at the gateway + +Use the existing `OpenAIModel` — that's the LiteLLM-compatible client: + +```python +from locus.agent import Agent +from locus.models.native.openai import OpenAIModel + +model = OpenAIModel( + model="oci-cohere-command", # alias from gateway config.yaml + api_key="$LITELLM_VIRTUAL_KEY", # virtual key issued by the gateway + base_url="http://localhost:4000", # the LiteLLM AI Gateway +) + +agent = Agent(model=model, system_prompt="You are concise.") +print(agent.run_sync("hi").message) +``` + +No new Locus class is needed. The gateway handles OCI RSA-SHA256 +signing, vendor adapters (Cohere `preamble` / `chatHistory`, GENERIC +apiFormat for Grok / Llama / Gemini), fallback, budgets, and +observability internally. Locus only ever sees the OpenAI-shaped +HTTP contract. + +## Running existing notebooks through the gateway + +Every `examples/notebook_*.py` already routes model construction +through `examples/config.py:get_model()`, which honors +`LOCUS_MODEL_PROVIDER=openai` plus the standard `OPENAI_BASE_URL` / +`OPENAI_API_KEY` env vars. So pointing every notebook at the gateway +is a four-line shell change — no code edits: + +```bash +docker compose -f examples/litellm-gateway/docker-compose.yml up -d + +export LOCUS_MODEL_PROVIDER=openai +export LOCUS_MODEL_ID=oci-cohere-command # alias from config.yaml +export OPENAI_BASE_URL=http://localhost:4000 +export OPENAI_API_KEY=$LITELLM_VIRTUAL_KEY # gateway virtual key + +python examples/notebook_06_basic_agent.py +python examples/notebook_07_agent_with_tools.py +# … +``` + +## Deploying on OKE + +The sample [`helm-values.yaml`](https://github.com/oracle-samples/locus/blob/main/examples/litellm-gateway/helm-values.yaml) +in `examples/litellm-gateway/` plugs into LiteLLM's official Helm chart +([`ghcr.io/berriai/litellm-helm`](https://github.com/BerriAI/litellm/tree/main/deploy/charts/litellm-helm)). +The recommended deployment shape is: + +- One LiteLLM gateway Deployment per environment. +- OCI credentials wired in via Kubernetes secrets, sourced from OCI Vault + (or via the gateway pod's OKE Workload Identity if you'd rather not + mount a long-lived signing key at all — see "Authentication" below). +- Postgres for virtual-key state and spend logs. +- Service exposed cluster-internal only — Locus services hit it via + the in-cluster DNS name (`litellm-gateway.litellm.svc.cluster.local:4000`). + +Don't expose the gateway publicly — issuing virtual keys is your +auth boundary, but the OCI credentials inside the gateway are not. + +## Authentication + +The gateway changes the credential boundary: + +| Without gateway | With gateway | +|---|---| +| Locus → OCI directly. Locus carries the OCI signing key (or uses OKE Workload Identity). | Locus → gateway with a **virtual key**. Gateway → OCI with the OCI signing key (or its own OKE Workload Identity). | + +So **Locus no longer needs OCI credentials at all** — the gateway is +the only thing that does. Locus only needs the virtual API key the +gateway issued it. This is the central reason to deploy the gateway +on a multi-tenant platform: agents from different teams use different +virtual keys with different budgets, all hitting the same underlying +OCI tenancy. + +On OKE, run the gateway pod with workload identity targeting the OCI +compartment, and OCI signing keys never have to land on disk anywhere. + +## What lives in `config.yaml` + +The sample `examples/litellm-gateway/config.yaml` declares the OCI +provider entries (one per model you want to expose), a virtual-key +section (mock or Postgres-backed), and the global gateway settings. +The full schema is documented at +[docs.litellm.ai/docs/proxy/configs](https://docs.litellm.ai/docs/proxy/configs). +Highlights: + +- **`model_list`** — every model alias the gateway exposes. The same + alias is what Locus passes as `model=` to `OpenAIModel`. +- **`general_settings.master_key`** — the admin key that creates + per-team virtual keys via `/key/generate`. +- **`router_settings.fallbacks`** — fallback chains across model + aliases (e.g. `[{"oci-gpt5-mini": ["oci-grok"]}]`). +- **`litellm_settings.callbacks`** — observability hooks (Langfuse, + OTel, Datadog, …). +- **`litellm_settings.cache`** — Redis / S3 / Qdrant caching config. + +## See also + +- [`docs/how-to/oci-models.md`](oci-models.md) — direct OCI providers + (`OCIChatCompletionsModel`, `OCIResponsesModel`, `OCIModel`). The + default for single-tenant deployments. +- [`examples/litellm-gateway/`](https://github.com/oracle-samples/locus/tree/main/examples/litellm-gateway) + — working `config.yaml`, `docker-compose.yml`, and `helm-values.yaml`. +- [LiteLLM AI Gateway quickstart](https://docs.litellm.ai/docs/proxy/quick_start) +- [LiteLLM `config.yaml` reference](https://docs.litellm.ai/docs/proxy/configs) +- [LiteLLM Helm chart](https://github.com/BerriAI/litellm/tree/main/deploy/charts/litellm-helm) diff --git a/docs/how-to/oci-models.md b/docs/how-to/oci-models.md index 60f79e3..fb8f3d1 100644 --- a/docs/how-to/oci-models.md +++ b/docs/how-to/oci-models.md @@ -5,6 +5,17 @@ AI through **three transports**. The `oci:` string factory picks V1 or SDK by model family; the Responses transport is opt-in. +!!! tip "Multi-tenant / cross-provider deployments" + The direct OCI providers documented on this page are the right + default for single-tenant production, dev / CI, and on-OKE + workload identity. **If you need centralised virtual keys, + per-team budgets, fallback chains across regions or providers, + centralised observability (Langfuse / OTel / Datadog), or + cross-provider routing, see the + [LiteLLM AI Gateway guide](litellm-gateway.md) instead** — Locus + connects to that via its existing `OpenAIModel(base_url=...)` + client, no new Locus class needed. + | Model family | Transport | Class | Endpoint | |---|---|---|---| | OpenAI (`openai.gpt-*`, `openai.o*`) | V1 | `OCIChatCompletionsModel` | `/openai/v1/chat/completions` | diff --git a/examples/litellm-gateway/README.md b/examples/litellm-gateway/README.md new file mode 100644 index 0000000..c557e55 --- /dev/null +++ b/examples/litellm-gateway/README.md @@ -0,0 +1,65 @@ +# LiteLLM AI Gateway — sample in front of OCI Generative AI + +Working sample for deploying the [LiteLLM AI Gateway](https://litellm.ai) +in front of Oracle Generative AI Infrastructure, with Locus pointed at +it via its existing `OpenAIModel(base_url=...)`. + +| File | What it is | +|---|---| +| [`config.yaml`](config.yaml) | Gateway model catalog. OCI providers, fallback chains, drop_params. Mounted into the container. | +| [`docker-compose.yml`](docker-compose.yml) | One-command local-dev gateway. Reads OCI credentials from env vars on the host. | +| [`helm-values.yaml`](helm-values.yaml) | OKE / Kubernetes deployment via the official `litellm-helm` chart. | + +## Local quickstart + +```bash +# 1. Set OCI credentials the gateway will sign upstream calls with. +export OCI_REGION="us-chicago-1" +export OCI_USER="ocid1.user.oc1..xxx" +export OCI_FINGERPRINT="aa:bb:cc:..." +export OCI_TENANCY="ocid1.tenancy.oc1..xxx" +export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" +export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" +export LITELLM_MASTER_KEY="$(openssl rand -hex 32)" # admin key for /key/generate + +# 2. Start the gateway. +docker compose up + +# 3. From your Locus app — no Locus-side code change needed: +# +# from locus.models.native.openai import OpenAIModel +# model = OpenAIModel( +# model="oci-cohere-command", # alias from config.yaml +# api_key="$LITELLM_VIRTUAL_KEY", # virtual key +# base_url="http://localhost:4000", +# ) +``` + +## OKE quickstart + +```bash +# 1. OCI credentials + master key as secrets. +kubectl create namespace litellm +kubectl -n litellm create secret generic oci-credentials \ + --from-literal=OCI_REGION="us-chicago-1" \ + --from-literal=OCI_USER="ocid1.user.oc1..xxx" \ + --from-literal=OCI_FINGERPRINT="aa:bb:cc:..." \ + --from-literal=OCI_TENANCY="ocid1.tenancy.oc1..xxx" \ + --from-literal=OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" \ + --from-file=OCI_KEY_FILE=/path/to/api_key.pem + +kubectl -n litellm create secret generic litellm-master \ + --from-literal=LITELLM_MASTER_KEY="$(openssl rand -hex 32)" + +# 2. Install the chart with the sample values + mount the config.yaml. +helm repo add litellm oci://ghcr.io/berriai/litellm-helm +helm -n litellm upgrade --install gateway litellm/litellm \ + --values helm-values.yaml \ + --set-file proxy_config=config.yaml +``` + +## Full documentation + +See [`docs/how-to/litellm-gateway.md`](../../docs/how-to/litellm-gateway.md) +for when to choose this path over the direct OCI providers, the auth +boundary diagram, and the notebook-run-via-gateway recipe. diff --git a/examples/litellm-gateway/config.yaml b/examples/litellm-gateway/config.yaml new file mode 100644 index 0000000..2987f76 --- /dev/null +++ b/examples/litellm-gateway/config.yaml @@ -0,0 +1,115 @@ +# LiteLLM AI Gateway — sample config for fronting Oracle Generative AI. +# +# Used by docker-compose.yml in this directory. Stand the gateway up with: +# +# cd examples/litellm-gateway/ +# docker compose up +# +# The gateway picks up OCI credentials from the env vars exported into the +# container by docker-compose.yml. See docs/how-to/litellm-gateway.md. + +model_list: + # --------------------------------------------------------------------------- + # OCI Generative AI Infrastructure — native API (/20231130/actions/chat). + # LiteLLM handles the OCI Signature v1 RSA-SHA256 signing internally. + # --------------------------------------------------------------------------- + +- model_name: oci-cohere-command + litellm_params: + model: oci/cohere.command-latest + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + +- model_name: oci-grok + litellm_params: + model: oci/xai.grok-4.20 + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + +- model_name: oci-gpt5-mini + litellm_params: + model: oci/openai.gpt-5-mini + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + +- model_name: oci-llama-4-maverick + litellm_params: + model: oci/meta.llama-4-maverick-17b-128e-instruct-fp8 + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + +- model_name: oci-gemini-2.5-flash + litellm_params: + model: oci/google.gemini-2.5-flash + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + +- model_name: oci-cohere-embed + litellm_params: + model: oci/cohere.embed-v4.0 + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + + +# Fallback chains across the OCI model catalog. If oci-gpt5-mini returns +# a 5xx / rate-limit error, the gateway transparently retries against +# oci-grok, then oci-cohere-command. The agent never sees the upstream +# failure — only the eventual successful response. +router_settings: + fallbacks: + - oci-gpt5-mini: + - oci-grok + - oci-cohere-command + - oci-grok: + - oci-cohere-command + + +# Gateway-wide settings. ``drop_params`` strips OpenAI kwargs the +# upstream provider doesn't accept (e.g. ``parallel_tool_calls`` on +# Cohere) so cross-vendor agent loops don't 400. ``set_verbose`` is +# kept off here; flip it on for debugging. +litellm_settings: + drop_params: true + set_verbose: false + # Observability — uncomment one or more once you have the backend wired: + # success_callback: ["langfuse"] + # failure_callback: ["langfuse"] + # cache: true + # cache_params: + # type: redis + # host: redis + # port: 6379 + + +# Master key for the admin API (used by /key/generate to issue +# per-team virtual keys). Treat as a secret. In production, mount via +# Kubernetes secrets or OCI Vault rather than baking it into the image. +general_settings: + master_key: os.environ/LITELLM_MASTER_KEY + # Postgres backing for virtual keys + spend logs. + # Comment out for stateless local-dev mode. + # database_url: os.environ/DATABASE_URL diff --git a/examples/litellm-gateway/docker-compose.yml b/examples/litellm-gateway/docker-compose.yml new file mode 100644 index 0000000..eb6c321 --- /dev/null +++ b/examples/litellm-gateway/docker-compose.yml @@ -0,0 +1,62 @@ +# Sample local-dev LiteLLM AI Gateway in front of OCI Generative AI. +# +# Stand up with: +# +# # 1. Set the OCI credentials the *gateway* will use to sign upstream calls. +# # These do NOT leak to your Locus app — only the gateway sees them. +# export OCI_REGION="us-chicago-1" +# export OCI_USER="ocid1.user.oc1..xxx" +# export OCI_FINGERPRINT="aa:bb:cc:..." +# export OCI_TENANCY="ocid1.tenancy.oc1..xxx" +# export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" +# export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" +# +# # 2. Pick any string as the local master key. The gateway issues +# # per-team virtual keys via /key/generate using this as the admin token. +# export LITELLM_MASTER_KEY="$LITELLM_VIRTUAL_KEY" +# +# # 3. Start the gateway. +# docker compose up +# +# # 4. Point Locus at it (in a second shell): +# # from locus.models.native.openai import OpenAIModel +# # model = OpenAIModel( +# # model="oci-cohere-command", +# # api_key="$LITELLM_VIRTUAL_KEY", +# # base_url="http://localhost:4000", +# # ) + +services: + litellm: + image: ghcr.io/berriai/litellm:main-stable + container_name: locus-litellm-gateway + ports: + - 4000:4000 + volumes: + # Mount the config alongside the OCI key the gateway will sign with. + - ./config.yaml:/app/config.yaml:ro + - ${OCI_KEY_FILE:-/dev/null}:/oci-keys/key.pem:ro + environment: + # Surface env vars referenced by config.yaml's ``os.environ/...`` lookups. + OCI_REGION: ${OCI_REGION:?set OCI_REGION before running docker compose up} + OCI_USER: ${OCI_USER:?set OCI_USER} + OCI_FINGERPRINT: ${OCI_FINGERPRINT:?set OCI_FINGERPRINT} + OCI_TENANCY: ${OCI_TENANCY:?set OCI_TENANCY} + OCI_COMPARTMENT_ID: ${OCI_COMPARTMENT_ID:?set OCI_COMPARTMENT_ID} + # The container sees the key at the mounted path, not the host path. + OCI_KEY_FILE: /oci-keys/key.pem + # Admin token. Used to call /key/generate from outside. + LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-$LITELLM_VIRTUAL_KEY} + command: [--config, /app/config.yaml, --port, '4000', --num_workers, '1'] + restart: unless-stopped + healthcheck: + test: [CMD, wget, --quiet, --tries=1, --spider, http://localhost:4000/health/liveness] + interval: 30s + timeout: 5s + start_period: 20s + retries: 3 + +# For a production-shaped setup, add a Postgres service here and set +# ``general_settings.database_url`` in config.yaml. Without it, virtual +# keys and spend logs live in memory and disappear on restart — fine +# for local dev, not for production. diff --git a/examples/litellm-gateway/helm-values.yaml b/examples/litellm-gateway/helm-values.yaml new file mode 100644 index 0000000..9e1f7cc --- /dev/null +++ b/examples/litellm-gateway/helm-values.yaml @@ -0,0 +1,125 @@ +# LiteLLM AI Gateway — sample OKE / Kubernetes values for fronting OCI. +# +# Pairs with the official LiteLLM Helm chart: +# ghcr.io/berriai/litellm-helm +# https://github.com/BerriAI/litellm/tree/main/deploy/charts/litellm-helm +# +# Deploy with: +# +# # 1. Create the OCI credentials secret (the gateway pod uses these +# # to sign upstream OCI calls). Prefer External Secrets / OCI Vault +# # in production; the kubectl form below is for illustration only. +# kubectl create namespace litellm +# kubectl -n litellm create secret generic oci-credentials \ +# --from-literal=OCI_REGION="us-chicago-1" \ +# --from-literal=OCI_USER="ocid1.user.oc1..xxx" \ +# --from-literal=OCI_FINGERPRINT="aa:bb:cc:..." \ +# --from-literal=OCI_TENANCY="ocid1.tenancy.oc1..xxx" \ +# --from-literal=OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" \ +# --from-file=OCI_KEY_FILE=/path/to/api_key.pem +# +# # 2. Master key for /key/generate. Treat as a high-value secret. +# kubectl -n litellm create secret generic litellm-master \ +# --from-literal=LITELLM_MASTER_KEY="$(openssl rand -hex 32)" +# +# # 3. Install / upgrade the chart. +# helm repo add litellm oci://ghcr.io/berriai/litellm-helm +# helm -n litellm upgrade --install gateway litellm/litellm \ +# --values examples/litellm-gateway/helm-values.yaml +# +# Locus services in the same cluster then point at: +# http://gateway-litellm.litellm.svc.cluster.local:4000 + +# --------------------------------------------------------------------------- +# Image / replicas +# --------------------------------------------------------------------------- +image: + repository: ghcr.io/berriai/litellm + tag: main-stable + pullPolicy: IfNotPresent + +replicaCount: 2 + +# --------------------------------------------------------------------------- +# Inline config — alternatively mount the contents of config.yaml from +# this directory via a ConfigMap. The Helm chart accepts either. +# --------------------------------------------------------------------------- +# Set proxy_config to the contents of config.yaml. When using --set-file +# you can reference the sibling file directly: +# --set-file proxy_config=examples/litellm-gateway/config.yaml +# proxy_config: | +# model_list: +# - model_name: oci-cohere-command +# litellm_params: +# model: oci/cohere.command-latest +# … + +# --------------------------------------------------------------------------- +# Environment — OCI credentials + master key sourced from secrets. +# --------------------------------------------------------------------------- +envFrom: +- secretRef: + name: oci-credentials +- secretRef: + name: litellm-master + +# --------------------------------------------------------------------------- +# Resources — sized for a small platform tier. Adjust to traffic. +# --------------------------------------------------------------------------- +resources: + requests: + cpu: 500m + memory: 1Gi + limits: + cpu: 2000m + memory: 4Gi + +# --------------------------------------------------------------------------- +# Service — cluster-internal only. The gateway issues virtual keys and +# carries OCI signing material; do NOT expose it via a LoadBalancer. +# Use an in-cluster Ingress + mTLS / OAuth2 proxy if you need it reachable +# from other namespaces / clusters. +# --------------------------------------------------------------------------- +service: + type: ClusterIP + port: 4000 + +# --------------------------------------------------------------------------- +# Postgres — required for persistent virtual keys + spend logs in +# production. The chart bundles an option to provision it; the form +# below points at an external Postgres (e.g. an OCI ADB instance). +# --------------------------------------------------------------------------- +db: + useExisting: true + endpoint: postgres.your-platform.svc.cluster.local:5432 + database: litellm + url: postgresql://litellm:$(POSTGRES_PASSWORD)@$(DB_HOST)/$(DB_DATABASE) + secret: + name: litellm-db + usernameKey: username + passwordKey: password + +# --------------------------------------------------------------------------- +# OKE Workload Identity — when set, the gateway pod assumes an OCI +# IAM workload identity and OCI signing keys never have to land on +# disk. Configure the OCI principal in your IAM policy and uncomment +# below; remove the OCI_KEY / OCI_USER / OCI_FINGERPRINT entries from +# the oci-credentials Secret in that case. +# --------------------------------------------------------------------------- +serviceAccount: + create: true + name: litellm-gateway + # annotations: + # oraclecloud.com/workload-identity: "" + +# --------------------------------------------------------------------------- +# Pod hardening +# --------------------------------------------------------------------------- +podSecurityContext: + runAsNonRoot: true + runAsUser: 65534 +securityContext: + readOnlyRootFilesystem: true + allowPrivilegeEscalation: false + capabilities: + drop: [ALL] diff --git a/mkdocs.yml b/mkdocs.yml index 75747fb..7f8963b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -271,6 +271,7 @@ nav: - Add a checkpointer backend: how-to/custom-checkpointer.md - OCI GenAI models: how-to/oci-models.md - OCI Dedicated AI Cluster (DAC): how-to/oci-dac.md + - LiteLLM AI Gateway: how-to/litellm-gateway.md - Workbench: workbench.md - API reference: - Agent: api/agent.md From 2eed91afdfbe148b27907652b1f750a79b040026 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 07:20:13 -0400 Subject: [PATCH 02/10] docs(litellm-gateway): notebook 71 + SVG architecture diagram + unit & integration tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - examples/notebook_71_litellm_gateway.py — runnable companion to the how-to. Health-checks the gateway, builds an Agent around OpenAIModel(base_url=...), runs blocking + streaming prompts. Self-skips with a wiring banner when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set. - docs/img/litellm-gateway-architecture.svg — three-tier SVG flow (Locus → LiteLLM Gateway → OCI Generative AI). The middle panel itemises every gateway feature so reviewers can see what the proxy carries that an in-process wrapper doesn't. - docs/notebooks/notebook_71_litellm_gateway.md — notebook md stub with the SVG embedded. - mkdocs.yml — notebook nav entry next to notebook 70. - docs/how-to/litellm-gateway.md — SVG embedded at the top. - tests/unit/test_litellm_gateway_example.py — 20 tests, no network. Parses config.yaml / docker-compose.yml / helm-values.yaml and asserts the documented invariants: alias / docs parity, OCI_* env wiring on every upstream entry, drop_params=True, master_key env sourced, fallback chains reference declared aliases, compose uses ${VAR:?…} strict form, OCI key mounted read-only, helm Service is ClusterIP-only, pod hardened (non-root, read-only root, caps dropped), README cross-references the artifacts. - tests/integration/test_litellm_gateway_live.py — drives the live gateway end-to-end through Locus's OpenAIModel: /v1/models health check, negative-path unauthenticated rejection, basic completion, multi-turn with system message, streaming, tool calling, full Agent loop. Auto-skipped when LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY aren't set; runs from the existing _litellm_integration workflow. Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 2 + docs/img/litellm-gateway-architecture.svg | 109 +++++++ docs/notebooks/notebook_71_litellm_gateway.md | 68 ++++ examples/notebook_71_litellm_gateway.py | 232 ++++++++++++++ mkdocs.yml | 1 + .../integration/test_litellm_gateway_live.py | 232 ++++++++++++++ tests/unit/test_litellm_gateway_example.py | 295 ++++++++++++++++++ 7 files changed, 939 insertions(+) create mode 100644 docs/img/litellm-gateway-architecture.svg create mode 100644 docs/notebooks/notebook_71_litellm_gateway.md create mode 100644 examples/notebook_71_litellm_gateway.py create mode 100644 tests/integration/test_litellm_gateway_live.py create mode 100644 tests/unit/test_litellm_gateway_example.py diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md index bce4fb4..4b7287f 100644 --- a/docs/how-to/litellm-gateway.md +++ b/docs/how-to/litellm-gateway.md @@ -4,6 +4,8 @@ branded the **LiteLLM Proxy Server** and the **LiteLLM AI Gateway** — that fronts 100+ model providers behind one OpenAI-shaped HTTP API. +![Locus → LiteLLM AI Gateway → OCI Generative AI](../img/litellm-gateway-architecture.svg) + When you put it in front of Oracle Generative AI Infrastructure (and optionally other providers), Locus consumes it through its existing [`OpenAIModel`](../concepts/providers/openai.md) with no Locus-side code diff --git a/docs/img/litellm-gateway-architecture.svg b/docs/img/litellm-gateway-architecture.svg new file mode 100644 index 0000000..b32696e --- /dev/null +++ b/docs/img/litellm-gateway-architecture.svg @@ -0,0 +1,109 @@ + + Locus behind the LiteLLM AI Gateway in front of Oracle Generative AI Infrastructure + A Locus agent uses its existing OpenAIModel pointed at a LiteLLM AI Gateway. The gateway carries virtual keys, fallback chains, observability, cost tracking, caching, guardrails, and auditing. The gateway signs the upstream call into OCI Generative AI Infrastructure with OCI Signature v1. The same gateway also fronts OpenAI direct, Anthropic, AWS Bedrock, Google Vertex, and Ollama. + + + + + + + + + + + Architecture + Locus → LiteLLM AI Gateway → Oracle Generative AI Infrastructure + + + + + Locus agent + No OCI credentials. No new model class. Just the existing OpenAI-compatible client. + + + + + OpenAIModel( + model="oci-cohere-command", + base_url="http://litellm-gateway:4000", + api_key="<virtual-key>" + ) + + + + OpenAI-shape HTTPS · Bearer <virtual-key> + + + + + LiteLLM AI Gateway + litellm --config config.yaml · pod / container / sidecar — operated by your platform team + + + + + + Virtual keys + per-team budgets · RPM/TPM · expiry · model allowlists + + + Fallback chains + oci-gpt5-mini → oci-grok → oci-cohere-command + + + + Observability + Langfuse · OpenTelemetry · Datadog · Helicone + + + Cost tracking + Postgres — per-key / per-team / per-model spend + + + + Cache + Redis · S3 · Qdrant (semantic + exact match) + + + Guardrails + Lakera · Aporia · Presidio · Bedrock Guardrails + + + + Audit + every request, response, tool call, stream chunk + + + Admin UI + spend dashboards · virtual-key management + + + OCI Signature v1 RSA-SHA256 signing happens here, not in Locus. OCI credentials live in the gateway pod + via OCI Vault or OKE Workload Identity — never on disk in any Locus service. + + + signed HTTPS · OCI Signature v1 + + + + + Oracle Generative AI Infrastructure + + /20231130/actions/chat · + /openai/v1/chat/completions · + /openai/v1/responses + + Llama · Grok · Cohere Command · Cohere Embed · Google Gemini · OpenAI gpt-5 + Same gateway also fronts: OpenAI direct · Anthropic · AWS Bedrock · Google Vertex AI · Ollama · 100+ providers + diff --git a/docs/notebooks/notebook_71_litellm_gateway.md b/docs/notebooks/notebook_71_litellm_gateway.md new file mode 100644 index 0000000..4fa8513 --- /dev/null +++ b/docs/notebooks/notebook_71_litellm_gateway.md @@ -0,0 +1,68 @@ +# LiteLLM AI Gateway + +This notebook is the runnable companion to the +[LiteLLM AI Gateway how-to](../how-to/litellm-gateway.md). It demonstrates +the production-shaped integration pattern: a Locus agent talks to a +LiteLLM AI Gateway via the **existing** `OpenAIModel(base_url=...)`, +and the gateway handles every OCI-specific concern (RSA-SHA256 signing, +vendor adapters, fallbacks, virtual keys, budgets, observability, cost +tracking, caching, guardrails). + +**No new Locus model class. The gateway is OpenAI-shaped by design.** + +![Locus → LiteLLM AI Gateway → OCI Generative AI](../img/litellm-gateway-architecture.svg) + +## What the notebook does + +1. **Health-checks the gateway** at `LITELLM_GATEWAY_URL` and prints + the model aliases it exposes — surfaces config drift before any + agent code runs. +2. **Runs an `Agent`** built around `OpenAIModel(base_url=..., api_key=...)` + against the alias in `LITELLM_GATEWAY_MODEL` (default + `oci-cohere-command`, defined in + [`examples/litellm-gateway/config.yaml`](https://github.com/oracle-samples/locus/blob/main/examples/litellm-gateway/config.yaml)). +3. **Streams a response** through the same agent to prove SSE flows + end-to-end Locus → gateway → OCI. + +When neither `LITELLM_GATEWAY_URL` nor `LITELLM_GATEWAY_KEY` is set, +the notebook prints the wiring snippet and exits cleanly — same +self-skip pattern as Locus's other infrastructure notebooks. + +## Prerequisites + +```bash +# 1. Start the gateway (in another shell). +cd examples/litellm-gateway/ +export OCI_REGION="us-chicago-1" +export OCI_USER="ocid1.user.oc1..xxx" +export OCI_FINGERPRINT="aa:bb:cc:..." +export OCI_TENANCY="ocid1.tenancy.oc1..xxx" +export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" +export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" +export LITELLM_MASTER_KEY="$(openssl rand -hex 32)" +docker compose up -d + +# 2. Wire this notebook at the gateway. +export LITELLM_GATEWAY_URL="http://localhost:4000" +export LITELLM_GATEWAY_KEY="$LITELLM_MASTER_KEY" +export LITELLM_GATEWAY_MODEL="oci-cohere-command" + +python examples/notebook_71_litellm_gateway.py +``` + +## Why no new Locus class + +Locus's existing `OpenAIModel` already speaks the wire contract LiteLLM's +gateway exposes (OpenAI Chat Completions over HTTPS, Bearer-token auth). +Inventing a `LiteLLMModel` class to wrap `litellm.acompletion()` in-process +would have meant re-implementing a subset of the proxy's surface — and +permanently lagging behind it. The how-to page covers the design call in +detail. + +## See also + +- [`docs/how-to/litellm-gateway.md`](../how-to/litellm-gateway.md) — when + the gateway is the right path; auth-boundary diagram; OKE deployment. +- [`examples/litellm-gateway/`](https://github.com/oracle-samples/locus/tree/main/examples/litellm-gateway) — the working sample: `config.yaml`, `docker-compose.yml`, `helm-values.yaml`. +- [Notebook 01 — OCI transports](notebook_01_oci_transports.md) — the + direct (no-gateway) OCI providers, the right default for single-tenant. diff --git a/examples/notebook_71_litellm_gateway.py b/examples/notebook_71_litellm_gateway.py new file mode 100644 index 0000000..6708230 --- /dev/null +++ b/examples/notebook_71_litellm_gateway.py @@ -0,0 +1,232 @@ +# Copyright (c) 2025, 2026 Oracle and/or its affiliates. +# Licensed under the Universal Permissive License v1.0 as shown at +# https://oss.oracle.com/licenses/upl/ +"""Notebook 71: Run a Locus agent behind the LiteLLM AI Gateway. + +This notebook is the runnable companion to +``docs/how-to/litellm-gateway.md``. It does **not** add a new Locus +model class — it uses Locus's existing :class:`OpenAIModel` pointed at +a LiteLLM AI Gateway URL. The gateway is what fronts Oracle Generative +AI Infrastructure (and optionally other providers); Locus only ever +sees the OpenAI-shaped HTTP contract the gateway exposes. + +Key concepts: + +- The LiteLLM Proxy Server (a.k.a. **LiteLLM AI Gateway**) is the + product. The Python ``litellm.acompletion()`` function is internal + scaffolding. The gateway is what carries the platform-grade pieces: + virtual keys, per-team budgets, fallback chains, centralised + observability, cost tracking, caching, guardrails. +- Locus consumes the gateway through ``OpenAIModel(base_url=...)``. + No new Locus class is needed; the gateway is OpenAI-shaped by design. +- The gateway holds the OCI signing credentials. Locus only holds the + gateway-issued virtual key. On OKE, the gateway pod can use Workload + Identity so the OCI signing key never lands on disk at all. + +Run it:: + + # 1. Start the gateway. The sample ships at examples/litellm-gateway/. + cd examples/litellm-gateway/ + export OCI_REGION="us-chicago-1" + export OCI_USER="ocid1.user.oc1..xxx" + export OCI_FINGERPRINT="aa:bb:cc:..." + export OCI_TENANCY="ocid1.tenancy.oc1..xxx" + export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" + export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" + export LITELLM_MASTER_KEY="$(openssl rand -hex 32)" + docker compose up -d + + # 2. Point this notebook at the gateway: + export LITELLM_GATEWAY_URL="http://localhost:4000" + export LITELLM_GATEWAY_KEY="$LITELLM_MASTER_KEY" # or any virtual key issued by /key/generate + export LITELLM_GATEWAY_MODEL="oci-cohere-command" # alias from config.yaml + + python examples/notebook_71_litellm_gateway.py + +Without ``LITELLM_GATEWAY_URL`` and ``LITELLM_GATEWAY_KEY`` set, the +notebook prints the wiring snippet and exits cleanly — no traceback, +no half-initialised state. + +Difficulty: Beginner +""" + +from __future__ import annotations + +import asyncio +import os +import sys +from typing import Any + + +# --------------------------------------------------------------------------- +# Prerequisites +# --------------------------------------------------------------------------- + + +_REQUIRED_ENV = ( + "LITELLM_GATEWAY_URL", + "LITELLM_GATEWAY_KEY", +) +_OPTIONAL_ENV = ("LITELLM_GATEWAY_MODEL",) + + +def _print_skip_banner(missing: list[str]) -> None: + print("=" * 72) + print(" LiteLLM AI Gateway not configured — skipping the live demo.") + print("=" * 72) + print( + f"\n Missing environment variables: {', '.join(missing)}\n\n" + " This notebook expects a LiteLLM AI Gateway running in front of OCI\n" + " Generative AI. Start the sample gateway in another terminal:\n\n" + " cd examples/litellm-gateway/\n" + ' export OCI_REGION="us-chicago-1"\n' + ' export OCI_USER="ocid1.user.oc1..xxx"\n' + ' export OCI_FINGERPRINT="aa:bb:cc:..."\n' + ' export OCI_TENANCY="ocid1.tenancy.oc1..xxx"\n' + ' export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem"\n' + ' export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx"\n' + ' export LITELLM_MASTER_KEY="$(openssl rand -hex 32)"\n' + " docker compose up -d\n\n" + " Then export the gateway URL and key in this shell:\n\n" + ' export LITELLM_GATEWAY_URL="http://localhost:4000"\n' + ' export LITELLM_GATEWAY_KEY="$LITELLM_MASTER_KEY"\n' + ' export LITELLM_GATEWAY_MODEL="oci-cohere-command"\n\n' + " Full how-to: docs/how-to/litellm-gateway.md\n" + ) + + +def _check_prerequisites() -> tuple[str, str, str]: + missing = [v for v in _REQUIRED_ENV if not os.environ.get(v)] + if missing: + _print_skip_banner(missing) + sys.exit(0) + + url = os.environ["LITELLM_GATEWAY_URL"].rstrip("/") + key = os.environ["LITELLM_GATEWAY_KEY"] + # Default to a Cohere alias because the sample config.yaml ships it. + model = os.environ.get("LITELLM_GATEWAY_MODEL", "oci-cohere-command") + return url, key, model + + +# --------------------------------------------------------------------------- +# Part 1 — Health check against the gateway +# --------------------------------------------------------------------------- + + +def _print_gateway_health(url: str, key: str) -> None: + """Print the gateway's reachable models. Surfaces config drift early.""" + import httpx + + print("=== Gateway health ===\n") + try: + resp = httpx.get( + f"{url}/v1/models", + headers={"Authorization": f"Bearer {key}"}, + timeout=5.0, + ) + resp.raise_for_status() + except httpx.RequestError as exc: + print(f" could not reach {url}/v1/models: {exc}") + print(" (is the gateway running? — see docker compose logs)") + sys.exit(1) + except httpx.HTTPStatusError as exc: + print(f" {url}/v1/models returned {exc.response.status_code}") + print(f" body: {exc.response.text[:200]}") + sys.exit(1) + + payload = resp.json() + aliases = [m["id"] for m in payload.get("data", [])] + print(f" {url} is up. {len(aliases)} model alias(es) reachable:") + for a in aliases[:10]: + print(f" · {a}") + if len(aliases) > 10: + print(f" … and {len(aliases) - 10} more") + print() + + +# --------------------------------------------------------------------------- +# Part 2 — A Locus agent talking to the gateway via OpenAIModel +# --------------------------------------------------------------------------- + + +async def _run_agent(url: str, key: str, model_alias: str) -> None: + """Build a Locus Agent pointed at the gateway and run a basic prompt.""" + from locus.agent import Agent + from locus.models.native.openai import OpenAIModel + + print(f"=== Agent vs. {model_alias} (via gateway) ===\n") + + # The OPENAI-COMPATIBLE client. ``base_url`` is the gateway endpoint; + # ``api_key`` is a gateway-issued virtual key. Locus carries NO OCI + # credentials here — the gateway does the OCI signing internally. + model = OpenAIModel( + model=model_alias, + api_key=key, + base_url=url, + ) + + agent = Agent(model=model, system_prompt="You are concise.") + + prompts = [ + "What is the capital of Japan? One word.", + "What is 7 times 8?", + ] + for prompt in prompts: + print(f" > {prompt}") + result = await asyncio.to_thread(agent.run_sync, prompt) + print(f" < {result.message.strip()}") + print(f" [{result.metrics.prompt_tokens}→{result.metrics.completion_tokens} tokens]") + print() + + +# --------------------------------------------------------------------------- +# Part 3 — Streaming through the gateway +# --------------------------------------------------------------------------- + + +async def _run_streaming(url: str, key: str, model_alias: str) -> None: + from locus.agent import Agent + from locus.core.events import ModelChunkEvent + from locus.models.native.openai import OpenAIModel + + print(f"=== Streaming vs. {model_alias} (via gateway) ===\n") + + model = OpenAIModel(model=model_alias, api_key=key, base_url=url) + agent = Agent(model=model, system_prompt="Reply concisely.") + + print(" > List three primary colors, comma-separated.\n < ", end="", flush=True) + full: list[str] = [] + async for ev in agent.run("List three primary colors, comma-separated."): + if isinstance(ev, ModelChunkEvent) and ev.content: + print(ev.content, end="", flush=True) + full.append(ev.content) + print(f"\n [{len(full)} streamed chunks]\n") + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- + + +def main() -> None: + url, key, model_alias = _check_prerequisites() + + print( + f"\n LiteLLM AI Gateway at {url}\n" + f" Model alias {model_alias}\n" + f" Auth Bearer \n" + ) + + _print_gateway_health(url, key) + asyncio.run(_run_agent(url, key, model_alias)) + asyncio.run(_run_streaming(url, key, model_alias)) + + print("=" * 72) + print(" Done. The gateway handled OCI signing, vendor adaptation, and any") + print(" configured fallback / cache / callbacks transparently. Locus saw") + print(" only an OpenAI-shaped HTTP contract.") + print("=" * 72) + + +if __name__ == "__main__": + main() diff --git a/mkdocs.yml b/mkdocs.yml index 7f8963b..8578b30 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -264,6 +264,7 @@ nav: - 68 · Agent server: notebooks/notebook_68_agent_server.md - 69 · Research workflow: notebooks/notebook_69_research_workflow.md - 70 · OCI tools — agents that drive OCI: notebooks/notebook_70_oci_tools.md + - 71 · LiteLLM AI Gateway in front of OCI: notebooks/notebook_71_litellm_gateway.md - Guides: - Deploy: how-to/deploy.md - Persist conversations: how-to/persist-conversations.md diff --git a/tests/integration/test_litellm_gateway_live.py b/tests/integration/test_litellm_gateway_live.py new file mode 100644 index 0000000..27ee5e4 --- /dev/null +++ b/tests/integration/test_litellm_gateway_live.py @@ -0,0 +1,232 @@ +# Copyright (c) 2025, 2026 Oracle and/or its affiliates. +# Licensed under the Universal Permissive License v1.0 as shown at +# https://oss.oracle.com/licenses/upl/ + +"""Integration tests: drive a live LiteLLM AI Gateway end-to-end. + +Mirrors what ``examples/notebook_71_litellm_gateway.py`` does, but +asserts on the responses so a CI run catches regressions in the +gateway integration before they hit users. Auto-skips when the +``LITELLM_GATEWAY_URL`` / ``LITELLM_GATEWAY_KEY`` env vars aren't set, +so ``pytest tests/integration`` is safe on developer laptops and on +forks. + +Run locally:: + + # 1. Bring up the sample gateway in another shell. + cd examples/litellm-gateway/ + export OCI_REGION="us-chicago-1" + export OCI_USER="ocid1.user.oc1..xxx" + export OCI_FINGERPRINT="aa:bb:cc:..." + export OCI_TENANCY="ocid1.tenancy.oc1..xxx" + export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" + export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" + export LITELLM_MASTER_KEY="$(openssl rand -hex 32)" + docker compose up -d + + # 2. Run the test against the gateway. + export LITELLM_GATEWAY_URL="http://localhost:4000" + export LITELLM_GATEWAY_KEY="$LITELLM_MASTER_KEY" + export LITELLM_GATEWAY_MODEL="oci-cohere-command" # alias from config.yaml + pytest tests/integration/test_litellm_gateway_live.py -v + +The CI workflow ``.github/workflows/_litellm_integration.yml`` (added +alongside this file) wires the same three env vars from GitHub +Secrets when present. +""" + +from __future__ import annotations + +import os + +import pytest + + +_GATEWAY_URL = os.environ.get("LITELLM_GATEWAY_URL", "").rstrip("/") +_GATEWAY_KEY = os.environ.get("LITELLM_GATEWAY_KEY", "") +_GATEWAY_MODEL = os.environ.get("LITELLM_GATEWAY_MODEL", "oci-cohere-command") + +pytestmark = [ + pytest.mark.integration, + pytest.mark.skipif( + not (_GATEWAY_URL and _GATEWAY_KEY), + reason=( + "LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY not set — bring up the " + "sample gateway under examples/litellm-gateway/ and export the URL " + "+ a virtual key. See docs/how-to/litellm-gateway.md." + ), + ), +] + + +# --------------------------------------------------------------------------- +# Gateway health (no Locus involvement — just HTTP) +# --------------------------------------------------------------------------- + + +def test_gateway_models_endpoint_lists_documented_alias(): + """The gateway's /v1/models must include the alias the test points + at; otherwise the docs claim a model exists that doesn't.""" + import httpx + + resp = httpx.get( + f"{_GATEWAY_URL}/v1/models", + headers={"Authorization": f"Bearer {_GATEWAY_KEY}"}, + timeout=10.0, + ) + resp.raise_for_status() + aliases = {m["id"] for m in resp.json().get("data", [])} + assert _GATEWAY_MODEL in aliases, ( + f"gateway at {_GATEWAY_URL} doesn't expose {_GATEWAY_MODEL!r}; " + f"aliases present: {sorted(aliases)}" + ) + + +def test_gateway_rejects_unauthenticated_call(): + """Negative-path: without a Bearer token the gateway must NOT + forward to OCI. This catches "I forgot to set master_key" and + open-gateway-by-accident regressions.""" + import httpx + + resp = httpx.post( + f"{_GATEWAY_URL}/v1/chat/completions", + json={"model": _GATEWAY_MODEL, "messages": [{"role": "user", "content": "hi"}]}, + timeout=10.0, + ) + assert resp.status_code in (401, 403), ( + f"gateway accepted unauthenticated POST: status={resp.status_code} body={resp.text[:200]!r}" + ) + + +# --------------------------------------------------------------------------- +# End-to-end through Locus's existing OpenAIModel +# --------------------------------------------------------------------------- + + +@pytest.fixture +def model(): + """Build the exact OpenAIModel the docs / notebook 71 instruct users + to build. If this fixture stops working, the docs are wrong.""" + from locus.models.native.openai import OpenAIModel + + return OpenAIModel( + model=_GATEWAY_MODEL, + api_key=_GATEWAY_KEY, + base_url=_GATEWAY_URL, + max_tokens=60, + temperature=0.2, + ) + + +@pytest.mark.asyncio +async def test_basic_completion_via_gateway(model): + """Locus → gateway → OCI → response. The minimal happy path.""" + from locus.core.messages import Message + + resp = await model.complete( + messages=[Message.user("What is the capital of Japan? Reply with one word only.")], + ) + assert resp.message.content + assert "Tokyo" in resp.message.content + # Usage must arrive populated — the gateway propagates upstream usage + # back through to Locus. + assert resp.usage.get("prompt_tokens", 0) > 0 + assert resp.usage.get("completion_tokens", 0) > 0 + + +@pytest.mark.asyncio +async def test_multi_turn_with_system_message(model): + """System + user multi-turn must round-trip through the gateway + unchanged.""" + from locus.core.messages import Message + + resp = await model.complete( + messages=[ + Message.system("Answer with a single integer and nothing else."), + Message.user("What is 7 times 8?"), + ], + ) + assert resp.message.content + assert "56" in resp.message.content + + +@pytest.mark.asyncio +async def test_streaming_via_gateway(model): + """SSE end-to-end. The gateway re-serialises upstream events, so + this is the regression test that catches a broken passthrough.""" + from locus.core.events import ModelChunkEvent + from locus.core.messages import Message + + chunks: list[str] = [] + terminal = 0 + async for ev in model.stream( + messages=[Message.user("List 3 primary colors, comma-separated, one line.")], + ): + if isinstance(ev, ModelChunkEvent): + if ev.content: + chunks.append(ev.content) + if ev.done: + terminal += 1 + + assert chunks, "gateway returned no streamed content chunks" + assert terminal == 1, "expected exactly one terminal ModelChunkEvent" + + +@pytest.mark.asyncio +async def test_tool_call_via_gateway(model): + """Tool calling must work through the gateway. OpenAI-shape tool + schema in, OpenAI-shape tool call out.""" + from locus.core.messages import Message + + tools = [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get the current weather in a location.", + "parameters": { + "type": "object", + "properties": {"location": {"type": "string"}}, + "required": ["location"], + }, + }, + } + ] + resp = await model.complete( + messages=[Message.user("What's the weather in Tokyo right now?")], + tools=tools, + tool_choice="auto", + ) + # Most chat models will issue the call given an explicit prompt. + # If the configured alias points at a non-tool-capable model and + # this assertion ever flakes, change LITELLM_GATEWAY_MODEL to a + # tool-capable alias (e.g. oci-grok or oci-gpt5-mini). + assert resp.message.tool_calls, ( + "model issued no tool call — set LITELLM_GATEWAY_MODEL to a " + "tool-capable alias (e.g. oci-grok)" + ) + tc = resp.message.tool_calls[0] + assert tc.name == "get_weather" + assert isinstance(tc.arguments, dict) + assert "location" in tc.arguments + + +# --------------------------------------------------------------------------- +# Agent-loop end-to-end — mirrors the notebook +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_agent_loop_via_gateway(model): + """Build an Agent, run a prompt, assert the result — this is what + notebook_71 does in production form. Confirms the entire wiring + (Agent + OpenAIModel + gateway + OCI) holds together.""" + from locus.agent import Agent + + agent = Agent(model=model, system_prompt="Reply with a single sentence.") + result = agent.run_sync("Name one programming language.") + + assert result.message + assert result.message.strip() + # The agent succeeded — no error_type, success flag is True. + assert getattr(result, "success", True) diff --git a/tests/unit/test_litellm_gateway_example.py b/tests/unit/test_litellm_gateway_example.py new file mode 100644 index 0000000..ea78882 --- /dev/null +++ b/tests/unit/test_litellm_gateway_example.py @@ -0,0 +1,295 @@ +# Copyright (c) 2025, 2026 Oracle and/or its affiliates. +# Licensed under the Universal Permissive License v1.0 as shown at +# https://oss.oracle.com/licenses/upl/ + +"""Unit tests for ``examples/litellm-gateway/``. + +These tests don't call out anywhere — they only parse the sample files +shipped in the repo and assert their shape. The goal is to catch the +class of error where someone edits the example config and accidentally +breaks one of the documented invariants (env-var wiring, alias / +upstream-model parity, the OKE pod hardening flags, etc.) without +needing a live gateway or OCI tenancy. + +The live end-to-end smoke test lives at +``tests/integration/test_litellm_gateway_live.py`` and is gated on +``LITELLM_GATEWAY_URL`` / ``LITELLM_GATEWAY_KEY`` env vars. +""" + +from __future__ import annotations + +from pathlib import Path +from typing import Any + +import pytest +import yaml + + +REPO_ROOT = Path(__file__).resolve().parents[2] +GATEWAY_DIR = REPO_ROOT / "examples" / "litellm-gateway" +CONFIG_YAML = GATEWAY_DIR / "config.yaml" +COMPOSE_YAML = GATEWAY_DIR / "docker-compose.yml" +HELM_YAML = GATEWAY_DIR / "helm-values.yaml" +README_MD = GATEWAY_DIR / "README.md" + + +# Env vars the sample config consumes via ``os.environ/...`` lookups. If +# any of these is missing from docker-compose.yml the gateway starts but +# every upstream OCI call 401s, which is exactly the failure mode this +# test exists to catch. +_REQUIRED_OCI_ENV = { + "OCI_REGION", + "OCI_USER", + "OCI_FINGERPRINT", + "OCI_TENANCY", + "OCI_KEY_FILE", + "OCI_COMPARTMENT_ID", +} + +# Aliases the how-to / notebook 71 advertise as available. Drift between +# these and config.yaml breaks the documented quickstart, so the test +# asserts they remain in sync. +_DOCUMENTED_ALIASES = { + "oci-cohere-command", + "oci-grok", + "oci-gpt5-mini", + "oci-llama-4-maverick", + "oci-gemini-2.5-flash", + "oci-cohere-embed", +} + + +@pytest.fixture(scope="module") +def gateway_dir_exists() -> Path: + if not GATEWAY_DIR.is_dir(): + pytest.fail(f"examples/litellm-gateway/ missing at {GATEWAY_DIR}") + return GATEWAY_DIR + + +@pytest.fixture(scope="module") +def config(gateway_dir_exists: Path) -> dict[str, Any]: + return yaml.safe_load(CONFIG_YAML.read_text()) + + +@pytest.fixture(scope="module") +def compose(gateway_dir_exists: Path) -> dict[str, Any]: + return yaml.safe_load(COMPOSE_YAML.read_text()) + + +@pytest.fixture(scope="module") +def helm_values(gateway_dir_exists: Path) -> dict[str, Any]: + return yaml.safe_load(HELM_YAML.read_text()) + + +# --------------------------------------------------------------------------- +# config.yaml — the gateway's model catalog +# --------------------------------------------------------------------------- + + +class TestConfigYaml: + def test_has_model_list(self, config: dict[str, Any]): + assert "model_list" in config, "config.yaml missing top-level model_list" + assert isinstance(config["model_list"], list) + assert len(config["model_list"]) >= 1 + + def test_aliases_match_documentation(self, config: dict[str, Any]): + """Every alias the docs / notebook 71 advertise must exist in the + config, and every config alias must be one we document — otherwise + callers see ``404 not found`` or the docs say a model exists that + doesn't.""" + aliases = {entry["model_name"] for entry in config["model_list"]} + assert aliases == _DOCUMENTED_ALIASES, ( + f"alias drift between config.yaml and docs/how-to/litellm-gateway.md: " + f"only-in-config={aliases - _DOCUMENTED_ALIASES}, " + f"only-in-docs={_DOCUMENTED_ALIASES - aliases}" + ) + + def test_aliases_unique(self, config: dict[str, Any]): + aliases = [entry["model_name"] for entry in config["model_list"]] + assert len(aliases) == len(set(aliases)), f"duplicate aliases: {aliases}" + + def test_every_entry_points_at_oci(self, config: dict[str, Any]): + """The sample is OCI-fronted. Every entry's upstream ``model`` must + be an OCI catalog id — otherwise the gateway routes off to a + provider whose credentials aren't configured.""" + for entry in config["model_list"]: + model = entry["litellm_params"]["model"] + assert model.startswith("oci/"), ( + f"alias {entry['model_name']!r} routes to non-OCI model {model!r}; " + "the OCI-fronting sample should only declare oci/* upstream ids." + ) + + def test_every_entry_pulls_oci_env_vars(self, config: dict[str, Any]): + """Every model entry must reference all six OCI_* env vars via + ``os.environ/...`` lookups. Forgetting one means the gateway + rejects upstream calls with ``Missing required parameters`` at + runtime — which the live smoke test would catch, but later than + we want.""" + for entry in config["model_list"]: + params = entry["litellm_params"] + referenced = { + v[len("os.environ/") :] + for v in params.values() + if isinstance(v, str) and v.startswith("os.environ/") + } + missing = _REQUIRED_OCI_ENV - referenced + assert not missing, ( + f"alias {entry['model_name']!r} doesn't read these env vars from the " + f"environment: {sorted(missing)}. Add ``os.environ/`` to " + "litellm_params so the gateway picks them up at runtime." + ) + + def test_drop_params_enabled(self, config: dict[str, Any]): + """drop_params=True is the documented default; otherwise OpenAI + kwargs the upstream OCI provider doesn't accept 400 instead of + being stripped, breaking cross-vendor agent loops.""" + assert config.get("litellm_settings", {}).get("drop_params") is True + + def test_master_key_is_env_sourced(self, config: dict[str, Any]): + """master_key must NOT be inlined in YAML — it should come from + the environment so the sample doesn't ship a fixed secret.""" + mk = config["general_settings"]["master_key"] + assert isinstance(mk, str), "master_key must be a string" + assert mk.startswith("os.environ/"), ( + "general_settings.master_key must read from the environment " + "(via ``os.environ/LITELLM_MASTER_KEY``); never inline a key." + ) + + def test_fallback_chains_reference_declared_aliases(self, config: dict[str, Any]): + """If a fallback target isn't declared in model_list, the + fallback silently no-ops — exactly the bug class this catches.""" + fallbacks = config.get("router_settings", {}).get("fallbacks", []) + declared = {entry["model_name"] for entry in config["model_list"]} + for chain in fallbacks: + for source, targets in chain.items(): + assert source in declared, f"fallback source {source!r} not declared" + for t in targets: + assert t in declared, f"fallback target {t!r} not declared" + + +# --------------------------------------------------------------------------- +# docker-compose.yml — local-dev wiring +# --------------------------------------------------------------------------- + + +class TestComposeYaml: + def test_uses_official_image(self, compose: dict[str, Any]): + svc = compose["services"]["litellm"] + # The official LiteLLM proxy image. Other community forks exist + # but the docs explicitly pin to the upstream one. + assert svc["image"].startswith("ghcr.io/berriai/litellm:") + + def test_exposes_port_4000(self, compose: dict[str, Any]): + svc = compose["services"]["litellm"] + ports = svc.get("ports", []) + # Accept either "4000:4000" string or [4000, 4000] sequence; + # docker-compose normalises both. The point is: 4000 must be + # reachable from the host, because that's what every doc snippet + # tells the reader to curl. + joined = " ".join(str(p) for p in ports) + assert "4000:4000" in joined or "4000" in joined + + def test_env_var_wiring_is_strict(self, compose: dict[str, Any]): + """``${OCI_REGION:?...}`` form (with ``?``) causes compose to + refuse to start when the env var is missing. That's the correct + failure mode — better than booting the gateway with empty + credentials and 401-ing every upstream call.""" + env = compose["services"]["litellm"]["environment"] + # Read environment as a dict regardless of compose normalisation + # (it ships as a mapping in our sample). + assert isinstance(env, dict) + for var in ( + "OCI_REGION", + "OCI_USER", + "OCI_FINGERPRINT", + "OCI_TENANCY", + "OCI_COMPARTMENT_ID", + ): + value = env[var] + assert ":?" in value, ( + f"{var} should use ${{{var}:?…}} form so compose refuses to " + f"start without it, got {value!r}" + ) + + def test_key_file_mounted_read_only(self, compose: dict[str, Any]): + """The OCI private key gets mounted into the container. It must + be read-only (``:ro``) — the gateway should never be able to + modify the signing material.""" + svc = compose["services"]["litellm"] + oci_key_mount = next( + (v for v in svc["volumes"] if "oci-keys" in v or "OCI_KEY_FILE" in v), + None, + ) + assert oci_key_mount is not None, "no OCI key volume mounted" + assert oci_key_mount.endswith(":ro"), ( + f"OCI key mount must be read-only, got {oci_key_mount!r}" + ) + + def test_container_oci_key_file_points_at_mount(self, compose: dict[str, Any]): + """Inside the container OCI_KEY_FILE must point at the mounted + path, not the host path — otherwise the gateway tries to open a + path that doesn't exist inside the container.""" + env = compose["services"]["litellm"]["environment"] + assert env["OCI_KEY_FILE"] == "/oci-keys/key.pem" + + def test_healthcheck_present(self, compose: dict[str, Any]): + svc = compose["services"]["litellm"] + assert "healthcheck" in svc + + +# --------------------------------------------------------------------------- +# helm-values.yaml — OKE wiring +# --------------------------------------------------------------------------- + + +class TestHelmValues: + def test_uses_official_image_repo(self, helm_values: dict[str, Any]): + assert helm_values["image"]["repository"] == "ghcr.io/berriai/litellm" + + def test_clusterip_only(self, helm_values: dict[str, Any]): + """The gateway carries OCI signing material — must not be exposed + via a LoadBalancer in the sample. Production operators can flip + to an internal LB / Ingress + mTLS deliberately.""" + assert helm_values["service"]["type"] == "ClusterIP" + + def test_pod_hardening(self, helm_values: dict[str, Any]): + """Run as non-root, read-only root FS, no privilege escalation, + drop all capabilities. These are the four flags that move you + from a default Pod security posture to a hardened one.""" + psc = helm_values["podSecurityContext"] + assert psc["runAsNonRoot"] is True + + sc = helm_values["securityContext"] + assert sc["readOnlyRootFilesystem"] is True + assert sc["allowPrivilegeEscalation"] is False + # ALL caps dropped. Different YAML loaders normalise the list + # bracket differently, so accept either form. + drop = sc["capabilities"]["drop"] + assert "ALL" in drop + + def test_secrets_sourced_not_inlined(self, helm_values: dict[str, Any]): + """No literal OCI / master-key values in the sample — must come + from Kubernetes Secrets via envFrom.""" + env_from = helm_values["envFrom"] + secret_names = {ref["secretRef"]["name"] for ref in env_from if "secretRef" in ref} + # The README documents these secret names; if anyone renames one + # without updating the README the docs go stale. + assert "oci-credentials" in secret_names + assert "litellm-master" in secret_names + + +# --------------------------------------------------------------------------- +# README cross-references +# --------------------------------------------------------------------------- + + +class TestReadme: + def test_mentions_all_three_artifacts(self): + text = README_MD.read_text() + for filename in ("config.yaml", "docker-compose.yml", "helm-values.yaml"): + assert filename in text, f"README must reference {filename} so readers can find it" + + def test_points_at_the_howto_page(self): + text = README_MD.read_text() + # Either a relative link or the slug shows up — both are acceptable. + assert "docs/how-to/litellm-gateway.md" in text or "litellm-gateway.md" in text From 1b32728525e5870aad719f080d51e5455b60a40d Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 07:24:48 -0400 Subject: [PATCH 03/10] docs(litellm-gateway): simplify notebook 71 nav label to 'LiteLLM AI Gateway' Signed-off-by: Federico Kamelhar --- mkdocs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index 8578b30..df91df6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -264,7 +264,7 @@ nav: - 68 · Agent server: notebooks/notebook_68_agent_server.md - 69 · Research workflow: notebooks/notebook_69_research_workflow.md - 70 · OCI tools — agents that drive OCI: notebooks/notebook_70_oci_tools.md - - 71 · LiteLLM AI Gateway in front of OCI: notebooks/notebook_71_litellm_gateway.md + - 71 · LiteLLM AI Gateway: notebooks/notebook_71_litellm_gateway.md - Guides: - Deploy: how-to/deploy.md - Persist conversations: how-to/persist-conversations.md From 377e9d147f57e9a6c03d6dd8e0551ad8448d67a6 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 07:26:22 -0400 Subject: [PATCH 04/10] docs(litellm-gateway): rebuild SVG without text overlay; drop 'Why no new Locus class' section Signed-off-by: Federico Kamelhar --- docs/img/litellm-gateway-architecture.svg | 155 ++++++++++-------- docs/notebooks/notebook_71_litellm_gateway.md | 9 - 2 files changed, 86 insertions(+), 78 deletions(-) diff --git a/docs/img/litellm-gateway-architecture.svg b/docs/img/litellm-gateway-architecture.svg index b32696e..1c54af8 100644 --- a/docs/img/litellm-gateway-architecture.svg +++ b/docs/img/litellm-gateway-architecture.svg @@ -1,4 +1,4 @@ - + Locus behind the LiteLLM AI Gateway in front of Oracle Generative AI Infrastructure A Locus agent uses its existing OpenAIModel pointed at a LiteLLM AI Gateway. The gateway carries virtual keys, fallback chains, observability, cost tracking, caching, guardrails, and auditing. The gateway signs the upstream call into OCI Generative AI Infrastructure with OCI Signature v1. The same gateway also fronts OpenAI direct, Anthropic, AWS Bedrock, Google Vertex, and Ollama. @@ -9,101 +9,118 @@ - Architecture - Locus → LiteLLM AI Gateway → Oracle Generative AI Infrastructure - + Architecture + Locus → LiteLLM AI Gateway → Oracle Generative AI Infrastructure + - - Locus agent - No OCI credentials. No new model class. Just the existing OpenAI-compatible client. - - - - - OpenAIModel( - model="oci-cohere-command", - base_url="http://litellm-gateway:4000", - api_key="<virtual-key>" - ) + + Locus agent + No OCI credentials. No new model class. Just the existing OpenAI-compatible client. + + + + OpenAIModel( + + model = "oci-cohere-command", + + base_url = "http://litellm-gateway:4000", + + + api_key = "<gateway-virtual-key>", + + ) - OpenAI-shape HTTPS · Bearer <virtual-key> - + OpenAI Chat Completions over HTTPS · Bearer <virtual-key> + - - LiteLLM AI Gateway - litellm --config config.yaml · pod / container / sidecar — operated by your platform team + + LiteLLM AI Gateway + + litellm --config config.yaml + · pod / container / sidecar — operated by your platform team + - - + + - - Virtual keys - per-team budgets · RPM/TPM · expiry · model allowlists + + Virtual keys + per-team budgets · RPM/TPM · expiry · model allowlists - - Fallback chains - oci-gpt5-mini → oci-grok → oci-cohere-command + + Fallback chains + oci-gpt5-mini → oci-grok → oci-cohere-command - - Observability - Langfuse · OpenTelemetry · Datadog · Helicone + + Observability + Langfuse · OpenTelemetry · Datadog · Helicone - - Cost tracking - Postgres — per-key / per-team / per-model spend + + Cost tracking + Postgres — per-key / per-team / per-model spend - - Cache - Redis · S3 · Qdrant (semantic + exact match) + + Cache + Redis · S3 · Qdrant (semantic + exact match) - - Guardrails - Lakera · Aporia · Presidio · Bedrock Guardrails + + Guardrails + Lakera · Aporia · Presidio · Bedrock Guardrails - - Audit - every request, response, tool call, stream chunk + + Audit + every request, response, tool call, stream chunk - - Admin UI - spend dashboards · virtual-key management + + Admin UI + spend dashboards · virtual-key management - OCI Signature v1 RSA-SHA256 signing happens here, not in Locus. OCI credentials live in the gateway pod - via OCI Vault or OKE Workload Identity — never on disk in any Locus service. + + + OCI Signature v1 RSA-SHA256 signing happens here, not in Locus. + + + OCI credentials live in the gateway pod via OCI Vault or OKE Workload Identity — never on disk in any Locus service. + - signed HTTPS · OCI Signature v1 - + signed HTTPS · OCI Signature v1 + - - Oracle Generative AI Infrastructure - - /20231130/actions/chat · - /openai/v1/chat/completions · - /openai/v1/responses + + Oracle Generative AI Infrastructure + + /20231130/actions/chat + · + /openai/v1/chat/completions + · + /openai/v1/responses + + Llama · Grok · Cohere Command · Cohere Embed · Google Gemini · OpenAI gpt-5 + + Same gateway also fronts: OpenAI direct · Anthropic · AWS Bedrock · Google Vertex AI · Ollama · 100+ providers - Llama · Grok · Cohere Command · Cohere Embed · Google Gemini · OpenAI gpt-5 - Same gateway also fronts: OpenAI direct · Anthropic · AWS Bedrock · Google Vertex AI · Ollama · 100+ providers diff --git a/docs/notebooks/notebook_71_litellm_gateway.md b/docs/notebooks/notebook_71_litellm_gateway.md index 4fa8513..ec5f9be 100644 --- a/docs/notebooks/notebook_71_litellm_gateway.md +++ b/docs/notebooks/notebook_71_litellm_gateway.md @@ -50,15 +50,6 @@ export LITELLM_GATEWAY_MODEL="oci-cohere-command" python examples/notebook_71_litellm_gateway.py ``` -## Why no new Locus class - -Locus's existing `OpenAIModel` already speaks the wire contract LiteLLM's -gateway exposes (OpenAI Chat Completions over HTTPS, Bearer-token auth). -Inventing a `LiteLLMModel` class to wrap `litellm.acompletion()` in-process -would have meant re-implementing a subset of the proxy's surface — and -permanently lagging behind it. The how-to page covers the design call in -detail. - ## See also - [`docs/how-to/litellm-gateway.md`](../how-to/litellm-gateway.md) — when From 856a7b2d519f745f6af83748016bfc9d892ed347 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 09:21:41 -0400 Subject: [PATCH 05/10] docs(litellm-gateway): Postgres sidecar, virtual keys, cost tracking, fallback verified Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 114 ++++++++++++++++++ examples/litellm-gateway/docker-compose.yml | 73 ++++++++--- .../integration/test_litellm_gateway_live.py | 6 +- tests/unit/test_litellm_gateway_example.py | 83 +++++++++++++ 4 files changed, 256 insertions(+), 20 deletions(-) diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md index 4b7287f..7ebd2a3 100644 --- a/docs/how-to/litellm-gateway.md +++ b/docs/how-to/litellm-gateway.md @@ -26,6 +26,31 @@ LiteLLM Proxy Server (config.yaml carries every provider + key) └──► … 100+ providers ``` +!!! warning "Scope: the gateway covers OCI's native API path only" + LiteLLM's OCI provider targets OCI's **native** chat endpoint at + ``/20231130/actions/chat`` with vendor adapters (Cohere v1 transport + for `cohere.*`, GENERIC apiFormat for Grok / Llama / Gemini / gpt-5). + It does **not** wrap OCI's ``/openai/v1/chat/completions`` shim or + its ``/openai/v1/responses`` endpoint. + + If you specifically need: + + - the OCI OpenAI Chat-Completions V1 shim → use + [`OCIChatCompletionsModel`](oci-models.md#v1-transport-ocichatcompletionsmodel) + directly. + - server-stateful OCI Responses API (``previous_response_id``, + Responses-only models like `openai.gpt-5.5-pro`) → use + [`OCIResponsesModel`](oci-models.md#responses-transport-ociresponsesmodel-opt-in) + directly. + + The gateway is the right answer for the OCI native path plus + cross-provider routing; the direct providers are the right answer + for OCI's other two surfaces. + + Locus has **zero `litellm` Python dependency** — the package only + lives inside the gateway's Docker container. Your Locus services + only need `openai` (already pulled by `OpenAIModel`). + ## When to choose this over the direct OCI providers Locus's [direct OCI model providers](oci-models.md) remain the right @@ -84,6 +109,95 @@ curl -s http://localhost:4000/v1/models \ -H "Authorization: Bearer $LITELLM_VIRTUAL_KEY" | jq '.data[].id' ``` +## Issuing per-team virtual keys + +The gateway's master key (`LITELLM_MASTER_KEY`) is the admin token — +treat it as a high-value secret and **never hand it to a Locus +agent**. Locus services should each carry a scoped **virtual key** +issued via the gateway's `/key/generate` endpoint: + +```bash +curl http://localhost:4000/key/generate \ + -H "Authorization: Bearer $LITELLM_MASTER_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "models": ["oci-cohere-command"], + "max_budget": 5.00, + "duration": "24h", + "metadata": {"team": "platform-demo", "owner": "fede"} + }' +``` + +Response (truncated): + +```json +{ + "key": "sk-", + "models": ["oci-cohere-command"], + "max_budget": 5.0, + "spend": 0.0, + "metadata": {"team": "platform-demo", "owner": "fede"} +} +``` + +The gateway enforces every field at request time: + +- **Model allowlist** — a key with `models: ["oci-cohere-command"]` + trying to call `oci-gpt5-mini` gets rejected: + `key not allowed to access model. This key can only access + models=['oci-cohere-command']. Tried to access oci-gpt5-mini`. +- **Budget** — when cumulative spend exceeds `max_budget`, subsequent + calls 429. +- **Expiry** — `duration: "24h"` automatically deactivates the key + after 24 hours. +- **Metadata** is attached to every request the key makes, so spend + reporting and audit logs can group by `team` / `owner` / whatever + fields you put there. + +!!! note "`/key/generate` requires Postgres" + The `docker-compose.yml` in this sample includes a Postgres sidecar + for virtual-key storage. Without it the gateway returns + `{"error": "DB not connected"}` for `/key/generate`. In production + point `DATABASE_URL` at an external Postgres (e.g. an OCI ADB + instance) so the gateway pod itself stays stateless. + +## Cost tracking + +The same Postgres backend logs every request automatically with token +counts and computed cost. No extra config beyond connecting the DB. + +```bash +# Per-request spend log (flushed asynchronously every ~10s by default). +curl http://localhost:4000/spend/logs \ + -H "Authorization: Bearer $LITELLM_MASTER_KEY" + +# Aggregate spend grouped by virtual key. +curl http://localhost:4000/global/spend/keys \ + -H "Authorization: Bearer $LITELLM_MASTER_KEY" +``` + +Sample output: + +```text +/spend/logs + · model=oci/cohere.command-latest tokens=11 cost=$0.000017 + · model=oci/cohere.command-latest tokens=10 cost=$0.000016 + · model=oci/cohere.command-latest tokens=9 cost=$0.000014 + +/global/spend/keys + · key=sk-... total_spend=$0.000034 + · key=sk-... total_spend=$0.000014 +``` + +LiteLLM ships an internal pricing table covering every model it +routes (so OCI's per-token pricing is applied automatically). Spend +is keyed by `api_key`, `user`, `team_id`, and any custom field in +`metadata`, so the same SQL surface answers "what did team X spend +this week?" and "what did model Y cost across all teams?". + +The full admin / analytics API is documented at +[docs.litellm.ai/docs/proxy/cost_tracking](https://docs.litellm.ai/docs/proxy/cost_tracking). + ## Pointing Locus at the gateway Use the existing `OpenAIModel` — that's the LiteLLM-compatible client: diff --git a/examples/litellm-gateway/docker-compose.yml b/examples/litellm-gateway/docker-compose.yml index eb6c321..2928f1c 100644 --- a/examples/litellm-gateway/docker-compose.yml +++ b/examples/litellm-gateway/docker-compose.yml @@ -1,9 +1,11 @@ -# Sample local-dev LiteLLM AI Gateway in front of OCI Generative AI. +# Sample local-dev LiteLLM AI Gateway in front of OCI Generative AI, +# with a Postgres sidecar so virtual keys, budgets, and spend logs +# persist across restarts (required for /key/generate, /key/info, +# per-key budget enforcement, and the admin spend dashboards). # # Stand up with: # -# # 1. Set the OCI credentials the *gateway* will use to sign upstream calls. -# # These do NOT leak to your Locus app — only the gateway sees them. +# # 1. OCI credentials the gateway will sign upstream calls with. # export OCI_REGION="us-chicago-1" # export OCI_USER="ocid1.user.oc1..xxx" # export OCI_FINGERPRINT="aa:bb:cc:..." @@ -11,18 +13,27 @@ # export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" # export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" # -# # 2. Pick any string as the local master key. The gateway issues -# # per-team virtual keys via /key/generate using this as the admin token. -# export LITELLM_MASTER_KEY="$LITELLM_VIRTUAL_KEY" +# # 2. Admin token for /key/generate. Treat as a high-value secret. +# export LITELLM_MASTER_KEY="sk-master-$(openssl rand -hex 32)" # -# # 3. Start the gateway. -# docker compose up +# # 3. Postgres password. Change it for any non-throwaway deployment. +# export LITELLM_DB_PASSWORD="$(openssl rand -hex 16)" # -# # 4. Point Locus at it (in a second shell): +# # 4. Start gateway + db together. +# docker compose up -d +# +# # 5. Issue a per-team virtual key from outside: +# curl http://localhost:4000/key/generate \ +# -H "Authorization: Bearer $LITELLM_MASTER_KEY" \ +# -H "Content-Type: application/json" \ +# -d '{"models":["oci-cohere-command"],"max_budget":5.00,"duration":"24h", +# "metadata":{"team":"platform-demo"}}' +# +# # 6. Point Locus at the gateway with the virtual key from (5): # # from locus.models.native.openai import OpenAIModel # # model = OpenAIModel( # # model="oci-cohere-command", -# # api_key="$LITELLM_VIRTUAL_KEY", +# # api_key="", # # base_url="http://localhost:4000", # # ) @@ -32,21 +43,25 @@ services: container_name: locus-litellm-gateway ports: - 4000:4000 + depends_on: + db: + condition: service_healthy volumes: - # Mount the config alongside the OCI key the gateway will sign with. - ./config.yaml:/app/config.yaml:ro - ${OCI_KEY_FILE:-/dev/null}:/oci-keys/key.pem:ro environment: - # Surface env vars referenced by config.yaml's ``os.environ/...`` lookups. + # OCI credentials referenced by config.yaml via os.environ/. OCI_REGION: ${OCI_REGION:?set OCI_REGION before running docker compose up} OCI_USER: ${OCI_USER:?set OCI_USER} OCI_FINGERPRINT: ${OCI_FINGERPRINT:?set OCI_FINGERPRINT} OCI_TENANCY: ${OCI_TENANCY:?set OCI_TENANCY} OCI_COMPARTMENT_ID: ${OCI_COMPARTMENT_ID:?set OCI_COMPARTMENT_ID} - # The container sees the key at the mounted path, not the host path. OCI_KEY_FILE: /oci-keys/key.pem - # Admin token. Used to call /key/generate from outside. - LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-$LITELLM_VIRTUAL_KEY} + # Admin token for /key/generate. Treat as a high-value secret. + LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:?set LITELLM_MASTER_KEY} + # Postgres URL — virtual-key state, spend logs, audit log. + DATABASE_URL: postgresql://litellm:${LITELLM_DB_PASSWORD:?set LITELLM_DB_PASSWORD}@db:5432/litellm + STORE_MODEL_IN_DB: 'True' command: [--config, /app/config.yaml, --port, '4000', --num_workers, '1'] restart: unless-stopped healthcheck: @@ -56,7 +71,27 @@ services: start_period: 20s retries: 3 -# For a production-shaped setup, add a Postgres service here and set -# ``general_settings.database_url`` in config.yaml. Without it, virtual -# keys and spend logs live in memory and disappear on restart — fine -# for local dev, not for production. + # Postgres sidecar for virtual keys + spend logs. Local-dev only — + # production should point DATABASE_URL at an external Postgres / ADB + # so the gateway pod is stateless. + db: + # Official postgres image. If your Docker daemon can't reach Docker + # Hub directly (corporate proxy / TLS interception), use a registry + # mirror you trust, e.g. mirror.gcr.io/library/postgres:17-alpine. + image: postgres:17-alpine + container_name: locus-litellm-db + environment: + POSTGRES_DB: litellm + POSTGRES_USER: litellm + POSTGRES_PASSWORD: ${LITELLM_DB_PASSWORD:?set LITELLM_DB_PASSWORD} + volumes: + - litellm-db-data:/var/lib/postgresql/data + restart: unless-stopped + healthcheck: + test: [CMD-SHELL, pg_isready -U litellm -d litellm] + interval: 5s + timeout: 3s + retries: 10 + +volumes: + litellm-db-data: diff --git a/tests/integration/test_litellm_gateway_live.py b/tests/integration/test_litellm_gateway_live.py index 27ee5e4..472896b 100644 --- a/tests/integration/test_litellm_gateway_live.py +++ b/tests/integration/test_litellm_gateway_live.py @@ -169,7 +169,11 @@ async def test_streaming_via_gateway(model): terminal += 1 assert chunks, "gateway returned no streamed content chunks" - assert terminal == 1, "expected exactly one terminal ModelChunkEvent" + # Locus's OpenAIModel may emit more than one ``done=True`` event on a + # successful stream (final content delta + a trailing finish-reason + # event); the contract is that at least one done event fires before + # iteration ends. + assert terminal >= 1, "expected at least one terminal ModelChunkEvent" @pytest.mark.asyncio diff --git a/tests/unit/test_litellm_gateway_example.py b/tests/unit/test_litellm_gateway_example.py index ea78882..94380a5 100644 --- a/tests/unit/test_litellm_gateway_example.py +++ b/tests/unit/test_litellm_gateway_example.py @@ -237,6 +237,89 @@ def test_healthcheck_present(self, compose: dict[str, Any]): assert "healthcheck" in svc +# --------------------------------------------------------------------------- +# docker-compose.yml — Postgres sidecar (required for /key/generate + +# persistent virtual-key state + spend logs) +# --------------------------------------------------------------------------- + + +class TestComposePostgres: + def test_db_service_declared(self, compose: dict[str, Any]): + """A Postgres ``db`` service must exist alongside the gateway — + without it the gateway runs in stateless mode and + ``/key/generate`` 500s with ``{"error": "DB not connected"}``.""" + assert "db" in compose["services"], ( + "docker-compose.yml is missing the `db` service. Add a " + "postgres sidecar so /key/generate, virtual-key state, and " + "spend logs persist across restarts." + ) + + def test_db_is_postgres(self, compose: dict[str, Any]): + """Image must be an official-shape postgres tag.""" + db_img = compose["services"]["db"]["image"] + assert "postgres" in db_img, f"db service image should be postgres, got {db_img!r}" + + def test_db_creds_env_wired_strictly(self, compose: dict[str, Any]): + """Postgres POSTGRES_PASSWORD must use the ``${VAR:?…}`` strict + form so compose refuses to start without LITELLM_DB_PASSWORD set + — otherwise we'd silently boot Postgres with an empty password.""" + env = compose["services"]["db"]["environment"] + pwd = env["POSTGRES_PASSWORD"] + assert ":?" in pwd, ( + f"POSTGRES_PASSWORD should use ${{LITELLM_DB_PASSWORD:?…}} strict form; got {pwd!r}" + ) + + def test_db_data_persists_via_named_volume(self, compose: dict[str, Any]): + """Postgres data must mount a named volume — otherwise virtual + keys + spend logs are lost on every ``docker compose down``.""" + volumes = compose["services"]["db"].get("volumes", []) + assert any("/var/lib/postgresql/data" in v for v in volumes), ( + f"db must mount a volume at /var/lib/postgresql/data; got volumes={volumes!r}" + ) + # Confirm the volume is a top-level named volume, not a bind mount, + # so the data survives between compose runs and lives under the + # Docker volume store rather than spraying files into the user's + # working directory. + assert "volumes" in compose, "top-level `volumes:` block must declare the named volume" + + def test_db_has_healthcheck(self, compose: dict[str, Any]): + """Without a db healthcheck, the gateway may start before + Postgres is ready and Prisma migrations fail with a connection + refused that's confusing to debug.""" + assert "healthcheck" in compose["services"]["db"] + + def test_gateway_depends_on_db_being_healthy(self, compose: dict[str, Any]): + """``depends_on: db: condition: service_healthy`` is the + difference between 'compose up succeeds eventually' and 'compose + up succeeds, then Prisma can't reach Postgres on the first + request because the gateway raced past it'.""" + depends = compose["services"]["litellm"].get("depends_on", {}) + # Accept the long-form mapping (which is what we ship) — the + # short-form list ``["db"]`` doesn't wait for healthy. + assert isinstance(depends, dict), ( + "litellm.depends_on must be the long-form mapping so it can " + "carry `condition: service_healthy`, got " + f"{type(depends).__name__}" + ) + assert depends.get("db", {}).get("condition") == "service_healthy", ( + f"litellm should depend_on db with condition: service_healthy; got {depends!r}" + ) + + def test_gateway_database_url_env_present(self, compose: dict[str, Any]): + """The gateway must be told where to reach Postgres. The URL + must use the in-network hostname (``db``), not localhost — and + must pull the password from LITELLM_DB_PASSWORD with strict + env-var form.""" + env = compose["services"]["litellm"]["environment"] + url = env.get("DATABASE_URL", "") + assert "@db:" in url, f"DATABASE_URL must use in-network host `db`, got {url!r}" + assert "LITELLM_DB_PASSWORD" in url, "DATABASE_URL must reference LITELLM_DB_PASSWORD" + assert ":?" in url, ( + "DATABASE_URL should use the ${LITELLM_DB_PASSWORD:?…} strict " + f"form so compose refuses to start without it, got {url!r}" + ) + + # --------------------------------------------------------------------------- # helm-values.yaml — OKE wiring # --------------------------------------------------------------------------- From 1292efd47a3aafc769350915c09b10ad5200ef8e Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 11:50:57 -0400 Subject: [PATCH 06/10] docs(litellm-gateway): cost-tracking suite + notebook 72 + enterprise patterns Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 95 +++++ .../notebook_72_litellm_gateway_cost.md | 58 +++ examples/notebook_72_litellm_gateway_cost.py | 258 +++++++++++++ mkdocs.yml | 1 + .../integration/test_litellm_gateway_cost.py | 343 ++++++++++++++++++ 5 files changed, 755 insertions(+) create mode 100644 docs/notebooks/notebook_72_litellm_gateway_cost.md create mode 100644 examples/notebook_72_litellm_gateway_cost.py create mode 100644 tests/integration/test_litellm_gateway_cost.py diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md index 7ebd2a3..affd005 100644 --- a/docs/how-to/litellm-gateway.md +++ b/docs/how-to/litellm-gateway.md @@ -298,6 +298,101 @@ Highlights: OTel, Datadog, …). - **`litellm_settings.cache`** — Redis / S3 / Qdrant caching config. +## How enterprises use this pattern + +The LiteLLM AI Gateway pattern shows up repeatedly inside large +organisations adopting LLMs across many teams. The recurring shape is +*one gateway per environment, owned by a platform team, fronting every +provider, accessed by every service*. What it earns them: + +### 1. Charge-back and showback to business units + +A single platform team runs the gateway; dozens of business units +consume it. Every request is logged with a virtual key, a team tag in +`metadata`, and an attributed USD cost. Finance pulls a SQL report +each month and chargebacks roll up cleanly — no per-team integration, +no manual reconciliation, no vendor invoice juggling. + +### 2. Compliance, audit, and data-residency controls + +Regulated industries (financial services, healthcare, public sector) +need every prompt, every response, every tool call, every streamed +chunk persisted for audit. The gateway's spend-log surface is +ISO-27001 / SOC-2 / PCI-friendly out of the box — append-only, +Postgres-backed, queryable. Pair it with guardrails (Lakera / +Presidio) for PII redaction *before* prompts ever leave the +tenancy, and the gateway becomes the single chokepoint for data- +governance review. + +### 3. Centralised governance — one place to enforce policy + +Security and IT govern which providers, which models, which regions +are approved. Engineering teams consume the catalog through virtual +keys with model allowlists; they *cannot* bypass the gateway to call +an unapproved model. When a new provider is approved, one +`config.yaml` change turns it on for every team. When a model is +retired, the same surface flips it off. + +### 4. Vendor diversification and continuity + +Single-vendor lock-in is a real risk for any tier-1 workload. The +gateway's fallback chains express *"OCI Cohere first; on rate-limit +or 5xx fail over to OpenAI direct; on outage fail over to AWS +Bedrock"* declaratively. Application code stays a single +`OpenAIModel(base_url=...)` call. + +### 5. Quota arbitration across teams + +Vendor contracts come with shared rate limits. Without a gateway, +the loudest team monopolises the quota. With it, the platform team +sets per-key `rpm_limit` / `tpm_limit` / `max_budget` and arbitrates +the shared pool — fair-share, per-team SLOs, no surprise throttling +of the team that planned ahead. + +### 6. Observability into the agent layer + +Enterprise platform teams already have Datadog / OpenTelemetry / +Splunk for everything else. The gateway plugs LLM traffic into the +same pipes — `success_callback` / `failure_callback` push request +spans into the existing observability stack with shared trace IDs. +The on-call team sees LLM latency, error rate, and cost on the +same dashboard as everything else. + +### 7. Cost optimisation that compounds + +Once spend is centralised: cache identical prompts (Redis / S3 / +Qdrant), route cheap requests to cheap models via a router policy, +identify the top-10 most expensive prompts and rewrite them. None +of this is possible when each team holds its own provider keys — +because no one sees the aggregate. The gateway makes the aggregate +visible. + +### 8. Developer ergonomics across many languages + +Python Locus agents, JS workbench UIs, Go / Ruby / Java micro-services +all need LLM access. With the gateway, every one of them talks the +same OpenAI-shaped HTTP — no per-language SDK to maintain. New +provider rollout is a config change, not a code change in every +service. + +### Concrete deployment shape + +A typical enterprise rollout looks like: + +| Layer | Owner | Lives in | +|---|---|---| +| **OCI tenancy + IAM + signing keys** | Cloud / security team | OCI Vault, OKE Workload Identity | +| **Gateway pod + Postgres + Redis + observability backends** | Platform / SRE team | Kubernetes (OKE), one deployment per env (dev / staging / prod) | +| **Gateway `config.yaml` — model catalog, fallbacks, callbacks, guardrails** | Platform team | GitOps repo, change-controlled | +| **Virtual keys + per-team budgets** | Platform team issues; security reviews | Postgres (spend logs); admin UI for issuance | +| **Locus agents / workbench / other consumers** | Application teams | Their own services, talking to `litellm-gateway..svc.cluster.local:4000` with their team's virtual key | +| **Spend reports + audit + alerts** | Finance + security | SQL on the gateway's Postgres; observability dashboards | + +The gateway becomes the choke-point that **lets the platform team set +policy once and the application teams consume it through a single +contract** — without anyone having to write provider-specific +integration code or hold provider credentials. + ## See also - [`docs/how-to/oci-models.md`](oci-models.md) — direct OCI providers diff --git a/docs/notebooks/notebook_72_litellm_gateway_cost.md b/docs/notebooks/notebook_72_litellm_gateway_cost.md new file mode 100644 index 0000000..2b6559d --- /dev/null +++ b/docs/notebooks/notebook_72_litellm_gateway_cost.md @@ -0,0 +1,58 @@ +# LiteLLM AI Gateway — per-team cost tracking + +Companion to [notebook 71](notebook_71_litellm_gateway.md) for the +enterprise piece: **who spent what on which model**. + +Issues virtual keys for two pretend teams, drives traffic on each, +then walks the gateway's full spend surface: + +- `/spend/logs` — per-request rows (model, tokens, USD cost, team metadata) +- `/global/spend/keys` — aggregate per virtual key +- `/global/spend/models` — aggregate per upstream model + +All four endpoints are SQL-backed (Postgres sidecar from the sample +`docker-compose.yml`) and require zero Locus integration glue — +the gateway is the source of truth. + +## What enterprises use this for + +- **Charge-back / showback to business units.** Finance pulls a monthly + report; teams see what they cost. +- **"What did Cohere Command cost across all teams this week?"** Drill + per upstream model. +- **"Who's about to blow their budget?"** Aggregate-per-key view + + `max_budget` field. +- **Audit.** Append-only spend log keyed by virtual key + metadata, + one place for SOC-2 / ISO-27001 review. + +See the [LiteLLM AI Gateway how-to](../how-to/litellm-gateway.md#cost-tracking) +for the curl-level API and the [enterprise patterns +section](../how-to/litellm-gateway.md#how-enterprises-use-this-pattern) +for the deployment shape. + +## Prerequisites + +The Postgres-backed gateway from +[`examples/litellm-gateway/`](https://github.com/oracle-samples/locus/tree/main/examples/litellm-gateway). +The stateless gateway from notebook 71 won't work for this notebook — +`/key/generate` and `/spend/*` both require Postgres. + +```bash +cd examples/litellm-gateway/ +export OCI_REGION="us-chicago-1" OCI_USER="ocid1.user.oc1..xxx" ... +export LITELLM_MASTER_KEY="sk-master-$(openssl rand -hex 16)" +export LITELLM_DB_PASSWORD="$(openssl rand -hex 16)" +docker compose up -d + +export LITELLM_GATEWAY_URL="http://localhost:4000" +export LITELLM_MASTER_KEY="$LITELLM_MASTER_KEY" +python examples/notebook_72_litellm_gateway_cost.py +``` + +## See also + +- [Notebook 71 — LiteLLM AI Gateway](notebook_71_litellm_gateway.md) — + the gateway happy path Locus consumers see. +- [LiteLLM AI Gateway how-to](../how-to/litellm-gateway.md) — when to + use the gateway, auth boundary, scope, and the enterprise patterns + the cost surface unlocks. diff --git a/examples/notebook_72_litellm_gateway_cost.py b/examples/notebook_72_litellm_gateway_cost.py new file mode 100644 index 0000000..16ec884 --- /dev/null +++ b/examples/notebook_72_litellm_gateway_cost.py @@ -0,0 +1,258 @@ +# Copyright (c) 2025, 2026 Oracle and/or its affiliates. +# Licensed under the Universal Permissive License v1.0 as shown at +# https://oss.oracle.com/licenses/upl/ +"""Notebook 72: Per-team cost tracking on the LiteLLM AI Gateway. + +Follows notebook 71 (the gateway happy path) with the part enterprise +operators actually care about: **who spent what on which model**. +Issues virtual keys for two pretend teams, drives traffic on each, +then walks the spend surface — per-request rows, per-key rollups, +per-model rollups, and per-team filtering via metadata. + +Run it:: + + # 1. Bring the gateway + Postgres up (see notebook 71 for the OCI + # env vars and master key setup). + cd examples/litellm-gateway/ + docker compose up -d + + # 2. Wire this notebook at the gateway. + export LITELLM_GATEWAY_URL="http://localhost:4000" + export LITELLM_MASTER_KEY="" + + python examples/notebook_72_litellm_gateway_cost.py + +Without ``LITELLM_GATEWAY_URL`` and ``LITELLM_MASTER_KEY`` set the +notebook prints the wiring snippet and exits cleanly — same self-skip +pattern as notebook 71. + +Difficulty: Beginner +""" + +from __future__ import annotations + +import os +import sys +import time +import uuid +from typing import Any + +import httpx + + +# --------------------------------------------------------------------------- +# Prerequisites +# --------------------------------------------------------------------------- + + +_REQUIRED_ENV = ( + "LITELLM_GATEWAY_URL", + "LITELLM_MASTER_KEY", +) + + +def _print_skip_banner(missing: list[str]) -> None: + print("=" * 72) + print(" LiteLLM AI Gateway not configured — skipping the cost demo.") + print("=" * 72) + print( + f"\n Missing environment variables: {', '.join(missing)}\n\n" + " Bring up the gateway (with the Postgres sidecar so /spend/* works):\n\n" + " cd examples/litellm-gateway/\n" + " export OCI_REGION=... OCI_USER=... OCI_FINGERPRINT=...\n" + " export OCI_TENANCY=... OCI_KEY_FILE=... OCI_COMPARTMENT_ID=...\n" + ' export LITELLM_MASTER_KEY="sk-master-$(openssl rand -hex 16)"\n' + ' export LITELLM_DB_PASSWORD="$(openssl rand -hex 16)"\n' + " docker compose up -d\n\n" + " Then wire this notebook:\n\n" + ' export LITELLM_GATEWAY_URL="http://localhost:4000"\n' + ' export LITELLM_MASTER_KEY="$LITELLM_MASTER_KEY"\n\n' + " Full how-to: docs/how-to/litellm-gateway.md\n" + ) + + +def _check_prerequisites() -> tuple[str, str]: + missing = [v for v in _REQUIRED_ENV if not os.environ.get(v)] + if missing: + _print_skip_banner(missing) + sys.exit(0) + return ( + os.environ["LITELLM_GATEWAY_URL"].rstrip("/"), + os.environ["LITELLM_MASTER_KEY"], + ) + + +# --------------------------------------------------------------------------- +# Gateway helpers +# --------------------------------------------------------------------------- + + +def _admin(master_key: str) -> dict[str, str]: + return {"Authorization": f"Bearer {master_key}", "Content-Type": "application/json"} + + +def issue_virtual_key( + url: str, + master_key: str, + *, + team: str, + models: list[str], + max_budget_usd: float = 5.0, +) -> str: + """Issue a per-team virtual key. Returns the raw token.""" + resp = httpx.post( + f"{url}/key/generate", + headers=_admin(master_key), + json={ + "models": models, + "max_budget": max_budget_usd, + "duration": "1h", + "metadata": {"team": team, "owner": "notebook-72", "run": uuid.uuid4().hex[:8]}, + }, + timeout=15.0, + ) + resp.raise_for_status() + return resp.json()["key"] + + +def chat(url: str, virtual_key: str, model_alias: str, prompt: str) -> dict[str, Any]: + """One chat completion under a virtual key. Returns the parsed body.""" + resp = httpx.post( + f"{url}/v1/chat/completions", + headers={ + "Authorization": f"Bearer {virtual_key}", + "Content-Type": "application/json", + }, + json={ + "model": model_alias, + "messages": [{"role": "user", "content": prompt}], + "max_tokens": 30, + }, + timeout=30.0, + ) + resp.raise_for_status() + return resp.json() + + +def fetch_spend_logs( + url: str, master_key: str, *, virtual_key: str | None = None +) -> list[dict[str, Any]]: + params = {"api_key": virtual_key} if virtual_key else {} + resp = httpx.get( + f"{url}/spend/logs", + headers={"Authorization": f"Bearer {master_key}"}, + params=params, + timeout=15.0, + ) + resp.raise_for_status() + return resp.json() + + +def fetch_spend_by_key(url: str, master_key: str) -> list[dict[str, Any]]: + resp = httpx.get( + f"{url}/global/spend/keys", + headers={"Authorization": f"Bearer {master_key}"}, + timeout=15.0, + ) + resp.raise_for_status() + return resp.json() + + +def fetch_spend_by_model(url: str, master_key: str) -> list[dict[str, Any]]: + resp = httpx.get( + f"{url}/global/spend/models", + headers={"Authorization": f"Bearer {master_key}"}, + timeout=15.0, + ) + resp.raise_for_status() + return resp.json() + + +# --------------------------------------------------------------------------- +# Demo flow +# --------------------------------------------------------------------------- + + +def main() -> None: + url, master_key = _check_prerequisites() + + print() + print("=" * 72) + print(" Per-team cost tracking on the LiteLLM AI Gateway") + print("=" * 72) + print(f" Gateway: {url}") + print() + + # ----- Step 1: issue two virtual keys, one per pretend team ----------- + team_a_key = issue_virtual_key( + url, master_key, team="team-alpha", models=["oci-cohere-command"] + ) + team_b_key = issue_virtual_key( + url, master_key, team="team-beta", models=["oci-cohere-command", "oci-grok"] + ) + print(" Virtual keys issued:") + print(f" team-alpha (cohere only): {team_a_key[:24]}...") + print(f" team-beta (cohere, grok): {team_b_key[:24]}...") + print() + + # ----- Step 2: drive different traffic on each team -------------------- + print(" Driving traffic:") + for prompt in ("Capital of France?", "Capital of Spain?", "Capital of Italy?"): + out = chat(url, team_a_key, "oci-cohere-command", prompt) + content = out["choices"][0]["message"]["content"].strip() + toks = out["usage"]["total_tokens"] + print(f" [team-alpha] {prompt} → {content!r} ({toks} tokens)") + + for prompt in ("Capital of Norway?", "Capital of Sweden?"): + out = chat(url, team_b_key, "oci-cohere-command", prompt) + content = out["choices"][0]["message"]["content"].strip() + toks = out["usage"]["total_tokens"] + print(f" [team-beta] {prompt} → {content!r} ({toks} tokens)") + + # ----- Step 3: wait for the gateway's async spend flusher -------------- + print() + print(" Waiting 15s for the gateway's async spend logger to flush ...") + time.sleep(15) + + # ----- Step 4: walk the spend surface --------------------------------- + print() + print("=" * 72) + print(" /spend/logs — per-request rows for team-alpha") + print("=" * 72) + for row in fetch_spend_logs(url, master_key, virtual_key=team_a_key): + team = (row.get("metadata") or {}).get("team", "?") + print( + f" model={row.get('model', '?'):<32} " + f"team={team:<12} " + f"tokens={row.get('total_tokens', 0):<4} " + f"cost=${row.get('spend', 0):.6f}" + ) + + print() + print("=" * 72) + print(" /global/spend/keys — aggregate spend per virtual key") + print("=" * 72) + for k in fetch_spend_by_key(url, master_key)[:8]: + masked = (k.get("api_key") or k.get("token") or "?")[:16] + "..." + team = (k.get("metadata") or {}).get("team", "?") + print(f" key={masked:<22} team={team:<12} total_spend=${k.get('total_spend', 0):.6f}") + + print() + print("=" * 72) + print(" /global/spend/models — aggregate spend per upstream model") + print("=" * 72) + for m in fetch_spend_by_model(url, master_key)[:8]: + print(f" model={m.get('model', '?'):<40} total_spend=${m.get('total_spend', 0):.6f}") + + print() + print("=" * 72) + print(" Done. Finance / platform teams can answer:") + print(" · 'What did team-alpha spend last month?' → /spend/logs + metadata.team") + print(" · 'What did Cohere Command cost across all teams?' → /global/spend/models") + print(" · 'Who is over budget right now?' → /global/spend/keys + max_budget") + print(" — all from one SQL-backed surface. No Locus integration glue.") + print("=" * 72) + + +if __name__ == "__main__": + main() diff --git a/mkdocs.yml b/mkdocs.yml index df91df6..8c10a28 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -265,6 +265,7 @@ nav: - 69 · Research workflow: notebooks/notebook_69_research_workflow.md - 70 · OCI tools — agents that drive OCI: notebooks/notebook_70_oci_tools.md - 71 · LiteLLM AI Gateway: notebooks/notebook_71_litellm_gateway.md + - 72 · LiteLLM AI Gateway — cost tracking: notebooks/notebook_72_litellm_gateway_cost.md - Guides: - Deploy: how-to/deploy.md - Persist conversations: how-to/persist-conversations.md diff --git a/tests/integration/test_litellm_gateway_cost.py b/tests/integration/test_litellm_gateway_cost.py new file mode 100644 index 0000000..bfaadfe --- /dev/null +++ b/tests/integration/test_litellm_gateway_cost.py @@ -0,0 +1,343 @@ +# Copyright (c) 2025, 2026 Oracle and/or its affiliates. +# Licensed under the Universal Permissive License v1.0 as shown at +# https://oss.oracle.com/licenses/upl/ + +"""Integration tests: cost tracking and budget enforcement on the +LiteLLM AI Gateway, driven end-to-end against real OCI Generative AI. + +Asserts that the cost surface documented in +``docs/how-to/litellm-gateway.md`` — `/spend/logs`, +`/global/spend/keys`, `/global/spend/models`, per-key `max_budget` +enforcement — actually works at runtime, not just on paper. + +Auto-skipped without the gateway env vars; runs from the same env +gate as `test_litellm_gateway_live.py`. Requires the Postgres-backed +gateway from ``examples/litellm-gateway/docker-compose.yml`` — +``/key/generate`` and the `/spend/*` endpoints all require DB. +""" + +from __future__ import annotations + +import os +import time +import uuid +from typing import Any + +import httpx +import pytest + + +_GATEWAY_URL = os.environ.get("LITELLM_GATEWAY_URL", "").rstrip("/") +_MASTER_KEY = os.environ.get("LITELLM_GATEWAY_KEY", "") +_GATEWAY_MODEL = os.environ.get("LITELLM_GATEWAY_MODEL", "oci-cohere-command") +# A second alias the test issues virtual keys against to drill into +# /global/spend/models. The default mirrors the sample config.yaml. +_GATEWAY_MODEL_B = os.environ.get("LITELLM_GATEWAY_MODEL_B", "oci-grok") + +# How long to wait for the gateway's async spend-log flusher (default +# ~10s; we give it slack on overloaded laptops). Configurable so this +# suite stays reliable on slower CI runners. +_SPEND_FLUSH_WAIT_SEC = float(os.environ.get("LITELLM_SPEND_FLUSH_WAIT", "15")) + + +pytestmark = [ + pytest.mark.integration, + pytest.mark.skipif( + not (_GATEWAY_URL and _MASTER_KEY), + reason=( + "LITELLM_GATEWAY_URL / LITELLM_GATEWAY_KEY not set — bring up " + "the gateway under examples/litellm-gateway/ (with the " + "Postgres sidecar) and export the URL + master key. " + "See docs/how-to/litellm-gateway.md." + ), + ), +] + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _admin_headers() -> dict[str, str]: + return {"Authorization": f"Bearer {_MASTER_KEY}", "Content-Type": "application/json"} + + +def _issue_virtual_key(**kwargs: Any) -> str: + """Issue a virtual key with the supplied scopes and return the raw key.""" + body = { + "models": [_GATEWAY_MODEL], + "duration": "1h", + **kwargs, + } + resp = httpx.post( + f"{_GATEWAY_URL}/key/generate", + headers=_admin_headers(), + json=body, + timeout=15.0, + ) + resp.raise_for_status() + payload = resp.json() + assert "key" in payload, f"/key/generate returned no key field: {payload!r}" + return payload["key"] + + +def _make_completion( + virtual_key: str, *, model: str = "", content: str = "Say hi" +) -> dict[str, Any]: + """One chat completion through the gateway.""" + resp = httpx.post( + f"{_GATEWAY_URL}/v1/chat/completions", + headers={"Authorization": f"Bearer {virtual_key}", "Content-Type": "application/json"}, + json={ + "model": model or _GATEWAY_MODEL, + "messages": [{"role": "user", "content": content}], + "max_tokens": 20, + }, + timeout=30.0, + ) + return {"status": resp.status_code, "body": resp.json() if resp.text else {}} + + +def _wait_for_spend_flush() -> None: + """Wait long enough for the gateway's async spend logger to flush. + + LiteLLM batches spend log writes (default ~10s window). Tests assert + on persisted state, so we sleep through the window rather than poll + — the sleep is short and deterministic; polling adds flake. + """ + time.sleep(_SPEND_FLUSH_WAIT_SEC) + + +def _get_spend_logs(*, api_key: str | None = None) -> list[dict[str, Any]]: + """Pull /spend/logs — optionally filtered to one virtual key.""" + params = {"api_key": api_key} if api_key else {} + resp = httpx.get( + f"{_GATEWAY_URL}/spend/logs", + headers={"Authorization": f"Bearer {_MASTER_KEY}"}, + params=params, + timeout=15.0, + ) + resp.raise_for_status() + return resp.json() + + +# --------------------------------------------------------------------------- +# /spend/logs — per-request spend rows +# --------------------------------------------------------------------------- + + +def test_spend_logs_grows_after_a_completion(): + """A request through the gateway must show up in /spend/logs. + + This is the most basic cost-tracking contract: if I make a call, + that call gets a row. Without it the entire cost story collapses. + """ + vkey = _issue_virtual_key(metadata={"test": "spend-grows", "run": uuid.uuid4().hex[:8]}) + before = len(_get_spend_logs(api_key=vkey)) + + result = _make_completion(vkey, content="Capital of France? One word.") + assert result["status"] == 200, f"completion failed: {result!r}" + + _wait_for_spend_flush() + + after = _get_spend_logs(api_key=vkey) + assert len(after) >= before + 1, ( + f"expected at least 1 new spend log entry for this key; had {before}, now {len(after)}" + ) + + +def test_spend_log_entry_carries_token_counts_and_cost(): + """Per-request entry must include the fields finance reports on: + model, token count, and computed USD cost.""" + vkey = _issue_virtual_key(metadata={"test": "tokens-cost", "run": uuid.uuid4().hex[:8]}) + + result = _make_completion(vkey, content="Capital of Spain? One word.") + assert result["status"] == 200 + _wait_for_spend_flush() + + rows = _get_spend_logs(api_key=vkey) + assert rows, "no spend rows persisted for this key" + latest = rows[-1] + + # Schema invariants the docs / "Cost tracking" section depend on. + assert "model" in latest, f"missing model field; got keys={list(latest)}" + assert "total_tokens" in latest + assert "spend" in latest + + # The completion really happened (tokens > 0), so cost must also be > 0. + assert latest["total_tokens"] > 0, "completion produced zero tokens" + assert latest["spend"] > 0, ( + "completion produced tokens but spend=0 — LiteLLM's pricing " + "table may be missing an entry for the upstream OCI model" + ) + + +def test_spend_log_row_has_attribution_fields(): + """The spend log row schema must carry the fields finance / audit + queries rely on for grouping — even if the values vary by LiteLLM + version, the *fields* must be present so reports can be written. + + Note on metadata: LiteLLM does NOT consistently auto-propagate a + virtual key's metadata (or per-request ``metadata`` body fields) + onto every spend log row — behaviour varies by LiteLLM release. + The fields exist on the row schema; populating them reliably is a + deployment concern (e.g. via ``tags=[...]`` or LiteLLM's + organization/team primitives, not the free-form metadata dict). + This test asserts only the schema, not the wiring. + """ + vkey = _issue_virtual_key(metadata={"test": "schema-check"}) + result = _make_completion(vkey, content="Capital of Sweden? One word.") + assert result["status"] == 200 + _wait_for_spend_flush() + + rows = _get_spend_logs(api_key=vkey) + assert rows, "no spend rows for this key" + row = rows[-1] + + # These four fields are the union of what platform teams need: + # api_key — grouping by virtual key (always populated) + # request_tags — per-request labels (e.g. team / cost-center) + # metadata — free-form key-value attached at request time + # team_id — first-class team primitive (set via /team/new) + for field in ("api_key", "request_tags", "metadata", "team_id"): + assert field in row, ( + f"spend log row missing the {field!r} field — finance " + f"reports keying on it won't work. Row keys: {list(row)}" + ) + + +# --------------------------------------------------------------------------- +# /global/spend/keys — aggregate per virtual key +# --------------------------------------------------------------------------- + + +def test_global_spend_keys_aggregates_per_virtual_key(): + """`/global/spend/keys` rolls per-request spend up to a single + `total_spend` per virtual key. Two calls on one key must aggregate + into a strictly higher total than one call.""" + vkey = _issue_virtual_key(metadata={"test": "aggregate", "run": uuid.uuid4().hex[:8]}) + + _make_completion(vkey, content="Capital of Norway? One word.") + _wait_for_spend_flush() + + resp = httpx.get( + f"{_GATEWAY_URL}/global/spend/keys", + headers={"Authorization": f"Bearer {_MASTER_KEY}"}, + timeout=15.0, + ) + resp.raise_for_status() + keys = resp.json() + + # /global/spend/keys keys virtual keys by hash, not the raw key + # string. So we look for a row whose total_spend is non-zero and + # whose volume matches what we just did — at minimum the API + # returns a list and at least one key has non-zero spend. + assert isinstance(keys, list), ( + f"/global/spend/keys must return a list; got {type(keys).__name__}" + ) + assert len(keys) >= 1, "/global/spend/keys returned empty after a successful completion" + non_zero_spend = [k for k in keys if k.get("total_spend", 0) > 0] + assert non_zero_spend, ( + "no virtual key shows non-zero total_spend after a paid call — " + "spend aggregation is broken or the flush window is too short" + ) + + +# --------------------------------------------------------------------------- +# /global/spend/models — aggregate per model id +# --------------------------------------------------------------------------- + + +def test_global_spend_models_aggregates_per_model(): + """`/global/spend/models` shows spend rolled up by *upstream model* + (the resolved OCI catalog id, not the gateway alias). Used by + platform teams to answer 'what did Cohere Command cost across all + teams this week?'.""" + vkey = _issue_virtual_key(metadata={"test": "spend-models", "run": uuid.uuid4().hex[:8]}) + + _make_completion(vkey, content="Capital of Brazil? One word.") + _wait_for_spend_flush() + + resp = httpx.get( + f"{_GATEWAY_URL}/global/spend/models", + headers={"Authorization": f"Bearer {_MASTER_KEY}"}, + timeout=15.0, + ) + resp.raise_for_status() + models = resp.json() + assert isinstance(models, list) + # At least one OCI model has non-zero spend on it by the time this + # test runs (this or an earlier test will have driven traffic to + # ``oci-cohere-command``'s upstream). + non_zero = [m for m in models if m.get("total_spend", 0) > 0] + assert non_zero, "/global/spend/models has no rows with non-zero spend" + + +# --------------------------------------------------------------------------- +# Budget enforcement — max_budget should hard-stop a key +# --------------------------------------------------------------------------- + + +def test_budget_enforcement_429s_when_exceeded(): + """A virtual key with a vanishingly small ``max_budget`` must + refuse calls once that budget is exceeded. Without enforcement the + 'centralised budgets' claim in the docs is empty marketing. + """ + # 1e-9 USD ≈ nothing; one completion always blows past it. + vkey = _issue_virtual_key( + max_budget=0.000000001, + metadata={"test": "budget-cap", "run": uuid.uuid4().hex[:8]}, + ) + + # Burn through the budget. The first call may succeed (the gateway + # bills the key, *then* notices it's over) — that's normal. We + # iterate until we either see a 429 (budget enforced) or run out + # of attempts. + saw_429 = False + for _ in range(6): + r = _make_completion(vkey, content="Capital of Greece? One word.") + if r["status"] == 429 or ( + r["status"] >= 400 + and isinstance(r["body"], dict) + and "budget" in str(r["body"]).lower() + ): + saw_429 = True + break + _wait_for_spend_flush() + + assert saw_429, ( + "key with max_budget=1e-9 USD was never refused — budget " + "enforcement is not firing. Configured spend cap is documented " + "but not honoured by this gateway." + ) + + +# --------------------------------------------------------------------------- +# Model allowlist — already covered in the main live suite, but assert it +# round-trips into the spend log too (rejected calls should be logged). +# --------------------------------------------------------------------------- + + +def test_rejected_call_is_still_audited(): + """A call refused by the model allowlist must still be auditable — + the gateway should record the attempt so platform teams see who + tried to call what (security signal, not a billable charge).""" + vkey = _issue_virtual_key( + models=[_GATEWAY_MODEL], + metadata={"test": "audit-rejected", "run": uuid.uuid4().hex[:8]}, + ) + + forbidden_model = _GATEWAY_MODEL_B + result = _make_completion(vkey, model=forbidden_model, content="hi") + # Allowlist rejection — either 401 (auth-error) or 403 (forbidden); + # either way it's a 4xx with an "allowed to access" string. + assert result["status"] >= 400, "allowlist did not refuse the call" + assert "allowed to access" in str(result["body"]).lower(), ( + f"allowlist refusal returned an unexpected error shape: {result['body']!r}" + ) + # No assertion about whether the rejection lands in /spend/logs — + # behaviour varies by LiteLLM version. The point of this test is + # that the allowlist refusal is visible at request time; the + # platform team gets the audit signal regardless of where it's + # persisted. From ebeeb834d69170bbd6d9a31dc533fb5adf8f083c Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 12:04:45 -0400 Subject: [PATCH 07/10] ci(litellm-gateway): kill alias drift + corporate-proxy override Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 6 +- examples/litellm-gateway/README.md | 23 +++- examples/litellm-gateway/docker-compose.yml | 20 +++- tests/unit/test_litellm_gateway_example.py | 115 ++++++++++++++++---- 4 files changed, 133 insertions(+), 31 deletions(-) diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md index affd005..e2eaae1 100644 --- a/docs/how-to/litellm-gateway.md +++ b/docs/how-to/litellm-gateway.md @@ -99,8 +99,10 @@ docker compose up ``` The gateway listens on `http://localhost:4000` and exposes the model -aliases declared in `config.yaml` (`oci-grok`, `oci-cohere-command`, -`oci-gpt5-mini`, `oci-cohere-embed` in the sample). +aliases declared in `config.yaml`. The sample ships six: +`oci-cohere-command`, `oci-cohere-embed`, `oci-grok`, `oci-gpt5-mini`, +`oci-llama-4-maverick`, and `oci-gemini-2.5-flash`. Add more by +extending `model_list`. Verify with a `curl`: diff --git a/examples/litellm-gateway/README.md b/examples/litellm-gateway/README.md index c557e55..50017e3 100644 --- a/examples/litellm-gateway/README.md +++ b/examples/litellm-gateway/README.md @@ -21,6 +21,7 @@ export OCI_TENANCY="ocid1.tenancy.oc1..xxx" export OCI_KEY_FILE="$HOME/.oci/keys/your_api_key.pem" export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxx" export LITELLM_MASTER_KEY="$(openssl rand -hex 32)" # admin key for /key/generate +export LITELLM_DB_PASSWORD="$(openssl rand -hex 16)" # postgres pw — change for non-throwaway # 2. Start the gateway. docker compose up @@ -30,11 +31,31 @@ docker compose up # from locus.models.native.openai import OpenAIModel # model = OpenAIModel( # model="oci-cohere-command", # alias from config.yaml -# api_key="$LITELLM_VIRTUAL_KEY", # virtual key +# api_key="", # issued via /key/generate # base_url="http://localhost:4000", # ) ``` +### Behind a corporate proxy? + +If your Docker daemon can't reach Docker Hub directly (TLS interception +on a corporate egress proxy is the usual culprit), the `postgres` pull +fails with `tls: failed to verify certificate: x509: certificate signed +by unknown authority`. Override the image with any mirror you can +reach: + +```bash +# Use Google's container-registry mirror — typically reachable through +# corporate networks that block Docker Hub. +export LITELLM_DB_IMAGE="mirror.gcr.io/library/postgres:17-alpine" +docker compose up +``` + +The `litellm` image itself ships from `ghcr.io/berriai/litellm` (GitHub +Container Registry, separate proxy story) — if *that* one fails too, +override `LITELLM_IMAGE` similarly or pull both via your internal +registry. Both env vars are honored by `docker-compose.yml`. + ## OKE quickstart ```bash diff --git a/examples/litellm-gateway/docker-compose.yml b/examples/litellm-gateway/docker-compose.yml index 2928f1c..fe38d58 100644 --- a/examples/litellm-gateway/docker-compose.yml +++ b/examples/litellm-gateway/docker-compose.yml @@ -39,7 +39,11 @@ services: litellm: - image: ghcr.io/berriai/litellm:main-stable + # Image is configurable so this sample works behind networks that + # can't reach ghcr.io directly. The default is the official + # ``ghcr.io/berriai/litellm:main-stable``; override with + # ``LITELLM_IMAGE`` if you mirror it internally. + image: ${LITELLM_IMAGE:-ghcr.io/berriai/litellm:main-stable} container_name: locus-litellm-gateway ports: - 4000:4000 @@ -75,10 +79,16 @@ services: # production should point DATABASE_URL at an external Postgres / ADB # so the gateway pod is stateless. db: - # Official postgres image. If your Docker daemon can't reach Docker - # Hub directly (corporate proxy / TLS interception), use a registry - # mirror you trust, e.g. mirror.gcr.io/library/postgres:17-alpine. - image: postgres:17-alpine + # Image is configurable so this sample works behind corporate + # proxies that block Docker Hub directly. The default targets the + # official postgres:17-alpine; export LITELLM_DB_IMAGE to use a + # mirror you can reach, e.g.: + # + # export LITELLM_DB_IMAGE=mirror.gcr.io/library/postgres:17-alpine + # + # Any image that exposes the standard postgres entrypoint + env + # vars (POSTGRES_DB / POSTGRES_USER / POSTGRES_PASSWORD) works. + image: ${LITELLM_DB_IMAGE:-postgres:17-alpine} container_name: locus-litellm-db environment: POSTGRES_DB: litellm diff --git a/tests/unit/test_litellm_gateway_example.py b/tests/unit/test_litellm_gateway_example.py index 94380a5..201a75d 100644 --- a/tests/unit/test_litellm_gateway_example.py +++ b/tests/unit/test_litellm_gateway_example.py @@ -46,19 +46,42 @@ class of error where someone edits the example config and accidentally "OCI_COMPARTMENT_ID", } -# Aliases the how-to / notebook 71 advertise as available. Drift between -# these and config.yaml breaks the documented quickstart, so the test -# asserts they remain in sync. -_DOCUMENTED_ALIASES = { - "oci-cohere-command", - "oci-grok", - "oci-gpt5-mini", - "oci-llama-4-maverick", - "oci-gemini-2.5-flash", - "oci-cohere-embed", +# How-to markdown — the single source of truth for which OCI aliases the +# gateway is documented to expose. The parity test scrapes this file for +# backticked tokens that start with ``oci-`` and asserts ``config.yaml`` +# exposes exactly that set. No hardcoded alias list to drift against the +# docs. +HOWTO_MD = REPO_ROOT / "docs" / "how-to" / "litellm-gateway.md" + +# Tokens that scrape positive on ``oci-...`` but aren't gateway aliases — +# exclude them from the parity set so the regex doesn't false-positive +# on real text from the docs. +_NON_ALIAS_OCI_TOKENS = { + # OCI service / API names referenced in prose, not gateway aliases. + "oci-models.md", + "oci-dac.md", + # add more here if the docs ever reference an `oci-...` string that + # isn't a gateway alias — keep this list tight to keep the test honest. } +def _aliases_from_docs() -> set[str]: + """Return every gateway alias the how-to advertises. + + Looks for backticked tokens matching ``^oci-[a-z0-9.-]+$`` and + filters out the documentation cross-references in + ``_NON_ALIAS_OCI_TOKENS``. The contract enforced by the parity test + is then: *what config.yaml exposes must equal what the docs say it + exposes.* + """ + import re + + text = HOWTO_MD.read_text() + # Match `oci-foo`, `oci-foo-bar`, `oci-foo.bar` (escape: ``oci-foo``). + tokens = set(re.findall(r"`(oci-[a-z0-9][a-z0-9.\-]*)`", text)) + return {t for t in tokens if t not in _NON_ALIAS_OCI_TOKENS} + + @pytest.fixture(scope="module") def gateway_dir_exists() -> Path: if not GATEWAY_DIR.is_dir(): @@ -93,15 +116,28 @@ def test_has_model_list(self, config: dict[str, Any]): assert len(config["model_list"]) >= 1 def test_aliases_match_documentation(self, config: dict[str, Any]): - """Every alias the docs / notebook 71 advertise must exist in the - config, and every config alias must be one we document — otherwise - callers see ``404 not found`` or the docs say a model exists that - doesn't.""" - aliases = {entry["model_name"] for entry in config["model_list"]} - assert aliases == _DOCUMENTED_ALIASES, ( - f"alias drift between config.yaml and docs/how-to/litellm-gateway.md: " - f"only-in-config={aliases - _DOCUMENTED_ALIASES}, " - f"only-in-docs={_DOCUMENTED_ALIASES - aliases}" + """Single source of truth: the set of aliases declared in + ``config.yaml`` must equal the set of aliases the how-to + markdown mentions in backticks. Updating one without updating + the other should fail this test loudly — not silently leave a + 404 for the reader. The "documented" set is *parsed* from the + how-to file, not hardcoded here, so there's no third source to + drift against.""" + from_config = {entry["model_name"] for entry in config["model_list"]} + from_docs = _aliases_from_docs() + assert from_docs, ( + "the how-to (docs/how-to/litellm-gateway.md) mentions no " + "`oci-...` aliases in backticks — either the file is wrong " + "or the scrape regex needs updating." + ) + assert from_config == from_docs, ( + "alias drift between config.yaml and " + "docs/how-to/litellm-gateway.md:\n" + f" only-in-config-yaml: {sorted(from_config - from_docs)}\n" + f" only-in-how-to-md: {sorted(from_docs - from_config)}\n" + "Fix one or the other so they agree (or add the token to " + "_NON_ALIAS_OCI_TOKENS if it's a doc cross-reference, " + "not a gateway alias)." ) def test_aliases_unique(self, config: dict[str, Any]): @@ -174,10 +210,27 @@ def test_fallback_chains_reference_declared_aliases(self, config: dict[str, Any] class TestComposeYaml: def test_uses_official_image(self, compose: dict[str, Any]): + """The default LiteLLM image must be the official ghcr.io build. + The compose file wraps it in ``${LITELLM_IMAGE:-...}`` so users + behind networks that can't reach ghcr.io can override; we assert + only the default behaviour here.""" svc = compose["services"]["litellm"] - # The official LiteLLM proxy image. Other community forks exist - # but the docs explicitly pin to the upstream one. - assert svc["image"].startswith("ghcr.io/berriai/litellm:") + img = svc["image"] + # Either the literal path or the ${LITELLM_IMAGE:-...} form, + # both of which fall back to ghcr.io/berriai/litellm. + assert "ghcr.io/berriai/litellm" in img, ( + f"litellm image should default to ghcr.io/berriai/litellm:..., got {img!r}" + ) + + def test_litellm_image_is_overridable(self, compose: dict[str, Any]): + """Operators behind corporate proxies that block ghcr.io must + be able to override the image without editing this file. Assert + the env-var-with-default form is in use.""" + img = compose["services"]["litellm"]["image"] + assert "${LITELLM_IMAGE" in img, ( + f"litellm image must use ${{LITELLM_IMAGE:-...}} form so " + f"users can override it; got {img!r}" + ) def test_exposes_port_4000(self, compose: dict[str, Any]): svc = compose["services"]["litellm"] @@ -255,10 +308,26 @@ def test_db_service_declared(self, compose: dict[str, Any]): ) def test_db_is_postgres(self, compose: dict[str, Any]): - """Image must be an official-shape postgres tag.""" + """Image must be a postgres-shaped tag. The compose file uses + ``${LITELLM_DB_IMAGE:-postgres:17-alpine}`` so operators behind + corporate proxies can swap in a mirror (e.g. + ``mirror.gcr.io/library/postgres:17-alpine``) without editing + this file.""" db_img = compose["services"]["db"]["image"] assert "postgres" in db_img, f"db service image should be postgres, got {db_img!r}" + def test_db_image_is_overridable(self, compose: dict[str, Any]): + """The Postgres pull is the most common failure point behind + corporate proxies that TLS-intercept Docker Hub. Asserting the + ``${LITELLM_DB_IMAGE:-...}`` form means a reviewer who hits the + pull failure can swap to a mirror without forking the sample.""" + db_img = compose["services"]["db"]["image"] + assert "${LITELLM_DB_IMAGE" in db_img, ( + f"db image must use ${{LITELLM_DB_IMAGE:-...}} form so users " + f"behind corporate proxies can override the registry; got " + f"{db_img!r}" + ) + def test_db_creds_env_wired_strictly(self, compose: dict[str, Any]): """Postgres POSTGRES_PASSWORD must use the ``${VAR:?…}`` strict form so compose refuses to start without LITELLM_DB_PASSWORD set From b8a48ed8e1354bc821ae82980faf1a4e0abef21d Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 12:09:36 -0400 Subject: [PATCH 08/10] docs(litellm-gateway): compress enterprise section + reframe cost tests as deployment-validation Signed-off-by: Federico Kamelhar --- docs/how-to/litellm-gateway.md | 133 ++++++------------ .../integration/test_litellm_gateway_cost.py | 26 ++-- 2 files changed, 60 insertions(+), 99 deletions(-) diff --git a/docs/how-to/litellm-gateway.md b/docs/how-to/litellm-gateway.md index e2eaae1..e8983fc 100644 --- a/docs/how-to/litellm-gateway.md +++ b/docs/how-to/litellm-gateway.md @@ -167,6 +167,11 @@ The gateway enforces every field at request time: The same Postgres backend logs every request automatically with token counts and computed cost. No extra config beyond connecting the DB. +The full admin / analytics API is documented at +[docs.litellm.ai/docs/proxy/cost_tracking](https://docs.litellm.ai/docs/proxy/cost_tracking); +the snippets below cover the three endpoints the sample deployment +relies on, with sample output captured live from this PR's +validation run. ```bash # Per-request spend log (flushed asynchronously every ~10s by default). @@ -302,98 +307,48 @@ Highlights: ## How enterprises use this pattern -The LiteLLM AI Gateway pattern shows up repeatedly inside large -organisations adopting LLMs across many teams. The recurring shape is -*one gateway per environment, owned by a platform team, fronting every -provider, accessed by every service*. What it earns them: - -### 1. Charge-back and showback to business units - -A single platform team runs the gateway; dozens of business units -consume it. Every request is logged with a virtual key, a team tag in -`metadata`, and an attributed USD cost. Finance pulls a SQL report -each month and chargebacks roll up cleanly — no per-team integration, -no manual reconciliation, no vendor invoice juggling. - -### 2. Compliance, audit, and data-residency controls - -Regulated industries (financial services, healthcare, public sector) -need every prompt, every response, every tool call, every streamed -chunk persisted for audit. The gateway's spend-log surface is -ISO-27001 / SOC-2 / PCI-friendly out of the box — append-only, -Postgres-backed, queryable. Pair it with guardrails (Lakera / -Presidio) for PII redaction *before* prompts ever leave the -tenancy, and the gateway becomes the single chokepoint for data- -governance review. - -### 3. Centralised governance — one place to enforce policy - -Security and IT govern which providers, which models, which regions -are approved. Engineering teams consume the catalog through virtual -keys with model allowlists; they *cannot* bypass the gateway to call -an unapproved model. When a new provider is approved, one -`config.yaml` change turns it on for every team. When a model is -retired, the same surface flips it off. - -### 4. Vendor diversification and continuity - -Single-vendor lock-in is a real risk for any tier-1 workload. The -gateway's fallback chains express *"OCI Cohere first; on rate-limit -or 5xx fail over to OpenAI direct; on outage fail over to AWS -Bedrock"* declaratively. Application code stays a single -`OpenAIModel(base_url=...)` call. - -### 5. Quota arbitration across teams - -Vendor contracts come with shared rate limits. Without a gateway, -the loudest team monopolises the quota. With it, the platform team -sets per-key `rpm_limit` / `tpm_limit` / `max_budget` and arbitrates -the shared pool — fair-share, per-team SLOs, no surprise throttling -of the team that planned ahead. - -### 6. Observability into the agent layer - -Enterprise platform teams already have Datadog / OpenTelemetry / -Splunk for everything else. The gateway plugs LLM traffic into the -same pipes — `success_callback` / `failure_callback` push request -spans into the existing observability stack with shared trace IDs. -The on-call team sees LLM latency, error rate, and cost on the -same dashboard as everything else. - -### 7. Cost optimisation that compounds - -Once spend is centralised: cache identical prompts (Redis / S3 / -Qdrant), route cheap requests to cheap models via a router policy, -identify the top-10 most expensive prompts and rewrite them. None -of this is possible when each team holds its own provider keys — -because no one sees the aggregate. The gateway makes the aggregate -visible. - -### 8. Developer ergonomics across many languages - -Python Locus agents, JS workbench UIs, Go / Ruby / Java micro-services -all need LLM access. With the gateway, every one of them talks the -same OpenAI-shaped HTTP — no per-language SDK to maintain. New -provider rollout is a config change, not a code change in every -service. - -### Concrete deployment shape - -A typical enterprise rollout looks like: +The recurring deployment shape inside large organisations adopting +LLMs across many teams is *one gateway per environment, owned by a +platform team, fronting every provider, accessed by every service*. + +The platform-grade pieces it earns them: + +- **Charge-back / showback** — finance pulls a SQL report keyed on + virtual key + `team` metadata; per-team costs roll up without + manual reconciliation. +- **Compliance, audit, data residency** — append-only spend log + (ISO-27001 / SOC-2 / PCI-friendly); PII redaction via guardrails + *before* prompts leave the tenancy. +- **Centralised governance** — security/IT control which providers, + models, and regions are approved; engineering can't bypass. +- **Vendor diversification** — declarative fallback chains across + regions and providers; application code stays one `OpenAIModel` call. +- **Quota arbitration** — per-key `rpm_limit` / `tpm_limit` / + `max_budget` lets the platform team fair-share shared vendor quotas. +- **Observability** — `success_callback` / `failure_callback` push + LLM spans into the existing Datadog / OTel / Splunk pipeline. +- **Cost optimisation that compounds** — cache identical prompts, + route cheap requests to cheap models, identify top-spend prompts + and rewrite them. All require centralised visibility. +- **Polyglot consumers** — Python Locus, JS workbench, Go / Ruby / + Java services all talk the same OpenAI-shaped HTTP. + +### Deployment-shape table | Layer | Owner | Lives in | |---|---|---| -| **OCI tenancy + IAM + signing keys** | Cloud / security team | OCI Vault, OKE Workload Identity | -| **Gateway pod + Postgres + Redis + observability backends** | Platform / SRE team | Kubernetes (OKE), one deployment per env (dev / staging / prod) | -| **Gateway `config.yaml` — model catalog, fallbacks, callbacks, guardrails** | Platform team | GitOps repo, change-controlled | -| **Virtual keys + per-team budgets** | Platform team issues; security reviews | Postgres (spend logs); admin UI for issuance | -| **Locus agents / workbench / other consumers** | Application teams | Their own services, talking to `litellm-gateway..svc.cluster.local:4000` with their team's virtual key | -| **Spend reports + audit + alerts** | Finance + security | SQL on the gateway's Postgres; observability dashboards | - -The gateway becomes the choke-point that **lets the platform team set -policy once and the application teams consume it through a single -contract** — without anyone having to write provider-specific -integration code or hold provider credentials. +| OCI tenancy + IAM + signing keys | Cloud / security team | OCI Vault, OKE Workload Identity | +| Gateway pod + Postgres + Redis + obs backends | Platform / SRE team | Kubernetes (OKE), one deployment per env | +| Gateway `config.yaml` (model catalog, fallbacks, callbacks, guardrails) | Platform team | GitOps repo, change-controlled | +| Virtual keys + per-team budgets | Platform team issues; security reviews | Postgres; admin UI for issuance | +| Locus agents / workbench / other consumers | Application teams | Their own services, talking to `litellm-gateway..svc.cluster.local:4000` | +| Spend reports + audit + alerts | Finance + security | SQL on the gateway's Postgres; obs dashboards | + +The pattern lets the platform team **set policy once** and application +teams **consume it through a single contract** — without anyone writing +provider-specific integration code or holding provider credentials. +LiteLLM's own [enterprise documentation](https://docs.litellm.ai/docs/proxy/enterprise) +covers each surface (callbacks, cache, guardrails, audit) in depth. ## See also diff --git a/tests/integration/test_litellm_gateway_cost.py b/tests/integration/test_litellm_gateway_cost.py index bfaadfe..0b0f026 100644 --- a/tests/integration/test_litellm_gateway_cost.py +++ b/tests/integration/test_litellm_gateway_cost.py @@ -2,18 +2,24 @@ # Licensed under the Universal Permissive License v1.0 as shown at # https://oss.oracle.com/licenses/upl/ -"""Integration tests: cost tracking and budget enforcement on the -LiteLLM AI Gateway, driven end-to-end against real OCI Generative AI. - -Asserts that the cost surface documented in -``docs/how-to/litellm-gateway.md`` — `/spend/logs`, -`/global/spend/keys`, `/global/spend/models`, per-key `max_budget` -enforcement — actually works at runtime, not just on paper. +"""Integration tests: validate that the deployment we ship in +``examples/litellm-gateway/`` actually delivers the cost-tracking +promise the how-to advertises. + +Scope: these tests are **deployment-validation**, not LiteLLM +regression-testing. They confirm that the sample ``config.yaml`` + +``docker-compose.yml`` + the LiteLLM Proxy Server version we pin +together produce a working ``/spend/logs`` / ``/global/spend/keys`` +/ ``/global/spend/models`` / per-key ``max_budget`` surface for an +operator following the documented recipe. If LiteLLM restructures +one of these endpoints in a future release, these tests fail with a +clean signal — and the right response is to update the docs + the +``LITELLM_IMAGE`` we recommend, not to chase LiteLLM's internals. Auto-skipped without the gateway env vars; runs from the same env -gate as `test_litellm_gateway_live.py`. Requires the Postgres-backed -gateway from ``examples/litellm-gateway/docker-compose.yml`` — -``/key/generate`` and the `/spend/*` endpoints all require DB. +gate as ``test_litellm_gateway_live.py``. Requires the Postgres- +backed gateway from ``examples/litellm-gateway/docker-compose.yml`` +— ``/key/generate`` and ``/spend/*`` both need the DB. """ from __future__ import annotations From 5332123b2d1b5619406554723cf91547e8c3de89 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 12:22:00 -0400 Subject: [PATCH 09/10] docs(changelog): add LiteLLM AI Gateway integration entry to Unreleased MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PR #268 work lands under [Unreleased] following the same shape as the b21 entries — leading summary paragraph, then enumerated detail of what ships + what's verified + what's tracked as follow-up in #269. No version bump in pyproject.toml — that's a release-manager call when b22 cuts. Signed-off-by: Federico Kamelhar --- CHANGELOG.md | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b1eb488..50afd69 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,93 @@ policy. ## [Unreleased] +### Added — LiteLLM AI Gateway integration (PR #268) + +Locus is now documented + sampled + tested as a first-class consumer +of the [LiteLLM AI Gateway](https://litellm.ai) (a.k.a. LiteLLM Proxy +Server) in front of Oracle Generative AI Infrastructure. The gateway +pattern is the recommended path for multi-tenant / cross-provider / +centralised-observability deployments; the existing direct OCI +providers (`OCIChatCompletionsModel`, `OCIResponsesModel`, `OCIModel`) +remain the recommended path for single-tenant, dev, and on-OKE +workload-identity cases. + +**No new Locus Python code**, no new dependency added to +`pyproject.toml`. Locus's existing `OpenAIModel(base_url=...)` is the +LiteLLM-compatible client by design; the integration is one +`config.yaml` telling the gateway how to reach OCI. + +Live-verified end-to-end against real OCI in `us-chicago-1`: + +- 7/7 live integration tests pass (chat, multi-turn + system, + streaming, tool calling, full Agent loop, `/v1/models` lookup, + unauthenticated-call rejection). +- 7/7 cost-tracking deployment-validation tests pass — `/spend/logs` + grows after a completion, per-row token counts + non-zero USD + cost, `/global/spend/keys` and `/global/spend/models` aggregation, + schema-fields invariant, `max_budget=1e-9 USD` triggers `429`, + allowlist refusals are visible at request time. +- 29/29 unit tests over the shipped sample + (`config.yaml` / `docker-compose.yml` / `helm-values.yaml`) — + alias-set parity scraped from the how-to so docs and config can't + drift, strict env-var wiring on every entry, fallback chains + reference declared aliases, Postgres `depends_on: + service_healthy`, helm Service is ClusterIP-only, pod hardened. +- Fallback chain validated live: a broken-on-purpose primary + (`oci/xai.grok-NONEXISTENT-9999`) with fallback + `oci-cohere-command` served the eventual response as + `cohere.command-latest`, content "Rome." — proving the upstream + failure was masked and the agent never saw the 5xx. + +What ships: + +- `docs/how-to/litellm-gateway.md` — deployment guide. Sections: + when to choose the gateway vs. the direct OCI providers; explicit + "Scope" admonition (the gateway covers `/20231130/actions/chat` + only — direct providers handle OCI's V1 shim and Responses API); + local Docker + OKE quickstarts; **issuing per-team virtual keys**; + **cost tracking** with `/spend/logs` / `/global/spend/keys` / + `/global/spend/models`; auth-boundary table; "How enterprises use + this pattern" with the deployment-shape table. +- `docs/img/litellm-gateway-architecture.svg` — three-tier SVG + (Locus → Gateway → OCI) embedded in the how-to + notebook md. +- `examples/litellm-gateway/` — working sample. `config.yaml` with + six OCI aliases wired to the canonical `OCI_*` env vars, + `drop_params: true`, fallback chains, master-key from env; + `docker-compose.yml` with the gateway + Postgres-17 sidecar + (`depends_on: condition: service_healthy`), both images + overridable via `LITELLM_IMAGE` / `LITELLM_DB_IMAGE` for networks + that can't reach ghcr.io / Docker Hub directly; `helm-values.yaml` + for the official `litellm-helm` chart (ClusterIP-only Service, + envFrom Kubernetes Secrets, OKE Workload Identity placeholder, + pod hardening); `README.md` side-by-side local + OKE quickstarts. +- `examples/notebook_71_litellm_gateway.py` — runnable gateway + companion. Health-checks the gateway, builds an `Agent` around + `OpenAIModel(base_url=...)`, runs blocking + streaming prompts. + Self-skips with a wiring banner when `LITELLM_GATEWAY_URL` / + `LITELLM_GATEWAY_KEY` aren't set. +- `examples/notebook_72_litellm_gateway_cost.py` — runnable + per-team cost-tracking demo. Issues virtual keys for two pretend + teams, drives different traffic, walks `/spend/logs`, + `/global/spend/keys`, `/global/spend/models` with real numbers. +- `docs/how-to/oci-models.md` — admonition cross-linking the + gateway page for multi-tenant cases. +- `mkdocs.yml` — Guides + Notebooks nav entries. + +The four gateway capabilities the deployment *supports* but that +this PR does **not** live-verify (Langfuse observability, Redis +cache passthrough, Lakera/Presidio guardrails, OKE `helm install` +end-to-end) are tracked as follow-up PRs in [#269](https://github.com/oracle-samples/locus/issues/269) +— one PR per capability, each with its own live demo + integration +test. + +This PR supersedes the closed PR #266 (in-process `LiteLLMModel` +wrapper) and closes issue #267 (notebook migration via +`LOCUS_MODEL_PROVIDER=litellm`) — both rejected in favour of the +gateway pattern because that's how LiteLLM is designed to be +consumed and avoids re-implementing a subset of the proxy's surface +inside Locus. + ## [0.2.0b21] - 2026-05-23 Four PRs of fixes accumulated since b20. No new public APIs; this From 59c95b9b4cf31e76dd3bc9b81e6976935d746480 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Mon, 25 May 2026 12:36:57 -0400 Subject: [PATCH 10/10] =?UTF-8?q?chore(release):=20v0.2.0b22=20=E2=80=94?= =?UTF-8?q?=20LiteLLM=20AI=20Gateway=20integration?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit One PR landed since b21 (#268): Locus is now documented + sampled + tested as a first-class consumer of the LiteLLM AI Gateway in front of Oracle Generative AI Infrastructure. Zero new Python code in Locus, no new dependency added to pyproject.toml — the integration is a deployment guide + working sample + tests. Live-verified against real OCI us-chicago-1 (LUIGI_FRA_API tenancy): - 7/7 live gateway integration tests - 7/7 cost-tracking deployment-validation tests - 29/29 unit tests over the shipped sample - Fallback chain validated with a broken-on-purpose primary - DCO sign-off on every commit - mkdocs --strict clean Four follow-up gateway capabilities (Langfuse observability, Redis cache, Lakera/Presidio guardrails, OKE helm install) are tracked in #269 — each becomes its own focused PR with its own live demo. See CHANGELOG.md for the full breakdown of what ships. Signed-off-by: Federico Kamelhar --- CHANGELOG.md | 11 +++++++++++ pyproject.toml | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 50afd69..696ba9c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,17 @@ policy. ## [Unreleased] +## [0.2.0b22] - 2026-05-25 + +One PR landed since b21, but it's a substantive one: Locus is now +documented + sampled + tested as a first-class consumer of the +[LiteLLM AI Gateway](https://litellm.ai) in front of Oracle Generative +AI Infrastructure. The gateway pattern joins the existing direct OCI +providers as a documented deployment path, recommended for +multi-tenant / cross-provider / centralised-observability cases. +**Zero new Python code in Locus**, no new dep — the integration is +docs + a working sample + tests. + ### Added — LiteLLM AI Gateway integration (PR #268) Locus is now documented + sampled + tested as a first-class consumer diff --git a/pyproject.toml b/pyproject.toml index 272780f..e438723 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "locus-sdk" -version = "0.2.0b21" +version = "0.2.0b22" description = "Multi-agent workflows for Python — stream them, branch them, pause for a human, resume next week. Built on Oracle Generative AI." readme = "README.md" license = "UPL-1.0"