Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,104 @@ policy.

## [Unreleased]

## [0.2.0b22] - 2026-05-25

One PR landed since b21, but it's a substantive one: Locus is now
documented + sampled + tested as a first-class consumer of the
[LiteLLM AI Gateway](https://litellm.ai) in front of Oracle Generative
AI Infrastructure. The gateway pattern joins the existing direct OCI
providers as a documented deployment path, recommended for
multi-tenant / cross-provider / centralised-observability cases.
**Zero new Python code in Locus**, no new dep — the integration is
docs + a working sample + tests.

### Added — LiteLLM AI Gateway integration (PR #268)

Locus is now documented + sampled + tested as a first-class consumer
of the [LiteLLM AI Gateway](https://litellm.ai) (a.k.a. LiteLLM Proxy
Server) in front of Oracle Generative AI Infrastructure. The gateway
pattern is the recommended path for multi-tenant / cross-provider /
centralised-observability deployments; the existing direct OCI
providers (`OCIChatCompletionsModel`, `OCIResponsesModel`, `OCIModel`)
remain the recommended path for single-tenant, dev, and on-OKE
workload-identity cases.

**No new Locus Python code**, no new dependency added to
`pyproject.toml`. Locus's existing `OpenAIModel(base_url=...)` is the
LiteLLM-compatible client by design; the integration is one
`config.yaml` telling the gateway how to reach OCI.

Live-verified end-to-end against real OCI in `us-chicago-1`:

- 7/7 live integration tests pass (chat, multi-turn + system,
streaming, tool calling, full Agent loop, `/v1/models` lookup,
unauthenticated-call rejection).
- 7/7 cost-tracking deployment-validation tests pass — `/spend/logs`
grows after a completion, per-row token counts + non-zero USD
cost, `/global/spend/keys` and `/global/spend/models` aggregation,
schema-fields invariant, `max_budget=1e-9 USD` triggers `429`,
allowlist refusals are visible at request time.
- 29/29 unit tests over the shipped sample
(`config.yaml` / `docker-compose.yml` / `helm-values.yaml`) —
alias-set parity scraped from the how-to so docs and config can't
drift, strict env-var wiring on every entry, fallback chains
reference declared aliases, Postgres `depends_on:
service_healthy`, helm Service is ClusterIP-only, pod hardened.
- Fallback chain validated live: a broken-on-purpose primary
(`oci/xai.grok-NONEXISTENT-9999`) with fallback
`oci-cohere-command` served the eventual response as
`cohere.command-latest`, content "Rome." — proving the upstream
failure was masked and the agent never saw the 5xx.

What ships:

- `docs/how-to/litellm-gateway.md` — deployment guide. Sections:
when to choose the gateway vs. the direct OCI providers; explicit
"Scope" admonition (the gateway covers `/20231130/actions/chat`
only — direct providers handle OCI's V1 shim and Responses API);
local Docker + OKE quickstarts; **issuing per-team virtual keys**;
**cost tracking** with `/spend/logs` / `/global/spend/keys` /
`/global/spend/models`; auth-boundary table; "How enterprises use
this pattern" with the deployment-shape table.
- `docs/img/litellm-gateway-architecture.svg` — three-tier SVG
(Locus → Gateway → OCI) embedded in the how-to + notebook md.
- `examples/litellm-gateway/` — working sample. `config.yaml` with
six OCI aliases wired to the canonical `OCI_*` env vars,
`drop_params: true`, fallback chains, master-key from env;
`docker-compose.yml` with the gateway + Postgres-17 sidecar
(`depends_on: condition: service_healthy`), both images
overridable via `LITELLM_IMAGE` / `LITELLM_DB_IMAGE` for networks
that can't reach ghcr.io / Docker Hub directly; `helm-values.yaml`
for the official `litellm-helm` chart (ClusterIP-only Service,
envFrom Kubernetes Secrets, OKE Workload Identity placeholder,
pod hardening); `README.md` side-by-side local + OKE quickstarts.
- `examples/notebook_71_litellm_gateway.py` — runnable gateway
companion. Health-checks the gateway, builds an `Agent` around
`OpenAIModel(base_url=...)`, runs blocking + streaming prompts.
Self-skips with a wiring banner when `LITELLM_GATEWAY_URL` /
`LITELLM_GATEWAY_KEY` aren't set.
- `examples/notebook_72_litellm_gateway_cost.py` — runnable
per-team cost-tracking demo. Issues virtual keys for two pretend
teams, drives different traffic, walks `/spend/logs`,
`/global/spend/keys`, `/global/spend/models` with real numbers.
- `docs/how-to/oci-models.md` — admonition cross-linking the
gateway page for multi-tenant cases.
- `mkdocs.yml` — Guides + Notebooks nav entries.

The four gateway capabilities the deployment *supports* but that
this PR does **not** live-verify (Langfuse observability, Redis
cache passthrough, Lakera/Presidio guardrails, OKE `helm install`
end-to-end) are tracked as follow-up PRs in [#269](https://github.com/oracle-samples/locus/issues/269)
— one PR per capability, each with its own live demo + integration
test.

This PR supersedes the closed PR #266 (in-process `LiteLLMModel`
wrapper) and closes issue #267 (notebook migration via
`LOCUS_MODEL_PROVIDER=litellm`) — both rejected in favour of the
gateway pattern because that's how LiteLLM is designed to be
consumed and avoids re-implementing a subset of the proxy's surface
inside Locus.

## [0.2.0b21] - 2026-05-23

Four PRs of fixes accumulated since b20. No new public APIs; this
Expand Down
Loading