Context
PR #268 lands the
LiteLLM AI Gateway integration with the core platform-grade pieces
live-verified against real OCI Generative AI (us-chicago-1,
LUIGI_FRA_API tenancy):
- ✅ Gateway + Postgres
docker compose up
- ✅
OpenAIModel(base_url=...) → OCI completion (7/7 integration tests)
- ✅
/key/generate virtual keys with allowlist + budget + expiry
- ✅ Cost tracking —
/spend/logs, /global/spend/keys, /global/spend/models (7/7 cost tests)
- ✅ Budget enforcement —
max_budget=1e-9 triggers 429
- ✅ Fallback chain — broken primary → cohere-command serves response
- ✅ 29 unit tests over the shipped sample
The how-to lists four additional gateway capabilities that the
deployment supports but the PR does not live-verify because each
requires a backing service we haven't stood up. They're explicitly
flagged as "Honest caveats" in the PR body. This issue tracks
verifying each one, ideally as its own follow-up PR so each lands
with its own focused live demo + integration test.
Acceptance criteria — each follow-up PR
1. Observability callback wired end-to-end (Langfuse first)
LiteLLM ships observability callbacks
that push request spans into the host platform's existing trace /
metrics backend. The how-to mentions Langfuse / OpenTelemetry /
Datadog / Helicone as supported targets but the sample
`config.yaml` has `success_callback: ["langfuse"]` commented out.
Done when:
2. Cache passthrough (Redis exact-match first)
LiteLLM supports in-memory / Redis / S3 / Qdrant caches
keyed on the canonical request. Cache hits skip the upstream cost
entirely. The how-to claims this; the sample `config.yaml` has the
`cache: true` block commented out.
Done when:
3. Guardrails (Lakera or Presidio first)
LiteLLM integrates with Lakera / Aporia / Presidio / Bedrock
Guardrails for pre-
and post-call PII redaction / content filtering. The SVG mentions
guardrails; no example wired today.
Done when:
4. OKE `helm install` against a real cluster
`examples/litellm-gateway/helm-values.yaml` ships and `helm
template` lints fine, but we have not `helm install`'d it
against a real OKE cluster. The "Deploying on OKE" section in the
how-to documents the recipe; we trust it works because it's the
standard `ghcr.io/berriai/litellm-helm` chart, but trust is not
verification.
Done when:
Suggested PR shape
Each item above is its own PR. They're independent:
- (1) and (2) extend `config.yaml` + `docker-compose.yml` + tests.
- (3) extends `config.yaml` + `docker-compose.yml` (Presidio
sidecar) + tests.
- (4) is an OKE-only change with no new code on the Locus side.
Doing them as four separate PRs keeps each merge low-risk and the
"is this feature actually verified?" question easy to answer per
capability.
Out of scope here
- LiteLLM Router primitives (weighted load-balancing across model
groups, multi-region routing). The simple fallback chain we ship
covers the common case; full Router is a separate, larger surface.
- Bedrock Guardrails specifically (requires an AWS account; lower
priority than Presidio for an OCI-fronted gateway).
- A LiteLLM-hosted control plane / cloud SaaS variant of the
gateway. We deliberately ship the self-hostable proxy.
References
Context
PR #268 lands the
LiteLLM AI Gateway integration with the core platform-grade pieces
live-verified against real OCI Generative AI (
us-chicago-1,LUIGI_FRA_API tenancy):
docker compose upOpenAIModel(base_url=...)→ OCI completion (7/7 integration tests)/key/generatevirtual keys with allowlist + budget + expiry/spend/logs,/global/spend/keys,/global/spend/models(7/7 cost tests)max_budget=1e-9triggers 429The how-to lists four additional gateway capabilities that the
deployment supports but the PR does not live-verify because each
requires a backing service we haven't stood up. They're explicitly
flagged as "Honest caveats" in the PR body. This issue tracks
verifying each one, ideally as its own follow-up PR so each lands
with its own focused live demo + integration test.
Acceptance criteria — each follow-up PR
1. Observability callback wired end-to-end (Langfuse first)
LiteLLM ships observability callbacks
that push request spans into the host platform's existing trace /
metrics backend. The how-to mentions Langfuse / OpenTelemetry /
Datadog / Helicone as supported targets but the sample
`config.yaml` has `success_callback: ["langfuse"]` commented out.
Done when:
`failure_callback` config (most likely Langfuse cloud, since OTel
needs a collector and Datadog needs an account).
LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY env vars in
`${VAR:?…}` form (or marks them as optional and skips the callback
when missing).
gateway, a request span lands in Langfuse with the right trace
metadata (model, latency, tokens, virtual key tag).
a screenshot or trace-detail snippet.
2. Cache passthrough (Redis exact-match first)
LiteLLM supports in-memory / Redis / S3 / Qdrant caches
keyed on the canonical request. Cache hits skip the upstream cost
entirely. The how-to claims this; the sample `config.yaml` has the
`cache: true` block commented out.
Done when:
so the basic sample still works without it).
`cache_params: {type: redis, host: redis, port: 6379}`.
call's `/spend/logs` row reports `cache_hit: true` and
`spend: 0`.
expected reduction in token cost on identical prompts.
3. Guardrails (Lakera or Presidio first)
LiteLLM integrates with Lakera / Aporia / Presidio / Bedrock
Guardrails for pre-
and post-call PII redaction / content filtering. The SVG mentions
guardrails; no example wired today.
Done when:
easiest — fully open-source, runs as another Docker service).
is redacted before the upstream OCI call, and the spend log row
shows the redacted form (not the SSN).
shape (which container runs the guardrail, how to verify it's
firing).
4. OKE `helm install` against a real cluster
`examples/litellm-gateway/helm-values.yaml` ships and `helm
template` lints fine, but we have not `helm install`'d it
against a real OKE cluster. The "Deploying on OKE" section in the
how-to documents the recipe; we trust it works because it's the
standard `ghcr.io/berriai/litellm-helm` chart, but trust is not
verification.
Done when:
`nemotron-phx` would do; or a fresh dev cluster).
key on disk), `oci-credentials` Secret materialises the
compartment id correctly, Service is ClusterIP-only, Postgres
connection works against an external ADB or the chart's bundled
Postgres.
gateway from a Locus pod in the same cluster.
Suggested PR shape
Each item above is its own PR. They're independent:
sidecar) + tests.
Doing them as four separate PRs keeps each merge low-risk and the
"is this feature actually verified?" question easy to answer per
capability.
Out of scope here
groups, multi-region routing). The simple fallback chain we ship
covers the common case; full Router is a separate, larger surface.
priority than Presidio for an OCI-fronted gateway).
gateway. We deliberately ship the self-hostable proxy.
References