Follow-up: live-verify LiteLLM AI Gateway features deferred from PR #268

## Context

[PR #268](https://github.com/oracle-samples/locus/pull/268) lands the
LiteLLM AI Gateway integration with the core platform-grade pieces
live-verified against real OCI Generative AI (`us-chicago-1`,
LUIGI_FRA_API tenancy):

- ✅ Gateway + Postgres `docker compose up`
- ✅ `OpenAIModel(base_url=...)` → OCI completion (7/7 integration tests)
- ✅ `/key/generate` virtual keys with allowlist + budget + expiry
- ✅ Cost tracking — `/spend/logs`, `/global/spend/keys`, `/global/spend/models` (7/7 cost tests)
- ✅ Budget enforcement — `max_budget=1e-9` triggers 429
- ✅ Fallback chain — broken primary → cohere-command serves response
- ✅ 29 unit tests over the shipped sample

The how-to lists four additional gateway capabilities that the
deployment *supports* but the PR does **not** live-verify because each
requires a backing service we haven't stood up. They're explicitly
flagged as "Honest caveats" in the PR body. This issue tracks
verifying each one, ideally as its own follow-up PR so each lands
with its own focused live demo + integration test.

---

## Acceptance criteria — each follow-up PR

### 1. Observability callback wired end-to-end (Langfuse first)

LiteLLM ships [observability callbacks](https://docs.litellm.ai/docs/proxy/logging)
that push request spans into the host platform's existing trace /
metrics backend. The how-to mentions Langfuse / OpenTelemetry /
Datadog / Helicone as supported targets but the sample
\`config.yaml\` has \`success_callback: [\"langfuse\"]\` commented out.

**Done when:**

- [ ] Sample \`config.yaml\` ships a working \`success_callback\` +
  \`failure_callback\` config (most likely Langfuse cloud, since OTel
  needs a collector and Datadog needs an account).
- [ ] \`docker-compose.yml\` knows about the LANGFUSE_HOST /
  LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY env vars in
  \`${VAR:?…}\` form (or marks them as optional and skips the callback
  when missing).
- [ ] Integration test asserts: after a completion through the
  gateway, a request span lands in Langfuse with the right trace
  metadata (model, latency, tokens, virtual key tag).
- [ ] How-to gets a "Observability" subsection with the live config +
  a screenshot or trace-detail snippet.

### 2. Cache passthrough (Redis exact-match first)

LiteLLM supports [in-memory / Redis / S3 / Qdrant caches](https://docs.litellm.ai/docs/proxy/caching)
keyed on the canonical request. Cache hits skip the upstream cost
entirely. The how-to claims this; the sample \`config.yaml\` has the
\`cache: true\` block commented out.

**Done when:**

- [ ] \`docker-compose.yml\` adds a Redis sidecar (gated on an env var
  so the basic sample still works without it).
- [ ] Sample \`config.yaml\` enables \`cache: true\` +
  \`cache_params: {type: redis, host: redis, port: 6379}\`.
- [ ] Integration test asserts: identical request twice → second
  call's \`/spend/logs\` row reports \`cache_hit: true\` and
  \`spend: 0\`.
- [ ] How-to gets a "Caching" subsection with the cache config + the
  expected reduction in token cost on identical prompts.

### 3. Guardrails (Lakera or Presidio first)

LiteLLM integrates with [Lakera / Aporia / Presidio / Bedrock
Guardrails](https://docs.litellm.ai/docs/proxy/guardrails) for pre-
and post-call PII redaction / content filtering. The SVG mentions
guardrails; no example wired today.

**Done when:**

- [ ] Sample config wires one guardrail end-to-end (Presidio is the
  easiest — fully open-source, runs as another Docker service).
- [ ] Integration test asserts: a prompt containing a synthetic SSN
  is redacted before the upstream OCI call, and the spend log row
  shows the redacted form (not the SSN).
- [ ] How-to gets a "Guardrails" subsection with the deployment
  shape (which container runs the guardrail, how to verify it's
  firing).

### 4. OKE \`helm install\` against a real cluster

\`examples/litellm-gateway/helm-values.yaml\` ships and \`helm
template\` lints fine, but we have **not** \`helm install\`'d it
against a real OKE cluster. The "Deploying on OKE" section in the
how-to documents the recipe; we trust it works because it's the
standard \`ghcr.io/berriai/litellm-helm\` chart, but trust is not
verification.

**Done when:**

- [ ] One end-to-end install against an OKE cluster (Locus's
  \`nemotron-phx\` would do; or a fresh dev cluster).
- [ ] Confirms: gateway pod uses OKE Workload Identity (no signing
  key on disk), \`oci-credentials\` Secret materialises the
  compartment id correctly, Service is ClusterIP-only, Postgres
  connection works against an external ADB or the chart's bundled
  Postgres.
- [ ] How-to gets a "Verified on OKE" callout with the OKE version
  + Kubernetes version it was tested against.
- [ ] One integration test or smoke script that drives the deployed
  gateway from a Locus pod in the same cluster.

---

## Suggested PR shape

Each item above is its own PR. They're independent:

- (1) and (2) extend \`config.yaml\` + \`docker-compose.yml\` + tests.
- (3) extends \`config.yaml\` + \`docker-compose.yml\` (Presidio
  sidecar) + tests.
- (4) is an OKE-only change with no new code on the Locus side.

Doing them as four separate PRs keeps each merge low-risk and the
"is this feature actually verified?" question easy to answer per
capability.

---

## Out of scope here

- LiteLLM Router primitives (weighted load-balancing across model
  groups, multi-region routing). The simple fallback chain we ship
  covers the common case; full Router is a separate, larger surface.
- Bedrock Guardrails specifically (requires an AWS account; lower
  priority than Presidio for an OCI-fronted gateway).
- A LiteLLM-hosted control plane / cloud SaaS variant of the
  gateway. We deliberately ship the self-hostable proxy.

---

## References

- PR #268 — establishes the gateway pattern + sample + tests + docs
- Live-verification table on PR #268 — what's already proven
- "Honest caveats" section on PR #268 — what this issue tracks
- LiteLLM Proxy docs — [docs.litellm.ai/docs/proxy](https://docs.litellm.ai/docs/proxy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up: live-verify LiteLLM AI Gateway features deferred from PR #268 #269

Context

Acceptance criteria — each follow-up PR

1. Observability callback wired end-to-end (Langfuse first)

2. Cache passthrough (Redis exact-match first)

3. Guardrails (Lakera or Presidio first)

4. OKE `helm install` against a real cluster

Suggested PR shape

Out of scope here

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Follow-up: live-verify LiteLLM AI Gateway features deferred from PR #268 #269

Description

Context

Acceptance criteria — each follow-up PR

1. Observability callback wired end-to-end (Langfuse first)

2. Cache passthrough (Redis exact-match first)

3. Guardrails (Lakera or Presidio first)

4. OKE `helm install` against a real cluster

Suggested PR shape

Out of scope here

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions