Skip to content

Follow-up: live-verify LiteLLM AI Gateway features deferred from PR #268 #269

@fede-kamel

Description

@fede-kamel

Context

PR #268 lands the
LiteLLM AI Gateway integration with the core platform-grade pieces
live-verified against real OCI Generative AI (us-chicago-1,
LUIGI_FRA_API tenancy):

  • ✅ Gateway + Postgres docker compose up
  • OpenAIModel(base_url=...) → OCI completion (7/7 integration tests)
  • /key/generate virtual keys with allowlist + budget + expiry
  • ✅ Cost tracking — /spend/logs, /global/spend/keys, /global/spend/models (7/7 cost tests)
  • ✅ Budget enforcement — max_budget=1e-9 triggers 429
  • ✅ Fallback chain — broken primary → cohere-command serves response
  • ✅ 29 unit tests over the shipped sample

The how-to lists four additional gateway capabilities that the
deployment supports but the PR does not live-verify because each
requires a backing service we haven't stood up. They're explicitly
flagged as "Honest caveats" in the PR body. This issue tracks
verifying each one, ideally as its own follow-up PR so each lands
with its own focused live demo + integration test.


Acceptance criteria — each follow-up PR

1. Observability callback wired end-to-end (Langfuse first)

LiteLLM ships observability callbacks
that push request spans into the host platform's existing trace /
metrics backend. The how-to mentions Langfuse / OpenTelemetry /
Datadog / Helicone as supported targets but the sample
`config.yaml` has `success_callback: ["langfuse"]` commented out.

Done when:

  • Sample `config.yaml` ships a working `success_callback` +
    `failure_callback` config (most likely Langfuse cloud, since OTel
    needs a collector and Datadog needs an account).
  • `docker-compose.yml` knows about the LANGFUSE_HOST /
    LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY env vars in
    `${VAR:?…}` form (or marks them as optional and skips the callback
    when missing).
  • Integration test asserts: after a completion through the
    gateway, a request span lands in Langfuse with the right trace
    metadata (model, latency, tokens, virtual key tag).
  • How-to gets a "Observability" subsection with the live config +
    a screenshot or trace-detail snippet.

2. Cache passthrough (Redis exact-match first)

LiteLLM supports in-memory / Redis / S3 / Qdrant caches
keyed on the canonical request. Cache hits skip the upstream cost
entirely. The how-to claims this; the sample `config.yaml` has the
`cache: true` block commented out.

Done when:

  • `docker-compose.yml` adds a Redis sidecar (gated on an env var
    so the basic sample still works without it).
  • Sample `config.yaml` enables `cache: true` +
    `cache_params: {type: redis, host: redis, port: 6379}`.
  • Integration test asserts: identical request twice → second
    call's `/spend/logs` row reports `cache_hit: true` and
    `spend: 0`.
  • How-to gets a "Caching" subsection with the cache config + the
    expected reduction in token cost on identical prompts.

3. Guardrails (Lakera or Presidio first)

LiteLLM integrates with Lakera / Aporia / Presidio / Bedrock
Guardrails
for pre-
and post-call PII redaction / content filtering. The SVG mentions
guardrails; no example wired today.

Done when:

  • Sample config wires one guardrail end-to-end (Presidio is the
    easiest — fully open-source, runs as another Docker service).
  • Integration test asserts: a prompt containing a synthetic SSN
    is redacted before the upstream OCI call, and the spend log row
    shows the redacted form (not the SSN).
  • How-to gets a "Guardrails" subsection with the deployment
    shape (which container runs the guardrail, how to verify it's
    firing).

4. OKE `helm install` against a real cluster

`examples/litellm-gateway/helm-values.yaml` ships and `helm
template` lints fine, but we have not `helm install`'d it
against a real OKE cluster. The "Deploying on OKE" section in the
how-to documents the recipe; we trust it works because it's the
standard `ghcr.io/berriai/litellm-helm` chart, but trust is not
verification.

Done when:

  • One end-to-end install against an OKE cluster (Locus's
    `nemotron-phx` would do; or a fresh dev cluster).
  • Confirms: gateway pod uses OKE Workload Identity (no signing
    key on disk), `oci-credentials` Secret materialises the
    compartment id correctly, Service is ClusterIP-only, Postgres
    connection works against an external ADB or the chart's bundled
    Postgres.
  • How-to gets a "Verified on OKE" callout with the OKE version
    • Kubernetes version it was tested against.
  • One integration test or smoke script that drives the deployed
    gateway from a Locus pod in the same cluster.

Suggested PR shape

Each item above is its own PR. They're independent:

  • (1) and (2) extend `config.yaml` + `docker-compose.yml` + tests.
  • (3) extends `config.yaml` + `docker-compose.yml` (Presidio
    sidecar) + tests.
  • (4) is an OKE-only change with no new code on the Locus side.

Doing them as four separate PRs keeps each merge low-risk and the
"is this feature actually verified?" question easy to answer per
capability.


Out of scope here

  • LiteLLM Router primitives (weighted load-balancing across model
    groups, multi-region routing). The simple fallback chain we ship
    covers the common case; full Router is a separate, larger surface.
  • Bedrock Guardrails specifically (requires an AWS account; lower
    priority than Presidio for an OCI-fronted gateway).
  • A LiteLLM-hosted control plane / cloud SaaS variant of the
    gateway. We deliberately ship the self-hostable proxy.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions