feat(ai-proxy): /v1/models aggregator + ai-gateway spec fragment (ADR-0030 §4) by ndreno · Pull Request #82 · barbacane-dev/barbacane

ndreno · 2026-05-05T07:14:46Z

Summary

Adds the third and final protocol surface from ADR-0030 §1 — GET /v1/models — plus the operator-facing spec fragment that ties all three AI gateway operations (chat completions, responses, models) together.

This is PR-5 in the implementation plan. Branched off the merged main, no stacking.

What landed

`/v1/models` aggregator

Walks every unique provider in routes / targets / flat config, hits each upstream's /v1/models (or /api/tags for Ollama, then translates), and returns OpenAI-compatible { object: \"list\", data: [...] }.

Per-provider failure is non-fatal: returns 200 with partial: true + warnings: [{provider, status, detail}]. A single flaky upstream doesn't take the discovery endpoint down. Each failure increments barbacane_plugin_ai_proxy_models_provider_failures_total{provider}.

`schemas/ai-gateway.yaml` spec fragment

Drops three operations into one importable spec file — operators put it in their specs/ folder and the multi-file compiler discovers it automatically. Provider creds via env:// references; YAML anchors dedupe the dispatch config across the three ops.

Out of scope (deferred per ADR-0030 §4)

Caching the aggregated response. v1 hits upstreams on every call (discovery is rare, off the data plane critical path).
Route-based filtering of advertised models. Ergonomic, not security — denied models still 403 on actual dispatch. Future PR.
Root-level x-barbacane-dispatch-defaults to deduplicate the fragment's three dispatch blocks. v1 uses YAML anchors (&ai_proxy_config) which is idiomatic OpenAPI.

Test plan

Plugin unit tests: 109 passed (+12: collect_unique_providers, translate_models_response, handle 405/500)
Integration tests: 19 passed (+2: 3-provider aggregation roundtrip, partial-failure response)
Compilation smoke: test_shipped_ai_gateway_spec_fragment_compiles proves the fragment loads through the standard pipeline
End-to-end fragment test: test_shipped_fragment_chat_completions_and_models_via_ollama exercises the actual shipped fragment with OLLAMA_BASE_URL → wiremock; verifies both chat completions through the catch-all route and /v1/models partial response with the realistic first-deploy state (only Ollama configured)
cargo build --target wasm32-unknown-unknown --release — clean
cargo clippy --lib --bins — zero warnings
cargo fmt --all -- --check — clean
bash docs/rulesets/tests/run-tests.sh — 15/15 pass
CI green

…-0030 §4) GET /v1/models walks every unique provider declared in routes / targets / flat, queries each upstream's /v1/models (or /api/tags for Ollama, then translates the shape), and returns an OpenAI-compatible { object: "list", data: [...] } payload. On per-provider failure (5xx / timeout / connection error) the dispatcher returns 200 OK with `partial: true` + warnings: [{provider, status, detail}] — a single flaky upstream doesn't take the discovery endpoint down. Each failure increments barbacane_plugin_ai_proxy_models_provider_failures_total{provider} so operators see degradations without polling clients. A 405 with content-type application/problem+json + Allow: GET fires for non-GET methods so a future spec that accidentally binds /v1/models to multiple methods produces a clear error. Spec fragment ============= schemas/ai-gateway.yaml — shipped operator-facing OpenAPI fragment that declares all three AI gateway operations (/v1/chat/completions, /v1/responses, /v1/models) bound to ai-proxy. Provider keys come from OPENAI_API_KEY / ANTHROPIC_API_KEY / OLLAMA_BASE_URL via env:// references; default routes are claude-* → Anthropic, gpt-* / o[1-4]* → OpenAI, * → Ollama. Multi-file spec discovery (manifest.rs:268-322 + artifact.rs:417-426) picks the fragment up automatically when an operator drops it into their specs/ folder. No new compiler mechanism required for v1. Out of scope (explicit deferrals per ADR-0030 §4) ================================================== - Caching the aggregated response via host_cache_*. ADR specifies a per-instance cache with 5-min TTL + single-flight against thundering- herd. v1 hits upstreams on every call — the endpoint isn't on the data plane critical path. Cost is one extra HTTP RTT per /v1/models call, which discovery clients call rarely. Caching slot is empty, not contested. - Filtering advertised models by route allow/deny. Filtering is ergonomic (don't show clients models they can't call), not security (denied models still 403 on actual dispatch). Future PR. - Root-level x-barbacane-dispatch-defaults to deduplicate the dispatch config across the fragment's three operations. v1 uses YAML anchors (&ai_proxy_config / *ai_proxy_config) which is idiomatic OpenAPI and doesn't require a new compiler feature. Tests ===== Plugin unit tests: 97 → 109 (+12). - collect_unique_providers (5 tests covering dedup, ordering, defaults) - translate_models_response (5 tests covering OpenAI/Anthropic/Ollama shapes + missing-data graceful handling) - handle dispatch (2 tests: 405 on non-GET, 500 on no providers) Integration tests: 17 → 19 (+2). - test_ai_proxy_models_aggregates_three_providers — wiremock for OpenAI (/v1/models), Anthropic (/v1/models), Ollama (/api/tags); GET /v1/models returns the union, with Anthropic's owned_by stamped and Ollama's /api/tags reshape verified. - test_ai_proxy_models_partial_response_on_provider_failure — Anthropic returns 503, OpenAI + Ollama succeed; response is 200 with partial: true + a single warning naming Anthropic. Compilation smoke test: - test_shipped_ai_gateway_spec_fragment_compiles — copies schemas/ai-gateway.yaml + a synthesized barbacane.yaml into a temp dir and verifies TestGateway::from_spec startup succeeds. End-to-end fragment test: - test_shipped_fragment_chat_completions_and_models_via_ollama — exercises the actual shipped fragment with OLLAMA_BASE_URL pointed at a wiremock; verifies (1) chat completions reach the wiremock via the * catch-all route, (2) /v1/models returns 200 with partial: true + warnings for the unreachable OpenAI/Anthropic providers (the realistic first-time-deploy operator experience). Combined into a single test because env vars are global state and parallel tests would race on the OLLAMA_BASE_URL setting.

R1 (load-bearing) — `/v1/models` was using `plugin.timeout` (default 120s), sized for LLM completions. Sequential aggregation across N providers meant a single hung upstream blocked the response for ~120s; three hung upstreams = ~360s. Add a dedicated `models_timeout_ms` config (default 5000ms) used only by the discovery aggregator. The worst case for a single hung upstream is now bounded at 5s, and the aggregator stays responsive even when one provider is degraded. R2 — `UpstreamProvider`'s dedup key drops `api_key`. Documented the trade-off explicitly: in multi-tenant configs where two routes deliberately use different keys against the same upstream account (e.g. for billing splits), the aggregator picks whichever key sorted first into the dedup. If that key is revoked or rate-limited and the other isn't, `/v1/models` warnings name the provider but not the offending key — operators correlate via the upstream's logs. Acceptable for v1; partial-response shape makes the failure visible. R3 — Added two tests that close the upstream-edge-case gap: - translate_handles_empty_arrays — Ollama / OpenAI returning empty data[] (operator just deployed, no models yet). - test_ai_proxy_models_handles_empty_body_from_upstream — upstream returns 200 with Content-Length: 0 (some misbehaving proxies do this). Aggregator records "invalid JSON from upstream" warning, doesn't crash, other providers still contribute. Style cleanup: - Collapsed the `if .. { /* happy */ } else { return ... }` early-return in `handle()` to a single negation + early return. - Derived Debug on UpstreamProvider (and on Provider, which it nests) so test panics print legible values. - Made UpstreamProvider, UpstreamFailure, collect_unique_providers, and fetch_provider_models module-private. None are used outside models.rs and the `pub(crate)` was leaking surface area. Spec fragment: - Added a doc-comment paragraph on the Ollama catch-all (`pattern: "*"`). Convenient for local-dev (Ollama-as-default) but in production silently routes typos like `gtp-4o` to Ollama instead of returning a clean `400 no_route`. Operators wanting strict validation drop the catch-all from their copy. Schema: - New `models_timeout_ms` field documented in config-schema.json with the rationale (separate from LLM `timeout` because discovery doesn't need 120s patience). Test count: plugin 109 → 110 (+1 empty-arrays test). Integration 19 → 19 (the empty-body test added; partial-response test renumbered implicitly).

ndreno added 2 commits May 5, 2026 09:14

ndreno merged commit 0f766cb into main May 5, 2026
9 checks passed

ndreno deleted the feat/ai-proxy-models-endpoint branch May 5, 2026 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): /v1/models aggregator + ai-gateway spec fragment (ADR-0030 §4)#82

feat(ai-proxy): /v1/models aggregator + ai-gateway spec fragment (ADR-0030 §4)#82
ndreno merged 2 commits intomainfrom
feat/ai-proxy-models-endpoint

ndreno commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndreno commented May 5, 2026

Summary

What landed

/v1/models aggregator

schemas/ai-gateway.yaml spec fragment

Out of scope (deferred per ADR-0030 §4)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`/v1/models` aggregator

`schemas/ai-gateway.yaml` spec fragment