Skip to content

feat(bifrost): enforce the rule-16 spend cap via a governance budget#186

Draft
elronbandel wants to merge 1 commit into
mainfrom
elron/bifrost-budget-cap
Draft

feat(bifrost): enforce the rule-16 spend cap via a governance budget#186
elronbandel wants to merge 1 commit into
mainfrom
elron/bifrost-budget-cap

Conversation

@elronbandel

Copy link
Copy Markdown
Contributor

Resolves #185. Draft — do not merge until the enforcement test passes on a live endpoint (see Verification).

Summary

The per-run spend cap (EVAL_MODEL_MAX_BUDGET, default $1 — models/RULES.md rule 16) was enforced only by core/litellm; the default gpt-5.4--bifrost path ignored it. This adds the cap to the bifrost gateway via bifrost governance: a budget on the sk-proxy virtual key (the bearer the agent already sends), priced by a pricing_override at the same $2.50/$10-per-1M rates as the litellm config, behind the governance plugin. gateways/bifrost/start renders EVAL_MODEL_MAX_BUDGET into the config (mirrors core/litellm's wrapper). Prerequisite for retiring the 2 GB core/litellm.

Type of change

  • New feature

Rules checked against

  • .agents/models/RULES.md rule 16 (hard budget cap) — implements it for the bifrost path; rule text unchanged.
  • .agents/gateways/RULES.md — gateway flavor contract preserved.
  • .agents/contributing/RULES.md 2 (code-only) + 3 (declared rules); tests/run/gateways/RULES.md (#[ignore] for credentialed behavior).

Verification

  • cargo fmt --check + scoped clippy -D warnings (gateways target) clean
  • Config renders to valid JSON (budget numeric, VK→budget wired); v1.5.4 accepts the schema (boots)
  • New static guard static_bifrost_wires_budget_cap passes
  • Enforcement NOT yet validated — the behavioral gate is #[ignore] (needs a real upstream):
    cargo test -p eval-containers-tests-run --test gateways upstream_bifrost_budget_cap_rejects -- --ignored
    
    Pass → bifrost enforces rule 16. No rejection → config.json governance ignored (cf. [Bug]: Enforce governance header setting is ignored in OSS multinode setup (from config.json) maximhq/bifrost#2408, multinode-only). 401 → VK needs provider_configs.

RULES.md impact

  • No RULES.md update needed (implements rule 16, does not change it).

Breaking changes

  • bifrost inference now requires the sk-proxy virtual key (is_vk_mandatory: true). The agent already sends sk-proxy (run-agent / runner.yaml), so real runs are unaffected; the test client was updated to match. Other callers to a bifrost gateway without sk-proxy will now get 401.

Test added (R-15)

upstream_bifrost_budget_cap_rejects (#[ignore]) asserts rejection once spend exceeds the cap; static_bifrost_wires_budget_cap locks the config wiring.

The per-run EVAL_MODEL_MAX_BUDGET cap (models/RULES.md rule 16) was enforced only by core/litellm; the default gpt-5.4--bifrost path ignored it. Add bifrost governance: a budget on the sk-proxy virtual key (the bearer the agent already sends), priced by a pricing_override at the same $2.50/$10-per-1M rates as the litellm config, behind the governance plugin. The start script renders EVAL_MODEL_MAX_BUDGET into the config (mirrors core/litellm's wrapper).

Tests: a static guard locks the wiring; an #[ignore] behavioral test (upstream_bifrost_budget_cap_rejects) proves bifrost rejects once spend crosses the cap — also the empirical check that config.json governance enforces in our single-node setup (the silent no-op maximhq/bifrost#2408 is multinode-only).

Lean config: the budget attaches directly to the virtual key (no team object, no client block). Enforcement is validated by the #[ignore] test against the live endpoint.

Resolves #185.

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default gateway path (bifrost) does not enforce the rule-16 spend cap

1 participant