From 6b9770ebaaa89a5c67366d98b552b76c1482bfaf Mon Sep 17 00:00:00 2001 From: Jiahui Wu Date: Sat, 6 Jun 2026 18:34:00 +0800 Subject: [PATCH] docs: add agent budget enforcement gates --- skills/ai-security/agent-security/SKILL.md | 46 ++++++++ .../tests/budget-enforcement-edge-cases.md | 100 ++++++++++++++++++ 2 files changed, 146 insertions(+) create mode 100644 skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md diff --git a/skills/ai-security/agent-security/SKILL.md b/skills/ai-security/agent-security/SKILL.md index 0e5e9a3a..d6b368a4 100644 --- a/skills/ai-security/agent-security/SKILL.md +++ b/skills/ai-security/agent-security/SKILL.md @@ -208,6 +208,51 @@ Evaluate whether the agent architecture is designed from the ground up around le --- +### Step 2A -- Resource Budget Enforcement Evidence + +Evaluate whether resource controls are enforced across the complete agent workflow, not just declared in configuration. A `max_tokens`, timeout, or rate-limit field is insufficient if retries, tool calls, sub-agents, concurrent sessions, or provider fallbacks use separate untracked budgets. + +**What to look for in code and configuration:** + +- **Budget scope:** Are quotas enforced per tenant, user, session, agent identity, tool, workflow, and batch job? Or only per HTTP request? +- **Shared ledger:** Do LLM calls, tool calls, browser/code execution, storage writes, external APIs, retries, and fallback providers decrement the same budget ledger? +- **Concurrency limits:** Can the same user or agent start many sessions in parallel, each with a fresh budget? +- **Retry accounting:** Are retry attempts, timeout retries, and exponential backoff attempts counted against the original budget? +- **Sub-agent fan-out:** If an agent can spawn workers or delegate tasks, do child agents inherit and spend from the parent budget? +- **Alerting and kill switch:** Are thresholds, alerts, and automatic halt behavior configured before the budget is exhausted? +- **Fail mode:** When the budget service, quota store, or metering pipeline is unavailable, does the agent fail closed instead of proceeding unmetered? + +**Detection methods:** Search for budget ledgers (`budget`, `quota`, `usage`, `cost`, `meter`, `ledger`), retry paths (`retry`, `backoff`, `fallback`, `timeout`), concurrency controls (`parallel`, `concurrent`, `worker`, `spawn`, `subagent`), and fail-open behavior (`fail_open`, `best_effort`, `ignore_quota`, `skip_metering`). + +**Budget enforcement evidence checklist:** + +| Evidence Item | Desired State | Common Violation | +|---|---|---| +| Quota scope | Per tenant/user/session/agent/tool/workflow | Single global or per-request limit | +| Budget ledger | All LLM and tool costs charged to one ledger | Model spend tracked, tool/API spend untracked | +| Retry/fallback accounting | Retries and fallback providers consume remaining budget | Retry storm gets fresh quota each attempt | +| Concurrency control | Parallel sessions share tenant/user limits | Each session receives independent full quota | +| Delegation control | Sub-agents inherit parent budget and depth limits | Workers spawn with fresh budgets | +| Enforcement point | Checked before action execution at tool/runtime boundary | Checked only after action completes | +| Fail mode | Quota service unavailable means halt or degraded read-only mode | Agent proceeds unmetered | +| Operations response | Alerts, kill switch, and owner escalation before exhaustion | Billing alert after spend already occurs | + +**What constitutes a finding:** + +| Condition | Severity | +|---|---| +| No cumulative session or tenant budget for autonomous agent workflows | High | +| Retries, fallback providers, or sub-agents bypass budget accounting | High | +| Tool and downstream API costs are unmetered while model calls are metered | High | +| Quota service failure allows unmetered execution | High | +| No concurrency limit across sessions for the same user or tenant | Medium | +| Budget enforcement occurs only after tool execution completes | Medium | +| Budget alerts exist but no automated halt or kill switch is available | Medium | + +**False positive to avoid:** Do not mark resource containment as pass because an agent has `max_tokens`, a request timeout, or an API gateway rate limit. Confirm the enforcement point, quota scope, cumulative ledger, fail mode, and retry/delegation accounting. + +--- + ### Step 3 -- Human-in-the-Loop Gate Placement Evaluate the design, placement, and robustness of human approval gates in the agent workflow. @@ -515,6 +560,7 @@ Glob: **/security_architecture* |---|---|---|---| | Permission Model | [rating] | [one-line summary] | [priority] | | Least-Privilege Design | [rating] | [one-line summary] | [priority] | +| Budget Enforcement | [rating] | [one-line summary] | [priority] | | HITL Gate Placement | [rating] | [one-line summary] | [priority] | | Blast Radius Containment | [rating] | [one-line summary] | [priority] | | Audit Trail Completeness | [rating] | [one-line summary] | [priority] | diff --git a/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md b/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md new file mode 100644 index 00000000..1f821f9b --- /dev/null +++ b/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md @@ -0,0 +1,100 @@ +# Budget Enforcement Edge Cases + +These fixtures validate agent architecture review behavior for resource budget, quota, and denial-of-wallet controls. + +## Case 1: Per-Request Limit Without Cumulative Session Budget + +```yaml +agent: + max_tokens_per_request: 8000 + request_timeout_seconds: 120 + max_requests_per_minute: 100 +session: + max_steps: null + max_total_tokens: null + max_total_tool_calls: null +``` + +**Expected result:** High severity finding. + +**Reason:** Each individual request is bounded, but an autonomous workflow can continue indefinitely and consume unbounded cumulative resources. + +## Case 2: Retries and Fallback Provider Bypass Metering + +```yaml +llm: + primary: provider-a + fallback: provider-b +budget: + ledger: provider-a-only +retry: + max_retries: 5 + backoff: exponential + count_retry_costs: false + count_fallback_costs: false +``` + +**Expected result:** High severity finding. + +**Reason:** Timeout retries and fallback calls can consume spend outside the enforced budget ledger. + +## Case 3: Sub-Agent Fan-Out Gets Fresh Budgets + +```yaml +orchestrator: + max_child_agents: 20 + parent_budget_usd: 10 +child_agent: + budget_policy: fresh_default_budget + max_budget_usd: 10 + inherits_parent_budget: false +``` + +**Expected result:** High severity finding. + +**Reason:** Delegation multiplies the effective budget and bypasses the parent's intended containment. + +## Case 4: Shared Budget Ledger With Fail-Closed Enforcement + +```yaml +budget_enforcement: + scopes: + - tenant + - user + - session + - agent_identity + - tool + ledger: + includes: + - llm_tokens + - browser_minutes + - code_execution_seconds + - external_api_calls + - storage_writes + - retries + - fallback_provider_calls + - child_agents + enforcement_point: before_tool_execution + quota_service_unavailable: fail_closed + alerts: + - threshold: 70 + action: notify_owner + - threshold: 90 + action: require_approval + - threshold: 100 + action: halt_workflow + kill_switch: + owner: security-operations + scope: tenant_or_global +``` + +**Expected result:** Pass for budget enforcement if implementation evidence confirms each enforcement point. + +**Reason:** The controls cover cumulative workflow cost, tool costs, retries, fallback providers, sub-agent fan-out, fail mode, and operational response. + +## Review Assertions + +- Do not credit request-level `max_tokens` as session-level containment. +- Confirm retries and provider fallbacks consume the same budget ledger. +- Confirm child agents inherit or reserve from the parent budget. +- Confirm quota service failures halt execution or degrade to a safe read-only mode.