From 6b9770ebaaa89a5c67366d98b552b76c1482bfaf Mon Sep 17 00:00:00 2001
From: Jiahui Wu <jiahuiwu@MacBook-Pro-2.local>
Date: Sat, 6 Jun 2026 18:34:00 +0800
Subject: [PATCH] docs: add agent budget enforcement gates

---
 skills/ai-security/agent-security/SKILL.md    |  46 ++++++++
 .../tests/budget-enforcement-edge-cases.md    | 100 ++++++++++++++++++
 2 files changed, 146 insertions(+)
 create mode 100644 skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md

diff --git a/skills/ai-security/agent-security/SKILL.md b/skills/ai-security/agent-security/SKILL.md
index 0e5e9a3a..d6b368a4 100644
--- a/skills/ai-security/agent-security/SKILL.md
+++ b/skills/ai-security/agent-security/SKILL.md
@@ -208,6 +208,51 @@ Evaluate whether the agent architecture is designed from the ground up around le
 
 ---
 
+### Step 2A -- Resource Budget Enforcement Evidence
+
+Evaluate whether resource controls are enforced across the complete agent workflow, not just declared in configuration. A `max_tokens`, timeout, or rate-limit field is insufficient if retries, tool calls, sub-agents, concurrent sessions, or provider fallbacks use separate untracked budgets.
+
+**What to look for in code and configuration:**
+
+- **Budget scope:** Are quotas enforced per tenant, user, session, agent identity, tool, workflow, and batch job? Or only per HTTP request?
+- **Shared ledger:** Do LLM calls, tool calls, browser/code execution, storage writes, external APIs, retries, and fallback providers decrement the same budget ledger?
+- **Concurrency limits:** Can the same user or agent start many sessions in parallel, each with a fresh budget?
+- **Retry accounting:** Are retry attempts, timeout retries, and exponential backoff attempts counted against the original budget?
+- **Sub-agent fan-out:** If an agent can spawn workers or delegate tasks, do child agents inherit and spend from the parent budget?
+- **Alerting and kill switch:** Are thresholds, alerts, and automatic halt behavior configured before the budget is exhausted?
+- **Fail mode:** When the budget service, quota store, or metering pipeline is unavailable, does the agent fail closed instead of proceeding unmetered?
+
+**Detection methods:** Search for budget ledgers (`budget`, `quota`, `usage`, `cost`, `meter`, `ledger`), retry paths (`retry`, `backoff`, `fallback`, `timeout`), concurrency controls (`parallel`, `concurrent`, `worker`, `spawn`, `subagent`), and fail-open behavior (`fail_open`, `best_effort`, `ignore_quota`, `skip_metering`).
+
+**Budget enforcement evidence checklist:**
+
+| Evidence Item | Desired State | Common Violation |
+|---|---|---|
+| Quota scope | Per tenant/user/session/agent/tool/workflow | Single global or per-request limit |
+| Budget ledger | All LLM and tool costs charged to one ledger | Model spend tracked, tool/API spend untracked |
+| Retry/fallback accounting | Retries and fallback providers consume remaining budget | Retry storm gets fresh quota each attempt |
+| Concurrency control | Parallel sessions share tenant/user limits | Each session receives independent full quota |
+| Delegation control | Sub-agents inherit parent budget and depth limits | Workers spawn with fresh budgets |
+| Enforcement point | Checked before action execution at tool/runtime boundary | Checked only after action completes |
+| Fail mode | Quota service unavailable means halt or degraded read-only mode | Agent proceeds unmetered |
+| Operations response | Alerts, kill switch, and owner escalation before exhaustion | Billing alert after spend already occurs |
+
+**What constitutes a finding:**
+
+| Condition | Severity |
+|---|---|
+| No cumulative session or tenant budget for autonomous agent workflows | High |
+| Retries, fallback providers, or sub-agents bypass budget accounting | High |
+| Tool and downstream API costs are unmetered while model calls are metered | High |
+| Quota service failure allows unmetered execution | High |
+| No concurrency limit across sessions for the same user or tenant | Medium |
+| Budget enforcement occurs only after tool execution completes | Medium |
+| Budget alerts exist but no automated halt or kill switch is available | Medium |
+
+**False positive to avoid:** Do not mark resource containment as pass because an agent has `max_tokens`, a request timeout, or an API gateway rate limit. Confirm the enforcement point, quota scope, cumulative ledger, fail mode, and retry/delegation accounting.
+
+---
+
 ### Step 3 -- Human-in-the-Loop Gate Placement
 
 Evaluate the design, placement, and robustness of human approval gates in the agent workflow.
@@ -515,6 +560,7 @@ Glob: **/security_architecture*
 |---|---|---|---|
 | Permission Model | [rating] | [one-line summary] | [priority] |
 | Least-Privilege Design | [rating] | [one-line summary] | [priority] |
+| Budget Enforcement | [rating] | [one-line summary] | [priority] |
 | HITL Gate Placement | [rating] | [one-line summary] | [priority] |
 | Blast Radius Containment | [rating] | [one-line summary] | [priority] |
 | Audit Trail Completeness | [rating] | [one-line summary] | [priority] |
diff --git a/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md b/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md
new file mode 100644
index 00000000..1f821f9b
--- /dev/null
+++ b/skills/ai-security/agent-security/tests/budget-enforcement-edge-cases.md
@@ -0,0 +1,100 @@
+# Budget Enforcement Edge Cases
+
+These fixtures validate agent architecture review behavior for resource budget, quota, and denial-of-wallet controls.
+
+## Case 1: Per-Request Limit Without Cumulative Session Budget
+
+```yaml
+agent:
+  max_tokens_per_request: 8000
+  request_timeout_seconds: 120
+  max_requests_per_minute: 100
+session:
+  max_steps: null
+  max_total_tokens: null
+  max_total_tool_calls: null
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** Each individual request is bounded, but an autonomous workflow can continue indefinitely and consume unbounded cumulative resources.
+
+## Case 2: Retries and Fallback Provider Bypass Metering
+
+```yaml
+llm:
+  primary: provider-a
+  fallback: provider-b
+budget:
+  ledger: provider-a-only
+retry:
+  max_retries: 5
+  backoff: exponential
+  count_retry_costs: false
+  count_fallback_costs: false
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** Timeout retries and fallback calls can consume spend outside the enforced budget ledger.
+
+## Case 3: Sub-Agent Fan-Out Gets Fresh Budgets
+
+```yaml
+orchestrator:
+  max_child_agents: 20
+  parent_budget_usd: 10
+child_agent:
+  budget_policy: fresh_default_budget
+  max_budget_usd: 10
+  inherits_parent_budget: false
+```
+
+**Expected result:** High severity finding.
+
+**Reason:** Delegation multiplies the effective budget and bypasses the parent's intended containment.
+
+## Case 4: Shared Budget Ledger With Fail-Closed Enforcement
+
+```yaml
+budget_enforcement:
+  scopes:
+    - tenant
+    - user
+    - session
+    - agent_identity
+    - tool
+  ledger:
+    includes:
+      - llm_tokens
+      - browser_minutes
+      - code_execution_seconds
+      - external_api_calls
+      - storage_writes
+      - retries
+      - fallback_provider_calls
+      - child_agents
+  enforcement_point: before_tool_execution
+  quota_service_unavailable: fail_closed
+  alerts:
+    - threshold: 70
+      action: notify_owner
+    - threshold: 90
+      action: require_approval
+    - threshold: 100
+      action: halt_workflow
+  kill_switch:
+    owner: security-operations
+    scope: tenant_or_global
+```
+
+**Expected result:** Pass for budget enforcement if implementation evidence confirms each enforcement point.
+
+**Reason:** The controls cover cumulative workflow cost, tool costs, retries, fallback providers, sub-agent fan-out, fail mode, and operational response.
+
+## Review Assertions
+
+- Do not credit request-level `max_tokens` as session-level containment.
+- Confirm retries and provider fallbacks consume the same budget ledger.
+- Confirm child agents inherit or reserve from the parent budget.
+- Confirm quota service failures halt execution or degrade to a safe read-only mode.