Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions skills/ai-security/agent-security/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,51 @@ Evaluate whether the agent architecture is designed from the ground up around le

---

### Step 2A -- Resource Budget Enforcement Evidence

Evaluate whether resource controls are enforced across the complete agent workflow, not just declared in configuration. A `max_tokens`, timeout, or rate-limit field is insufficient if retries, tool calls, sub-agents, concurrent sessions, or provider fallbacks use separate untracked budgets.

**What to look for in code and configuration:**

- **Budget scope:** Are quotas enforced per tenant, user, session, agent identity, tool, workflow, and batch job? Or only per HTTP request?
- **Shared ledger:** Do LLM calls, tool calls, browser/code execution, storage writes, external APIs, retries, and fallback providers decrement the same budget ledger?
- **Concurrency limits:** Can the same user or agent start many sessions in parallel, each with a fresh budget?
- **Retry accounting:** Are retry attempts, timeout retries, and exponential backoff attempts counted against the original budget?
- **Sub-agent fan-out:** If an agent can spawn workers or delegate tasks, do child agents inherit and spend from the parent budget?
- **Alerting and kill switch:** Are thresholds, alerts, and automatic halt behavior configured before the budget is exhausted?
- **Fail mode:** When the budget service, quota store, or metering pipeline is unavailable, does the agent fail closed instead of proceeding unmetered?

**Detection methods:** Search for budget ledgers (`budget`, `quota`, `usage`, `cost`, `meter`, `ledger`), retry paths (`retry`, `backoff`, `fallback`, `timeout`), concurrency controls (`parallel`, `concurrent`, `worker`, `spawn`, `subagent`), and fail-open behavior (`fail_open`, `best_effort`, `ignore_quota`, `skip_metering`).

**Budget enforcement evidence checklist:**

| Evidence Item | Desired State | Common Violation |
|---|---|---|
| Quota scope | Per tenant/user/session/agent/tool/workflow | Single global or per-request limit |
| Budget ledger | All LLM and tool costs charged to one ledger | Model spend tracked, tool/API spend untracked |
| Retry/fallback accounting | Retries and fallback providers consume remaining budget | Retry storm gets fresh quota each attempt |
| Concurrency control | Parallel sessions share tenant/user limits | Each session receives independent full quota |
| Delegation control | Sub-agents inherit parent budget and depth limits | Workers spawn with fresh budgets |
| Enforcement point | Checked before action execution at tool/runtime boundary | Checked only after action completes |
| Fail mode | Quota service unavailable means halt or degraded read-only mode | Agent proceeds unmetered |
| Operations response | Alerts, kill switch, and owner escalation before exhaustion | Billing alert after spend already occurs |

**What constitutes a finding:**

| Condition | Severity |
|---|---|
| No cumulative session or tenant budget for autonomous agent workflows | High |
| Retries, fallback providers, or sub-agents bypass budget accounting | High |
| Tool and downstream API costs are unmetered while model calls are metered | High |
| Quota service failure allows unmetered execution | High |
| No concurrency limit across sessions for the same user or tenant | Medium |
| Budget enforcement occurs only after tool execution completes | Medium |
| Budget alerts exist but no automated halt or kill switch is available | Medium |

**False positive to avoid:** Do not mark resource containment as pass because an agent has `max_tokens`, a request timeout, or an API gateway rate limit. Confirm the enforcement point, quota scope, cumulative ledger, fail mode, and retry/delegation accounting.

---

### Step 3 -- Human-in-the-Loop Gate Placement

Evaluate the design, placement, and robustness of human approval gates in the agent workflow.
Expand Down Expand Up @@ -515,6 +560,7 @@ Glob: **/security_architecture*
|---|---|---|---|
| Permission Model | [rating] | [one-line summary] | [priority] |
| Least-Privilege Design | [rating] | [one-line summary] | [priority] |
| Budget Enforcement | [rating] | [one-line summary] | [priority] |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add Budget Enforcement to finding Review Area options

When Step 2A produces a budget-related finding, reports that follow this template have no matching Review Area value in the Findings section above, even though this new summary row now treats Budget Enforcement as a first-class area. This makes the generated assessment internally inconsistent for the new high-severity cases; please add Budget Enforcement to the Finding Review Area options as well.

Useful? React with 👍 / 👎.

| HITL Gate Placement | [rating] | [one-line summary] | [priority] |
| Blast Radius Containment | [rating] | [one-line summary] | [priority] |
| Audit Trail Completeness | [rating] | [one-line summary] | [priority] |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Budget Enforcement Edge Cases

These fixtures validate agent architecture review behavior for resource budget, quota, and denial-of-wallet controls.

## Case 1: Per-Request Limit Without Cumulative Session Budget

```yaml
agent:
max_tokens_per_request: 8000
request_timeout_seconds: 120
max_requests_per_minute: 100
session:
max_steps: null
max_total_tokens: null
max_total_tool_calls: null
```

**Expected result:** High severity finding.

**Reason:** Each individual request is bounded, but an autonomous workflow can continue indefinitely and consume unbounded cumulative resources.

## Case 2: Retries and Fallback Provider Bypass Metering

```yaml
llm:
primary: provider-a
fallback: provider-b
budget:
ledger: provider-a-only
retry:
max_retries: 5
backoff: exponential
count_retry_costs: false
count_fallback_costs: false
```

**Expected result:** High severity finding.

**Reason:** Timeout retries and fallback calls can consume spend outside the enforced budget ledger.

## Case 3: Sub-Agent Fan-Out Gets Fresh Budgets

```yaml
orchestrator:
max_child_agents: 20
parent_budget_usd: 10
child_agent:
budget_policy: fresh_default_budget
max_budget_usd: 10
inherits_parent_budget: false
```

**Expected result:** High severity finding.

**Reason:** Delegation multiplies the effective budget and bypasses the parent's intended containment.

## Case 4: Shared Budget Ledger With Fail-Closed Enforcement

```yaml
budget_enforcement:
scopes:
- tenant
- user
- session
- agent_identity
- tool
ledger:
includes:
- llm_tokens
- browser_minutes
- code_execution_seconds
- external_api_calls
- storage_writes
- retries
- fallback_provider_calls
- child_agents
enforcement_point: before_tool_execution
quota_service_unavailable: fail_closed
alerts:
- threshold: 70
action: notify_owner
- threshold: 90
action: require_approval
- threshold: 100
action: halt_workflow
kill_switch:
owner: security-operations
scope: tenant_or_global
```

**Expected result:** Pass for budget enforcement if implementation evidence confirms each enforcement point.

**Reason:** The controls cover cumulative workflow cost, tool costs, retries, fallback providers, sub-agent fan-out, fail mode, and operational response.

## Review Assertions

- Do not credit request-level `max_tokens` as session-level containment.
- Confirm retries and provider fallbacks consume the same budget ledger.
- Confirm child agents inherit or reserve from the parent budget.
- Confirm quota service failures halt execution or degrade to a safe read-only mode.