From b276076be7389d4ed9830d99fa833bae0d77ce13 Mon Sep 17 00:00:00 2001 From: Ian Jhumel Bautista Date: Sat, 6 Jun 2026 10:13:33 +0800 Subject: [PATCH] docs(policy): align rationale docs with recalibrated severities Update front-matter severity/confidence and the prose restatements for the 12 policy docs whose rules were recalibrated in trustabl-rules (fix/rule-recalibration): loop-bound, error-contract, and the CSDK-102 / OAI-110 / OAI-004 / CSDK-012 / AG2-012 changes. Corrected the unbounded-loop rationale to reflect each framework's finite default and reframed the downgraded severity defenses. POLICY_INDEX regeneration is intentionally deferred: gen_index reads the live rules pack, which is mid schema_version 9 transition (VAI-011 not yet documented on rulebook main), so regenerating here would pull in an unrelated rule. Indexes and the check_rulebook/gen_index gates go green once the v9 network sync lands. --- docs/Policy/autogen/agent_safety.md | 51 ++++++++++++---------- docs/Policy/autogen/network.md | 18 ++++---- docs/Policy/claude_sdk/agent_safety.md | 14 +++--- docs/Policy/claude_sdk/error_handling.md | 18 ++++---- docs/Policy/claude_sdk/path_safety.md | 20 ++++----- docs/Policy/google_adk/error_handling.md | 12 ++--- docs/Policy/langchain/agent_safety.md | 40 +++++++++-------- docs/Policy/mcp/error_handling.md | 12 ++--- docs/Policy/openai_sdk/agent_safety.md | 13 +++--- docs/Policy/openai_sdk/decorator_config.md | 11 ++--- docs/Policy/openai_sdk/error_handling.md | 12 ++--- docs/Policy/vercel_ai/agent_safety.md | 34 ++++++++------- 12 files changed, 140 insertions(+), 115 deletions(-) diff --git a/docs/Policy/autogen/agent_safety.md b/docs/Policy/autogen/agent_safety.md index 4453fa8..3b7429e 100644 --- a/docs/Policy/autogen/agent_safety.md +++ b/docs/Policy/autogen/agent_safety.md @@ -14,8 +14,8 @@ rules: scope: agent fix_type: config - id: AG2-004 - severity: medium - confidence: 0.8 + severity: low + confidence: 0.6 scope: agent fix_type: config - id: AG2-005 @@ -36,7 +36,7 @@ references: [LLM05, LLM06, LLM10] **Policy ID:** `autogen_agent_safety` **File:** `autogen/agent_safety.yaml` **Rules:** AG2-001, AG2-002, AG2-004, AG2-005, AG2-006 -**Severities:** high, high, medium, medium, medium +**Severities:** high, high, low, medium, medium **Fix types:** config, config, config, config, config **References:** LLM05 (Improper Output Handling), LLM06 (Excessive Agency), LLM10 (Unbounded Consumption) @@ -48,7 +48,7 @@ Agent-scope rules for AutoGen / AG2 agents, read off the constructor kwargs of `ConversableAgent`, `UserProxyAgent`, `AssistantAgent`, and `GroupChatManager`. They flag the configurations AutoGen's own docs warn against: code execution on the host with no Docker (AG2-001), code execution with no human review -(AG2-002), an unbounded group-chat loop (AG2-004), code execution enabled on the +(AG2-002), a group-chat loop with no explicit round cap (AG2-004), code execution enabled on the LLM-facing assistant (AG2-005), and a code-executing agent with no auto-reply cap (AG2-006). Each uses the `agent_kwarg_value` / `agent_kwarg_present` / `agent_kwarg_missing` predicates against the constructor call. @@ -73,13 +73,14 @@ the model's code runs with zero review. Two more rules guard the generate/execute boundary and the loop bounds. Collapsing generation and execution into one `AssistantAgent` (AG2-005) means the agent the model fully controls also runs whatever it produces, removing the review -boundary AutoGen's two-agent pattern exists to provide. And unbounded loops are an -Unbounded Consumption (LLM10) hazard with a safety edge: a `GroupChatManager` -with no `max_round` (AG2-004) lets a degenerate conversation run until something -else stops it, and a code-executing executor with no `max_consecutive_auto_reply` -(AG2-006) can auto-execute model code an unbounded number of times — so a single -injected instruction is amplified across many runs, multiplying both cost and -blast radius. +boundary AutoGen's two-agent pattern exists to provide. And loops that rely on the +framework default instead of an explicit cap are an Unbounded Consumption (LLM10) +hazard with a safety edge: a `GroupChatManager` with no explicit `max_round` +(AG2-004) falls back to AutoGen's built-in default, letting a degenerate +conversation run to that generic ceiling, and a code-executing executor with no +`max_consecutive_auto_reply` (AG2-006) falls back to the class default of 100 — so +a single injected instruction can be amplified across up to that many runs, +multiplying both cost and blast radius. --- @@ -129,22 +130,24 @@ or disable execution. **Confidence 0.85:** the rule confirms execution is configured and review is off, but cannot see an out-of-band approval gate the team may have wired around the agent — a small over-flag. -### AG2-004 — GroupChatManager has no max_round bound (Severity: medium, Confidence: 0.8, Fix type: config) +### AG2-004 — GroupChatManager has no explicit max_round bound (Severity: low, Confidence: 0.6, Fix type: config) **What we detect:** a `GroupChatManager` (or `GroupChat`) with no `max_round` kwarg (predicate `agent_kwarg_missing`). -**Why it is flaggable:** with no round cap the speaker-selection loop has no -upper bound; a degenerate conversation runs until the budget or wall-clock is -exhausted (LLM10), and if participants hold side-effecting tools the same -mutation can be applied repeatedly. +**Why it is flaggable:** with no explicit `max_round` the speaker-selection loop +falls back to AutoGen's built-in default rather than a task-sized cap; a +degenerate conversation runs to that generic ceiling (LLM10), and if participants +hold side-effecting tools the same mutation can be applied repeatedly up to that +bound. **Real-world consequence:** two agents keep handing a task back and forth because neither emits the termination signal; the chat burns API budget for hundreds of rounds before a timeout kills it. -**Why severity is medium and not high:** the usual outcome is a cost/availability -incident rather than a direct compromise — serious but recoverable, and only a +**Why severity is low:** AutoGen already bounds the loop with a built-in default, +so this flags a missing *explicit, task-sized* cap rather than a true runaway — a +hygiene nudge whose usual worst case is a cost/availability incident, and only a safety problem when looped tools have side effects. **Fix type — config:** pass `max_round=`. **Confidence 0.8:** a chat wrapped by an external timeout or a custom loop guard is over-flagged, since the rule sees only the constructor @@ -180,10 +183,11 @@ over-flags safe two-role setups that happen to set the kwarg on the assistant. `code_execution_config` present AND no `max_consecutive_auto_reply` kwarg (predicates `agent_kwarg_present` + `agent_kwarg_missing`). -**Why it is flaggable:** with no auto-reply cap a code-executing agent can -auto-respond — and therefore auto-execute model code — an unbounded number of -times in one exchange, amplifying the cost and blast radius of a single injected -instruction. +**Why it is flaggable:** with no explicit `max_consecutive_auto_reply` a +code-executing agent falls back to AutoGen's class default of 100 +(MAX_CONSECUTIVE_AUTO_REPLY) — so it can auto-respond, and therefore auto-execute +model code, up to 100 times in one exchange, amplifying the cost and blast radius +of a single injected instruction. **Real-world consequence:** an executor with no `max_consecutive_auto_reply` loops on a failing code block, re-executing slightly varied attacker code dozens @@ -191,7 +195,8 @@ of times before anything stops it. **Why severity is medium and not high:** it is an amplifier of the underlying code-execution risk (covered by AG2-001/002), not a fresh RCE path on its own; -its impact is the unbounded *repetition* rather than the execution itself. **Fix +its impact is the *repetition* (up to the default cap of 100) rather than the +execution itself. **Fix type — config:** set `max_consecutive_auto_reply=` to a small integer. **Confidence 0.7:** a deployment that bounds the loop another way (an external turn limit, a custom reply handler) is over-flagged, since the rule sees only the diff --git a/docs/Policy/autogen/network.md b/docs/Policy/autogen/network.md index 3bfe290..1bc58e0 100644 --- a/docs/Policy/autogen/network.md +++ b/docs/Policy/autogen/network.md @@ -4,8 +4,8 @@ category: autogen topic: network rules: - id: AG2-012 - severity: medium - confidence: 0.8 + severity: high + confidence: 0.85 scope: tool fix_type: code references: [LLM10] @@ -16,7 +16,7 @@ references: [LLM10] **Policy ID:** `autogen_network` **File:** `autogen/network.yaml` **Rules:** AG2-012 -**Severities:** medium +**Severities:** high **Fix types:** code **References:** LLM10 (Unbounded Consumption) @@ -50,7 +50,7 @@ one turn. ## Rule-by-rule defense -### AG2-012 — Tool network call has no timeout (Severity: medium, Confidence: 0.8, Fix type: code) +### AG2-012 — Tool network call has no timeout (Severity: high, Confidence: 0.85, Fix type: code) **What we detect:** an AutoGen tool body that calls a `requests.*` / `httpx.*` request function with no `timeout=` keyword (predicate `call_without_kwarg`). @@ -63,10 +63,12 @@ socket dies. with no timeout; a slow upstream makes the agent hang for minutes per call, and under concurrent load the host runs out of connections while every agent waits. -**Why severity is medium and not high:** the impact is an availability/cost -incident, not a compromise — recoverable, and only triggered by a slow or -unresponsive remote rather than on every call. **Fix type — code:** adding -`timeout=` is a tool-source edit. **Confidence 0.8:** the rule looks for the +**Why severity is high:** this matches the timeout rules in every other SDK pack +(CSDK-003, MCP-004, OAI-005, ADK-003, PYD-006), which all rate a missing timeout +high. A hung call with no tool-level timeout stalls the whole agent loop — and, in +a group chat, the whole conversation — and the failure never surfaces to the +model, so the blast radius is the agent's availability, not a single turn. **Fix type — code:** adding +`timeout=` is a tool-source edit. **Confidence 0.85:** the rule looks for the `timeout` kwarg on the recognized callees, so it over-fires when a timeout is set another way (a session-level default, an `httpx.Client(timeout=...)` the call inherits) and under-fires on request libraries outside the recognized diff --git a/docs/Policy/claude_sdk/agent_safety.md b/docs/Policy/claude_sdk/agent_safety.md index 0e91aa6..d926dfc 100644 --- a/docs/Policy/claude_sdk/agent_safety.md +++ b/docs/Policy/claude_sdk/agent_safety.md @@ -9,7 +9,7 @@ rules: scope: agent fix_type: config - id: CSDK-102 - severity: high + severity: medium confidence: 0.8 scope: agent fix_type: config @@ -51,7 +51,7 @@ references: [LLM01, LLM06] **Policy ID:** `claude_sdk_agent_safety` **File:** `claude_sdk/agent_safety.yaml` **Rules:** CSDK-101, CSDK-102, CSDK-103, CSDK-104, CSDK-105, CSDK-120, CSDK-130, CSDK-131 -**Severities:** high, high, high, high, high, high, high, high +**Severities:** high, medium, high, high, high, high, high, high **Fix types:** config, config, config, config, config, config, config, config **References:** LLM01, LLM06 @@ -128,7 +128,7 @@ it with a PreToolUse hook — a wiring change, not tool code. **Confidence 0.8:** A subagent may legitimately need shell for its job (a build runner); the rule cannot tell a justified grant from an over-broad one, hence 0.8. -### CSDK-102 — Subagent is granted the WebSearch tool (Severity: high, Confidence: 0.8, Fix type: config) +### CSDK-102 — Subagent is granted the WebSearch tool (Severity: medium, Confidence: 0.8, Fix type: config) **What we detect:** An `AgentDefinition` whose `tools` list contains `WebSearch`. @@ -139,9 +139,11 @@ injected instructions in results can redirect the subagent. "ignore previous instructions…" text becomes part of the context and steers the next action. -**Why severity is high and not medium:** Untrusted-content intake is a primary -prompt-injection vector and there is no SDK-level filtering. Not critical: the -payload still needs a follow-on capability to do damage. +**Why severity is medium:** Granting WebSearch is routine and useful, and +untrusted-content intake only becomes harmful when paired with a follow-on +capability that can act on the injected instruction — so the grant alone is a +review signal, not a high-severity defect. It is not low because search results +are a primary prompt-injection vector with no SDK-level filtering. **Fix type — config:** Remove `WebSearch`, or gate queries with a PreToolUse hook. diff --git a/docs/Policy/claude_sdk/error_handling.md b/docs/Policy/claude_sdk/error_handling.md index 880f1c2..09567d4 100644 --- a/docs/Policy/claude_sdk/error_handling.md +++ b/docs/Policy/claude_sdk/error_handling.md @@ -4,7 +4,7 @@ category: claude_sdk topic: error_handling rules: - id: CSDK-005 - severity: medium + severity: low confidence: 0.6 scope: tool fix_type: code @@ -16,7 +16,7 @@ references: [LLM05] **Policy ID:** `claude_sdk_error_handling` **File:** `claude_sdk/error_handling.yaml` **Rules:** CSDK-005 -**Severities:** medium +**Severities:** low **Fix types:** code **References:** LLM05 @@ -57,7 +57,7 @@ unpredictable agent behavior. ## Rule-by-rule defense -### CSDK-005 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code) +### CSDK-005 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code) **What we detect:** A tool body that contains a `raise` and has no `try`/`except` block @@ -73,11 +73,13 @@ message may leak internals. fault gives the model no "retryable" hint; the model may retry a charge that actually went through, or give up on one that would have succeeded on retry. -**Why severity is medium and not high:** -It degrades reliability and leaks minor internals rather than directly breaching -the system; a well-behaved caller environment can absorb some of it. It is not -low because mis-handled errors in side-effecting tools cause real wrong actions -(double charges, abandoned writes). +**Why severity is low:** +A bare `raise` is frequently fine: the Claude Agent SDK, an outer wrapper, or a +`failure_error_function`-style handler often converts the exception into something +the model can act on, so this is a reliability-and-hygiene nudge rather than a +defect. It is not medium because the in-body check cannot see those out-of-body +handlers and fires on a great deal of correct code — treat it as a prompt to add +an explicit structured-error contract where one is genuinely missing. **Fix type — code:** Wrap the body and return a structured error — a source edit. diff --git a/docs/Policy/claude_sdk/path_safety.md b/docs/Policy/claude_sdk/path_safety.md index 6734cfe..284211e 100644 --- a/docs/Policy/claude_sdk/path_safety.md +++ b/docs/Policy/claude_sdk/path_safety.md @@ -9,7 +9,7 @@ rules: scope: tool fix_type: code - id: CSDK-012 - severity: medium + severity: low confidence: 0.5 scope: tool fix_type: code @@ -21,7 +21,7 @@ references: [LLM02, LLM06] **Policy ID:** `claude_sdk_path_safety` **File:** `claude_sdk/path_safety.yaml` **Rules:** CSDK-004, CSDK-012 -**Severities:** high, medium +**Severities:** high, low **Fix types:** code, code **References:** LLM02, LLM06 @@ -99,7 +99,7 @@ positive. Conversely, a tool that calls `.resolve()` but never checks containmen passes the rule yet is still unsafe — a false negative the rule cannot close, which is why containment lives in the recommendations. -### CSDK-012 — TypeScript Claude SDK tool writes to the filesystem (Severity: medium, Confidence: 0.5, Fix type: code) +### CSDK-012 — TypeScript Claude SDK tool writes to the filesystem (Severity: low, Confidence: 0.5, Fix type: code) **What we detect:** A TypeScript Claude SDK `tool(...)` whose handler body invokes a filesystem-write @@ -131,13 +131,13 @@ A `saveNote(name, body)` tool doing `writeFileSync(name, body)` is steered into `writeFileSync("../../.bashrc", payload)` or into overwriting a config file to widen the agent's own permissions. -**Why severity is medium and not high:** -This is deliberately one notch below the Python sibling's high precisely because -the signal is coarse. The rule fires on *any* write — including writes to a -hard-coded safe path with no model influence — so a large fraction of hits are not -exploitable. Pairing a low-precision detector with a high severity would -overstate the finding; medium reflects that this is a lead to confirm, not a -near-certain defect. +**Why severity is low:** +This is the weakest detector in the file. It fires on *any* write — including +writes to a hard-coded safe path with no model influence — and has no path-flow +analysis behind it, so a large fraction of hits are not exploitable. Pairing a +low-precision detector with anything above low would overstate a lead that is +about as likely benign as not; low marks it as a prompt to confirm the path or +contents are model-influenced, not a defect. **Fix type — code:** Confining writes to a working directory and resolving/validating the final path is diff --git a/docs/Policy/google_adk/error_handling.md b/docs/Policy/google_adk/error_handling.md index 9f264ea..98e8a0e 100644 --- a/docs/Policy/google_adk/error_handling.md +++ b/docs/Policy/google_adk/error_handling.md @@ -4,7 +4,7 @@ category: google_adk topic: error_handling rules: - id: ADK-005 - severity: medium + severity: low confidence: 0.6 scope: tool fix_type: code @@ -16,7 +16,7 @@ references: [LLM05] **Policy ID:** `google_adk_error_handling` **File:** `google_adk/error_handling.yaml` **Rules:** ADK-005 -**Severities:** medium +**Severities:** low **Fix types:** code **References:** LLM05 @@ -49,7 +49,7 @@ contract and surfaces a raw exception instead. ## Rule-by-rule defense -### ADK-005 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code) +### ADK-005 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code) **What we detect:** a wrapped-function body with a `raise` and no `try`/`except`. @@ -59,8 +59,10 @@ recovery contract, breaking ADK's return-a-dict convention. **Real-world consequence:** a transient fault raised as `ValueError` gives the model no "retryable" hint; it retries a completed action or abandons a recoverable one. -**Why severity is medium and not high:** reliability/minor-leak rather than a direct -breach; mishandled errors in side-effecting tools still cause real wrong actions. +**Why severity is low:** ADK's return-a-dict convention or an outer wrapper +commonly shapes the error already, so this is a reliability nudge that fires on a +lot of correct code; it stays above noise because mishandled errors in +side-effecting tools can still cause real wrong actions. **Fix type — code:** wrap the body and return a structured error dict. diff --git a/docs/Policy/langchain/agent_safety.md b/docs/Policy/langchain/agent_safety.md index 3b6626b..6a597ae 100644 --- a/docs/Policy/langchain/agent_safety.md +++ b/docs/Policy/langchain/agent_safety.md @@ -9,13 +9,13 @@ rules: scope: agent fix_type: code - id: LC-102 - severity: medium - confidence: 0.8 + severity: low + confidence: 0.6 scope: agent fix_type: config - id: LC-111 - severity: medium - confidence: 0.8 + severity: low + confidence: 0.6 scope: agent fix_type: config references: [LLM06, LLM10] @@ -26,8 +26,8 @@ references: [LLM06, LLM10] **Policy ID:** `langchain_agent_safety` **File:** `langchain/agent_safety.yaml` **Rules:** LC-101, LC-102, LC-111 -**Severities:** high, medium -**Fix types:** code, config +**Severities:** high, low, low +**Fix types:** code, config, config **References:** LLM06 (Excessive Agency), LLM10 (Unbounded Consumption) --- @@ -37,8 +37,8 @@ references: [LLM06, LLM10] Agent-scope rules for the constructor-shaped LangChain / LangGraph agents Trustabl discovers: `create_react_agent` and `create_agent` (normalized class `ReactAgent` / `CreateAgent`) and the legacy `AgentExecutor`. The rules cover the two highest-signal -agent-level risks: wiring a code-execution/shell built-in tool (LC-101) and an -unbounded tool-calling loop (LC-102 / LC-111). +agent-level risks: wiring a code-execution/shell built-in tool (LC-101) and a +tool-calling loop with no explicit iteration cap (LC-102 / LC-111). The raw `StateGraph` graph agent is a documented discovery gap — its tools and model are assembled across many call sites, so it is not yet modeled as a single agent. @@ -71,28 +71,30 @@ and read the deployment's secrets. sandbox-and-gate it. **Confidence 0.85:** a few agents legitimately need a REPL and have sandboxed it out of band, which the class-name match cannot see. -### LC-102 — AgentExecutor has no max_iterations limit (Severity: medium, Confidence: 0.8, Fix type: config) +### LC-102 — AgentExecutor has no explicit max_iterations limit (Severity: low, Confidence: 0.6, Fix type: config) **What we detect:** an `AgentExecutor` with no effective `max_iterations` kwarg (predicate `agent_kwarg_missing`). -**Why it is flaggable:** with no iteration ceiling, a model that never emits a final -answer — it loops calling tools, or oscillates between two — runs until it exhausts -the API budget or wall-clock (LLM10, Unbounded Consumption). When the looped tools -have side effects, the runaway loop is also a correctness and safety problem, not -just a cost one. +**Why it is flaggable:** with no explicit `max_iterations`, the executor falls back +to LangChain's default of 15 — a generic ceiling, not one sized to this task. A +model that loops or oscillates still runs up to 15 tool round-trips (LLM10, +Unbounded Consumption), a cost the workflow may not tolerate, and the implicit cap +can shift between versions; when the looped tools have side effects it is a +correctness concern too. -**Severity medium:** a cost/availability incident rather than a direct compromise. -**Confidence 0.8:** an executor wrapped by an external timeout or a custom loop -guard is over-flagged. +**Severity low:** the framework default (15) already prevents a true runaway, so +this flags a missing *explicit, task-sized* cap — a hygiene nudge, not a defect. +**Confidence 0.6:** an executor relying on the default, wrapped by an external +timeout, or guarded by a custom loop is over-flagged. -### LC-111 — TypeScript AgentExecutor has no maxIterations limit (Severity: medium, Confidence: 0.8, Fix type: config) +### LC-111 — TypeScript AgentExecutor has no explicit maxIterations limit (Severity: low, Confidence: 0.6, Fix type: config) **What we detect:** a TS `AgentExecutor` with no effective `maxIterations` kwarg. **Why it is flaggable / consequence:** identical to LC-102 in LangChain.js. -**Severity medium / Confidence 0.8:** same profile. +**Severity low / Confidence 0.6:** same profile as LC-102. --- diff --git a/docs/Policy/mcp/error_handling.md b/docs/Policy/mcp/error_handling.md index 1b3b934..903d976 100644 --- a/docs/Policy/mcp/error_handling.md +++ b/docs/Policy/mcp/error_handling.md @@ -4,7 +4,7 @@ category: mcp topic: error_handling rules: - id: MCP-006 - severity: medium + severity: low confidence: 0.6 scope: tool fix_type: code @@ -31,7 +31,7 @@ An MCP tool handler that can raise without catching, detected by ## Rule-by-rule defense -### MCP-006 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code) +### MCP-006 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code) **What we detect:** a handler body that contains a `raise` and no `try`/`except`. @@ -40,10 +40,10 @@ exception to the connecting client as an opaque protocol error. The model on the other end often cannot recover or retry intelligently, and the raw message may leak internal detail — stack frames, absolute paths, secrets in arguments — across the server's trust boundary to whatever client connected (improper output -handling, LLM05). Medium severity because the impact is degraded recovery plus a -modest disclosure channel; confidence 0.6 because a handler may raise -intentionally for a caller that handles it, and the body-only check does not see -a `try` in a calling frame. +handling, LLM05). Low severity because the impact is degraded recovery plus a +modest disclosure channel, and a handler often raises intentionally for a caller +or runtime that structures it; confidence 0.6 because the body-only check does +not see a `try` in a calling frame. **Fix type — code:** returning a structured `{"error": ..., "retryable": ...}` result instead of raising is a source edit. diff --git a/docs/Policy/openai_sdk/agent_safety.md b/docs/Policy/openai_sdk/agent_safety.md index 4a022ab..f305534 100644 --- a/docs/Policy/openai_sdk/agent_safety.md +++ b/docs/Policy/openai_sdk/agent_safety.md @@ -34,7 +34,7 @@ rules: scope: agent fix_type: config - id: OAI-110 - severity: high + severity: medium confidence: 0.6 scope: agent fix_type: config @@ -46,7 +46,7 @@ references: [LLM01, LLM06] **Policy ID:** `openai_sdk_agent_safety` **File:** `openai_sdk/agent_safety.yaml` **Rules:** OAI-101, OAI-102, OAI-103, OAI-104, OAI-105, OAI-109, OAI-110 -**Severities:** high, high, high, medium, high, high, high +**Severities:** high, high, high, medium, high, high, medium **Fix types:** config, config, config, config, config, config, config **References:** LLM01, LLM06 @@ -238,7 +238,7 @@ reach `WebSearchTool`. **Confidence 0.85:** the agent might screen by another route the rule cannot see. -### OAI-110 — Content-fetching tool without output_guardrails (Severity: high, Confidence: 0.6, Fix type: config) +### OAI-110 — Content-fetching tool without output_guardrails (Severity: medium, Confidence: 0.6, Fix type: config) **What we detect:** empty `output_guardrails` while the agent wires `WebSearchTool`, `FileSearchTool`, or `CodeInterpreterTool`. @@ -250,8 +250,11 @@ can drive an exfiltrating or unsafe answer with nothing inspecting what leaves. **Real-world consequence:** injected content in a fetched document steers the final response to leak data, unscreened. -**Why high not medium:** egress screening is the last line before the user/caller, and -it is absent on an agent that ingests untrusted content. +**Why severity is medium:** output guardrails are far from universally adopted, and +many content-fetching agents handle only low-risk public data, so a missing egress +screen is often acceptable (see the 0.6 confidence). It stays above low because, on +an agent that ingests untrusted content and can act on it, the absent screen is the +last line before the user/caller. **Fix type — config:** add an `@output_guardrail` and wire `output_guardrails=[...]`. diff --git a/docs/Policy/openai_sdk/decorator_config.md b/docs/Policy/openai_sdk/decorator_config.md index 9c52dc4..296319a 100644 --- a/docs/Policy/openai_sdk/decorator_config.md +++ b/docs/Policy/openai_sdk/decorator_config.md @@ -9,7 +9,7 @@ rules: scope: tool fix_type: config - id: OAI-004 - severity: medium + severity: low confidence: 0.7 scope: tool fix_type: config @@ -26,7 +26,7 @@ references: [LLM05] **Policy ID:** `openai_sdk_decorator_config` **File:** `openai_sdk/decorator_config.yaml` **Rules:** OAI-003, OAI-004, OAI-015 -**Severities:** medium, medium, high +**Severities:** medium, low, high **Fix types:** config, config, config **References:** LLM05 @@ -89,7 +89,7 @@ input shape needed the relaxation, widen the type hints instead. **Confidence 0.95:** the literal `False` value is read directly — almost no false positives. -### OAI-004 — Tool has no failure_error_function (Severity: medium, Confidence: 0.7, Fix type: config) +### OAI-004 — Tool has no failure_error_function (Severity: low, Confidence: 0.7, Fix type: config) **What we detect:** a `@function_tool` with no `failure_error_function` kwarg. @@ -99,8 +99,9 @@ model, which then has no recovery contract and may hallucinate retries. **Real-world consequence:** a transient failure is shown to the model as an opaque traceback; it retries a non-retryable action or abandons a recoverable one. -**Why severity is medium and not high:** the SDK's default behavior is degraded but -not catastrophic; many tools never raise. +**Why severity is low:** this flags the *absence* of an optional safeguard whose +default behavior is already tolerable, and many tools never raise or handle errors +in-body, so it fires on a lot of correct code; it is a hygiene nudge, not a defect. **Fix type — config:** pass a `failure_error_function` that returns a structured error string. diff --git a/docs/Policy/openai_sdk/error_handling.md b/docs/Policy/openai_sdk/error_handling.md index 95549a8..118d9ee 100644 --- a/docs/Policy/openai_sdk/error_handling.md +++ b/docs/Policy/openai_sdk/error_handling.md @@ -4,7 +4,7 @@ category: openai_sdk topic: error_handling rules: - id: OAI-008 - severity: medium + severity: low confidence: 0.6 scope: tool fix_type: code @@ -16,7 +16,7 @@ references: [LLM05] **Policy ID:** `openai_sdk_error_handling` **File:** `openai_sdk/error_handling.yaml` **Rules:** OAI-008 -**Severities:** medium +**Severities:** low **Fix types:** code **References:** LLM05 @@ -53,7 +53,7 @@ the most common false positive for OAI-008. ## Rule-by-rule defense -### OAI-008 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code) +### OAI-008 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code) **What we detect:** a tool body with a `raise` and no `try`/`except`. @@ -64,8 +64,10 @@ string with no recovery contract, and may carry internal detail. on a transient fault gives the model no "retryable" hint; it may retry a completed charge or abandon a recoverable one. -**Why severity is medium and not high:** reliability/minor-leak rather than a direct -breach; mishandled errors in side-effecting tools still cause real wrong actions. +**Why severity is low:** the SDK's `failure_error_function` (or an outer handler) +commonly shapes the error already, so this is a reliability nudge that fires on a +lot of correct code; it stays above noise because mishandled errors in +side-effecting tools can still cause real wrong actions. **Fix type — code:** wrap the body and return a structured error. diff --git a/docs/Policy/vercel_ai/agent_safety.md b/docs/Policy/vercel_ai/agent_safety.md index 235b778..f094993 100644 --- a/docs/Policy/vercel_ai/agent_safety.md +++ b/docs/Policy/vercel_ai/agent_safety.md @@ -9,7 +9,7 @@ rules: scope: agent fix_type: config - id: VAI-007 - severity: medium + severity: low confidence: 0.6 scope: agent fix_type: config @@ -26,7 +26,7 @@ references: [LLM06, LLM10] **Policy ID:** `vercel_ai_agent_safety` **File:** `vercel_ai/agent_safety.yaml` **Rules:** VAI-006, VAI-007, VAI-008 -**Severities:** high, medium, medium +**Severities:** high, low, medium **Fix types:** config, config, config **References:** LLM06 (Excessive Agency), LLM10 (Unbounded Consumption) @@ -58,12 +58,14 @@ direct path to running attacker-chosen commands or code with the agent's privileges (VAI-006). This is excessive agency (LLM06) in its most literal form — the agent is one wired tool away from arbitrary execution. -The loop bounds matter because the SDK imposes no default ceiling. A -`generateText` call with a `tools` record runs a multi-step loop whose only -stopping condition, absent `stopWhen` / `maxSteps`, is the model deciding to stop -calling tools (VAI-007). A prompt injection — or a model that loops on a tool -whose output keeps re-triggering it — runs the loop unbounded, burning tokens, -hammering every wired tool (including billed or side-effecting ones), and +The loop bounds matter because the default ceiling is generic, not task-sized. A +bare `generateText` / `streamText` call does not continue the tool loop without +`stopWhen`, and the `Agent` / `ToolLoopAgent` class defaults to +`stopWhen: stepCountIs(20)` — so an agent that omits an explicit bound (VAI-007) +either silently does not iterate or inherits that generic 20-step ceiling. A +prompt injection — or a model that loops on a tool whose output keeps +re-triggering it — can then run up to that default before stopping, burning +tokens, hammering every wired tool (including billed or side-effecting ones), and stalling the request (LLM10). VAI-008 is the interaction of the two: setting `toolChoice: "required"` forces a tool call on every step instead of letting the model answer directly, so a wired execution tool is far more likely to be invoked @@ -100,23 +102,25 @@ agent-wiring change, not a tool-source edit. **Confidence 0.85:** a few agents legitimately need an execution tool and sandbox it out of band, which the class-name match cannot see. -### VAI-007 — Agent tool loop has no step bound (Severity: medium, Confidence: 0.6, Fix type: config) +### VAI-007 — Agent tool loop has no explicit step bound (Severity: low, Confidence: 0.6, Fix type: config) **What we detect:** an agent that runs a tool loop but sets neither `stopWhen` nor `maxSteps` (predicate `agent_kwarg_missing` for both). -**Why it is flaggable:** with no bound the loop's only stopping condition is the -model choosing to stop calling tools; an injection or a self-re-triggering tool -runs it unbounded (LLM10). +**Why it is flaggable:** absent an explicit bound the loop runs to the SDK's +generic default (a single step for a bare call, or `stepCountIs(20)` for the +`Agent` class) rather than a task-sized cap; an injection or a self-re-triggering +tool can run it up to that ceiling (LLM10). **Real-world consequence:** a research agent loops on a search tool whose results keep prompting another search; with no `maxSteps` it runs hundreds of round-trips, burning the token budget and hammering the search API before the request times out. -**Why severity is medium and not high:** the usual outcome is a cost/availability -incident rather than a compromise — recoverable, and only a safety problem when -the looped tools have side effects. **Fix type — config:** pass `maxSteps` or a +**Why severity is low:** the SDK already bounds the loop by default, so this flags +a missing *explicit, task-sized* cap rather than a true runaway — a hygiene nudge +whose usual worst case is a cost/availability incident, and only a safety problem +when the looped tools have side effects. **Fix type — config:** pass `maxSteps` or a `stopWhen` condition. **Confidence 0.6:** the SDK has multiple evolving stop mechanisms (`maxSteps`, `stopWhen`, `stepCountIs`, version differences between v4 and v5), and an agent bounded by an external timeout or a custom loop guard is