Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 28 additions & 23 deletions docs/Policy/autogen/agent_safety.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ rules:
scope: agent
fix_type: config
- id: AG2-004
severity: medium
confidence: 0.8
severity: low
confidence: 0.6
scope: agent
fix_type: config
- id: AG2-005
Expand All @@ -36,7 +36,7 @@ references: [LLM05, LLM06, LLM10]
**Policy ID:** `autogen_agent_safety`
**File:** `autogen/agent_safety.yaml`
**Rules:** AG2-001, AG2-002, AG2-004, AG2-005, AG2-006
**Severities:** high, high, medium, medium, medium
**Severities:** high, high, low, medium, medium
**Fix types:** config, config, config, config, config
**References:** LLM05 (Improper Output Handling), LLM06 (Excessive Agency), LLM10 (Unbounded Consumption)

Expand All @@ -48,7 +48,7 @@ Agent-scope rules for AutoGen / AG2 agents, read off the constructor kwargs of
`ConversableAgent`, `UserProxyAgent`, `AssistantAgent`, and `GroupChatManager`.
They flag the configurations AutoGen's own docs warn against: code execution on
the host with no Docker (AG2-001), code execution with no human review
(AG2-002), an unbounded group-chat loop (AG2-004), code execution enabled on the
(AG2-002), a group-chat loop with no explicit round cap (AG2-004), code execution enabled on the
LLM-facing assistant (AG2-005), and a code-executing agent with no auto-reply cap
(AG2-006). Each uses the `agent_kwarg_value` / `agent_kwarg_present` /
`agent_kwarg_missing` predicates against the constructor call.
Expand All @@ -73,13 +73,14 @@ the model's code runs with zero review.
Two more rules guard the generate/execute boundary and the loop bounds.
Collapsing generation and execution into one `AssistantAgent` (AG2-005) means the
agent the model fully controls also runs whatever it produces, removing the review
boundary AutoGen's two-agent pattern exists to provide. And unbounded loops are an
Unbounded Consumption (LLM10) hazard with a safety edge: a `GroupChatManager`
with no `max_round` (AG2-004) lets a degenerate conversation run until something
else stops it, and a code-executing executor with no `max_consecutive_auto_reply`
(AG2-006) can auto-execute model code an unbounded number of times — so a single
injected instruction is amplified across many runs, multiplying both cost and
blast radius.
boundary AutoGen's two-agent pattern exists to provide. And loops that rely on the
framework default instead of an explicit cap are an Unbounded Consumption (LLM10)
hazard with a safety edge: a `GroupChatManager` with no explicit `max_round`
(AG2-004) falls back to AutoGen's built-in default, letting a degenerate
conversation run to that generic ceiling, and a code-executing executor with no
`max_consecutive_auto_reply` (AG2-006) falls back to the class default of 100 — so
a single injected instruction can be amplified across up to that many runs,
multiplying both cost and blast radius.

---

Expand Down Expand Up @@ -129,22 +130,24 @@ or disable execution. **Confidence 0.85:** the rule confirms execution is
configured and review is off, but cannot see an out-of-band approval gate the team
may have wired around the agent — a small over-flag.

### AG2-004 — GroupChatManager has no max_round bound (Severity: medium, Confidence: 0.8, Fix type: config)
### AG2-004 — GroupChatManager has no explicit max_round bound (Severity: low, Confidence: 0.6, Fix type: config)

**What we detect:** a `GroupChatManager` (or `GroupChat`) with no `max_round`
kwarg (predicate `agent_kwarg_missing`).

**Why it is flaggable:** with no round cap the speaker-selection loop has no
upper bound; a degenerate conversation runs until the budget or wall-clock is
exhausted (LLM10), and if participants hold side-effecting tools the same
mutation can be applied repeatedly.
**Why it is flaggable:** with no explicit `max_round` the speaker-selection loop
falls back to AutoGen's built-in default rather than a task-sized cap; a
degenerate conversation runs to that generic ceiling (LLM10), and if participants
hold side-effecting tools the same mutation can be applied repeatedly up to that
bound.

**Real-world consequence:** two agents keep handing a task back and forth because
neither emits the termination signal; the chat burns API budget for hundreds of
rounds before a timeout kills it.

**Why severity is medium and not high:** the usual outcome is a cost/availability
incident rather than a direct compromise — serious but recoverable, and only a
**Why severity is low:** AutoGen already bounds the loop with a built-in default,
so this flags a missing *explicit, task-sized* cap rather than a true runaway — a
hygiene nudge whose usual worst case is a cost/availability incident, and only a
safety problem when looped tools have side effects. **Fix type — config:** pass
`max_round=`. **Confidence 0.8:** a chat wrapped by an external timeout or a
custom loop guard is over-flagged, since the rule sees only the constructor
Expand Down Expand Up @@ -180,18 +183,20 @@ over-flags safe two-role setups that happen to set the kwarg on the assistant.
`code_execution_config` present AND no `max_consecutive_auto_reply` kwarg
(predicates `agent_kwarg_present` + `agent_kwarg_missing`).

**Why it is flaggable:** with no auto-reply cap a code-executing agent can
auto-respond — and therefore auto-execute model code — an unbounded number of
times in one exchange, amplifying the cost and blast radius of a single injected
instruction.
**Why it is flaggable:** with no explicit `max_consecutive_auto_reply` a
code-executing agent falls back to AutoGen's class default of 100
(MAX_CONSECUTIVE_AUTO_REPLY) — so it can auto-respond, and therefore auto-execute
model code, up to 100 times in one exchange, amplifying the cost and blast radius
of a single injected instruction.

**Real-world consequence:** an executor with no `max_consecutive_auto_reply`
loops on a failing code block, re-executing slightly varied attacker code dozens
of times before anything stops it.

**Why severity is medium and not high:** it is an amplifier of the underlying
code-execution risk (covered by AG2-001/002), not a fresh RCE path on its own;
its impact is the unbounded *repetition* rather than the execution itself. **Fix
its impact is the *repetition* (up to the default cap of 100) rather than the
execution itself. **Fix
type — config:** set `max_consecutive_auto_reply=` to a small integer.
**Confidence 0.7:** a deployment that bounds the loop another way (an external
turn limit, a custom reply handler) is over-flagged, since the rule sees only the
Expand Down
18 changes: 10 additions & 8 deletions docs/Policy/autogen/network.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ category: autogen
topic: network
rules:
- id: AG2-012
severity: medium
confidence: 0.8
severity: high
confidence: 0.85
scope: tool
fix_type: code
references: [LLM10]
Expand All @@ -16,7 +16,7 @@ references: [LLM10]
**Policy ID:** `autogen_network`
**File:** `autogen/network.yaml`
**Rules:** AG2-012
**Severities:** medium
**Severities:** high
**Fix types:** code
**References:** LLM10 (Unbounded Consumption)

Expand Down Expand Up @@ -50,7 +50,7 @@ one turn.

## Rule-by-rule defense

### AG2-012 — Tool network call has no timeout (Severity: medium, Confidence: 0.8, Fix type: code)
### AG2-012 — Tool network call has no timeout (Severity: high, Confidence: 0.85, Fix type: code)

**What we detect:** an AutoGen tool body that calls a `requests.*` / `httpx.*`
request function with no `timeout=` keyword (predicate `call_without_kwarg`).
Expand All @@ -63,10 +63,12 @@ socket dies.
with no timeout; a slow upstream makes the agent hang for minutes per call, and
under concurrent load the host runs out of connections while every agent waits.

**Why severity is medium and not high:** the impact is an availability/cost
incident, not a compromise — recoverable, and only triggered by a slow or
unresponsive remote rather than on every call. **Fix type — code:** adding
`timeout=` is a tool-source edit. **Confidence 0.8:** the rule looks for the
**Why severity is high:** this matches the timeout rules in every other SDK pack
(CSDK-003, MCP-004, OAI-005, ADK-003, PYD-006), which all rate a missing timeout
high. A hung call with no tool-level timeout stalls the whole agent loop — and, in
a group chat, the whole conversation — and the failure never surfaces to the
model, so the blast radius is the agent's availability, not a single turn. **Fix type — code:** adding
`timeout=` is a tool-source edit. **Confidence 0.85:** the rule looks for the
`timeout` kwarg on the recognized callees, so it over-fires when a timeout is set
another way (a session-level default, an `httpx.Client(timeout=...)` the call
inherits) and under-fires on request libraries outside the recognized
Expand Down
14 changes: 8 additions & 6 deletions docs/Policy/claude_sdk/agent_safety.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ rules:
scope: agent
fix_type: config
- id: CSDK-102
severity: high
severity: medium
confidence: 0.8
scope: agent
fix_type: config
Expand Down Expand Up @@ -51,7 +51,7 @@ references: [LLM01, LLM06]
**Policy ID:** `claude_sdk_agent_safety`
**File:** `claude_sdk/agent_safety.yaml`
**Rules:** CSDK-101, CSDK-102, CSDK-103, CSDK-104, CSDK-105, CSDK-120, CSDK-130, CSDK-131
**Severities:** high, high, high, high, high, high, high, high
**Severities:** high, medium, high, high, high, high, high, high
**Fix types:** config, config, config, config, config, config, config, config
**References:** LLM01, LLM06

Expand Down Expand Up @@ -128,7 +128,7 @@ it with a PreToolUse hook — a wiring change, not tool code.
**Confidence 0.8:** A subagent may legitimately need shell for its job (a build
runner); the rule cannot tell a justified grant from an over-broad one, hence 0.8.

### CSDK-102 — Subagent is granted the WebSearch tool (Severity: high, Confidence: 0.8, Fix type: config)
### CSDK-102 — Subagent is granted the WebSearch tool (Severity: medium, Confidence: 0.8, Fix type: config)

**What we detect:** An `AgentDefinition` whose `tools` list contains `WebSearch`.

Expand All @@ -139,9 +139,11 @@ injected instructions in results can redirect the subagent.
"ignore previous instructions…" text becomes part of the context and steers the
next action.

**Why severity is high and not medium:** Untrusted-content intake is a primary
prompt-injection vector and there is no SDK-level filtering. Not critical: the
payload still needs a follow-on capability to do damage.
**Why severity is medium:** Granting WebSearch is routine and useful, and
untrusted-content intake only becomes harmful when paired with a follow-on
capability that can act on the injected instruction — so the grant alone is a
review signal, not a high-severity defect. It is not low because search results
are a primary prompt-injection vector with no SDK-level filtering.

**Fix type — config:** Remove `WebSearch`, or gate queries with a PreToolUse hook.

Expand Down
18 changes: 10 additions & 8 deletions docs/Policy/claude_sdk/error_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ category: claude_sdk
topic: error_handling
rules:
- id: CSDK-005
severity: medium
severity: low
confidence: 0.6
scope: tool
fix_type: code
Expand All @@ -16,7 +16,7 @@ references: [LLM05]
**Policy ID:** `claude_sdk_error_handling`
**File:** `claude_sdk/error_handling.yaml`
**Rules:** CSDK-005
**Severities:** medium
**Severities:** low
**Fix types:** code
**References:** LLM05

Expand Down Expand Up @@ -57,7 +57,7 @@ unpredictable agent behavior.

## Rule-by-rule defense

### CSDK-005 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code)
### CSDK-005 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code)

**What we detect:**
A tool body that contains a `raise` and has no `try`/`except` block
Expand All @@ -73,11 +73,13 @@ message may leak internals.
fault gives the model no "retryable" hint; the model may retry a charge that
actually went through, or give up on one that would have succeeded on retry.

**Why severity is medium and not high:**
It degrades reliability and leaks minor internals rather than directly breaching
the system; a well-behaved caller environment can absorb some of it. It is not
low because mis-handled errors in side-effecting tools cause real wrong actions
(double charges, abandoned writes).
**Why severity is low:**
A bare `raise` is frequently fine: the Claude Agent SDK, an outer wrapper, or a
`failure_error_function`-style handler often converts the exception into something
the model can act on, so this is a reliability-and-hygiene nudge rather than a
defect. It is not medium because the in-body check cannot see those out-of-body
handlers and fires on a great deal of correct code — treat it as a prompt to add
an explicit structured-error contract where one is genuinely missing.

**Fix type — code:**
Wrap the body and return a structured error — a source edit.
Expand Down
20 changes: 10 additions & 10 deletions docs/Policy/claude_sdk/path_safety.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ rules:
scope: tool
fix_type: code
- id: CSDK-012
severity: medium
severity: low
confidence: 0.5
scope: tool
fix_type: code
Expand All @@ -21,7 +21,7 @@ references: [LLM02, LLM06]
**Policy ID:** `claude_sdk_path_safety`
**File:** `claude_sdk/path_safety.yaml`
**Rules:** CSDK-004, CSDK-012
**Severities:** high, medium
**Severities:** high, low
**Fix types:** code, code
**References:** LLM02, LLM06

Expand Down Expand Up @@ -99,7 +99,7 @@ positive. Conversely, a tool that calls `.resolve()` but never checks containmen
passes the rule yet is still unsafe — a false negative the rule cannot close,
which is why containment lives in the recommendations.

### CSDK-012 — TypeScript Claude SDK tool writes to the filesystem (Severity: medium, Confidence: 0.5, Fix type: code)
### CSDK-012 — TypeScript Claude SDK tool writes to the filesystem (Severity: low, Confidence: 0.5, Fix type: code)

**What we detect:**
A TypeScript Claude SDK `tool(...)` whose handler body invokes a filesystem-write
Expand Down Expand Up @@ -131,13 +131,13 @@ A `saveNote(name, body)` tool doing `writeFileSync(name, body)` is steered into
`writeFileSync("../../.bashrc", payload)` or into overwriting a config file to
widen the agent's own permissions.

**Why severity is medium and not high:**
This is deliberately one notch below the Python sibling's high precisely because
the signal is coarse. The rule fires on *any* writeincluding writes to a
hard-coded safe path with no model influence — so a large fraction of hits are not
exploitable. Pairing a low-precision detector with a high severity would
overstate the finding; medium reflects that this is a lead to confirm, not a
near-certain defect.
**Why severity is low:**
This is the weakest detector in the file. It fires on *any* write — including
writes to a hard-coded safe path with no model influenceand has no path-flow
analysis behind it, so a large fraction of hits are not exploitable. Pairing a
low-precision detector with anything above low would overstate a lead that is
about as likely benign as not; low marks it as a prompt to confirm the path or
contents are model-influenced, not a defect.

**Fix type — code:**
Confining writes to a working directory and resolving/validating the final path is
Expand Down
12 changes: 7 additions & 5 deletions docs/Policy/google_adk/error_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ category: google_adk
topic: error_handling
rules:
- id: ADK-005
severity: medium
severity: low
confidence: 0.6
scope: tool
fix_type: code
Expand All @@ -16,7 +16,7 @@ references: [LLM05]
**Policy ID:** `google_adk_error_handling`
**File:** `google_adk/error_handling.yaml`
**Rules:** ADK-005
**Severities:** medium
**Severities:** low
**Fix types:** code
**References:** LLM05

Expand Down Expand Up @@ -49,7 +49,7 @@ contract and surfaces a raw exception instead.

## Rule-by-rule defense

### ADK-005 — Tool raises exceptions without a structured error contract (Severity: medium, Confidence: 0.6, Fix type: code)
### ADK-005 — Tool raises exceptions without a structured error contract (Severity: low, Confidence: 0.6, Fix type: code)

**What we detect:** a wrapped-function body with a `raise` and no `try`/`except`.

Expand All @@ -59,8 +59,10 @@ recovery contract, breaking ADK's return-a-dict convention.
**Real-world consequence:** a transient fault raised as `ValueError` gives the model
no "retryable" hint; it retries a completed action or abandons a recoverable one.

**Why severity is medium and not high:** reliability/minor-leak rather than a direct
breach; mishandled errors in side-effecting tools still cause real wrong actions.
**Why severity is low:** ADK's return-a-dict convention or an outer wrapper
commonly shapes the error already, so this is a reliability nudge that fires on a
lot of correct code; it stays above noise because mishandled errors in
side-effecting tools can still cause real wrong actions.

**Fix type — code:** wrap the body and return a structured error dict.

Expand Down
Loading
Loading