You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add configurable, conservative limits to the flow-file loaders (max file size, max step count, max nesting depth, max string length) so a malformed or hostile .flow.yaml/.flow.json cannot exhaust memory or CPU before validation, raising a typed FlowSerializationError when a limit is exceeded.
Why this matters
Flow files are the primary untrusted input surface: they arrive from repositories, contributor PRs validated by the GitHub Action, and generated drafts. yaml.safe_load prevents code execution, but YAML alias expansion, deeply nested mappings, and giant strings can still cause pathological resource use before Pydantic validation runs. Bounding the input defends the CI Action and any host that loads third-party flows. Pairs with the adversarial corpus (issue 15).
Current evidence
chainweaver/serialization.py uses yaml.safe_load and raises FlowSerializationError, but reads the whole file and parses it before any size/shape bound (verified by reading the loader).
.github/actions/chainweaver validates arbitrary contributor flow files in CI — a direct untrusted-input path.
Bounding input size/depth is standard defensive practice for parsers consuming untrusted documents; YAML billion-laughs-style alias expansion is a known class.
Proposed implementation
Add limits as loader parameters with conservative defaults (e.g. max bytes, max steps, max depth, max string length), overridable via chainweaver validate/check flags and the library API.
Enforce size before reading fully (or stream-bounded read); enforce step/depth/string limits during/after parse, raising FlowSerializationError with the offending limit named.
For YAML, disable or bound alias expansion if safe_load permits unbounded expansion.
Wire the GitHub Action to use the same defaults and surface a clear annotation.
AI-agent execution notes
Inspect chainweaver/serialization.py, chainweaver/cli.py (validate/check), .github/actions/chainweaver/annotate.py, chainweaver/exceptions.py. Coordinate with the adversarial-corpus issue (15) — the resource-shaped cases there are the regression net. Keep defaults conservative but not so tight that legitimate large flows break (measure against the biggest in-repo example). Pure, no network. Frame defensively; no exploit detail in docs.
Acceptance criteria
Files exceeding any limit fail fast with a typed FlowSerializationError naming the limit, in < ~2s for the corpus resource cases.
Limits are configurable via API and CLI with documented conservative defaults.
Legitimate existing example flows load unchanged.
Test plan
Negative tests for each limit (oversized file, 10k steps, deep nesting, huge string), timing assertion, regression that real examples still load, GitHub Action annotation test.
Documentation plan
docs/security.md, docs/cli.md (new flags), CHANGELOG (security note framed defensively).
Migration and compatibility notes
Not expected to require migration if defaults are above realistic flow sizes; document the defaults and how to raise them.
Risks and tradeoffs
Too-tight defaults reject legitimate large flows (mitigate by measuring and documenting overrides); limits add a few parameters to the loader surface.
Summary
Add configurable, conservative limits to the flow-file loaders (max file size, max step count, max nesting depth, max string length) so a malformed or hostile
.flow.yaml/.flow.jsoncannot exhaust memory or CPU before validation, raising a typedFlowSerializationErrorwhen a limit is exceeded.Why this matters
Flow files are the primary untrusted input surface: they arrive from repositories, contributor PRs validated by the GitHub Action, and generated drafts.
yaml.safe_loadprevents code execution, but YAML alias expansion, deeply nested mappings, and giant strings can still cause pathological resource use before Pydantic validation runs. Bounding the input defends the CI Action and any host that loads third-party flows. Pairs with the adversarial corpus (issue 15).Current evidence
chainweaver/serialization.pyusesyaml.safe_loadand raisesFlowSerializationError, but reads the whole file and parses it before any size/shape bound (verified by reading the loader)..github/actions/chainweavervalidates arbitrary contributor flow files in CI — a direct untrusted-input path.External context
Bounding input size/depth is standard defensive practice for parsers consuming untrusted documents; YAML billion-laughs-style alias expansion is a known class.
Proposed implementation
chainweaver validate/checkflags and the library API.FlowSerializationErrorwith the offending limit named.safe_loadpermits unbounded expansion.AI-agent execution notes
Inspect
chainweaver/serialization.py,chainweaver/cli.py(validate/check),.github/actions/chainweaver/annotate.py,chainweaver/exceptions.py. Coordinate with the adversarial-corpus issue (15) — the resource-shaped cases there are the regression net. Keep defaults conservative but not so tight that legitimate large flows break (measure against the biggest in-repo example). Pure, no network. Frame defensively; no exploit detail in docs.Acceptance criteria
FlowSerializationErrornaming the limit, in < ~2s for the corpus resource cases.Test plan
Negative tests for each limit (oversized file, 10k steps, deep nesting, huge string), timing assertion, regression that real examples still load, GitHub Action annotation test.
Documentation plan
docs/security.md,docs/cli.md(new flags), CHANGELOG (security note framed defensively).Migration and compatibility notes
Not expected to require migration if defaults are above realistic flow sizes; document the defaults and how to raise them.
Risks and tradeoffs
Too-tight defaults reject legitimate large flows (mitigate by measuring and documenting overrides); limits add a few parameters to the loader surface.
Suggested labels
security, reliability, testing