Skip to content

fix: eliminate cubic candidate-set merge blowup (scan DoS, wardline-c797baf28b)#58

Closed
tachyon-beep wants to merge 35 commits into
mainfrom
fix/candidate-set-merge-dos
Closed

fix: eliminate cubic candidate-set merge blowup (scan DoS, wardline-c797baf28b)#58
tachyon-beep wants to merge 35 commits into
mainfrom
fix/candidate-set-merge-dos

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Summary

Fixes wardline-c797baf28b (Codex security finding, DoS): an attacker-authored .py file with a chain of one-armed branches rebinding the same name to a fresh lambda drove the Level-2 branch-join merge to O(N³) — ~1100 branches = ~15s, exhausting CPU on every local and CI scan (this is the only finding that hit the default gate, no opt-in).

Root cause

Both branch-join merges in src/wardline/scanner/taint/variable_level.py deduplicated candidates with a nested linear scan of the growing candidate list:

  • _merge_branch_bindings: any(lam is seen for seen in bucket) (lambda bodies)
  • _merge_branch_types: if fqn not in bucket (receiver-class FQNs)

That is O(bucket) per insert → O(bucket²) per merge. Across N one-armed branches the candidate set grows to N over N merges → O(N³).

Fix

Both merges now dedup via a per-name identity/equality set → O(1) per insert, O(bucket) per merge, O(N²) cumulative. The change is behavior-identical — same candidate set, same first-seen insertion order, same dedup semantics — so no false negative is introduced. A cap was deliberately avoided: dropping candidates would be an FN, against this module's no-FN discipline (mirrors the d7ecb908 precedent of eliminating complexity soundly rather than capping).

Every branch join (if/while/for/match/try) routes through these two functions — all paths fixed.

Measured

shape before after
1100-branch lambda PoC 4.388s 0.080s (55×)
1100-branch var-type sibling 1.015s 0.037s (27×)

Tests

  • test_lambda_candidate_merge_is_not_cubic_on_chained_rebinds — DoS regression guard (hardware-independent scaling-ratio: ~8× quadratic vs ~23× cubic from N=700→2000, plus a loose absolute backstop).
  • test_var_type_candidate_merge_is_not_cubic_on_chained_rebinds — sibling guard.
  • test_chained_one_armed_rebinds_keep_every_lambda_candidate — soundness lock: a sink-lambda in the first of 40 one-armed branches still fires post-branch (no cap, no coalescing of distinct lambdas).

Full scanner suite: 1363 passed. ruff + mypy clean. wardline self-gate exit 0.

Review

4-reviewer panel (static-analysis soundness, Python correctness, security/threat, test-suite). All APPROVE / APPROVE-WITH-NITS. The static-analysis reviewer differentially verified behavior-equivalence over 40k randomized trials (no FN, no FP, no output drift). Review nits folded in: scaling-ratio assertion, corrected no-FN-lock comment, module-level import time.

Follow-up (not in this PR)

The threat review found adjacent, distinct super-linear DoS vectors in the same L2 walk that this fix does not address (separate root causes): residual O(N²) at larger N, a lambda-free O(S×M) per-statement-snapshot vector, a loop-fixpoint×snapshot ~O(M³) path, and the absence of any coarse per-function/scan budget. Tracked in wardline-3980f94dac (lead remedy: a loud-degrade budget, designed against the no-FN discipline).

🤖 Generated with Claude Code

John Morrissey and others added 30 commits June 21, 2026 16:58
The waiver_add entity_symbol path resolves a qualname through Loomweave
(SeiResolver.detect + resolve_locator) before writing the waiver — an
outbound/loopback network side effect. But the tool declared only
READ|WRITE and _effective_tool_capabilities had no waiver_add branch, so
ToolPolicy never denied it under allow_network=false: any client allowed
the write tools could trigger signed Loomweave requests despite the
network being fenced off.

Add a waiver_add branch that declares NETWORK under the exact predicate
that fires the resolve — entity_symbol present, entity_id absent (entity_id
wins and is carried opaque), and a Loomweave URL configured — mirroring the
scan/explain_taint/dossier gates. Also align the handler so it only builds
the Loomweave client when entity_symbol and not entity_id, matching the
declared side effect. resolve_entity_binding_input is the sole network path
in _waiver_add (add_waiver is a local FS write); the gate is a precise,
fail-closed match.

Regression tests: entity_symbol under no-network policy is denied before the
handler runs; entity_id-only and entity_id-wins-over-symbol stay ungated.

Closes wardline-14359d070b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Level-2 branch-join merges deduplicated candidates with a nested
linear scan of the growing candidate list: `any(lam is seen for seen in
bucket)` in _merge_branch_bindings and `if fqn not in bucket` in
_merge_branch_types. That is O(bucket) per insert, O(bucket**2) per
merge. Across a chain of N one-armed branches rebinding the same name
(`if flagK: cb = lambda c: sinkK(c)`), the candidate set grows to N over
N merges -> O(N**3). An attacker-authored file with ~1100 such branches
drove a DEFAULT-gate scan to ~15s, exhausting CPU on every local and CI
run (wardline-c797baf28b).

Both merges now dedup via a per-name identity/equality set: O(1) per
insert, O(bucket) per merge, O(N**2) cumulative. The change is
behavior-identical -- same candidate set, same first-seen insertion
order, same dedup semantics -- so no false negative is introduced. A cap
was deliberately avoided: dropping candidates would be an FN, against
this module's no-FN discipline; this mirrors the d7ecb90 precedent of
eliminating the complexity soundly rather than capping.

The 1100-branch PoC drops 4.388s -> 0.080s (55x); the var_types sibling
1.015s -> 0.037s (27x). Full scanner suite (1363 tests) stays green.

Tests:
- test_lambda_candidate_merge_is_not_cubic_on_chained_rebinds (DoS guard)
- test_var_type_candidate_merge_is_not_cubic_on_chained_rebinds (sibling)
- test_chained_one_armed_rebinds_keep_every_lambda_candidate (no-FN lock)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tachyon-beep tachyon-beep deleted the fix/candidate-set-merge-dos branch June 24, 2026 05:54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5924d7f6b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# to — and the one whose auth the filigree-auth check should probe. Without fix,
# repair is a no-op and this is just the recorded emit target.
probe_url = _resolve_probe_url(root, filigree_url)
probe_target = _resolve_probe_target(root, filigree_url)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve published-port provenance before auth probing

When MCP doctor passes an already-resolved Filigree URL, this call treats it as a caller flag and _resolve_probe_target() sets token_probe_allowed=True, even if filigree_url_source says it came from the repo-controlled .weft/filigree/ephemeral.port. In that MCP launch-with-auto-discovery scenario the guard in _check_filigree_auth() is bypassed and the Filigree bearer can be probed against an arbitrary loopback port named by the checkout; carry the source through here or construct a _ProbeTarget that keeps published-port targets non-probed.

Useful? React with 👍 / 👎.

# ``scan`` advertises the conservative possible-effects superset in
# tools/list, but hardened runtime policy should still allow a purely
# local scan when no integration URL resolves.
capabilities = {ToolCapability.READ} if tool.name == "scan" else set(tool.capabilities)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require WRITE for cache-backed MCP scans

When a read-only MCP server handles scan with no Filigree/Loomweave URL but with cache_dir, this downgrade leaves the effective capability as READ, so the policy allows the call even though _scan() passes that cache dir into run_scan() and SummaryCache.load()/save() creates the directory and, when WARDLINE_SUMMARY_CACHE_KEY is set, writes cache JSON files. Include cache_dir in the scan write-capability check so read-only mode cannot mutate the checkout.

Useful? React with 👍 / 👎.

tree.root_node.children, file, attr_dir=file_dir, nest_base=stem_base, prefix=(), out=file_mounts
)
mounts.extend(file_mounts)
except Exception as exc: # noqa: BLE001 - hostile source must not crash the scan

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let missing Rust parser fail before per-file downgrade

When the wardline[rust] tree-sitter extra is not installed, parse_rust() raises RustToolingError, but this broad overlay catch now consumes it whenever _build_overlays() supplies error_callback for normal in-src crate files. The scan then returns WLN-ENGINE-FILE-FAILED plus coverage instead of the install-hint tool error, so wardline scan --lang rust can look like a completed scan (and may exit 0 without a gate threshold); re-raise RustToolingError/setup failures before downgrading hostile source errors.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant