feat(ai_guard): detect project-scope MCP auto-execution (TrustFall) (#145)#153
Merged
Conversation
This was referenced Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #145.
What
Detect the TrustFall class — a repository that commits a project MCP payload (
<repo>/.mcp.json) together with settings keys that auto-launch it on folder-trust (a 1-click RCE on opening the repo). Detection/posture only — no enforce/deny (that is #100).How
AiGuardReason::ProjectMcpAutoEnabled { mechanism }(sigil-core), rubric weight 2.5 (project_mcp_auto_enabled). Solo → Medium; with a suspicious launcher → High; with destructive → Critical.<repo>/.mcp.json(mcpServers) and routes each server through the existingemit_one_serverattack-shape scorer (ai-guard: score MCP stdio launchers by attack shape (shell / transient-path) above the benign baseline #127), deduplicated by name with settingsmcpServers(settings precedence — a name in both scores once).enableAllProjectMcpServers == trueand a non-emptyenabledMcpjsonServers. The project parser emits only when a key is present and.mcp.jsonactually ships servers.permissions.allow:["mcp__*"]is intentionally not a trigger (tool-call permission ≠ server pre-approval).enableAllProjectMcpServers: trueemits on the key alone (blanket pre-approval across every repo).ProjectMcpAutoEnabled{mechanism:"folder-trust autorun (default)"}only when the project config produced a local/risky MCP reason — benign remote-only configs do not amplify (no alert fatigue).ws:///wss://now classified as remote inemit_one_server..mcp.json(no.claude/settings.json) is now discovered (newdiscover_claude_reposunion helper, used at both boot and reload) and watched (.mcp.jsonadded to the syntheticWatchTargetand toClaudeCodeProjectParser::watched_paths()).Hardening (post-review)
.mcp.jsonread: a malformed.mcp.jsonno longer aborts the whole assess and blind detection of a malicious.claude/settings.jsonin the same repo (corrupt-sidecar evasion seam). Degrades to "no payload"; settings-side reasons still score. Regression test added.Tests
~16 new tests: payload scoring, the two auto-enable keys, the
mcp__*negative guard, name dedup, user-global blanket, Option B local-vs-remote for Cursor/Gemini, ws remote,.mcp.json-only discovery, watched-paths, corrupt-sidecar resilience, rubric weight/bucket math. Full suite green;clippy -D warningsclean.Out of scope (follow-ups)
permissions.allow:["mcp__*"]auto-approval signal + cross-scope permissions merge.~/.claude.jsonapproval-state scan.🤖 Generated with Claude Code