Skip to content

feat(ai_guard): scan agent instruction files for injection directives (#146)#155

Merged
Ju571nK merged 7 commits into
mainfrom
feat/instruction-file-scan-146
Jun 13, 2026
Merged

feat(ai_guard): scan agent instruction files for injection directives (#146)#155
Ju571nK merged 7 commits into
mainfrom
feat/instruction-file-scan-146

Conversation

@Ju571nK

@Ju571nK Ju571nK commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Closes #146.

What

Static-scan repo-local agent instruction files for high-signal exec / prompt-injection directives. Agentic coding assistants treat these files as trusted configuration, so an attacker who lands a malicious CLAUDE.md / AGENTS.md / .cursorrules in a repo can embed directives the agent will follow (taxonomy D2.1). Detection/posture only — advisory findings, never a hook block (avoids false-positive friction on legitimate instruction files).

Scanned files → parser

File Parser
CLAUDE.md (project + ~/.claude/CLAUDE.md) Claude
AGENTS.md (project) Claude and Codex (AGENTS.md is Codex's first-class instruction file — no codex-only blind spot)
.cursorrules, .cursor/rules/* Cursor

.github/copilot-instructions.md is deferred (no Copilot parser yet) — tracked separately.

How

  • New AiGuardReason::InstructionFileDirective { directive_kind, path, snippet } (sigil-core), kinds FetchPipe / Destructive / Obfuscation / OverrideMarker. Rubric weights 3.0 / 3.0 / 2.5 / 2.0 (16 → 20 kinds). Posture-only.
  • Shared instruction_scan module (single source, called by per-tool parsers, mirrors mcp_scan). Four line-oriented detectors:
    • FetchPipecurl/wget … | (sudo) sh, bash <(curl …), sh -c "$(curl …)", … | python. Checked before generic destructive so rm -rf /; curl x | sh classifies as the higher-signal FetchPipe.
    • Destructive — reuses rubric::first_destructive_pattern (rm -rf, dd, mkfs, …) for non-fetch lines.
    • Obfuscationbase64 -d / atob( / eval in execution context (bare eval/base64 prose excluded).
    • OverrideMarker — tight injection-phrase set (ignore previous instructions, disregard the above, do not tell the user, override your safety, …).
    • One reason per category per file (anti-flood), trailing-backslash continuation-line join, snippet ≤ 120 chars.
  • read_text_optional (NotFound/empty/whitespace → None, 4 MiB scan cap, defensive: a read error on one instruction file logs + skips, never aborts the whole assess).
  • Discovery: discover_claude_repos unions CLAUDE.md/AGENTS.md markers; new discover_codex_repos (.codex/config.toml OR AGENTS.md); new discover_cursor_repos (.cursor/mcp.json OR .cursorrules OR .cursor/rules dir). Used at both boot and reload — so an instruction-file-only repo is discovered.
  • Watch: synthetic WatchTargets for the instruction files, including a dedicated .cursor/rules/* glob target (a glob, not the bare directory, so the normalizer keeps child-file events; watched_paths() returns the dir for the dispatcher's starts_with routing).

Tests

~20 new tests: detector positives (fetch-pipe variants, override phrases, exec-context obfuscation), benign-file negative (no FP on a normal guide with a URL), rm; curl|sh → FetchPipe classification, one-per-category cap, continuation join, snippet cap, per-parser scanning, instruction-only-repo discovery, the .cursor/rules/* glob-matches-child / bare-dir-does-not matcher proof, rubric weight/bucket, serde round-trip. Full suite green; clippy -D warnings clean.

Out of scope (follow-ups)

🤖 Generated with Claude Code

@Ju571nK Ju571nK self-assigned this Jun 13, 2026
@Ju571nK Ju571nK merged commit 6440fe4 into main Jun 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scan agent instruction files (.cursorrules / copilot-instructions / AGENTS.md / CLAUDE.md) for exec+egress directives

1 participant