feat(ai_guard): scan agent instruction files for injection directives (#146) by Ju571nK · Pull Request #155 · Ju571nK/sigil

Ju571nK · 2026-06-13T07:24:11Z

Closes #146.

What

Static-scan repo-local agent instruction files for high-signal exec / prompt-injection directives. Agentic coding assistants treat these files as trusted configuration, so an attacker who lands a malicious CLAUDE.md / AGENTS.md / .cursorrules in a repo can embed directives the agent will follow (taxonomy D2.1). Detection/posture only — advisory findings, never a hook block (avoids false-positive friction on legitimate instruction files).

Scanned files → parser

File	Parser
`CLAUDE.md` (project + `~/.claude/CLAUDE.md`)	Claude
`AGENTS.md` (project)	Claude and Codex (AGENTS.md is Codex's first-class instruction file — no codex-only blind spot)
`.cursorrules`, `.cursor/rules/*`	Cursor

.github/copilot-instructions.md is deferred (no Copilot parser yet) — tracked separately.

How

New AiGuardReason::InstructionFileDirective { directive_kind, path, snippet } (sigil-core), kinds FetchPipe / Destructive / Obfuscation / OverrideMarker. Rubric weights 3.0 / 3.0 / 2.5 / 2.0 (16 → 20 kinds). Posture-only.
Shared instruction_scan module (single source, called by per-tool parsers, mirrors mcp_scan). Four line-oriented detectors:
- FetchPipe — curl/wget … | (sudo) sh, bash <(curl …), sh -c "$(curl …)", … | python. Checked before generic destructive so rm -rf /; curl x | sh classifies as the higher-signal FetchPipe.
- Destructive — reuses rubric::first_destructive_pattern (rm -rf, dd, mkfs, …) for non-fetch lines.
- Obfuscation — base64 -d / atob( / eval in execution context (bare eval/base64 prose excluded).
- OverrideMarker — tight injection-phrase set (ignore previous instructions, disregard the above, do not tell the user, override your safety, …).
- One reason per category per file (anti-flood), trailing-backslash continuation-line join, snippet ≤ 120 chars.
read_text_optional (NotFound/empty/whitespace → None, 4 MiB scan cap, defensive: a read error on one instruction file logs + skips, never aborts the whole assess).
Discovery: discover_claude_repos unions CLAUDE.md/AGENTS.md markers; new discover_codex_repos (.codex/config.toml OR AGENTS.md); new discover_cursor_repos (.cursor/mcp.json OR .cursorrules OR .cursor/rules dir). Used at both boot and reload — so an instruction-file-only repo is discovered.
Watch: synthetic WatchTargets for the instruction files, including a dedicated .cursor/rules/* glob target (a glob, not the bare directory, so the normalizer keeps child-file events; watched_paths() returns the dir for the dispatcher's starts_with routing).

Tests

~20 new tests: detector positives (fetch-pipe variants, override phrases, exec-context obfuscation), benign-file negative (no FP on a normal guide with a URL), rm; curl|sh → FetchPipe classification, one-per-category cap, continuation join, snippet cap, per-parser scanning, instruction-only-repo discovery, the .cursor/rules/* glob-matches-child / bare-dir-does-not matcher proof, rubric weight/bucket, serde round-trip. Full suite green; clippy -D warnings clean.

Out of scope (follow-ups)

Copilot .github/copilot-instructions.md (no parser).
Nested .cursor/rules/<subdir>/* (flat-only in v1 — scanner and watch both flat; tracked separately).
enforce/deny ([Epic] sigil-hook Stage 2 — in-domain enforcement (block) at the agent tool boundary #100); no rule pack (Migrate hardcoded AI-tool parsers to declarative rule packs (3b.7.3) #102 showed the DSL is type-blind for content-regex).
bare egress/URL, bare base64 blobs, prose-y markers ("secretly"/"you are now") — excluded as FP-prone.

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…orrules/.cursor/rules) (#146)

Ju571nK and others added 7 commits June 13, 2026 15:05

feat(ai_guard): add InstructionFileDirective reason (#146)

89cd062

feat(ai_guard): rubric weights for instruction directives (#146)

406ce50

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(ai_guard): read_text_optional for instruction-file scanning (#146)

98a6354

feat(ai_guard): instruction_scan content scanner (#146)

ab8642c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(ai_guard): scan CLAUDE.md/AGENTS.md in Claude+Codex parsers (#146)

f8d96c0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(ai_guard): scan .cursorrules + .cursor/rules in Cursor parser (#146

03c1f73

) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(ai_guard): discover+watch instruction files (CLAUDE/AGENTS/.curs…

ea2d650

…orrules/.cursor/rules) (#146)

Ju571nK mentioned this pull request Jun 13, 2026

Scan nested .cursor/rules/<subdir> instruction files (follow-up to #146) #156

Open

Ju571nK self-assigned this Jun 13, 2026

Ju571nK merged commit 6440fe4 into main Jun 13, 2026
5 checks passed

This was referenced Jun 16, 2026

chore(release): bump workspace version to 0.6.0 #165

Merged

docs(readme): "What it detects" capabilities banner (TrustFall + agents/features) #166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai_guard): scan agent instruction files for injection directives (#146)#155

feat(ai_guard): scan agent instruction files for injection directives (#146)#155
Ju571nK merged 7 commits into
mainfrom
feat/instruction-file-scan-146

Ju571nK commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ju571nK commented Jun 13, 2026

What

Scanned files → parser

How

Tests

Out of scope (follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant