Feature: Tag-Based Rule Mapping for Improved AI Model Adherence by thschaffr · Pull Request #3 · thschaffr/project-codeguard

thschaffr · 2026-05-26T09:12:36Z

Summary

This PR introduces tag-based rule mapping functionality to improve AI model adherence when applying security rules. The core enhancement provides explicit contextual guidance through tags, addressing challenges observed when models attempt to determine which rules to apply.

Staging note: this PR currently targets thschaffr/project-codeguard:main (a 1:1 mirror of cosai-oasis/project-codeguard:main) so it can be reviewed in the fork before opening the corresponding PR upstream against cosai.

Problem Statement

Previous approaches faced several limitations:

System prompts alone were insufficient: Models could not reliably determine which rules to apply based solely on initial instructions
Language-only filtering was too restrictive: Relying only on programming language to filter rules missed cross-cutting security concerns
Extension bias in AI models: Models showed bias toward file extensions they considered more "dangerous" (e.g., .php, .py), causing them to overlook rules targeting extensions like .html or .css
Tags existed but were invisible to agents: cosai already validates and stores tags from rule frontmatter (utils.validate_tags, ProcessedRule.tags, ConversionResult.tags), but the data was never propagated to any IDE-format output or surfaced in SKILL.md — so agents had no way to use them

Background: what tags were originally built for

Tag support was introduced upstream in cosai PR cosai-oasis#70 (feature/add-tags-filtering). The only consumer that PR shipped was a build-time CLI filter (--tag) used to slice the output bundle by tag — documented in docs/custom-rules.md:

# Build only rules tagged with data-security
uv run python src/convert_to_ide_formats.py --tag data-security

# AND logic across multiple tags
uv run python src/convert_to_ide_formats.py --tag api,web-security

That filter is useful for maintainers producing custom bundles, but it runs at build time and never reaches the agent. The tag values are read by matches_tag_filter() in convert_to_ide_formats.py to decide whether to include a rule in the output, and then discarded — they don't appear in any generated .mdc, .instructions.md, .md, or SKILL.md file.

This PR adds the missing second consumer: rather than just filtering the build, tags are now surfaced into every IDE output and aggregated into a lookup table in SKILL.md, so an AI coding agent can use the same tag taxonomy at runtime. The existing --tag CLI filter is left untouched and remains complementary.

Solution

Tag-based rule mapping unlocks the tag data that already exists:

Tags propagated to agent-facing files: rule frontmatter tags (which already exist in sources/rules/core/*.md today) now also appear in the generated IDE outputs and the published skills/software-security/ bundle, so agents reading those files at runtime can see them — not just the build script at conversion time
Cross-language rule grouping: Tags enable grouping rules across various languages and extensions, overcoming the file-extension bias
Dynamic tag-to-rules mapping: The SKILL.md template now includes a generated table mapping tags to their associated rules, giving agents an explicit lookup mechanism
Tags surfaced in every IDE output: Cursor, Copilot, Windsurf, Antigravity, and all Agent Skills–based formats now include tags: in the generated frontmatter

Changes

Core Changes

src/convert_to_ide_formats.py:
- Added update_tag_mappings() to generate the tag-to-rules mapping table in SKILL.md
- Tags collected per run via a new tag_to_rules dictionary
- Tags are now propagated to all IDE format outputs
sources/rules/core/codeguard-SKILLS.md.template:
- Inserted a new Tag-Based Rules section (with  /  markers for dynamic injection) immediately before the existing Language-Specific Rules section, and renumbered the trailing Proactive Security section accordingly
- Replaced the generic "What security domains are involved? → Load relevant rule files" item in the Initial Security Check with a tag-specific check: "What security tags apply? → Load all rules with matching tags (e.g., 'authentication', 'web', 'secrets')"
skills/software-security/**: Regenerated artifacts (committed because they ship as the plugin payload) — 15 rule files updated to include tags: in their frontmatter, SKILL.md updated with the new tag mappings table

IDE Format Updates

Cursor (src/formats/cursor.py): Tags added to YAML frontmatter as tags: [a, b]
Copilot (src/formats/copilot.py): Tags added to YAML frontmatter as tags: [a, b]
Windsurf (src/formats/windsurf.py): Tags added to YAML frontmatter as tags: [a, b]
Antigravity (src/formats/antigravity.py): Tags added to YAML frontmatter as tags: [a, b]
Agent Skills (src/formats/agentskills.py): Tags added as expanded YAML list — inherited automatically by OpenCode, Codex, OpenClaw, Hermes, and Claude formats

Intentionally NOT in this PR

No changes to src/tag_mappings.py → avoids overlap/conflict with the open upstream PR feat: add Cline and Continue.dev formats, MCP search tool, CI tests, expanded tags cosai-oasis/project-codeguard#80, which is expanding KNOWN_TAGS independently
No interactive "auto-add unknown tag" helper → validate_unified_rules.py already errors on unknown tags, and auto-adding tags via regex would bypass code review

Usage

Tags are already declared in rule frontmatter today (e.g. codeguard-1-hardcoded-credentials.md has tags: [secrets]). With this PR they flow through to every output:

# Input rule (unchanged)
---
description: Authentication and MFA best practices
languages: [c, go, java, javascript, python]
tags:
  - authentication
  - web
alwaysApply: false
---

# Generated Cursor output (.cursor/rules/...mdc)
---
description: Authentication and MFA best practices
globs: **/*.c,**/*.go,**/*.java,...
version: 1.3.1
tags: [authentication, web]
---

# Generated section in skills/software-security/SKILL.md
| Security Context (Tag) | Rule Files to Apply |
|------------------------|---------------------|
| authentication | codeguard-0-authentication-mfa.md, codeguard-0-session-management-and-cookies.md |
| secrets        | codeguard-0-additional-cryptography.md, codeguard-1-digital-certificates.md, codeguard-1-hardcoded-credentials.md |
| web            | codeguard-0-api-web-services.md, codeguard-0-authentication-mfa.md, codeguard-0-client-side-web-security.md, ... |

Test Plan

Run conversion script with rules containing various tags
Verify tags appear in generated IDE-specific rule files (Cursor, Copilot, Windsurf, Antigravity, Agent Skills, OpenCode, Codex, OpenClaw, Hermes, Claude)
Verify SKILL.md contains updated tag-to-rules mapping table
Verify validate_unified_rules.py still passes on all 23 rules
Verify falls back silently when older templates lack the new markers
Verify the existing --tag CLI filter still works unchanged (complementary, not replaced)

…tag-to-rule mapping table to the generated SKILL.md. Tags are already validated in rule frontmatter (utils.validate_tags) and exposed on ProcessedRule/ConversionResult, but they were not propagated to any IDE-format output or to the generated SKILL.md. This change makes the existing tag data actually usable downstream. Format changes (tags appended to YAML frontmatter when present): - cursor.py: tags: [authentication, web] - copilot.py: tags: [authentication, web] - windsurf.py: tags: [authentication, web] - antigravity.py: tags: [authentication, web] - agentskills.py: expanded YAML list (inherited by opencode, codex, openclaw, hermes, claude formats) SKILL.md template / generator: - Add  /  markers mirroring the existing language-mapping block - New update_tag_mappings() renders a "Security Context (Tag) -> Rules" table from the per-run tag_to_rules dict; falls back silently when the markers are absent so older templates still build - Add a new section to the skill workflow text that calls out tag-based selection alongside the existing language-based selection Regenerated skills/software-security/ artifacts to match the new pipeline (committed because they ship as the plugin payload). No change to tag_mappings.py (avoids overlap with PR cosai-oasis#80). No change to validation behavior; validate_unified_rules.py already rejects unknown tags.

thschaffr · 2026-05-26T10:16:17Z

Superseded by upstream PR cosai-oasis#81. This fork PR existed only to stage the change for review before opening against cosai; the actual review/merge will happen upstream.

thschaffr changed the title ~~feat: surface rule tags in IDE outputs and SKILL.md mapping table~~ Feature: Tag-Based Rule Mapping for Improved AI Model Adherence May 26, 2026

thschaffr force-pushed the feature/tag-based-rule-mapping-cosai branch from d113921 to f7dbea4 Compare May 26, 2026 10:15

thschaffr closed this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Tag-Based Rule Mapping for Improved AI Model Adherence#3

Feature: Tag-Based Rule Mapping for Improved AI Model Adherence#3
thschaffr wants to merge 1 commit into
mainfrom
feature/tag-based-rule-mapping-cosai

thschaffr commented May 26, 2026 •

edited

Loading

Uh oh!

thschaffr commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thschaffr commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem Statement

Background: what tags were originally built for

Solution

Changes

Core Changes

IDE Format Updates

Intentionally NOT in this PR

Usage

Test Plan

Uh oh!

thschaffr commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thschaffr commented May 26, 2026 •

edited

Loading