Feature: Tag-Based Rule Mapping for Improved AI Model Adherence#3
Closed
thschaffr wants to merge 1 commit into
Closed
Feature: Tag-Based Rule Mapping for Improved AI Model Adherence#3thschaffr wants to merge 1 commit into
thschaffr wants to merge 1 commit into
Conversation
…tag-to-rule mapping table to the generated SKILL.md.
Tags are already validated in rule frontmatter (utils.validate_tags) and
exposed on ProcessedRule/ConversionResult, but they were not propagated
to any IDE-format output or to the generated SKILL.md. This change makes
the existing tag data actually usable downstream.
Format changes (tags appended to YAML frontmatter when present):
- cursor.py: tags: [authentication, web]
- copilot.py: tags: [authentication, web]
- windsurf.py: tags: [authentication, web]
- antigravity.py: tags: [authentication, web]
- agentskills.py: expanded YAML list (inherited by opencode, codex,
openclaw, hermes, claude formats)
SKILL.md template / generator:
- Add <!-- TAG_MAPPINGS_START --> / <!-- TAG_MAPPINGS_END --> markers
mirroring the existing language-mapping block
- New update_tag_mappings() renders a "Security Context (Tag) -> Rules"
table from the per-run tag_to_rules dict; falls back silently when the
markers are absent so older templates still build
- Add a new section to the skill workflow text that calls out tag-based
selection alongside the existing language-based selection
Regenerated skills/software-security/ artifacts to match the new
pipeline (committed because they ship as the plugin payload).
No change to tag_mappings.py (avoids overlap with PR cosai-oasis#80).
No change to validation behavior; validate_unified_rules.py already
rejects unknown tags.
d113921 to
f7dbea4
Compare
Owner
Author
|
Superseded by upstream PR cosai-oasis#81. This fork PR existed only to stage the change for review before opening against cosai; the actual review/merge will happen upstream. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces tag-based rule mapping functionality to improve AI model adherence when applying security rules. The core enhancement provides explicit contextual guidance through tags, addressing challenges observed when models attempt to determine which rules to apply.
Problem Statement
Previous approaches faced several limitations:
.php,.py), causing them to overlook rules targeting extensions like.htmlor.cssutils.validate_tags,ProcessedRule.tags,ConversionResult.tags), but the data was never propagated to any IDE-format output or surfaced inSKILL.md— so agents had no way to use themBackground: what tags were originally built for
Tag support was introduced upstream in cosai PR cosai-oasis#70 (
feature/add-tags-filtering). The only consumer that PR shipped was a build-time CLI filter (--tag) used to slice the output bundle by tag — documented indocs/custom-rules.md:That filter is useful for maintainers producing custom bundles, but it runs at build time and never reaches the agent. The tag values are read by
matches_tag_filter()inconvert_to_ide_formats.pyto decide whether to include a rule in the output, and then discarded — they don't appear in any generated.mdc,.instructions.md,.md, orSKILL.mdfile.This PR adds the missing second consumer: rather than just filtering the build, tags are now surfaced into every IDE output and aggregated into a lookup table in
SKILL.md, so an AI coding agent can use the same tag taxonomy at runtime. The existing--tagCLI filter is left untouched and remains complementary.Solution
Tag-based rule mapping unlocks the tag data that already exists:
sources/rules/core/*.mdtoday) now also appear in the generated IDE outputs and the publishedskills/software-security/bundle, so agents reading those files at runtime can see them — not just the build script at conversion timeSKILL.mdtemplate now includes a generated table mapping tags to their associated rules, giving agents an explicit lookup mechanismtags:in the generated frontmatterChanges
Core Changes
src/convert_to_ide_formats.py:update_tag_mappings()to generate the tag-to-rules mapping table inSKILL.mdtag_to_rulesdictionarysources/rules/core/codeguard-SKILLS.md.template:<!-- TAG_MAPPINGS_START -->/<!-- TAG_MAPPINGS_END -->markers for dynamic injection) immediately before the existing Language-Specific Rules section, and renumbered the trailing Proactive Security section accordinglyskills/software-security/**: Regenerated artifacts (committed because they ship as the plugin payload) — 15 rule files updated to includetags:in their frontmatter,SKILL.mdupdated with the new tag mappings tableIDE Format Updates
src/formats/cursor.py): Tags added to YAML frontmatter astags: [a, b]src/formats/copilot.py): Tags added to YAML frontmatter astags: [a, b]src/formats/windsurf.py): Tags added to YAML frontmatter astags: [a, b]src/formats/antigravity.py): Tags added to YAML frontmatter astags: [a, b]src/formats/agentskills.py): Tags added as expanded YAML list — inherited automatically by OpenCode, Codex, OpenClaw, Hermes, and Claude formatsIntentionally NOT in this PR
src/tag_mappings.py→ avoids overlap/conflict with the open upstream PR feat: add Cline and Continue.dev formats, MCP search tool, CI tests, expanded tags cosai-oasis/project-codeguard#80, which is expandingKNOWN_TAGSindependentlyvalidate_unified_rules.pyalready errors on unknown tags, and auto-adding tags via regex would bypass code reviewUsage
Tags are already declared in rule frontmatter today (e.g.
codeguard-1-hardcoded-credentials.mdhastags: [secrets]). With this PR they flow through to every output:Test Plan
SKILL.mdcontains updated tag-to-rules mapping tablevalidate_unified_rules.pystill passes on all 23 rules--tagCLI filter still works unchanged (complementary, not replaced)