RuleProbe

Verify whether AI coding agents actually follow the instruction files they're given.

Why

Every AI coding agent reads an instruction file. None of them prove they followed it.

You write CLAUDE.md or AGENTS.md with specific rules: camelCase variables, no any types, named exports only, test files for every source file. The agent says "Done." But did it actually follow them? Your code review catches some violations, misses others, and doesn't scale.

RuleProbe reads the same instruction file, extracts the machine-verifiable rules, and checks agent output against each one. Binary pass/fail, with file paths and line numbers as evidence. No LLM evaluation, no judgment calls. Deterministic and reproducible.

Quick Start

npm install -g ruleprobe

Or run it directly:

npx ruleprobe --help

Note: The examples below reflect the current development HEAD (53 matchers, 9 categories). The published npm v0.1.0 shipped with 15 matchers. A new release will follow.

Parse an instruction file to see what rules RuleProbe can extract. This is real output from parsing the repo's included example instruction file:

ruleprobe parse docs/example-instructions.md

Extracted 32 rules:

  forbidden-no-any-type-2
    Category: forbidden-pattern
    Verifier: ast
    Pattern:  no-any (*.ts)
    Source:    "- No any types anywhere in the codebase"

  error-no-empty-catch-6
    Category: error-handling
    Verifier: ast
    Pattern:  no-empty-catch (*.ts)
    Source:    "- No empty catch blocks; always handle or rethrow errors"

  naming-kebab-case-files-17
    Category: naming
    Verifier: filesystem
    Pattern:  kebab-case (filenames)
    Source:    "- File names: kebab-case (e.g., user-service.ts, api-handler.ts)"

  dependency-pinned-versions-34
    Category: dependency
    Verifier: filesystem
    Pattern:  pinned-dependencies (package.json)
    Source:    "- All dependencies pinned to exact versions, no ^ or ~ ranges"
  ...

Verify agent output against those rules. This is ruleprobe verifying its own source code:

ruleprobe verify docs/example-instructions.md ./src --format text

RuleProbe Adherence Report
Agent: unknown | Model: unknown | Task: manual

Rules: 32 total | 23 passed | 9 failed | Score: 72%

FAIL  error-handling/error-no-empty-catch-6
      commands/run.ts:148 - found: empty catch block
      utils/safe-path.ts:116 - found: empty catch block
      verifier/ast-verifier.ts:248 - found: empty catch block
PASS  forbidden-pattern/forbidden-no-any-type-2
PASS  structure/structure-strict-mode-1
PASS  structure/structure-named-exports-only-3
PASS  naming/naming-kebab-case-files-17
FAIL  naming/naming-camelcase-variables-18
      verifier/treesitter-loader.ts:75 - found: ParserCtor
      verifier/treesitter-loader.ts:76 - found: LanguageRef
PASS  naming/naming-pascalcase-types-20
PASS  test-requirement/test-files-exist-25
FAIL  structure/structure-no-barrel-files-24
      ast-checks/index.ts:5 - found: barrel file with 24 re-exports
      llm/index.ts:7 - found: barrel file with 9 re-exports
PASS  import-pattern/import-no-path-aliases-28
PASS  forbidden-pattern/forbidden-no-console-log-4
PASS  structure/structure-max-file-length-22
PASS  structure/structure-jsdoc-required-21
PASS  dependency/dependency-pinned-versions-34
...

By Category:
  naming:             2/4 (50%)
  forbidden-pattern:  4/4 (100%)
  structure:          4/5 (80%)
  import-pattern:     4/4 (100%)
  test-requirement:   2/2 (100%)
  error-handling:     1/2 (50%)
  type-safety:        2/4 (50%)
  code-style:         2/5 (40%)
  dependency:         2/2 (100%)

Every failure includes the file, line number, and what was found. No ambiguity.

What It Does

Parse. Reads 6 instruction file formats (CLAUDE.md, AGENTS.md, .cursorrules, copilot-instructions.md, GEMINI.md, .windsurfrules) and extracts rules that can be checked mechanically. Subjective instructions like "write clean code" are reported as unparseable so you know what was skipped.

Verify. Runs each extracted rule against a directory of agent-generated code. Checks use AST parsing via ts-morph, file system inspection, and regex pattern matching. No LLM evaluation at any stage by default; results are deterministic and identical across runs.

LLM Extract (opt-in). Pass --llm-extract to send unparseable lines through an OpenAI-compatible API for a second extraction pass. LLM-extracted rules are labeled with extractionMethod: 'llm' and confidence: 'medium', and default to warning severity. Requires OPENAI_API_KEY env var. No LLM dependency is installed by default.

Compare. Point RuleProbe at outputs from two or more agents and get a side-by-side comparison table showing which rules each one followed. Useful for evaluating agents on the same task, or tracking adherence over time.

GitHub Action. Ships as a composite action you can drop into any repo. Runs ruleprobe verify on every PR, posts results as a comment, and optionally outputs reviewdog rdjson format for inline annotations. No API keys needed beyond GITHUB_TOKEN.

Configuration

RuleProbe auto-discovers a config file in the working directory (or any parent). You can also pass --config <path> explicitly. Supported file names, in priority order:

ruleprobe.config.ts
ruleprobe.config.js
ruleprobe.config.json
.ruleproberc.json

A config file lets you add custom rules, override extracted rules, or exclude rules entirely:

// ruleprobe.config.ts
import { defineConfig } from 'ruleprobe';

export default defineConfig({
  // Add rules that the parser can't extract from your instruction file
  rules: [
    {
      id: 'custom-no-lodash',
      category: 'import-pattern',
      description: 'Ban lodash imports',
      verifier: 'regex',
      pattern: { type: 'banned-import', target: '*.ts', expected: 'lodash', scope: 'file' },
    },
  ],

  // Change severity or expected values on extracted rules
  overrides: [
    { ruleId: 'naming-camelcase', severity: 'warning' },
    { ruleId: 'structure-max-file-length', expected: '500' },
  ],

  // Remove rules you don't want checked
  exclude: ['forbidden-no-console-log'],
});

defineConfig() is a no-op passthrough that provides type checking in TypeScript configs. JSON configs work without it.

Custom rules use the same verifier types (ast, regex, filesystem) and pattern types as extracted rules. Any pattern type listed in the Supported Rule Types table works as a custom rule pattern.

CLI Reference

`ruleprobe parse <instruction-file>`

Extract rules from an instruction file.

ruleprobe parse CLAUDE.md --format json
ruleprobe parse AGENTS.md --show-unparseable
ruleprobe parse AGENTS.md --llm-extract --show-unparseable

--format json|text controls output format. --show-unparseable includes lines that couldn't be converted to rules. --llm-extract sends unparseable lines to an OpenAI-compatible API for additional extraction (requires OPENAI_API_KEY).

`ruleprobe verify <instruction-file> <output-dir>`

Check agent output against extracted rules.

ruleprobe verify CLAUDE.md ./output --format text
ruleprobe verify AGENTS.md ./output --agent claude --model opus-4 --format json --output report.json
ruleprobe verify AGENTS.md ./output --format markdown --severity error
ruleprobe verify AGENTS.md ./output --format rdjson
ruleprobe verify AGENTS.md ./output --config ruleprobe.config.ts
ruleprobe verify AGENTS.md ./output --llm-extract
ruleprobe verify AGENTS.md ./output --rubric-decompose
ruleprobe verify AGENTS.md ./output --project tsconfig.json

--agent and --model tag the report metadata. --severity error|warning|all filters results. --output writes to a file instead of stdout. --format rdjson produces reviewdog-compatible diagnostics. --config loads a specific config file (otherwise auto-discovered). --llm-extract runs unparseable lines through an LLM for additional rule extraction. --rubric-decompose uses an LLM to break subjective instructions into weighted concrete checks (tagged with extractionMethod: 'rubric' and confidence: 'low'). Both --llm-extract and --rubric-decompose require OPENAI_API_KEY. --project enables type-aware AST checks (implicit any, unused exports, unresolved imports) using the specified tsconfig.json.

Exit codes: 0 all rules passed, 1 violations found, 2 execution error.

`ruleprobe compare <instruction-file> <dirs...>`

Compare multiple agent outputs against the same rules.

ruleprobe compare AGENTS.md ./claude-output ./copilot-output --agents claude,copilot --format markdown

`ruleprobe tasks` / `ruleprobe task <id>`

List available task templates or output a specific task prompt. Three templates ship with v0.1.0: rest-endpoint, utility-module, react-component.

ruleprobe tasks
ruleprobe task rest-endpoint

`ruleprobe run <instruction-file>`

Invoke an AI agent on a task template, verify the output, and print the report in one step. Requires @anthropic-ai/claude-agent-sdk and ANTHROPIC_API_KEY for SDK mode. Alternatively, use --watch to point at a directory where you (or another agent) will write output manually.

# SDK mode: invoke Claude, verify, report
ruleprobe run CLAUDE.md --task rest-endpoint --agent claude-code --model sonnet --format text

# Watch mode: wait for output in a directory, then verify
ruleprobe run CLAUDE.md --watch ./agent-output --timeout 300 --format json

Options: --task, --agent, --model, --format, --output-dir, --watch, --timeout, --allow-symlinks, --config.

GitHub Action

Drop this into .github/workflows/ruleprobe.yml:

name: RuleProbe
on: [pull_request]
jobs:
  check-rules:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: moonrunnerkc/ruleprobe@v1
        with:
          instruction-file: AGENTS.md
          output-dir: src
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

That's it. No API keys, no LLM calls, deterministic results, runs in seconds.

Note: @v1 tracks the latest v1.x release. Pin to a specific tag (e.g., @v1.0.0) for reproducible builds.

Full options

- uses: moonrunnerkc/ruleprobe@v1
  with:
    instruction-file: AGENTS.md
    output-dir: src
    agent: ci
    model: unknown
    format: text
    severity: all
    fail-on-violation: "true"
    post-comment: "true"
    reviewdog-format: "false"

Input	Default	Description
`instruction-file`	(required)	Path to instruction file
`output-dir`	`src`	Directory containing code to verify
`agent`	`ci`	Agent identifier for report metadata
`model`	`unknown`	Model identifier for report metadata
`format`	`text`	Report format: text, json, or markdown
`severity`	`all`	Filter: error, warning, or all
`fail-on-violation`	`true`	Fail the check on any violation
`post-comment`	`true`	Post results as a PR comment
`reviewdog-format`	`false`	Also output rdjson for reviewdog

Outputs: score, passed, failed, total (available to downstream steps).

Programmatic API

Five functions cover the full pipeline:

Function	Purpose
`parseInstructionFile(path)`	Parse an instruction file into a `RuleSet`
`verifyOutput(ruleSet, dir)`	Run rules against a code directory
`generateReport(run, ruleSet, results)`	Build an `AdherenceReport` with summary stats
`formatReport(report, format)`	Render as text, JSON, markdown, or rdjson
`extractRules(markdown, fileType)`	Extract rules from raw markdown content
`defineConfig(config)`	Type-safe config helper for ruleprobe.config.ts
`loadConfig(path?, searchDir?)`	Load and validate a config file
`applyConfig(ruleSet, config)`	Merge custom rules, overrides, and exclusions into a RuleSet
`extractWithLlm(ruleSet, options)`	Run LLM extraction on unparseable lines
`createOpenAiProvider(config?)`	Create an OpenAI-compatible LLM provider

import { parseInstructionFile, verifyOutput, generateReport, formatReport } from 'ruleprobe';

const ruleSet = parseInstructionFile('CLAUDE.md');
const results = verifyOutput(ruleSet, './agent-output');
const report = generateReport(
  { agent: 'claude-code', model: 'opus-4', taskTemplateId: 'rest-endpoint',
    outputDir: './agent-output', timestamp: new Date().toISOString(), durationSeconds: null },
  ruleSet,
  results,
);
console.log(formatReport(report, 'text'));

LLM-assisted extraction (opt-in):

import { parseInstructionFile, extractWithLlm, createOpenAiProvider } from 'ruleprobe';

const ruleSet = parseInstructionFile('CLAUDE.md');
const provider = createOpenAiProvider({ model: 'gpt-4o-mini' });
const enhanced = await extractWithLlm(ruleSet, { provider });
// enhanced.rules now includes LLM-extracted rules with extractionMethod: 'llm'

How It Works

flowchart LR
    A[Instruction File] --> B[Rule Parser]
    B --> C[RuleSet]
    D[Agent Output] --> E[Verifier]
    C --> E
    E --> F[Adherence Report]

The parser reads your instruction file and identifies lines that map to deterministic checks (naming conventions, forbidden patterns, structural requirements). Each rule gets a category, a verifier type, and a pattern. The verifier walks the agent's output directory, runs AST checks via ts-morph for code structure rules, file system checks for naming and test file requirements, and regex checks for line length and content patterns. The report collects pass/fail results with evidence for every rule.

Supported Rule Types

53 built-in matchers across 9 categories:

Category	Count	Verifier(s)
naming	7	AST, Filesystem, Tree-sitter
forbidden-pattern	5	AST, Regex
structure	9	AST, Filesystem
test-requirement	5	AST, Filesystem, Regex
import-pattern	6	AST, Regex
error-handling	2	AST
type-safety	5	AST, Regex
code-style	10	AST, Regex, Tree-sitter
dependency	1	Filesystem

Full table with example instructions and check details: docs/matchers.md

Authentication

Most of RuleProbe works offline with no API keys. Two opt-in features use external APIs:

Feature	Flag(s)	Required env var	When you need it
LLM rule extraction	`--llm-extract`	`OPENAI_API_KEY`	Extracting rules from unparseable instruction lines
Rubric decomposition	`--rubric-decompose`	`OPENAI_API_KEY`	Breaking subjective rules into concrete checks
Agent invocation (SDK mode)	`ruleprobe run --agent claude-code`	`ANTHROPIC_API_KEY`	Invoking Claude to generate code, then verifying
GitHub Action	`uses: moonrunnerkc/ruleprobe@v1`	`GITHUB_TOKEN`	CI, PR comments

parse, verify, compare, tasks, and task work entirely offline. No key needed.

Tree-sitter Support

Python and Go get naming and function-length checks via tree-sitter WASM grammars. The grammar packages (tree-sitter-python, tree-sitter-go, web-tree-sitter) ship as regular dependencies; no extra install step is required. WASM binaries are loaded at runtime from the installed packages. If loading fails (unsupported platform, missing native build), tree-sitter checks are skipped and other verifiers still run.

Security

RuleProbe never executes scanned code, never makes network calls (unless you opt in with --llm-extract, --rubric-decompose, or ruleprobe run), and never modifies files in the scanned directory. User-supplied paths are resolved and bounded to the working directory; symlinks outside the project are skipped unless you pass --allow-symlinks. All dependencies are pinned to exact versions. See SECURITY.md for the full model.

Limitations

What v0.1.0 doesn't do, stated plainly.

TypeScript gets the deepest coverage. ts-morph gives full AST analysis for TypeScript and JavaScript: naming, forbidden patterns, structure, imports, type-safety, and code-style checks. Python and Go get naming and function-length checks via tree-sitter WASM grammars (grammar packages ship as regular dependencies; see the Tree-sitter Support section). Everything else falls back to regex (line length, comments, semicolons). No Rust, Java, or C# AST support yet.
Subjective rules stay subjective. "Write clean code" has no deterministic check. The --rubric-decompose flag on the verify command uses an LLM to break subjective instructions into weighted concrete checks (max function length, no magic numbers, etc.), tagged with extractionMethod: 'rubric' and confidence: 'low'. This is a proxy, not a direct evaluation. Lines with no measurable proxy stay in the unparseable array. Requires OPENAI_API_KEY.
Agent invocation covers Claude SDK and watch mode only. The run command invokes agents via the Claude Agent SDK (requires ANTHROPIC_API_KEY) or watches a directory for output. Copilot, Cursor, and other agent SDKs are not integrated; use --watch mode for those.
Type-aware checks require --project. Three checks (implicit any, unused exports, unresolved imports) need the TypeChecker, which requires a tsconfig.json. Without --project, ts-morph parses files in isolation and these checks are skipped.
53 matchers, not infinite. The parser skips lines it can't confidently map to a check. Use --show-unparseable to see what was missed, and --llm-extract or --rubric-decompose to handle the remainder.

Case Study

See docs/case-study-v0.1.0.md for a comparison of two agents on the rest-endpoint task template against 10 rules.

Contributing

git clone https://github.com/moonrunnerkc/ruleprobe.git
cd ruleprobe && npm install
npm test

Issues and pull requests welcome at github.com/moonrunnerkc/ruleprobe.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
.vscode		.vscode
action-scripts		action-scripts
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RuleProbe

Why

Quick Start

What It Does

Configuration

CLI Reference

`ruleprobe parse <instruction-file>`

`ruleprobe verify <instruction-file> <output-dir>`

`ruleprobe compare <instruction-file> <dirs...>`

`ruleprobe tasks` / `ruleprobe task <id>`

`ruleprobe run <instruction-file>`

GitHub Action

Programmatic API

How It Works

Supported Rule Types

Authentication

Tree-sitter Support

Security

Limitations

Case Study

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

RuleProbe

Why

Quick Start

What It Does

Configuration

CLI Reference

ruleprobe parse <instruction-file>

ruleprobe verify <instruction-file> <output-dir>

ruleprobe compare <instruction-file> <dirs...>

ruleprobe tasks / ruleprobe task <id>

ruleprobe run <instruction-file>

GitHub Action

Programmatic API

How It Works

Supported Rule Types

Authentication

Tree-sitter Support

Security

Limitations

Case Study

Contributing

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 1

Languages

`ruleprobe parse <instruction-file>`

`ruleprobe verify <instruction-file> <output-dir>`

`ruleprobe compare <instruction-file> <dirs...>`

`ruleprobe tasks` / `ruleprobe task <id>`

`ruleprobe run <instruction-file>`

Packages