Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ jobs:
bash memorylint/tests/test-memorylint-regressions.sh
bash memorylint/tests/test-fixture-validation.sh
bash memorylint/tests/test-fixture-scanner.sh
bash memorylint/tests/test-workspace-audit.sh
bash memorylint/tests/test-apply-workflow.sh
bash memorylint/tests/test-load-agents-proof.sh

powershell-bridge-tests:
name: powershell-bridge-tests
Expand Down
23 changes: 23 additions & 0 deletions memorylint/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Real workspace-level audit/apply/load-agents helper scripts:
`scripts/audit_workspace.py`, `scripts/apply_report.py`,
`scripts/load_agents_state.py`.
- Canonical ownership / precedence matrix for architecture, domain,
infrastructure, workflow, tooling, and personal preference rules.
- Constitution manual handoff artifact contract for boundary findings that
target `.specify/memory/constitution.md`.
- Executable `edits` support in the machine-readable audit report so safe/apply
runs can use deterministic file changes.

### Changed

- Refactored fixture scanning onto a shared audit core so the regression corpus
executes the same detection logic as workspace audit.
- Strengthened the `before_plan` gate to require structured `AGENTS.md` load
proof instead of a verbal acknowledgement only.
- Aligned README / DESIGN / command docs around the executable report schema and
operational audit metrics.
- Updated regression fixtures to match the canonical ownership matrix and real
audit behaviour.

## [1.5.0] - 2026-05-27
<!-- planned-bump: major -->
<!-- next-release-version: 2.0.0 -->
Expand Down
55 changes: 52 additions & 3 deletions memorylint/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ silent authority over long-lived project memory.

## Audit Pipeline

The audit command follows a deterministic conceptual pipeline:
The audit command now follows a deterministic executable pipeline:

1. **Instruction Inventory**: scan instruction sources such as `AGENTS.md`,
`.specify/memory/constitution.md`, `CLAUDE.md`, `.cursor/rules/*`, README
Expand All @@ -49,8 +49,27 @@ The audit command follows a deterministic conceptual pipeline:
evidence to each finding.
4. **Drift Detection**: detect `boundary`, `reality`, `conflict`, and
`redundancy` drift.
5. **Report Generation**: emit both a human-readable Markdown report and a
machine-readable `memorylint-report.json` artifact.
5. **Executable Report Generation**: emit both a human-readable Markdown report
and a machine-readable `memorylint-report.json` artifact.

## Source Ownership Matrix

MemoryLint applies a canonical ownership / precedence model:

| Category | Canonical Owner | Secondary Sources |
|----------|-----------------|------------------|
| `architecture` | `.specify/memory/constitution.md` | `.cursor/rules/*`, `CLAUDE.md` |
| `domain` | `.specify/memory/constitution.md` | manifests, docs |
| `infrastructure` | root `AGENTS.md` | nested `AGENTS.md`, `CLAUDE.md`, workflows |
| `workflow` | root `AGENTS.md` | nested `AGENTS.md`, `CLAUDE.md` |
| `tooling` | root `AGENTS.md` | tool-specific editor rules |
| `personal_preference` | root `AGENTS.md` | editor-specific restatements |

Precedence rules:

1. Constitution wins for shared architecture and domain rules.
2. Root `AGENTS.md` wins for shared workflow, infrastructure, tooling, and preference rules.
3. README, workflows, tests, and manifests are evidence-bearing sources, not canonical owners.

## Machine-Readable Report

Expand All @@ -75,6 +94,11 @@ gate compares these hashes against current file content before modifying any
file. This prevents applying stale findings after instruction files have
changed.

Findings may additionally carry:

- `edits`: deterministic line-scoped mutations for executable safe/apply runs
- `manual_handoff`: constitution-targeted handoff material that cannot be auto-applied

## Apply Gate

Apply has three modes:
Expand All @@ -87,6 +111,24 @@ Apply has three modes:
Safe mode must not move architecture or domain rules, rewrite semantics, delete
constitution-owned rules, or apply medium/low-confidence findings.

Boundary fixes targeting the constitution always become **manual handoff
artifacts**. Apply may remove the misplaced secondary copy only when the
handoff has been explicitly approved; it still must not auto-merge into the
constitution.

## Planning Gate

`load-agents` is now a verifiable gate rather than a verbal acknowledgement.
Its success output records:

- the root `AGENTS.md` path
- the SHA-256 hash of the loaded file
- the extracted section list
- the rule summaries inherited into planning

This creates a machine-checkable `before_plan` proof instead of a best-effort
statement.

Post-apply validation checks:

- `AGENTS.md` integrity and critical section preservation;
Expand All @@ -112,6 +154,12 @@ contract. It covers:
each fixture's `expected-findings.json`. This turns the design from a prompt-only
contract into a deterministic regression gate.

The same core powers:

- `scripts/audit_workspace.py` for real workspace audit
- `scripts/apply_report.py` for staleness-checked apply and rollback
- `scripts/load_agents_state.py` for structured planning-gate proof

## Release Criteria

MemoryLint changes are ready to ship only when:
Expand All @@ -120,4 +168,5 @@ MemoryLint changes are ready to ship only when:
- audit/apply/load-agents prompts preserve their safety contracts;
- fixture schemas are valid;
- deterministic fixture scanning matches expected findings;
- real workspace audit/apply/load-agents scripts stay aligned with the prompt contracts;
- repository workflow tests and whitespace checks pass.
50 changes: 41 additions & 9 deletions memorylint/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ Evidence-driven instruction drift checker for Spec Kit.

MemoryLint audits long-lived agent instruction files — `AGENTS.md`, `.specify/memory/constitution.md`, `CLAUDE.md`, `.cursor/rules/`, and other sources — to detect boundary violations, stale references, conflicts, and redundancies. Every finding is backed by concrete evidence so reviewers can trust the report before applying any changes.

The current implementation includes executable helpers for all three surfaces:

- `scripts/audit_workspace.py`
- `scripts/apply_report.py`
- `scripts/load_agents_state.py`

## Problem Statement

In Spec-Driven Development (SDD), AI agents rely on long-lived instruction files:
Expand Down Expand Up @@ -74,6 +80,22 @@ This extension registers the following Spec Kit lifecycle hooks:

Key design constraint: hooks only run **read-only** operations. The `apply` command is never wired to a hook — it is always an explicit user action.

## Canonical Ownership Matrix

MemoryLint now applies one canonical ownership matrix during audit:

| Category | Canonical Owner | Notes |
|----------|-----------------|-------|
| `architecture` | `.specify/memory/constitution.md` | editor rules may restate, but do not own |
| `domain` | `.specify/memory/constitution.md` | manifests and docs may reflect, but do not own |
| `infrastructure` | root `AGENTS.md` | nested/editor sources may scope or mirror |
| `workflow` | root `AGENTS.md` | nested/editor sources may scope or mirror |
| `tooling` | root `AGENTS.md` | tool-specific files may add local detail |
| `personal_preference` | root `AGENTS.md` | editor-specific restatements are secondary |

This matrix is what drives `recommended_destination`, redundancy cleanup, and
constitution handoff generation.

## Apply Modes

| Mode | Behaviour |
Expand Down Expand Up @@ -103,6 +125,11 @@ It includes `schema_version`, `source_metadata`, `instruction_map`, `findings`,
and `metrics`. `source_metadata` records SHA-256 hashes for scanned files so the
apply gate can reject stale reports before changing anything.

Executable findings may also include:

- `edits`: line-scoped file operations used by the apply gate
- `manual_handoff`: constitution-targeted handoff material that must be reviewed by a human

## Rule Classification

Every rule is classified into one of eight categories:
Expand All @@ -118,17 +145,22 @@ Every rule is classified into one of eight categories:
| `obsolete` | References something that no longer exists |
| `conflict` | Contradicts another rule |

## Trust Metrics
## Audit Metrics

Every audit report includes a metrics section tracking:
Every audit report now emits run-time metrics that match the executable output:

| Metric | Purpose |
|--------|---------|
| High-confidence finding acceptance rate | Measures report accuracy |
| False positive rate | Must stay low to maintain trust |
| Suggested diff apply rate | Tracks actionability |
| Real stale/conflicting rules found | Measures value delivered |
| Destructive surprise edits | Must be **zero** |
| Total instruction sources scanned | Shows workspace coverage |
| Total rules catalogued | Shows extracted rule inventory size |
| Total findings | Shows total actionable/non-actionable drift |
| High-confidence findings | Indicates directly evidenced findings |
| Medium-confidence findings | Indicates heuristic findings that need review |
| Low-confidence findings | Indicates weak-evidence findings |
| Files that would be modified by suggested actions | Powers safe preview and apply gating |

Longitudinal trust KPIs such as false-positive rate or destructive surprise
edits remain release-evaluation signals, not per-run report fields.

## Regression Corpus

Expand All @@ -147,8 +179,8 @@ MemoryLint includes a regression corpus of nine fixture repos under `tests/fixtu
| `post-apply-breakage` | Apply safety validation |

The fixture corpus is executable. `memorylint/scripts/scan_fixtures.py --check`
generates deterministic findings for every fixture and compares them with each
fixture's `expected-findings.json`.
re-runs the real audit core against every fixture and compares the normalized
findings with each fixture's `expected-findings.json`.

## Design

Expand Down
Loading
Loading