Integrity infrastructure for AI-mutated markdown — spec, code citations, and (eventually) narrative. 한국어 README
When an AI agent edits markdown directly, three failure modes appear that no compiler catches:
- A regex meant to fix
§3matches inside a code fence and corrupts an unrelated example. - A heading rename silently breaks 200 cross-refs scattered across other docs.
- An "improvement" rewrites a frozen ledger entry — the decision history that explained why the system is shaped this way is gone.
These hazards extend outward the moment your codebase starts citing
the spec. A comment that reads // see Round 254 for the rationale is
load-bearing documentation; once Round 254 is renamed, deleted, or
superseded, that comment lies — and git blame will chase the wrong
rationale forever. The same applies to narrative documents: a character
bible whose eye-color note in chapter 2 contradicts chapter 15 is the
same class of integrity break, just in a different medium.
Mnemosyne replaces these fragile surfaces with a typed, bi-directional integrity stack.
- The atomic store (
docs/.atomic/workspace.atomic.json) is the single source of truth — typed records (Section / ChangelogEntry / FrozenList / CrossRef) with append-only audit semantics. docs/GENERATED.mdis the sole human-readable artifact, deterministically rendered from the store. Humans read; AI writes through typed primitives.- Every mutation routes through a typed primitive that runs T1 (cross-ref orphan reject) and T2 (frozen-ledger jaccard) before persisting.
- Code citations of spec ids (
§3,Round 254) are scanned at commit time; hallucinated or superseded references are rejected before they reach git history. - Section ↔ Implementation bindings record which source files own each decision. When a spec section is renamed or superseded, the citing code locations surface automatically.
Status: Phase 0 hardening (7 crates). 500+ tests green. Mnemosyne
dogfoods itself — its own design history lives in the atomic store at
docs/.atomic/workspace.atomic.json, with docs/GENERATED.md as the
human-readable view.
Mnemosyne enforces three integrity boundaries. Each one corresponds to a class of bug that AI-mediated authoring creates and that hand-written review usually misses.
Cross-references between sections never dangle. If §3 in
docs/SPEC.md references §42, but §42 doesn't exist — neither
intra-doc, nor in the default cross-doc target, nor in the atomic store
— the mutation that introduced that reference is rejected at write
time. Renaming §3 automatically updates every cross_ref pointing to
it, atomically.
What this catches: "I told the AI to rename §3 → §4, it did a regex replace, and now eight unrelated docs have broken refs."
Once a ChangelogEntry is committed, its sub_bullets are append-only.
A subsequent mutation that removes a bullet from a frozen entry fails
the jaccard-inclusion check (current ⊇ previous). The audit trail
becomes provably immutable without relying on git history (which file
renames, squash-merges, and cherry-picks routinely break for
decision-tracking purposes).
What this catches: "The AI 'improved' the changelog wording and now I don't know what we actually decided in Round 17."
Every spec Section can record implementations = [(file, symbol), ...]
— the source code that owns that decision. The
validate-code-refs pass then walks the configured production source
paths and extracts §<id> / Round NNN citations from comments. Three
classes of defect are rejected:
Missing— citation references a section/entry id that doesn't exist in the atomic store (hallucination).CitationUnbound— citation appears in a file that the referenced section'simplementationslist does not claim as a binding. Either the section's binding list is stale, or the citing comment is misplaced — both are real defects, surfaced symmetrically.ImplementationMissing— an Active section has zeroimplementationsrecorded. "Active" means "this decision is backed by code"; a section with no recorded backing breaks that contract.
Pre-commit hooks wire all three into a reject gate. Renaming or superseding a spec section runs a cascade scan that prints every citing code location to stderr — stale citations surface immediately.
What this catches: "The agent left a // see Round 254 comment in
auth.rs after we renamed Round 254 to Round 256 last month, and
nothing flagged it for six weeks."
| Crate | Role |
|---|---|
mnemosyne-validator |
Parser / emitter / T1+T2 / round-trip |
mnemosyne-store |
RocksDB CF layout |
mnemosyne-core |
Typed-fact bridge |
mnemosyne-cascade |
Salsa cascade queries |
mnemosyne-server |
gRPC + audit append surface |
mnemosyne-cli |
Production CLI (validate / mutate / generate-docs) |
mnemosyne-mcp |
Model Context Protocol server for AI clients |
git clone https://github.com/newmassrael/mnemosyne
cd mnemosyne
cargo install --path crates/mnemosyne-cli --force
cargo install --path crates/mnemosyne-mcp --forceIn your project root, author mnemosyne.toml:
[workspace]
docs = ["ARCHITECTURE.md", "docs/spec.md"]
default_doc = "ARCHITECTURE.md"
[schema]
changelog_titles = ["Changelog"]
entry_id_prefix = "Round "
[style]
locale = "en"
# Optional — opt into the code-citation defense (rejects hallucinated
# §id / Round-N references in your source comments).
[code_refs]
paths = ["src/"]
severity_missing = "warn" # promote to "reject" once your baseline is clean
severity_binding = "warn"
comment_only = trueThen:
mnemosyne-cli validate-workspace # T1 + round-trip + atomic ledger
mnemosyne-cli validate-code-refs # citation defense (if [code_refs] configured)This surfaces your baseline: T1 orphan total, round-trip mandatory status, T3/T4 style violations, atomic ledger sync, plus any spec-id citations in source that no longer resolve. From that baseline, mutations are evaluated incrementally.
See docs/GETTING_STARTED.md and docs/SCHEMA_GUIDE.md for the full walkthrough.
mnemosyne-mcp is a Model Context Protocol server. AI clients
(Claude Code, Cursor, Cline, Continue, Copilot Chat, …) connect over
stdio and gain:
- 16 typed tools — validate / query / 12 atomic mutate primitives (Section + ChangelogEntry typed-field setters). Each tool's args are JSONSchema-validated before reaching the validator.
- 7 concept resources under
mnemosyne://concepts/*— overview, atomic-store, frozen-ledger, tier-rules, anti-patterns, schema-guide, workflow. AI clients auto-load these so the agent internalizes Mnemosyne's semantics before mutating.
Drop a .mcp.json at the project root:
{
"mcpServers": {
"mnemosyne": {
"command": "mnemosyne-mcp",
"args": ["--workspace", "."]
}
}
}Restart your AI client. On first invocation it will prompt to approve the server; once approved, the agent can call tools and read concept resources without further setup.
When a teammate clones a project that already has .mcp.json +
mnemosyne.toml, they only need:
cargo install --path /path/to/mnemosyne/crates/mnemosyne-cli --force
cargo install --path /path/to/mnemosyne/crates/mnemosyne-mcp --forceThe next time their AI client opens the project, it picks up
.mcp.json automatically. Pre-built binaries via cargo-dist are
planned for a future release.
The lifecycle has four nodes:
typed mutate primitive ──► atomic store JSON ──► tera render ──► GENERATED.md
│ │
└────────── round-trip: parse(emit) == typed_facts ───────────┘
A typical mutation flow:
- The author or AI calls a typed primitive
(e.g.
set_section_intent). - The primitive runs T1 (cross-ref orphan reject) and T2 (frozen ledger jaccard) before any write.
- On accept, the atomic store JSON is written via temp file + atomic rename.
- Cascade auto-update: a tera template renders the store back to
docs/GENERATED.md. - The round-trip invariant —
parse(emit(typed_facts)) == typed_facts— is rechecked on every subsequentvalidate-workspacecall.
Read paths skip parsing entirely — query-section returns SectionView
JSON straight from the atomic store.
Whether a tool is invoked by the CLI, the MCP server, or a pre-commit
hook, the same code path runs (parse + emit + T1 + T2 in
mnemosyne-validator). One implementation, three entry surfaces.
In CI you don't need MCP — just the CLI:
# .github/workflows/mnemosyne.yml
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- run: cargo install --git https://github.com/newmassrael/mnemosyne mnemosyne-cli
- run: mnemosyne-cli validate-workspace
- run: mnemosyne-cli verify-generated
- run: mnemosyne-cli validate-code-refs # optional, requires [code_refs] in mnemosyne.tomlThe same three commands are wired into scripts/install-hooks.sh as a
pre-commit gate. Once the citation-defense baseline is clean, promote
severity_* from warn to reject in mnemosyne.toml and the hook
will block any commit that introduces a hallucinated spec citation.
The major shape decisions and the alternatives examined. Useful when adopting Mnemosyne in a project that has its own opinions about doc management.
A pure markdown surface exposes three structural failure modes to AI agents:
- A regex meant to fix
§3accidentally matches inside a code fence. - A heading rename silently invalidates two hundred cross-refs.
- An "improvement" commits a rewrite of a frozen ledger entry and history is gone.
The typed atomic store collapses each into a mechanical reject:
- T1 — a non-existent
§Ntarget is rejected at write time. - A heading rename routes through
set_section_*which atomically updates every cross_ref pointing to it. - T2 — a sub_bullet removal is rejected by jaccard inclusion.
Considered: RocksDB, sled, LMDB, XTDB, Datomic. The Phase -1A
measurement spike (under bench/) confirmed that RocksDB CF + 24 B
fixed-width composite keys hits the §3 SLA budget for the per-fact
layer.
For the workspace-scope atomic store (Section + ChangelogEntry typed facts), a full database buys nothing — the workspace is small, and the access pattern is "load whole file → mutate once → re-render." A single JSON file written via temp + atomic rename covers the use case.
RocksDB is still wired in Phase 0 for the audit-trail layer:
mnemosyne-cli commit records design-doc commit transactions to
RocksDB column families under .mnemosyne/store/. The full per-branch
fact layer that exercises the §4 ten-CF schema at the 50K-asset
workload is Phase 1+ scope. The validate / mutate / render paths used
day-to-day touch only the JSON file; RocksDB activates on commit.
Git tracks file changes. Frozen ledger tracks decision changes. The two are not the same:
- File renames lose the git history of decisions inside the file.
- Squash-merging collapses individual decision commits.
- Cherry-picking re-orders decisions arbitrarily.
The ChangelogEntry sequence is ordered by entry_id monotonicity and
re-validated at every mutation. Stronger than git for the audit-trail
use case.
LSP edits operate on text ranges. Mnemosyne's primitives operate on typed fields. The difference matters when one logical change touches many regions:
- LSP rename
§39 → §40: author writes a regex and hopes it's correct. - Mnemosyne
set_section_impact_scope(target=§40): validator checks that §40 exists, atomically updates every relevant cross_ref, re-renders GENERATED.md.
Cost: mutations must go through the typed API. Benefit: the "regex matched the wrong thing" class of bugs is eliminated by construction.
Considered: custom JSON-RPC, gRPC, vendor-specific extensions, plain CLI calls. MCP won on three points:
- It is a cross-vendor standard (Claude Code, Cursor, Cline, Continue, Copilot Chat all speak it).
- Tool arguments are JSONSchema-validated at the protocol layer.
- Resources auto-load concept docs into the agent's context, so the agent learns the rules before mutating.
The mnemosyne-mcp server wraps the production CLI, keeping the
validation logic single-source.
Considered: Differential Dataflow, Adapton, manual invalidation. Salsa won on:
- Field-level dependency tracking (the Round 92 fine-grained layer).
- Byte-equal memoization stability across processes.
- Compile-time
#[salsa::input/tracked/db]integration that keeps cascade definitions close to the query bodies.
Phase 1.5 cascade-gate full-scale measurement (50K asset workload) will validate that the per-record pattern scales to the §11 SLA budget.
The contract: parse(emit(typed_facts)) == typed_facts.
Without it, the atomic store and GENERATED.md drift, and any
pre-commit hook eventually misclassifies. The Round 67 sub-section
prefix bug surfaced exactly this way: the parser produced section_id
60/1 for a nested numbered heading, but the emitter wrote bare
1., so re-parsing yielded a different id and the diff broke. The
fix preserved the parent prefix on the last segment. Mechanical
hygiene that hand-written tests rarely catch.
The four entity kinds (Section / ChangelogEntry / FrozenList / CrossRef) are closed-form. User-defined kinds, additional entities, and schema extensions are explicitly not Phase 0 features — that work belongs to Phase 1.5+ schema decomposition (a separate spec round).
Closing the schema in Phase 0:
- Simplifies the validator (no plugin loader path).
- Keeps round-trip provability tractable.
- Makes 5-language emit (Rust + Kotlin + Python + C++ + Protobuf) feasible. Salsa cascade semantics remain Rust-only because porting the incremental-computation guarantees to other languages was judged out of paradigm.
- docs/GETTING_STARTED.md — 5-minute setup walkthrough.
- docs/SCHEMA_GUIDE.md — every
mnemosyne.tomlfield, with presets. - docs/GENERATED.md — generated from the atomic store; the project's own design-doc dogfood.
- CLAUDE.md — Claude Code guidance for working on Mnemosyne itself.
- COMMIT_FORMAT.md — commit message convention.
For AI agents already inside an MCP session, the canonical onboarding order is:
mnemosyne://concepts/overviewmnemosyne://concepts/anti-patternsmnemosyne://concepts/atomic-storemnemosyne://concepts/frozen-ledgermnemosyne://concepts/tier-rulesmnemosyne://concepts/workflow
Mnemosyne's core abstraction — AI-mutated markdown documents need typed invariants to stay safe — generalizes well beyond design docs. The roadmap follows that generalization outward: same primitives (Section / CrossRef / ChangelogEntry / FrozenList), same integrity guarantees (T1 / T2 / Path B), different schemas on top.
Production dogfood. Mnemosyne's own design history runs through the atomic store; the hardening arc spanning Round 252-272 closed the core integrity gaps:
- T1 cross-doc orphan reject with
[[orphan_ledger]]opt-in carries for legitimate legacy references. - Atomic-axis
decision_statusfield with author-time + validate-time guards (T1 rule 4 across both axes). - Code-citation defense reject mode (
severity_missing/severity_binding=reject) gating pre-commit on hallucinated spec references. - Bidirectional Spec ↔ Code binding via
Section.implementationsand three-edged set-equality detection (CitationUnbound+ImplementationUnbacked+ImplementationMissing). - Atomic ChangelogEntry mutate API with auto-cascade regeneration of
GENERATED.mdon every successful write.
The next adoption surface: long-form fiction, game scripts, TRPG campaign notes, worldbuilding wikis, character bibles. These media share the same AI-mutation hazard pattern that motivated Phase 0 — LLM-driven editing breaks invariants that no compiler enforces — but the schema and the primitives change.
Concrete target genres and what Mnemosyne would guard:
- Long-form fiction draft management. A character's
established eye color in chapter 2 must match chapter 15. A renamed
faction shouldn't leave 40 orphan references in unrelated scenes.
The atomic-store + T1 invariants lift directly — what changes is
the entity schema (Character / Location / Faction / Scene) and the
mutate primitives (
set_character_eye_color,rename_faction_with_cascade). - Game scripts (interactive fiction, dialog trees, branching
narrative). Branch targets must resolve. Character dialog schemas
must stay consistent across scenes. Conditional flag references
(
if metPirateKing) cannot dangle. Same T1 cross-ref orphan reject, applied to scene graphs instead of section graphs. - TRPG campaign notes. NPC stat blocks, location backstory, plot beat audit trail. The GM's "what did I rule three sessions ago" problem is exactly the frozen-ledger problem: git history doesn't carry decision provenance, but a ChangelogEntry stream sorted by session number does.
- Worldbuilding wikis. Faction relations, timeline consistency, magic-system constraints. References between articles need orphan reject; "law of magic" changes need frozen-ledger semantics so retroactive edits don't quietly contradict ten earlier chapters.
- Character bibles. Name spelling normalization, age/timeline arithmetic, relationship graph consistency. Identical hazards to a design doc, different fields on the underlying schema.
The Phase 1 priority audit (Round 172) ranked fictional adapter as the first Phase 1 entry by a 6.00 / 3.00× margin over alternatives — chosen because (a) the AI-mediated authoring workflow already exists in this space, (b) the per-asset count fits the workspace-scope JSON store without database migration, and (c) the integrity-break failure modes are visible to end users (a reader notices when a character's eye color contradicts the bible) which keeps the validator's reject mode well-calibrated.
Phase 1 is currently deferred behind Phase 0 stack stabilization — not abandoned. The roadmap is honest about the boundary.
Validation that the per-record Salsa cascade pattern (currently used
at workspace scope) scales to the 50K-asset workload at the published
p95 budget. Substrate carried from the Phase -1A measurement spike
(under bench/, retained as historical baseline). This is the
infrastructure prerequisite for any narrative-medium adapter that
manages a novel-scale (~50K facts) workspace efficiently.
These items are registered carries in the audit ledger, not commitments. Phase 0 stack stability is the gating criterion. The codebase deliberately separates "what works today and is dogfooded" from "what is named in the priority audit" — there is no implication that a registered carry will ship on any particular timeline.
Dual-licensed under MIT or Apache-2.0 at your option.