Skip to content

newmassrael/mnemosyne

Repository files navigation

Mnemosyne

Integrity infrastructure for AI-mutated markdown — spec, code citations, and (eventually) narrative. 한국어 README

When an AI agent edits markdown directly, three failure modes appear that no compiler catches:

  • A regex meant to fix §3 matches inside a code fence and corrupts an unrelated example.
  • A heading rename silently breaks 200 cross-refs scattered across other docs.
  • An "improvement" rewrites a frozen ledger entry — the decision history that explained why the system is shaped this way is gone.

These hazards extend outward the moment your codebase starts citing the spec. A comment that reads // see Round 254 for the rationale is load-bearing documentation; once Round 254 is renamed, deleted, or superseded, that comment lies — and git blame will chase the wrong rationale forever. The same applies to narrative documents: a character bible whose eye-color note in chapter 2 contradicts chapter 15 is the same class of integrity break, just in a different medium.

Mnemosyne replaces these fragile surfaces with a typed, bi-directional integrity stack.

  • The atomic store (docs/.atomic/workspace.atomic.json) is the single source of truth — typed records (Section / ChangelogEntry / FrozenList / CrossRef) with append-only audit semantics.
  • docs/GENERATED.md is the sole human-readable artifact, deterministically rendered from the store. Humans read; AI writes through typed primitives.
  • Every mutation routes through a typed primitive that runs T1 (cross-ref orphan reject) and T2 (frozen-ledger jaccard) before persisting.
  • Code citations of spec ids (§3, Round 254) are scanned at commit time; hallucinated or superseded references are rejected before they reach git history.
  • Section ↔ Implementation bindings record which source files own each decision. When a spec section is renamed or superseded, the citing code locations surface automatically.

Status: Phase 0 hardening (7 crates). 500+ tests green. Mnemosyne dogfoods itself — its own design history lives in the atomic store at docs/.atomic/workspace.atomic.json, with docs/GENERATED.md as the human-readable view.

What Mnemosyne actually protects

Mnemosyne enforces three integrity boundaries. Each one corresponds to a class of bug that AI-mediated authoring creates and that hand-written review usually misses.

1. Document ↔ document (T1 cross-ref orphan reject)

Cross-references between sections never dangle. If §3 in docs/SPEC.md references §42, but §42 doesn't exist — neither intra-doc, nor in the default cross-doc target, nor in the atomic store — the mutation that introduced that reference is rejected at write time. Renaming §3 automatically updates every cross_ref pointing to it, atomically.

What this catches: "I told the AI to rename §3 → §4, it did a regex replace, and now eight unrelated docs have broken refs."

2. Document ↔ history (T2 frozen-ledger jaccard)

Once a ChangelogEntry is committed, its sub_bullets are append-only. A subsequent mutation that removes a bullet from a frozen entry fails the jaccard-inclusion check (current ⊇ previous). The audit trail becomes provably immutable without relying on git history (which file renames, squash-merges, and cherry-picks routinely break for decision-tracking purposes).

What this catches: "The AI 'improved' the changelog wording and now I don't know what we actually decided in Round 17."

3. Document ↔ code (Path B bidirectional binding + code-citation defense)

Every spec Section can record implementations = [(file, symbol), ...] — the source code that owns that decision. The validate-code-refs pass then walks the configured production source paths and extracts §<id> / Round NNN citations from comments. Three classes of defect are rejected:

  • Missing — citation references a section/entry id that doesn't exist in the atomic store (hallucination).
  • CitationUnbound — citation appears in a file that the referenced section's implementations list does not claim as a binding. Either the section's binding list is stale, or the citing comment is misplaced — both are real defects, surfaced symmetrically.
  • ImplementationMissing — an Active section has zero implementations recorded. "Active" means "this decision is backed by code"; a section with no recorded backing breaks that contract.

Pre-commit hooks wire all three into a reject gate. Renaming or superseding a spec section runs a cascade scan that prints every citing code location to stderr — stale citations surface immediately.

What this catches: "The agent left a // see Round 254 comment in auth.rs after we renamed Round 254 to Round 256 last month, and nothing flagged it for six weeks."

Components

Crate Role
mnemosyne-validator Parser / emitter / T1+T2 / round-trip
mnemosyne-store RocksDB CF layout
mnemosyne-core Typed-fact bridge
mnemosyne-cascade Salsa cascade queries
mnemosyne-server gRPC + audit append surface
mnemosyne-cli Production CLI (validate / mutate / generate-docs)
mnemosyne-mcp Model Context Protocol server for AI clients

Quick start (CLI)

git clone https://github.com/newmassrael/mnemosyne
cd mnemosyne
cargo install --path crates/mnemosyne-cli --force
cargo install --path crates/mnemosyne-mcp --force

In your project root, author mnemosyne.toml:

[workspace]
docs = ["ARCHITECTURE.md", "docs/spec.md"]
default_doc = "ARCHITECTURE.md"

[schema]
changelog_titles = ["Changelog"]
entry_id_prefix = "Round "

[style]
locale = "en"

# Optional — opt into the code-citation defense (rejects hallucinated
# §id / Round-N references in your source comments).
[code_refs]
paths = ["src/"]
severity_missing = "warn"   # promote to "reject" once your baseline is clean
severity_binding = "warn"
comment_only = true

Then:

mnemosyne-cli validate-workspace   # T1 + round-trip + atomic ledger
mnemosyne-cli validate-code-refs   # citation defense (if [code_refs] configured)

This surfaces your baseline: T1 orphan total, round-trip mandatory status, T3/T4 style violations, atomic ledger sync, plus any spec-id citations in source that no longer resolve. From that baseline, mutations are evaluated incrementally.

See docs/GETTING_STARTED.md and docs/SCHEMA_GUIDE.md for the full walkthrough.

Using Mnemosyne with AI agents (MCP)

mnemosyne-mcp is a Model Context Protocol server. AI clients (Claude Code, Cursor, Cline, Continue, Copilot Chat, …) connect over stdio and gain:

  • 16 typed tools — validate / query / 12 atomic mutate primitives (Section + ChangelogEntry typed-field setters). Each tool's args are JSONSchema-validated before reaching the validator.
  • 7 concept resources under mnemosyne://concepts/* — overview, atomic-store, frozen-ledger, tier-rules, anti-patterns, schema-guide, workflow. AI clients auto-load these so the agent internalizes Mnemosyne's semantics before mutating.

Register the MCP server in a project

Drop a .mcp.json at the project root:

{
  "mcpServers": {
    "mnemosyne": {
      "command": "mnemosyne-mcp",
      "args": ["--workspace", "."]
    }
  }
}

Restart your AI client. On first invocation it will prompt to approve the server; once approved, the agent can call tools and read concept resources without further setup.

Onboarding flow for collaborators

When a teammate clones a project that already has .mcp.json + mnemosyne.toml, they only need:

cargo install --path /path/to/mnemosyne/crates/mnemosyne-cli --force
cargo install --path /path/to/mnemosyne/crates/mnemosyne-mcp --force

The next time their AI client opens the project, it picks up .mcp.json automatically. Pre-built binaries via cargo-dist are planned for a future release.

How It Works

The lifecycle has four nodes:

typed mutate primitive ──► atomic store JSON ──► tera render ──► GENERATED.md
        │                                                             │
        └────────── round-trip: parse(emit) == typed_facts ───────────┘

A typical mutation flow:

  1. The author or AI calls a typed primitive (e.g. set_section_intent).
  2. The primitive runs T1 (cross-ref orphan reject) and T2 (frozen ledger jaccard) before any write.
  3. On accept, the atomic store JSON is written via temp file + atomic rename.
  4. Cascade auto-update: a tera template renders the store back to docs/GENERATED.md.
  5. The round-trip invariant — parse(emit(typed_facts)) == typed_facts — is rechecked on every subsequent validate-workspace call.

Read paths skip parsing entirely — query-section returns SectionView JSON straight from the atomic store.

Whether a tool is invoked by the CLI, the MCP server, or a pre-commit hook, the same code path runs (parse + emit + T1 + T2 in mnemosyne-validator). One implementation, three entry surfaces.

CI integration

In CI you don't need MCP — just the CLI:

# .github/workflows/mnemosyne.yml
on: [push, pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo install --git https://github.com/newmassrael/mnemosyne mnemosyne-cli
      - run: mnemosyne-cli validate-workspace
      - run: mnemosyne-cli verify-generated
      - run: mnemosyne-cli validate-code-refs   # optional, requires [code_refs] in mnemosyne.toml

The same three commands are wired into scripts/install-hooks.sh as a pre-commit gate. Once the citation-defense baseline is clean, promote severity_* from warn to reject in mnemosyne.toml and the hook will block any commit that introduces a hallucinated spec citation.

Design Considerations

The major shape decisions and the alternatives examined. Useful when adopting Mnemosyne in a project that has its own opinions about doc management.

Why atomic store + GENERATED.md, not raw markdown

A pure markdown surface exposes three structural failure modes to AI agents:

  • A regex meant to fix §3 accidentally matches inside a code fence.
  • A heading rename silently invalidates two hundred cross-refs.
  • An "improvement" commits a rewrite of a frozen ledger entry and history is gone.

The typed atomic store collapses each into a mechanical reject:

  • T1 — a non-existent §N target is rejected at write time.
  • A heading rename routes through set_section_* which atomically updates every cross_ref pointing to it.
  • T2 — a sub_bullet removal is rejected by jaccard inclusion.

Why a single JSON file instead of a database

Considered: RocksDB, sled, LMDB, XTDB, Datomic. The Phase -1A measurement spike (under bench/) confirmed that RocksDB CF + 24 B fixed-width composite keys hits the §3 SLA budget for the per-fact layer.

For the workspace-scope atomic store (Section + ChangelogEntry typed facts), a full database buys nothing — the workspace is small, and the access pattern is "load whole file → mutate once → re-render." A single JSON file written via temp + atomic rename covers the use case.

RocksDB is still wired in Phase 0 for the audit-trail layer: mnemosyne-cli commit records design-doc commit transactions to RocksDB column families under .mnemosyne/store/. The full per-branch fact layer that exercises the §4 ten-CF schema at the 50K-asset workload is Phase 1+ scope. The validate / mutate / render paths used day-to-day touch only the JSON file; RocksDB activates on commit.

Why frozen ledger instead of git history

Git tracks file changes. Frozen ledger tracks decision changes. The two are not the same:

  • File renames lose the git history of decisions inside the file.
  • Squash-merging collapses individual decision commits.
  • Cherry-picking re-orders decisions arbitrarily.

The ChangelogEntry sequence is ordered by entry_id monotonicity and re-validated at every mutation. Stronger than git for the audit-trail use case.

Why typed primitives instead of LSP-style text edits

LSP edits operate on text ranges. Mnemosyne's primitives operate on typed fields. The difference matters when one logical change touches many regions:

  • LSP rename §39 → §40: author writes a regex and hopes it's correct.
  • Mnemosyne set_section_impact_scope(target=§40): validator checks that §40 exists, atomically updates every relevant cross_ref, re-renders GENERATED.md.

Cost: mutations must go through the typed API. Benefit: the "regex matched the wrong thing" class of bugs is eliminated by construction.

Why MCP for the AI integration surface

Considered: custom JSON-RPC, gRPC, vendor-specific extensions, plain CLI calls. MCP won on three points:

  • It is a cross-vendor standard (Claude Code, Cursor, Cline, Continue, Copilot Chat all speak it).
  • Tool arguments are JSONSchema-validated at the protocol layer.
  • Resources auto-load concept docs into the agent's context, so the agent learns the rules before mutating.

The mnemosyne-mcp server wraps the production CLI, keeping the validation logic single-source.

Why Salsa for cascade queries

Considered: Differential Dataflow, Adapton, manual invalidation. Salsa won on:

  • Field-level dependency tracking (the Round 92 fine-grained layer).
  • Byte-equal memoization stability across processes.
  • Compile-time #[salsa::input/tracked/db] integration that keeps cascade definitions close to the query bodies.

Phase 1.5 cascade-gate full-scale measurement (50K asset workload) will validate that the per-record pattern scales to the §11 SLA budget.

Why round-trip equality is the spine

The contract: parse(emit(typed_facts)) == typed_facts.

Without it, the atomic store and GENERATED.md drift, and any pre-commit hook eventually misclassifies. The Round 67 sub-section prefix bug surfaced exactly this way: the parser produced section_id 60/1 for a nested numbered heading, but the emitter wrote bare 1., so re-parsing yielded a different id and the diff broke. The fix preserved the parent prefix on the last segment. Mechanical hygiene that hand-written tests rarely catch.

Closed-form schema in Phase 0

The four entity kinds (Section / ChangelogEntry / FrozenList / CrossRef) are closed-form. User-defined kinds, additional entities, and schema extensions are explicitly not Phase 0 features — that work belongs to Phase 1.5+ schema decomposition (a separate spec round).

Closing the schema in Phase 0:

  • Simplifies the validator (no plugin loader path).
  • Keeps round-trip provability tractable.
  • Makes 5-language emit (Rust + Kotlin + Python + C++ + Protobuf) feasible. Salsa cascade semantics remain Rust-only because porting the incremental-computation guarantees to other languages was judged out of paradigm.

Documentation

For AI agents already inside an MCP session, the canonical onboarding order is:

  1. mnemosyne://concepts/overview
  2. mnemosyne://concepts/anti-patterns
  3. mnemosyne://concepts/atomic-store
  4. mnemosyne://concepts/frozen-ledger
  5. mnemosyne://concepts/tier-rules
  6. mnemosyne://concepts/workflow

Roadmap

Mnemosyne's core abstraction — AI-mutated markdown documents need typed invariants to stay safe — generalizes well beyond design docs. The roadmap follows that generalization outward: same primitives (Section / CrossRef / ChangelogEntry / FrozenList), same integrity guarantees (T1 / T2 / Path B), different schemas on top.

Phase 0 — Design-doc lifecycle (current)

Production dogfood. Mnemosyne's own design history runs through the atomic store; the hardening arc spanning Round 252-272 closed the core integrity gaps:

  • T1 cross-doc orphan reject with [[orphan_ledger]] opt-in carries for legitimate legacy references.
  • Atomic-axis decision_status field with author-time + validate-time guards (T1 rule 4 across both axes).
  • Code-citation defense reject mode (severity_missing / severity_binding = reject) gating pre-commit on hallucinated spec references.
  • Bidirectional Spec ↔ Code binding via Section.implementations and three-edged set-equality detection (CitationUnbound + ImplementationUnbacked + ImplementationMissing).
  • Atomic ChangelogEntry mutate API with auto-cascade regeneration of GENERATED.md on every successful write.

Phase 1 — Narrative medium adapter

The next adoption surface: long-form fiction, game scripts, TRPG campaign notes, worldbuilding wikis, character bibles. These media share the same AI-mutation hazard pattern that motivated Phase 0 — LLM-driven editing breaks invariants that no compiler enforces — but the schema and the primitives change.

Concrete target genres and what Mnemosyne would guard:

  • Long-form fiction draft management. A character's established eye color in chapter 2 must match chapter 15. A renamed faction shouldn't leave 40 orphan references in unrelated scenes. The atomic-store + T1 invariants lift directly — what changes is the entity schema (Character / Location / Faction / Scene) and the mutate primitives (set_character_eye_color, rename_faction_with_cascade).
  • Game scripts (interactive fiction, dialog trees, branching narrative). Branch targets must resolve. Character dialog schemas must stay consistent across scenes. Conditional flag references (if metPirateKing) cannot dangle. Same T1 cross-ref orphan reject, applied to scene graphs instead of section graphs.
  • TRPG campaign notes. NPC stat blocks, location backstory, plot beat audit trail. The GM's "what did I rule three sessions ago" problem is exactly the frozen-ledger problem: git history doesn't carry decision provenance, but a ChangelogEntry stream sorted by session number does.
  • Worldbuilding wikis. Faction relations, timeline consistency, magic-system constraints. References between articles need orphan reject; "law of magic" changes need frozen-ledger semantics so retroactive edits don't quietly contradict ten earlier chapters.
  • Character bibles. Name spelling normalization, age/timeline arithmetic, relationship graph consistency. Identical hazards to a design doc, different fields on the underlying schema.

The Phase 1 priority audit (Round 172) ranked fictional adapter as the first Phase 1 entry by a 6.00 / 3.00× margin over alternatives — chosen because (a) the AI-mediated authoring workflow already exists in this space, (b) the per-asset count fits the workspace-scope JSON store without database migration, and (c) the integrity-break failure modes are visible to end users (a reader notices when a character's eye color contradicts the bible) which keeps the validator's reject mode well-calibrated.

Phase 1 is currently deferred behind Phase 0 stack stabilization — not abandoned. The roadmap is honest about the boundary.

Phase 1.5 — Cascade-gate full-scale measurement

Validation that the per-record Salsa cascade pattern (currently used at workspace scope) scales to the 50K-asset workload at the published p95 budget. Substrate carried from the Phase -1A measurement spike (under bench/, retained as historical baseline). This is the infrastructure prerequisite for any narrative-medium adapter that manages a novel-scale (~50K facts) workspace efficiently.

What's not on the roadmap

These items are registered carries in the audit ledger, not commitments. Phase 0 stack stability is the gating criterion. The codebase deliberately separates "what works today and is dogfooded" from "what is named in the priority audit" — there is no implication that a registered carry will ship on any particular timeline.

License

Dual-licensed under MIT or Apache-2.0 at your option.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors