Skip to content

perf: SQLite manifest store — replace per-file YAML sidecars (#ALP-912)#101

Merged
srobinson merged 13 commits intomainfrom
nancy/ALP-912
Mar 7, 2026
Merged

perf: SQLite manifest store — replace per-file YAML sidecars (#ALP-912)#101
srobinson merged 13 commits intomainfrom
nancy/ALP-912

Conversation

@srobinson
Copy link
Owner

Summary

  • Replace per-file .fmm YAML sidecars with a single .fmm.db SQLite database for manifest storage
  • New src/db/ module: schema design (mod.rs), write path (writer.rs), read path (reader.rs)
  • fmm generate writes to SQLite with incremental mtime-based updates; Manifest::load reads SQLite-first with sidecar fallback
  • CLI commands (init, validate, clean) migrated to SQLite with --db flag support
  • Removed serde_yaml dependency and all sidecar infrastructure (sidecar_parser.rs, 121 committed .fmm files)
  • Updated docs, help text, tools.toml, and generated artifacts

Sub-issues

  • ALP-913: SQLite schema design + rusqlite integration
  • ALP-914: Write path (fmm generate populates SQLite)
  • ALP-915: Read path (Manifest loads from SQLite)
  • ALP-916: Migrate CLI commands (init, validate, clean)
  • ALP-917: Remove sidecar infrastructure + serde_yaml dependency
  • ALP-918: Update docs, tools.toml, and generated artifacts

Test plan

  • just check && just build && just test all pass (57 unit + integration tests)
  • fmm generate produces .fmm.db and no .fmm sidecars
  • fmm validate reads from SQLite
  • fmm clean removes .fmm.db and legacy .fmm/ directory
  • MCP tools work correctly with SQLite-backed manifest

🤖 Generated with Claude Code

srobinson and others added 13 commits March 7, 2026 13:40
- Added `rusqlite = { version = "0.32", features = ["bundled"] }` to Cargo.toml
- Created `src/db/mod.rs` with full schema + connection management:
  - `open_or_create(root)` — creates `.fmm.db` at repo root if absent, applies
    pragmas (WAL, synchronous=NORMAL, mmap_size=256MB, temp_store=memory,
    foreign_keys=ON), runs schema creation or migration on version mismatch
  - `open_db(root)` — opens existing DB with pragmas, errors if file missing
  - `ensure_schema()` reads stored schema_version from meta table; drops and
    recreates all tables when version mismatches (regeneratable index)
  - Schema: files, exports (idx_exports_name, idx_exports_file), methods
    (idx_methods_name), reverse_deps (idx_reverse_deps_target),
    workspace_packages, meta
  - DB_FILENAME and SCHEMA_VERSION exported as pub constants
- Exposed `pub mod db` in src/lib.rs
- 6 unit tests: table creation, schema_version in meta, idempotency, WAL mode
  active, schema migration on version mismatch, open_db error when no file
- just check && just test clean

Next: ALP-914 — write path: fmm generate populates SQLite

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed

### src/db/writer.rs (new)
- `file_mtime_rfc3339(path)` — reads source file mtime as RFC3339 string
- `is_file_up_to_date(conn, path, mtime)` — compares source mtime to DB indexed_at
- `upsert_file_data(tx, rel_path, result, mtime)` — inserts/replaces file row +
  exports + methods inside an existing transaction; CASCADE on files PK cleans
  old exports/methods automatically; deduplicates method dotted names (overloads)
- `load_files_map(conn)` — loads all files into HashMap<String, FileEntry> for
  reverse-dep computation (imports/dependencies/named_imports populated)
- `rebuild_and_write_reverse_deps(conn, root)` — loads files from DB, builds
  minimal Manifest with workspace info, calls manifest.rebuild_reverse_deps(),
  persists results to reverse_deps table in a transaction
- `write_reverse_deps(conn, rev_deps)` — clears + bulk-inserts reverse_deps
- `upsert_workspace_packages(conn, packages)` — stores workspace package map
- `write_meta(conn, key, value)` — writes meta key-value pairs
- `extract_function_names(custom_fields)` — extracts TypeScript function_names
  from parser custom_fields HashMap
- 9 unit tests covering: upsert+query, export CASCADE on replace, method dedup,
  incremental up-to-date check, meta roundtrip, workspace packages, load_files_map

### src/extractor/mod.rs
- Added `pub fn parse(&self, path: &Path) -> Result<ParseResult>` — public
  wrapper around parse_content; returns full ParseResult (metadata + custom_fields)
  for the SQLite write path

### src/db/mod.rs
- Added `pub mod writer;`

### src/cli/sidecar.rs — generate()
- SQLite write path runs BEFORE sidecar path (so DB is fresh on next load):
  1. open_or_create DB
  2. Discover and store workspace packages
  3. Sequential mtime check: build list of dirty files (skips if indexed_at >= mtime)
  4. Parallel parse of dirty files only (rayon)
  5. Single transaction: upsert_file_data for each parsed result
  6. rebuild_and_write_reverse_deps (full rebuild from DB + workspace discovery)
  7. write_meta: fmm_version, generated_at
- Existing sidecar write path kept unchanged (backward compat for ALP-917)
- --force bypasses mtime check, --dry-run skips all DB writes

## Notes for next worker (ALP-915 — read path)
- DB is now populated by generate; ALP-915 loads Manifest from DB
- load_files_map() already loads the data needed; ALP-915 extends it to also
  populate exports, methods, function_index, export_all, etc.
- The double-parse (SQLite + sidecar) is accepted for this transition phase;
  ALP-917 removes the sidecar path which eliminates the redundant work
- Incremental indexing is fully implemented: second generate only re-parses
  changed files (mtime comparison)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed

### src/db/reader.rs (new)
- `load_manifest_from_db(conn, root)` — builds a complete Manifest from 5 queries
  (files, exports, methods, reverse_deps, workspace_packages); applies the same
  TS > JS export collision logic as load_from_sidecars
- `load_files()` — populates manifest.files with all metadata columns
- `load_exports()` — groups by file_path; populates FileEntry.exports + export_lines
  + export_index / export_locations / export_all (collision resolved) + function_index
  (first-wins from function_names cross-reference)
- `load_methods()` — populates method_index from methods table
- `load_reverse_deps()` — reads pre-computed reverse_deps table directly into HashMap
  (no O(N²) rebuild at load time)
- `load_workspace_packages()` — reads stored workspace data; falls back to runtime
  discovery if empty (e.g. pre-generate state)
- 4 unit tests: round-trip files/exports, TS>JS collision, methods, reverse_deps

### src/db/mod.rs
- Added `pub mod reader;`

### src/manifest/mod.rs
- Added `Manifest::load_from_sqlite(root)` — opens .fmm.db and calls load_manifest_from_db
- Added `Manifest::load(root)` — prefers SQLite when .fmm.db exists; falls back to
  sidecars with a one-time stderr warning; all callers updated to use this

### Call site updates (all use Manifest::load now)
- src/cli/commands/mod.rs: load_manifest() helper
- src/cli/glossary.rs: glossary() fn
- src/cli/search.rs: search() fn
- src/cli/init.rs: init() fn
- src/mcp/mod.rs: with_root() and reload()

## Performance
- Cold load from SQLite: 5 SELECT queries regardless of file count
- Reverse deps: read from pre-computed table (no O(N²) computation)
- Fallback: if no .fmm.db, load_from_sidecars runs as before (no regression)

## Notes for next worker (ALP-916 — CLI commands: init, validate, clean)
- validate() and clean() in src/cli/sidecar.rs still reference sidecars
- ALP-916 should update validate() to check DB indexed_at vs file mtime
- ALP-916 should update clean() to delete .fmm.db instead of .fmm files
- init.rs still calls Manifest::load (correct) but its output text still says
  "sidecars" — update strings in ALP-916/ALP-918

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed

### src/cli/sidecar.rs — validate() and clean() rewritten for SQLite

**validate():**
- Replaced re-parse + YAML content comparison with mtime-based DB check
- Opens .fmm.db; errors with clear message if DB not found
- Uses is_file_up_to_date() per file: compares indexed_at vs source mtime
- Distinguishes "stale" (in DB but outdated) vs "not indexed" (missing from DB)
- Same exit code contract: Ok(()) = all up to date, bail!() = stale files found

**clean():**
- Replaced filesystem walk deleting .fmm files with SQLite DELETE FROM files
- Added delete_db: bool parameter — clears contents by default, deletes .fmm.db file with --db
- DELETE FROM files cascades to exports, methods, reverse_deps (schema ON DELETE CASCADE)
- Transition block: also removes any legacy .fmm sidecar files still present (ALP-917 removes this)
- Legacy .fmm/ directory cleanup preserved

### src/db/writer.rs — file_mtime_rfc3339()
- Now includes nanoseconds (subsec_nanos) instead of truncating to whole seconds
- Fixes validate_fails_after_source_change test: same-second writes are now
  detectable on APFS/ext4 which provide nanosecond mtime precision

### src/cli/mod.rs — Commands::Clean
- Added --db flag (delete_db: bool): deletes .fmm.db file vs just clearing contents
- Updated command description from "sidecar files" to "index database"

### src/main.rs
- Destructures delete_db from Commands::Clean; passes to cli::clean()
- Updated "Cleaning sidecars..." banner to "Cleaning index..."

### src/cli/init.rs
- Added .gitignore hint: "Add '.fmm.db' to your .gitignore — the index is regeneratable"
- Updated "next steps" copy from "create sidecars" to "index your codebase"

### src/cli/commands/mod.rs
- Updated warn_no_sidecars() message from ".fmm sidecars" to "fmm index"

### tests/cli_integration.rs
- Updated 3 fmm::cli::clean() calls to pass delete_db=false (new param)

## Notes for next worker (ALP-917 — remove sidecar infrastructure)
- The transition block in clean() (sidecar per-file cleanup) is marked with a
  comment: "ALP-917 removes this block when the sidecar write path is deleted"
- The sidecar write path in generate() is also labeled for ALP-917
- FileProcessor, sidecar_path_for imports in sidecar.rs still needed until ALP-917

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Detailed plan for removing sidecar infrastructure:
- Which files to delete (formatter/mod.rs, sidecar_parser.rs)
- Which functions to remove from each file
- How to migrate each test file
- Key insights (watch.rs connection per-event pattern)

ALP-916 is Worker Done. ALP-917 is In Progress (no code changes yet).
DELETE FROM files cascades to exports and methods via ON DELETE CASCADE,
but reverse_deps has no FK relationship to files — it stores paths as plain
text. fmm clean (without --db) left stale reverse dep entries in the DB.

Added DELETE FROM reverse_deps to the clean batch so the DB reaches a
consistent empty state that matches what fmm generate would produce on a
fresh run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed

### Deleted
- src/formatter/mod.rs — YAML sidecar renderer (180 LOC)
- src/manifest/sidecar_parser.rs — YAML deserializer (384 LOC)

### src/lib.rs
- Removed `pub mod formatter;`

### src/extractor/mod.rs
- Removed: sidecar_path_for(), format_sidecar(), process(), validate(), clean(), content_without_modified()
- Removed: use crate::formatter::Frontmatter import
- Kept: FileProcessor::new(), extract_metadata(), parse(), parse_content()

### src/manifest/mod.rs
- Removed load_from_sidecars() method (300 LOC of YAML walker + parser)
- Removed: mod sidecar_parser, use sidecar_parser::parse_sidecar, use ignore::WalkBuilder
- Simplified load() to call load_from_sqlite() directly (no sidecar fallback)
- Updated doc comments

### src/mcp/mod.rs
- reload() now calls Manifest::load() instead of load_from_sidecars()

### src/cli/sidecar.rs
- generate(): removed sidecar write path (lines 106-158), kept SQLite path
  - Added dry-run output (shows what would be indexed without touching DB)
  - Updated output messages: "indexed" not "sidecar(s) written"
- clean(): removed transition block (per-file .fmm cleanup)

### src/cli/watch.rs
- Rewrote handle_event() to use SQLite directly (no FileProcessor.process())
- On Create/Modify: index_file() → open_or_create, parse, upsert, rebuild reverse deps
- On Remove: remove_file_from_db() → DELETE FROM files WHERE path = ?
- is_watchable(): filters .fmm.db and WAL files instead of .fmm sidecars
- Rewrote all tests to check DB state instead of sidecar file existence

### src/cli/status.rs
- Replaced sidecar count with SQLite file count (SELECT COUNT(*) FROM files)
- Removed: use crate::extractor::sidecar_path_for

### src/cli/init.rs
- Replaced "show sample sidecar" block with DB stats (file count, export count)
- Removed: use crate::extractor::sidecar_path_for
- Updated banner copy: "SQLite code intelligence" not "metadata sidecars"

### Cargo.toml
- Removed serde_yaml = "0.9" dependency

### src/format/mod.rs
- Added yaml_escape() function (moved from deleted formatter/mod.rs)

### src/format/{helpers,list_formatters,search_formatters,yaml_formatters}.rs
- Updated imports: crate::formatter::yaml_escape → crate::format::yaml_escape

### src/mcp/tools/glossary.rs
- Updated: crate::formatter::yaml_escape → crate::format::yaml_escape

### src/resolver/workspace.rs
- Replaced serde_yaml-based pnpm-workspace.yaml parser with line-based parser
- Removed extract_string_list() which depended on serde_yaml::Value

### tests/cross_package_resolution.rs
- load_manifest() now calls generate() + Manifest::load() (temporary stub)
- TODO: replace write_sidecar() with real TypeScript source files (see handover)
- Updated react_shared_downstream_count to use Manifest::load()

### tests/mcp_tools.rs
- manifest_loads_from_sidecars → manifest_loads_from_db
- Updated load_from_sidecars() → load() calls with graceful fallback

## Status: just check passes, tests need full migration

### Tests still needing full migration (next worker):

**tests/cli_integration.rs** — All sidecar_exists/sidecar_content assertions need
  → replace with db_exists(), db_indexed(), db_export_count() helpers
  → generate_creates_sidecars → generate_creates_db
  → generate_sidecar_content_is_valid_yaml → delete or rewrite checking DB content
  → generate_skips_unchanged_sidecars → check export count unchanged
  → generate_updates_stale_sidecars → check new export in DB
  → clean_removes_all_sidecars → check DB cleared (count=0)
  → respects_gitignore/fmmignore → check DB indexed/skipped paths
  → single_file_generate → check only one file in DB

**tests/mcp_tools.rs** — Full migration:
  → write_source_and_sidecar() → write_source() (no sidecar write)
  → At end of setup_mcp_server(): call fmm::cli::generate() before McpServer::with_root()
  → manifest_loads_from_db: assert files.len() == 5 (after generate works)

**tests/cross_package_resolution.rs** — Full migration:
  → write_sidecar() → write_ts_source() with real TypeScript
    - imports: [pkg-name] → import { x } from 'pkg-name';
    - dependencies: [./path] → import { y } from './path';
    - exports: [name: [line, line]] → export const name = 1;
  → All path assertions: absolute → relative
    BEFORE: root.join("packages/shared/utils.ts").to_string_lossy()
    AFTER: "packages/shared/utils.ts"

**tests/named_import_precision.rs** — Simplest migration:
  → Remove all .fmm sidecar writes (keep source file writes)
  → Add at end of setup_precision_server():
    fmm::cli::generate(&[root.to_str().unwrap().to_string()], false, false).unwrap();
  → The TS parser correctly extracts named_imports, function_names, namespace_imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g review

Test migration (all 4 remaining test files):
- tests/cli_integration.rs: replaced sidecar helpers with SQLite DB assertions
  via rusqlite; fixed single_file_generate root resolution
- tests/cross_package_resolution.rs: rewrote write_sidecar() as write_file()
  writing real source; all path assertions changed to relative keys
- tests/glossary.rs: all 6 setup functions now write real source + call generate()
- tests/mcp_tools.rs: all 4 setup functions migrated; removed write_source_and_sidecar
- tests/named_import_precision.rs: removed all .fmm sidecar writes; added generate()

Production bug fixes discovered during test migration:

1. src/db/writer.rs — rebuild_and_write_reverse_deps used relative DB keys as
   manifest.files keys, breaking the cross-package resolver (oxc_resolver needs
   absolute paths, canonicalize() fails on relative, Layer 3 fs::exists() returns
   false). Fix: convert relative→absolute before build_reverse_deps(), then strip
   root prefix back to relative for DB storage.

2. src/manifest/glossary_builder.rs — find_dependents() scanned all files via
   dep_matches/dotted_dep_matches, ignoring the precomputed reverse_deps index.
   Bare specifier cross-package imports (e.g. 'shared/ReactFeatureFlags') were
   never found, so cross-package callers never appeared in used_by. Fix: use
   reverse_deps when populated (SQLite path); fall back to scanning when empty
   (programmatic manifest path without rebuild_reverse_deps).

3. src/db/reader.rs — load_methods() populated manifest.method_index but not
   FileEntry.methods. The read_symbol tool's large-class redirect checks
   file_entry.methods for BigService.doWork* entries; finding none, it returned
   truncated source instead of the redirect hint. Fix: also populate FileEntry.methods
   in load_methods so the redirect fires correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…QLite

## What changed

### README.md
- Rewrote "What it does" section: SQLite database replaces per-file sidecar YAML
- Updated comparison table: "Read sidecar metadata" → "Query SQLite index"
- Updated architecture diagram: .fmm sidecars → .fmm.db (SQLite)
- Updated pipeline step 3: "writes .fmm sidecar" → "upserts into .fmm.db"
- Updated pipeline step 4: "reads sidecars on demand" → "loads index from SQLite"
- Updated commands table: sidecar language → index language throughout
- Updated CI/CD example: "Validate fmm sidecars" → "Validate fmm index"
- Quick start: "Live sidecar regeneration" → "Live index updates on file change"

### src/cli/help_text.rs
- generate: "Create and update .fmm sidecars" → "Index source files into SQLite database"
- watch: "regenerate sidecars on change" → "update the index on change"
- validate: "Check sidecars are current" → "Check the index is current"
- clean: "Remove all .fmm sidecars" → "Clear the fmm index database"

### src/cli/mod.rs
- All command long_about strings updated from sidecar to index terminology
- generate: long_about, notes, --force flag description
- validate: examples, error hint (Stale sidecars → Stale index)
- watch: long_about, notes section (no more .fmm feedback loop note)
- init: example comments, --no-generate description
- status: "sidecar counts" → "index counts"
- mcp: "Requires sidecars" → "Requires the index"
- resolve_root comment: "sidecar output" → "the index"

### src/main.rs
- "Generating sidecars..." → "Indexing source files..."
- "Validating sidecars..." → "Validating index..."

### src/mcp/mod.rs
- "No sidecars found" → "No index found"
- Comment: "Rebuild index from sidecars" → "Reload index"

### src/mcp/tools/read.rs + src/cli/commands/read.rs
- "regenerate sidecars with 'fmm generate' for v0.3 format" → "run 'fmm generate' to re-index"

### src/cli/search.rs + src/cli/glossary.rs
- "No .fmm sidecars found" → "No fmm index found"
- "fmm search queries sidecar metadata" → "Run 'fmm generate' first to build the index"
- "No sidecars found" → "No index found"

### tools.toml
- Removed "Requires line-range data from v0.3 sidecars." from fmm_read_symbol description

### build.rs (generates templates/SKILL.md)
- "## Sidecar Fallback" section replaced with "## CLI Fallback"
- Removed YAML sidecar example; replaced with fmm CLI commands

### templates/SKILL.md (auto-generated by build.rs)
- Regenerated with CLI Fallback section replacing Sidecar Fallback

### src/mcp/generated_schema.rs (auto-generated by build.rs)
- Regenerated with updated fmm_read_symbol description

### CONTRIBUTING_LANGUAGE.md
- Step 7 rewritten: "cargo run -- sidecar" → "cargo run -- generate && outline"
- Checklist item 7 updated to match

### examples/demo-project/WALKTHROUGH.md
- "pre-generated .fmm sidecars" → "Run fmm generate to build the index"
- "reads 8 sidecars totaling ~56 lines of YAML" → "queries the fmm index"
- "~200 tokens to read all sidecars" → "~200 tokens to query the index"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sidecar write path was removed in ALP-917 but the 121 committed
.fmm sidecar files remained, permanently stale since generate() no
longer writes them.

Generate the SQLite index for the fmm codebase and commit it instead.
This preserves the same invariant as the old sidecars (fmm MCP tools
work on a fresh clone without running fmm generate first) while
completing the migration cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@srobinson srobinson merged commit dabc5f3 into main Mar 7, 2026
1 check passed
@srobinson srobinson deleted the nancy/ALP-912 branch March 7, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant