perf: SQLite manifest store — replace per-file YAML sidecars (#ALP-912)#101
Merged
perf: SQLite manifest store — replace per-file YAML sidecars (#ALP-912)#101
Conversation
- Added `rusqlite = { version = "0.32", features = ["bundled"] }` to Cargo.toml
- Created `src/db/mod.rs` with full schema + connection management:
- `open_or_create(root)` — creates `.fmm.db` at repo root if absent, applies
pragmas (WAL, synchronous=NORMAL, mmap_size=256MB, temp_store=memory,
foreign_keys=ON), runs schema creation or migration on version mismatch
- `open_db(root)` — opens existing DB with pragmas, errors if file missing
- `ensure_schema()` reads stored schema_version from meta table; drops and
recreates all tables when version mismatches (regeneratable index)
- Schema: files, exports (idx_exports_name, idx_exports_file), methods
(idx_methods_name), reverse_deps (idx_reverse_deps_target),
workspace_packages, meta
- DB_FILENAME and SCHEMA_VERSION exported as pub constants
- Exposed `pub mod db` in src/lib.rs
- 6 unit tests: table creation, schema_version in meta, idempotency, WAL mode
active, schema migration on version mismatch, open_db error when no file
- just check && just test clean
Next: ALP-914 — write path: fmm generate populates SQLite
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed ### src/db/writer.rs (new) - `file_mtime_rfc3339(path)` — reads source file mtime as RFC3339 string - `is_file_up_to_date(conn, path, mtime)` — compares source mtime to DB indexed_at - `upsert_file_data(tx, rel_path, result, mtime)` — inserts/replaces file row + exports + methods inside an existing transaction; CASCADE on files PK cleans old exports/methods automatically; deduplicates method dotted names (overloads) - `load_files_map(conn)` — loads all files into HashMap<String, FileEntry> for reverse-dep computation (imports/dependencies/named_imports populated) - `rebuild_and_write_reverse_deps(conn, root)` — loads files from DB, builds minimal Manifest with workspace info, calls manifest.rebuild_reverse_deps(), persists results to reverse_deps table in a transaction - `write_reverse_deps(conn, rev_deps)` — clears + bulk-inserts reverse_deps - `upsert_workspace_packages(conn, packages)` — stores workspace package map - `write_meta(conn, key, value)` — writes meta key-value pairs - `extract_function_names(custom_fields)` — extracts TypeScript function_names from parser custom_fields HashMap - 9 unit tests covering: upsert+query, export CASCADE on replace, method dedup, incremental up-to-date check, meta roundtrip, workspace packages, load_files_map ### src/extractor/mod.rs - Added `pub fn parse(&self, path: &Path) -> Result<ParseResult>` — public wrapper around parse_content; returns full ParseResult (metadata + custom_fields) for the SQLite write path ### src/db/mod.rs - Added `pub mod writer;` ### src/cli/sidecar.rs — generate() - SQLite write path runs BEFORE sidecar path (so DB is fresh on next load): 1. open_or_create DB 2. Discover and store workspace packages 3. Sequential mtime check: build list of dirty files (skips if indexed_at >= mtime) 4. Parallel parse of dirty files only (rayon) 5. Single transaction: upsert_file_data for each parsed result 6. rebuild_and_write_reverse_deps (full rebuild from DB + workspace discovery) 7. write_meta: fmm_version, generated_at - Existing sidecar write path kept unchanged (backward compat for ALP-917) - --force bypasses mtime check, --dry-run skips all DB writes ## Notes for next worker (ALP-915 — read path) - DB is now populated by generate; ALP-915 loads Manifest from DB - load_files_map() already loads the data needed; ALP-915 extends it to also populate exports, methods, function_index, export_all, etc. - The double-parse (SQLite + sidecar) is accepted for this transition phase; ALP-917 removes the sidecar path which eliminates the redundant work - Incremental indexing is fully implemented: second generate only re-parses changed files (mtime comparison) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed ### src/db/reader.rs (new) - `load_manifest_from_db(conn, root)` — builds a complete Manifest from 5 queries (files, exports, methods, reverse_deps, workspace_packages); applies the same TS > JS export collision logic as load_from_sidecars - `load_files()` — populates manifest.files with all metadata columns - `load_exports()` — groups by file_path; populates FileEntry.exports + export_lines + export_index / export_locations / export_all (collision resolved) + function_index (first-wins from function_names cross-reference) - `load_methods()` — populates method_index from methods table - `load_reverse_deps()` — reads pre-computed reverse_deps table directly into HashMap (no O(N²) rebuild at load time) - `load_workspace_packages()` — reads stored workspace data; falls back to runtime discovery if empty (e.g. pre-generate state) - 4 unit tests: round-trip files/exports, TS>JS collision, methods, reverse_deps ### src/db/mod.rs - Added `pub mod reader;` ### src/manifest/mod.rs - Added `Manifest::load_from_sqlite(root)` — opens .fmm.db and calls load_manifest_from_db - Added `Manifest::load(root)` — prefers SQLite when .fmm.db exists; falls back to sidecars with a one-time stderr warning; all callers updated to use this ### Call site updates (all use Manifest::load now) - src/cli/commands/mod.rs: load_manifest() helper - src/cli/glossary.rs: glossary() fn - src/cli/search.rs: search() fn - src/cli/init.rs: init() fn - src/mcp/mod.rs: with_root() and reload() ## Performance - Cold load from SQLite: 5 SELECT queries regardless of file count - Reverse deps: read from pre-computed table (no O(N²) computation) - Fallback: if no .fmm.db, load_from_sidecars runs as before (no regression) ## Notes for next worker (ALP-916 — CLI commands: init, validate, clean) - validate() and clean() in src/cli/sidecar.rs still reference sidecars - ALP-916 should update validate() to check DB indexed_at vs file mtime - ALP-916 should update clean() to delete .fmm.db instead of .fmm files - init.rs still calls Manifest::load (correct) but its output text still says "sidecars" — update strings in ALP-916/ALP-918 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed ### src/cli/sidecar.rs — validate() and clean() rewritten for SQLite **validate():** - Replaced re-parse + YAML content comparison with mtime-based DB check - Opens .fmm.db; errors with clear message if DB not found - Uses is_file_up_to_date() per file: compares indexed_at vs source mtime - Distinguishes "stale" (in DB but outdated) vs "not indexed" (missing from DB) - Same exit code contract: Ok(()) = all up to date, bail!() = stale files found **clean():** - Replaced filesystem walk deleting .fmm files with SQLite DELETE FROM files - Added delete_db: bool parameter — clears contents by default, deletes .fmm.db file with --db - DELETE FROM files cascades to exports, methods, reverse_deps (schema ON DELETE CASCADE) - Transition block: also removes any legacy .fmm sidecar files still present (ALP-917 removes this) - Legacy .fmm/ directory cleanup preserved ### src/db/writer.rs — file_mtime_rfc3339() - Now includes nanoseconds (subsec_nanos) instead of truncating to whole seconds - Fixes validate_fails_after_source_change test: same-second writes are now detectable on APFS/ext4 which provide nanosecond mtime precision ### src/cli/mod.rs — Commands::Clean - Added --db flag (delete_db: bool): deletes .fmm.db file vs just clearing contents - Updated command description from "sidecar files" to "index database" ### src/main.rs - Destructures delete_db from Commands::Clean; passes to cli::clean() - Updated "Cleaning sidecars..." banner to "Cleaning index..." ### src/cli/init.rs - Added .gitignore hint: "Add '.fmm.db' to your .gitignore — the index is regeneratable" - Updated "next steps" copy from "create sidecars" to "index your codebase" ### src/cli/commands/mod.rs - Updated warn_no_sidecars() message from ".fmm sidecars" to "fmm index" ### tests/cli_integration.rs - Updated 3 fmm::cli::clean() calls to pass delete_db=false (new param) ## Notes for next worker (ALP-917 — remove sidecar infrastructure) - The transition block in clean() (sidecar per-file cleanup) is marked with a comment: "ALP-917 removes this block when the sidecar write path is deleted" - The sidecar write path in generate() is also labeled for ALP-917 - FileProcessor, sidecar_path_for imports in sidecar.rs still needed until ALP-917 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Detailed plan for removing sidecar infrastructure: - Which files to delete (formatter/mod.rs, sidecar_parser.rs) - Which functions to remove from each file - How to migrate each test file - Key insights (watch.rs connection per-event pattern) ALP-916 is Worker Done. ALP-917 is In Progress (no code changes yet).
DELETE FROM files cascades to exports and methods via ON DELETE CASCADE, but reverse_deps has no FK relationship to files — it stores paths as plain text. fmm clean (without --db) left stale reverse dep entries in the DB. Added DELETE FROM reverse_deps to the clean batch so the DB reaches a consistent empty state that matches what fmm generate would produce on a fresh run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## What changed
### Deleted
- src/formatter/mod.rs — YAML sidecar renderer (180 LOC)
- src/manifest/sidecar_parser.rs — YAML deserializer (384 LOC)
### src/lib.rs
- Removed `pub mod formatter;`
### src/extractor/mod.rs
- Removed: sidecar_path_for(), format_sidecar(), process(), validate(), clean(), content_without_modified()
- Removed: use crate::formatter::Frontmatter import
- Kept: FileProcessor::new(), extract_metadata(), parse(), parse_content()
### src/manifest/mod.rs
- Removed load_from_sidecars() method (300 LOC of YAML walker + parser)
- Removed: mod sidecar_parser, use sidecar_parser::parse_sidecar, use ignore::WalkBuilder
- Simplified load() to call load_from_sqlite() directly (no sidecar fallback)
- Updated doc comments
### src/mcp/mod.rs
- reload() now calls Manifest::load() instead of load_from_sidecars()
### src/cli/sidecar.rs
- generate(): removed sidecar write path (lines 106-158), kept SQLite path
- Added dry-run output (shows what would be indexed without touching DB)
- Updated output messages: "indexed" not "sidecar(s) written"
- clean(): removed transition block (per-file .fmm cleanup)
### src/cli/watch.rs
- Rewrote handle_event() to use SQLite directly (no FileProcessor.process())
- On Create/Modify: index_file() → open_or_create, parse, upsert, rebuild reverse deps
- On Remove: remove_file_from_db() → DELETE FROM files WHERE path = ?
- is_watchable(): filters .fmm.db and WAL files instead of .fmm sidecars
- Rewrote all tests to check DB state instead of sidecar file existence
### src/cli/status.rs
- Replaced sidecar count with SQLite file count (SELECT COUNT(*) FROM files)
- Removed: use crate::extractor::sidecar_path_for
### src/cli/init.rs
- Replaced "show sample sidecar" block with DB stats (file count, export count)
- Removed: use crate::extractor::sidecar_path_for
- Updated banner copy: "SQLite code intelligence" not "metadata sidecars"
### Cargo.toml
- Removed serde_yaml = "0.9" dependency
### src/format/mod.rs
- Added yaml_escape() function (moved from deleted formatter/mod.rs)
### src/format/{helpers,list_formatters,search_formatters,yaml_formatters}.rs
- Updated imports: crate::formatter::yaml_escape → crate::format::yaml_escape
### src/mcp/tools/glossary.rs
- Updated: crate::formatter::yaml_escape → crate::format::yaml_escape
### src/resolver/workspace.rs
- Replaced serde_yaml-based pnpm-workspace.yaml parser with line-based parser
- Removed extract_string_list() which depended on serde_yaml::Value
### tests/cross_package_resolution.rs
- load_manifest() now calls generate() + Manifest::load() (temporary stub)
- TODO: replace write_sidecar() with real TypeScript source files (see handover)
- Updated react_shared_downstream_count to use Manifest::load()
### tests/mcp_tools.rs
- manifest_loads_from_sidecars → manifest_loads_from_db
- Updated load_from_sidecars() → load() calls with graceful fallback
## Status: just check passes, tests need full migration
### Tests still needing full migration (next worker):
**tests/cli_integration.rs** — All sidecar_exists/sidecar_content assertions need
→ replace with db_exists(), db_indexed(), db_export_count() helpers
→ generate_creates_sidecars → generate_creates_db
→ generate_sidecar_content_is_valid_yaml → delete or rewrite checking DB content
→ generate_skips_unchanged_sidecars → check export count unchanged
→ generate_updates_stale_sidecars → check new export in DB
→ clean_removes_all_sidecars → check DB cleared (count=0)
→ respects_gitignore/fmmignore → check DB indexed/skipped paths
→ single_file_generate → check only one file in DB
**tests/mcp_tools.rs** — Full migration:
→ write_source_and_sidecar() → write_source() (no sidecar write)
→ At end of setup_mcp_server(): call fmm::cli::generate() before McpServer::with_root()
→ manifest_loads_from_db: assert files.len() == 5 (after generate works)
**tests/cross_package_resolution.rs** — Full migration:
→ write_sidecar() → write_ts_source() with real TypeScript
- imports: [pkg-name] → import { x } from 'pkg-name';
- dependencies: [./path] → import { y } from './path';
- exports: [name: [line, line]] → export const name = 1;
→ All path assertions: absolute → relative
BEFORE: root.join("packages/shared/utils.ts").to_string_lossy()
AFTER: "packages/shared/utils.ts"
**tests/named_import_precision.rs** — Simplest migration:
→ Remove all .fmm sidecar writes (keep source file writes)
→ Add at end of setup_precision_server():
fmm::cli::generate(&[root.to_str().unwrap().to_string()], false, false).unwrap();
→ The TS parser correctly extracts named_imports, function_names, namespace_imports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g review Test migration (all 4 remaining test files): - tests/cli_integration.rs: replaced sidecar helpers with SQLite DB assertions via rusqlite; fixed single_file_generate root resolution - tests/cross_package_resolution.rs: rewrote write_sidecar() as write_file() writing real source; all path assertions changed to relative keys - tests/glossary.rs: all 6 setup functions now write real source + call generate() - tests/mcp_tools.rs: all 4 setup functions migrated; removed write_source_and_sidecar - tests/named_import_precision.rs: removed all .fmm sidecar writes; added generate() Production bug fixes discovered during test migration: 1. src/db/writer.rs — rebuild_and_write_reverse_deps used relative DB keys as manifest.files keys, breaking the cross-package resolver (oxc_resolver needs absolute paths, canonicalize() fails on relative, Layer 3 fs::exists() returns false). Fix: convert relative→absolute before build_reverse_deps(), then strip root prefix back to relative for DB storage. 2. src/manifest/glossary_builder.rs — find_dependents() scanned all files via dep_matches/dotted_dep_matches, ignoring the precomputed reverse_deps index. Bare specifier cross-package imports (e.g. 'shared/ReactFeatureFlags') were never found, so cross-package callers never appeared in used_by. Fix: use reverse_deps when populated (SQLite path); fall back to scanning when empty (programmatic manifest path without rebuild_reverse_deps). 3. src/db/reader.rs — load_methods() populated manifest.method_index but not FileEntry.methods. The read_symbol tool's large-class redirect checks file_entry.methods for BigService.doWork* entries; finding none, it returned truncated source instead of the redirect hint. Fix: also populate FileEntry.methods in load_methods so the redirect fires correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…QLite ## What changed ### README.md - Rewrote "What it does" section: SQLite database replaces per-file sidecar YAML - Updated comparison table: "Read sidecar metadata" → "Query SQLite index" - Updated architecture diagram: .fmm sidecars → .fmm.db (SQLite) - Updated pipeline step 3: "writes .fmm sidecar" → "upserts into .fmm.db" - Updated pipeline step 4: "reads sidecars on demand" → "loads index from SQLite" - Updated commands table: sidecar language → index language throughout - Updated CI/CD example: "Validate fmm sidecars" → "Validate fmm index" - Quick start: "Live sidecar regeneration" → "Live index updates on file change" ### src/cli/help_text.rs - generate: "Create and update .fmm sidecars" → "Index source files into SQLite database" - watch: "regenerate sidecars on change" → "update the index on change" - validate: "Check sidecars are current" → "Check the index is current" - clean: "Remove all .fmm sidecars" → "Clear the fmm index database" ### src/cli/mod.rs - All command long_about strings updated from sidecar to index terminology - generate: long_about, notes, --force flag description - validate: examples, error hint (Stale sidecars → Stale index) - watch: long_about, notes section (no more .fmm feedback loop note) - init: example comments, --no-generate description - status: "sidecar counts" → "index counts" - mcp: "Requires sidecars" → "Requires the index" - resolve_root comment: "sidecar output" → "the index" ### src/main.rs - "Generating sidecars..." → "Indexing source files..." - "Validating sidecars..." → "Validating index..." ### src/mcp/mod.rs - "No sidecars found" → "No index found" - Comment: "Rebuild index from sidecars" → "Reload index" ### src/mcp/tools/read.rs + src/cli/commands/read.rs - "regenerate sidecars with 'fmm generate' for v0.3 format" → "run 'fmm generate' to re-index" ### src/cli/search.rs + src/cli/glossary.rs - "No .fmm sidecars found" → "No fmm index found" - "fmm search queries sidecar metadata" → "Run 'fmm generate' first to build the index" - "No sidecars found" → "No index found" ### tools.toml - Removed "Requires line-range data from v0.3 sidecars." from fmm_read_symbol description ### build.rs (generates templates/SKILL.md) - "## Sidecar Fallback" section replaced with "## CLI Fallback" - Removed YAML sidecar example; replaced with fmm CLI commands ### templates/SKILL.md (auto-generated by build.rs) - Regenerated with CLI Fallback section replacing Sidecar Fallback ### src/mcp/generated_schema.rs (auto-generated by build.rs) - Regenerated with updated fmm_read_symbol description ### CONTRIBUTING_LANGUAGE.md - Step 7 rewritten: "cargo run -- sidecar" → "cargo run -- generate && outline" - Checklist item 7 updated to match ### examples/demo-project/WALKTHROUGH.md - "pre-generated .fmm sidecars" → "Run fmm generate to build the index" - "reads 8 sidecars totaling ~56 lines of YAML" → "queries the fmm index" - "~200 tokens to read all sidecars" → "~200 tokens to query the index" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sidecar write path was removed in ALP-917 but the 121 committed .fmm sidecar files remained, permanently stale since generate() no longer writes them. Generate the SQLite index for the fmm codebase and commit it instead. This preserves the same invariant as the old sidecars (fmm MCP tools work on a fresh clone without running fmm generate first) while completing the migration cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.fmmYAML sidecars with a single.fmm.dbSQLite database for manifest storagesrc/db/module: schema design (mod.rs), write path (writer.rs), read path (reader.rs)fmm generatewrites to SQLite with incremental mtime-based updates;Manifest::loadreads SQLite-first with sidecar fallbackinit,validate,clean) migrated to SQLite with--dbflag supportserde_yamldependency and all sidecar infrastructure (sidecar_parser.rs, 121 committed.fmmfiles)Sub-issues
Test plan
just check && just build && just testall pass (57 unit + integration tests)fmm generateproduces.fmm.dband no.fmmsidecarsfmm validatereads from SQLitefmm cleanremoves.fmm.dband legacy.fmm/directory🤖 Generated with Claude Code