Skip to content

feat: TypeScript scale — make fmm work on 39k+ file codebases (#ALP-923)#104

Merged
srobinson merged 7 commits intomainfrom
nancy/ALP-923
Mar 7, 2026
Merged

feat: TypeScript scale — make fmm work on 39k+ file codebases (#ALP-923)#104
srobinson merged 7 commits intomainfrom
nancy/ALP-923

Conversation

@srobinson
Copy link
Owner

Summary

Makes fmm generate viable for large TypeScript codebases (39k+ files) through three coordinated improvements:

  • Progress indicators (ALP-920): indicatif progress bars with phase timing, --quiet flag for CI
  • Parallelization (ALP-921): Bulk staleness check (39k queries → 1 SELECT), parallel JSON pre-serialization via rayon
  • Nested symbol extraction (ALP-922): Depth-1 functions and prologue vars extracted from mega-functions, indexed in method_index, searchable via fmm_search

Key design decisions

  • Schema version bumped to 2 (kind TEXT column on methods table)
  • Only depth-1 nested declarations extracted (depth-2+ intentionally skipped)
  • Prologue vars filtered to non-trivial initializers (call/new/as/type annotation)
  • nested-fn and closure-state kinds added to ExportEntry
  • Phase 2b (parallel serialization) tracked separately in timing breakdown

Test plan

  • All 57 existing tests pass
  • New unit tests for nested symbol extraction (7 tests in typescript.rs)
  • fmm generate on a large TypeScript repo shows progress bars and timing
  • fmm generate --quiet suppresses progress, prints summary only
  • fmm search "nested function name" finds symbols inside mega-functions

🤖 Generated with Claude Code

srobinson and others added 7 commits March 8, 2026 04:10
- Add `indicatif = { version = "0.17", features = ["rayon"] }` to Cargo.toml
- Rewrite `sidecar::generate()` with four-phase progress reporting:
  - Scan phase: spinner while walking directory tree
  - Phase 2 (parse): progress bar with files/s and ETA via rayon ParallelProgressIterator
  - Phase 3 (write): sequential progress bar per file written to DB
  - Phase 4 (deps): spinner while rebuilding reverse dependency graph
- Add `--quiet` / `-q` flag to `fmm generate` — suppresses all progress,
  prints only the final "Done ✓  N file(s) indexed in Xs" summary line
- Timing breakdown always printed (parse/write/deps/other) unless --quiet
- Progress bars suppressed for trivial runs (< 10 dirty files)
- Remove per-file `✓ file.ts` output (replaced by progress bars)
- "all up to date" path now reports total elapsed: "Found N files · all up to date (Xs)"
- Update all call sites: watch.rs, init.rs, and 28 test invocations (quiet=true in tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…les)

Phase 1 — bulk staleness check (39k queries → 1):
- Add db::writer::load_indexed_mtimes(): loads all (path, indexed_at) pairs
  in a single SELECT, returning a HashMap for in-memory comparison
- Replace 39k individual is_file_up_to_date() calls with one bulk query
- Parallelize mtime syscalls via rayon par_iter() (OS can pipeline stat()
  calls; measurable gain on M-series SSD with 12+ cores)

Phase 2b — parallel JSON pre-serialization:
- Add PreserializedRow struct, ExportRecord, MethodRecord
- Add serialize_file_data(): does all serde_json::to_string work outside
  the single-threaded SQLite transaction — CPU-bound, rayon-safe
- Add upsert_preserialized(): writes pre-serialized bytes, no JSON work
- Insert parallel pre-serialization pass between Phase 2 (parse) and
  Phase 3 (write); 195k serde calls across 39k files now run in parallel

Phase 3 uses upsert_preserialized() instead of upsert_file_data() so
the transaction loop is pure SQLite I/O with zero serialization overhead.

upsert_file_data() kept for backwards compatibility (tests, validate path).

Expected impact on TypeScript repo (39k files, M-series Mac):
- Phase 1: ~2s → <0.1s (single query vs 39k queries)
- Phase 3: serialization cost moves to Phase 2b and runs in parallel

All 57 tests pass; no data corruption (verified with existing test suite).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two correctness fixes in generate() output:

- Add phase2b_elapsed to track parallel serialization time separately.
  Previously this cost was absorbed into "other", making the timing
  breakdown useless for diagnosing serialization performance on 39k+
  file repos. Now shown as "serialize: Xs" between parse and write.

- Use serialized_rows.len() instead of dirty_files.len() for the
  "Done N file(s) indexed" summary. dirty_files includes files that
  failed to parse or serialize; serialized_rows is the count actually
  written to the DB.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends tree-sitter extraction to capture depth-1 declarations inside
function bodies (both exported and non-exported top-level functions).

## What was extracted

- Depth-1 function declarations → kind "nested-fn". Fully searchable.
  Naming: "createTypeChecker.getIndexType" (parent.child convention).
- Depth-1 non-trivial prologue var/const/let declarations → kind "closure-state".
  Non-trivial = has a call expression, new expression, as_expression, or type
  annotation in its initializer. Trivial literals (false, 0, "") are skipped.
  Prologue = before the first nested function declaration in the body.
- Depth > 1 declarations: NOT extracted (locals inside nested functions).

## Changes

src/parser/mod.rs:
- Added `kind: Option<String>` field to ExportEntry (skipped in serde for backward compat).
- Added ExportEntry::nested_fn() and ExportEntry::closure_state() constructors.

src/parser/builtin/typescript.rs:
- Added extract_nested_symbols() — walks top-level function_declaration nodes
  (exported and bare), walks their body for depth-1 nested fns and prologue vars.
- Added is_non_trivial_declarator() for prologue filter logic.
- Called from parse_with_aliases() after class method extraction.
- 7 new unit tests covering all acceptance criteria.

src/manifest/mod.rs:
- FileEntry gains nested_fns and closure_state HashMaps (#[serde(skip)]).
- From<Metadata> routes ExportEntry by kind into the correct bucket.
- add_file() clears nested_fns/closure_state from method_index on update.
- remove_file() clears nested_fns/closure_state from method_index on remove.
- All nested symbols (all kinds) go into method_index for fmm_search.

src/search.rs:
- Added step 2 (exact method_index match) and 2b (fuzzy method_index scan).
- "silentNeverType" now finds "createTypeChecker.silentNeverType" via 2b.
- Added seen_exports.insert() after fuzzy export hits to prevent duplicates.

src/format/yaml_formatters.rs:
- format_file_outline() refactored to handle nested_fn and closure_state sub-entries.
- Nested functions always shown under parent. Closure-state shown only with include_private.
- Header annotation updated (e.g. "52 lines, 3 nested functions, 2 closure-state").
- Sub-entries sorted by start line when mixed kinds are present.

src/db/mod.rs:
- SCHEMA_VERSION bumped to 2.
- methods table gains `kind TEXT` column (NULL = class method).

src/db/writer.rs:
- MethodRecord gains kind field. Both upsert paths pass kind to the INSERT.

src/db/reader.rs:
- load_methods() reads kind column, routes into nested_fns/closure_state/methods.

src/mcp/tests.rs + tests/cross_language_validation.rs:
- Added ..Default::default() for new FileEntry fields in mcp tests.
- Updated typescript_real_repo_internal_module to assert export_names().is_empty()
  instead of exports.is_empty() (nested symbols now extracted from non-exported fns).

fixtures/typescript/mega_function.ts:
- New fixture: createTypeChecker with prologue vars (non-trivial and trivial),
  depth-1 nested fns, depth-2 nested fn, and non-exported internalHelper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…comment

- Drop source_bytes parameter from is_non_trivial_declarator(): it was
  never used inside the function. The let _ = source_bytes suppression
  inside the inner loop was the only reference, added purely to silence
  the compiler warning. Call site updated accordingly.

- Fix step 2 comment in bare_search(): the old comment claimed step 2
  handles "silentNeverType" -> "createTypeChecker.silentNeverType" but
  that is step 2b (fuzzy contains). Step 2 is an exact lookup by the
  full dotted key, so the comment now reflects that.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@srobinson srobinson enabled auto-merge (squash) March 7, 2026 22:11
@srobinson srobinson merged commit a5767e8 into main Mar 7, 2026
1 check passed
@srobinson srobinson deleted the nancy/ALP-923 branch March 7, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant