Skip to content

parse_generic_imports: O(N²) ts_node_child() loop hangs on large AST roots #106

@halindrome

Description

@halindrome

Summary

parse_generic_imports in internal/cbm/extract_imports.c iterates top-level AST nodes using indexed access, which is O(N²) due to how tree-sitter implements child lookup internally.

Root Cause

uint32_t count = ts_node_child_count(ctx->root);
for (uint32_t i = 0; i < count; i++) {
    TSNode node = ts_node_child(ctx->root, i);  // O(i) per call

ts_node_child(node, i) is not O(1) random access — it walks the child linked list from index 0 to i on every call via ts_node_child_with_descendant()ts_node_child_iterator_next. For a root with N children this is O(N²) total.

ts_node_next_sibling() has the same problem: it also calls ts_node_child_with_descendant() internally.

Impact

On repos with generated TypeScript .d.ts files or large Perl files, the AST root node can have 5,000–50,000 top-level children. At 10,000 children: ~50M iterator steps, causing the indexer to spin at 100% CPU indefinitely until killed.

Observed on a monorepo with 6 submodules containing TypeScript and Perl. Both full mode and fast mode hang. The process never completes.

CPU profile stack (from sample(1)):

cbm_pipeline_pass_semantic
  → cbm_extract_file
    → parse_generic_imports
      → ts_node_child / ts_node_next_sibling
        → ts_node_child_with_descendant
          → ts_node_child_iterator_next  ← spinning here

Fix

Use TSTreeCursor for sibling traversal. ts_tree_cursor_goto_next_sibling() maintains cursor state across calls and is O(1) per step:

TSTreeCursor cursor = ts_tree_cursor_new(ctx->root);
if (ts_tree_cursor_goto_first_child(&cursor)) {
    do {
        TSNode node = ts_tree_cursor_current_node(&cursor);
        // ... existing match logic unchanged ...
    } while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);

Total complexity: O(N). At 10,000 children: 10,000 steps vs ~50M.

Verification

After fix: a previously-hanging monorepo with 6 submodules (TypeScript, Perl, Java, HTML, CSS) indexed successfully in 71 seconds — 142,908 nodes, 218,685 edges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    stability/performanceServer crashes, OOM, hangs, high CPU/memory

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions