Skip to content

feat(refs): Tree-sitter identifier-aware symbol references#49

Merged
PatrickSys merged 2 commits intomasterfrom
feat/phase2-symbol-refs-treesitter
Feb 28, 2026
Merged

feat(refs): Tree-sitter identifier-aware symbol references#49
PatrickSys merged 2 commits intomasterfrom
feat/phase2-symbol-refs-treesitter

Conversation

@PatrickSys
Copy link
Owner

Improve refs --symbol precision by using Tree-sitter identifier occurrences when the file exists and a curated grammar is available; fall back to legacy chunk-regex otherwise. Adds coverage for excluding comment/string-only matches. No docs changes.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2aa0831510

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@greptile-apps
Copy link

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR enhances get_symbol_references precision by using Tree-sitter AST analysis to identify genuine code identifiers, automatically filtering out matches in comments and strings. When Tree-sitter is available and the file exists, it performs an identifier-aware scan; otherwise, it gracefully falls back to the legacy chunk-regex approach.

Major changes:

  • Refactored findSymbolReferences to group chunks by file and prefer reading actual file content over chunk content for accuracy
  • Added findIdentifierOccurrences in tree-sitter utils that traverses the AST and excludes non-code contexts (comments, strings, template strings, regex, jsx_text)
  • Implemented proper deduplication, resource cleanup (tree.delete in finally), and error handling with fallback
  • Added comprehensive test coverage validating comment/string exclusion

Code quality:

  • Maintains backward compatibility and project constraints (zero-infra, language-agnostic, privacy-first)
  • Proper type safety with minimal use of type assertions
  • Good separation of concerns between core logic and tree-sitter utilities
  • One minor style issue: test file uses as any to access private members

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects excellent code quality with proper error handling, graceful fallback mechanisms, good test coverage, and no breaking changes. The implementation is well-structured, maintains project constraints, and includes proper resource cleanup. Only one minor style issue found regarding any usage in tests.
  • No files require special attention

Important Files Changed

Filename Overview
src/core/symbol-references.ts Refactored to use Tree-sitter for identifier matching when available, with proper fallback to regex. Clean implementation with good error handling.
src/utils/tree-sitter.ts Added findIdentifierOccurrences function with proper AST traversal and comment/string filtering. Includes proper resource cleanup and error handling.
tests/get-symbol-references.test.ts Added comprehensive test validating that comments and strings are excluded from symbol references when Tree-sitter is available.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([findSymbolReferences called]) --> LoadIndex[Load keyword index]
    LoadIndex --> Prefilter[Prefilter chunks with regex]
    Prefilter --> GroupFiles[Group chunks by file path]
    GroupFiles --> IterateFiles{For each file}
    
    IterateFiles --> CheckFile{File exists<br/>on disk?}
    CheckFile -->|No| FallbackRegex[Use chunk-regex fallback]
    CheckFile -->|Yes| ReadFile[Read file content]
    
    ReadFile --> DetectLang[Detect language]
    DetectLang --> CheckTreeSitter{Tree-sitter<br/>supported?}
    
    CheckTreeSitter -->|No| FallbackRegex
    CheckTreeSitter -->|Yes| ParseAST[Parse with Tree-sitter]
    
    ParseAST --> CheckError{Parse<br/>error?}
    CheckError -->|Yes| FallbackRegex
    CheckError -->|No| TraverseAST[Traverse AST for identifiers]
    
    TraverseAST --> FilterContext[Filter out comments/strings]
    FilterContext --> Dedupe[Deduplicate occurrences]
    Dedupe --> CountMatches[Add to usageCount]
    CountMatches --> NextFile[Continue to next file]
    
    FallbackRegex --> RegexScan[Regex scan in chunks]
    RegexScan --> CountRegex[Add matches to usageCount]
    CountRegex --> NextFile
    
    NextFile --> IterateFiles
    IterateFiles -->|Done| Return[Return results with usages]
    Return --> End([End])
Loading

Last reviewed commit: 2aa0831

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

);

const { server } = await import('../src/index.js');
const handler = (server as any)._requestHandlers.get('tools/call');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violates AGENTS.md guideline: "Avoid using any Type AT ALL COSTS." Consider using a typed interface or unknown with proper type guards to access the handler.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@PatrickSys
Copy link
Owner Author

Addressed review feedback:\n- Security: avoid reading index-provided paths outside CODEBASE_ROOT. We now only read files when the resolved path stays under root; otherwise we fall back to chunk-regex. (commit 1735e3c)\n- Tests: removed newly-introduced �s any access by adding a typed tools/call handler helper. (commit 1735e3c)

@PatrickSys PatrickSys merged commit c23ffec into master Feb 28, 2026
3 checks passed
@PatrickSys PatrickSys deleted the feat/phase2-symbol-refs-treesitter branch March 2, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant