-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Summary
search_code currently searches the workspace recursively instead of limiting itself to the files that are actually indexed in the project graph.
This can produce noisy or misleading results, especially when generated artifacts, cached files, or binary outputs exist under the repository.
Problem
The current implementation of search_code shells out to recursive grep over the project root.
That means the tool can return matches from files that are not part of the indexed codebase, such as:
- generated files under
build/ - cached outputs
- other non-source artifacts
- binary files that happen to contain matching bytes
This breaks the expectation that MCP code search should reflect the indexed project state.
User-visible impact
In practice, this causes:
- unexpected matches from generated artifacts
- polluted search output
- incorrect file paths in results when binary content is involved
- mismatch between graph-based discovery and text search behavior
So the tool may report results that are not actually part of the indexed repository model.
Expected behavior
search_code should search only within files that are part of the indexed project.
That means:
- use the indexed
Filenodes as the search scope - apply
file_patternagainst indexed file paths - ignore binary files
- keep support for both plain substring search and regex search
Proposed fix
Refactor search_code so that it:
- resolves the project from the store
- enumerates indexed
Filenodes - opens only those indexed files
- searches line-by-line in C
- skips binary files
- preserves
pattern,regex,file_pattern, andlimit
Validation
A regression test should cover a project that contains:
- a real source file with a match
- a generated artifact under
build/with the same match
and assert that only the indexed source file is returned.