Skip to content

search_code returns matches from non-indexed/generated files instead of only indexed source files #102

@heraque

Description

@heraque

Summary

search_code currently searches the workspace recursively instead of limiting itself to the files that are actually indexed in the project graph.

This can produce noisy or misleading results, especially when generated artifacts, cached files, or binary outputs exist under the repository.

Problem

The current implementation of search_code shells out to recursive grep over the project root.

That means the tool can return matches from files that are not part of the indexed codebase, such as:

  • generated files under build/
  • cached outputs
  • other non-source artifacts
  • binary files that happen to contain matching bytes

This breaks the expectation that MCP code search should reflect the indexed project state.

User-visible impact

In practice, this causes:

  • unexpected matches from generated artifacts
  • polluted search output
  • incorrect file paths in results when binary content is involved
  • mismatch between graph-based discovery and text search behavior

So the tool may report results that are not actually part of the indexed repository model.

Expected behavior

search_code should search only within files that are part of the indexed project.

That means:

  • use the indexed File nodes as the search scope
  • apply file_pattern against indexed file paths
  • ignore binary files
  • keep support for both plain substring search and regex search

Proposed fix

Refactor search_code so that it:

  1. resolves the project from the store
  2. enumerates indexed File nodes
  3. opens only those indexed files
  4. searches line-by-line in C
  5. skips binary files
  6. preserves pattern, regex, file_pattern, and limit

Validation

A regression test should cover a project that contains:

  • a real source file with a match
  • a generated artifact under build/ with the same match

and assert that only the indexed source file is returned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions