Skip to content

Symbol usage index — find-usages, callers-of, impact analysis #78

@tonydspaniard

Description

@tonydspaniard

Goal

Build a symbol usage index — every public class, interface, method, and configuration key in the project gets a queryable record of where it's used. Exposed via CLI and as the framework__find_usages MCP tool. Refactoring without this is scary; with it, refactoring becomes routine.

Why

Agents struggle with refactors because they can't confidently answer "what depends on X?" Grep produces noise (matches in comments, docblocks, unrelated identifiers); manual reading scales badly. A real index — built from PHP-Parser's AST + the framework's spec/config awareness — gives precise, structured answers in milliseconds.

Specifically, the index knows:

  • Class usage: new X(...), extends X, implements X, X::class, return/param/property type hints, attribute classes
  • Interface implementations and extenders
  • Method usage: $obj->method(), Class::method(), callable references [$obj, 'method'] and $obj->method(...)
  • Property reads/writes
  • Constant references
  • Configuration keys referenced from env / spec / code

Crucially, it also knows about the framework's higher-level constructs:

  • Which spec files declare endpoints that resolve to which Action classes
  • Which routes match which middleware
  • Which entities are referenced from which specs
  • Which events have which listeners

So the agent can ask: "if I rename User, what do I need to change?" and get a complete answer.

Index format

A single SQLite database at .altair/index.db. Schema:

CREATE TABLE symbols (
    id INTEGER PRIMARY KEY,
    fqn TEXT NOT NULL UNIQUE,           -- e.g. "App\User\User", "App\User\User::register"
    kind TEXT NOT NULL,                  -- 'class' | 'interface' | 'trait' | 'enum' | 'method' | 'property' | 'constant'
    file TEXT NOT NULL,
    line INTEGER NOT NULL,
    visibility TEXT,                     -- 'public' | 'protected' | 'private' | null for classes
    is_readonly INTEGER DEFAULT 0,
    is_static INTEGER DEFAULT 0
);

CREATE TABLE usages (
    id INTEGER PRIMARY KEY,
    symbol_id INTEGER REFERENCES symbols(id),
    used_in_file TEXT NOT NULL,
    used_in_line INTEGER NOT NULL,
    usage_kind TEXT NOT NULL,            -- 'new' | 'extends' | 'implements' | 'type_hint' | 'call' | 'property_read' | 'property_write' | 'attribute' | 'spec_endpoint' | 'spec_entity' | 'route_middleware'
    context TEXT                         -- optional context (e.g. enclosing method)
);

CREATE INDEX usages_by_symbol ON usages(symbol_id);
CREATE INDEX symbols_by_kind ON symbols(kind);

CREATE TABLE meta (
    key TEXT PRIMARY KEY,
    value TEXT
);
-- meta entries: last_built_at, framework_version, file_hashes (for incremental rebuild)

SQLite chosen because:

  • Zero setup (no daemon)
  • Fast queries even with 100k+ symbols
  • Easy to ship in .altair/ (gitignored — it's a derived artifact)
  • Inspectable by humans with sqlite3 CLI

CLI surface

bin/altair index build                                    # full rebuild
bin/altair index build --incremental                      # only changed files
bin/altair index find-usages "App\\User\\User"           # all usages
bin/altair index find-usages "App\\User\\User::register"  # method usages only
bin/altair index implements "Altair\\Http\\Contracts\\MiddlewareInterface"
bin/altair index extends "App\\Base\\Entity"
bin/altair index callers-of "App\\User\\CreateUser::__invoke"
bin/altair index unused                                   # symbols with zero usages (dead code candidates)
bin/altair index orphans                                  # specs without routes, routes without specs, etc.

--format=json on any of these returns structured output for MCP.

MCP tools

Tool Inputs Returns
framework__find_usages symbol: string, kind?: string List of {file, line, usage_kind, context}
framework__implementers interface: string List of classes implementing the interface
framework__callers method: string List of code locations calling the method
framework__dead_code List of symbols with zero recorded usages
framework__impact symbols: string[] Aggregate impact across tests, specs, and other files

framework__impact is the key one for refactoring confidence. Given a set of symbols the agent plans to change, it returns:

{
  "symbols": ["App\\User\\User"],
  "impact": {
    "files": 23,
    "tests": 8,
    "specs": 3,
    "estimated_test_runtime_ms": 1240
  },
  "by_file": [...],
  "tests_to_run": ["tests/Http/Actions/CreateUserActionTest.php", "..."],
  "specs_affected": ["api/users/create.yaml", "api/posts/list.yaml"]
}

The agent uses tests_to_run as an optimisation — runs only those before declaring success.

Index building

PHP-Parser walks every PHP file under src/, app/, and tests/. The framework-aware augmentations:

  • Parse YAML specs under api/ to add spec_endpoint and spec_entity usages
  • Parse config/routes.php (or whatever routes manifest the project uses) for middleware references
  • Parse config/configurations.php for container bindings

The index gets stale fast on a vibe-coding session. Incremental rebuild is essential:

  • Track file content hashes in meta.file_hashes
  • On index build --incremental, only reparse files whose hashes changed
  • Re-link usages for affected symbols

For 1000-file projects, incremental rebuild should be under 500ms. Full rebuild under 5 seconds.

Auto-rebuild

The MCP server can opt to auto-trigger an incremental rebuild before every find_usages / impact call — at the cost of latency. Default: trigger if last_built_at is older than 60s OR file changes are detected via mtime scan.

Alternative: a long-lived watcher process (bin/altair index watch) that rebuilds on file change. Optional. Mentioned as a follow-up; v1 ships on-demand.

Shape

src/Altair/Index/
├── Cli/
│   ├── BuildCommand.php
│   ├── FindUsagesCommand.php
│   ├── ImplementsCommand.php
│   ├── ExtendsCommand.php
│   ├── CallersOfCommand.php
│   ├── UnusedCommand.php
│   └── OrphansCommand.php
├── Mcp/
│   ├── FindUsagesTool.php
│   ├── ImplementersTool.php
│   ├── CallersTool.php
│   ├── DeadCodeTool.php
│   └── ImpactTool.php
├── Parser/
│   ├── PhpFileWalker.php           # uses nikic/php-parser
│   ├── YamlSpecWalker.php
│   └── ConfigWalker.php
├── Storage/
│   └── SqliteIndexStorage.php
├── Query/
│   ├── UsageQuery.php
│   └── ImpactQuery.php
└── composer.json

Acceptance criteria

  • Full index build of the framework's own monorepo completes in <5s
  • Incremental rebuild after one-file change completes in <500ms
  • find-usages returns accurate results for all 7 usage_kinds listed in the schema
  • framework__impact correctly enumerates the test files that depend on a symbol — verified against the framework's own test suite
  • framework__dead_code finds at least one true positive in a project with intentional dead code (test fixture)
  • Spec-aware usages: find-usages App\\User\\User returns spec_endpoint rows for YAML specs that reference it
  • Index survives a bin/altair spec scaffold operation (auto-rebuilds or invalidates affected files)
  • .altair/index.db is gitignored by the skeleton
  • Tests:
    • PhpFileWalker: golden tests for each usage_kind
    • ImpactQuery: known-state fixture project, assert aggregate counts
    • End-to-end: build index against framework monorepo, sample queries return expected results
    • Incremental: build, modify one file, rebuild, assert only that file's symbols re-indexed

Out of scope

  • Cross-language indexing (the emitted SDKs are owned by their language toolchain)
  • Type inference beyond what PHPStan provides (defer to PHPStan for that)
  • Documentation lookups (manifests own that)
  • Full call-graph analysis (caller-of is shallow; deep transitive analysis is too expensive for the value)

Dependencies

New composer deps:

Why this matters

Refactoring is the activity that separates a real codebase from a demo. Without confidence in "what depends on this," agents either over-refactor (changing too much) or under-refactor (avoiding necessary changes). With a real usage index, agents can make precise, surgical changes — the way a senior engineer with IDE support does today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions