Skip to content

aharwelik/repo-map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

repo-map banner

repo-map

A fast, regex-based symbol index that lets AI coding agents look up where any symbol is defined in O(1) instead of grep-scanning the whole codebase.

Keywords: code search, symbol index, ctags alternative, AI coding agent, Claude Code, Codex, context window, repository map, developer tools, LLM code navigation


The Problem

AI coding agents (Claude Code, Codex, Cursor, etc.) routinely waste 30-75 seconds doing blind grep passes across an entire codebase to answer a question as simple as "where is parseQuery defined?" Each grep pass:

  1. Walks every file tree branch.
  2. Loads irrelevant files into the context window.
  3. Discards results that do not match, burning tokens and wall-clock time.

This is the documented blind-grep / wasted-context problem: agents cannot efficiently distinguish "this symbol does not exist anywhere" from "I have not looked hard enough yet", so they keep grepping. The result is slow responses, ballooning context windows, and hallucinated references to symbols that were never defined.

repo-map eliminates these passes. Run build once (or on each commit), then answer "where is X?" with a single dict lookup in the cached JSON index.


Install

No third-party dependencies. Requires Python 3.10+.

git clone https://github.com/aharwelik/proof-carrying-ops
cd proof-carrying-ops/repo-map
python -m repomap --help

The package is also importable directly:

from repomap.index import build_index, lookup_symbol

Usage

Build the index

python -m repomap build <dir> [-o map.json]

Walk <dir> recursively, extract symbol definitions from all supported source files, and write a JSON index. Skips .git, node_modules, __pycache__, .venv, dist, and build automatically.

Example:

$ python -m repomap build examples/sample_project -o sample.json
repo-map: scanned 3 file(s), indexed 19 symbol(s) -> sample.json

Default output file is repomap.json in the current directory.

Look up a symbol

python -m repomap where <symbol> [-i map.json]

Print every file and line number where <symbol> is defined. Exit code 0 if found, exit code 1 if not found. A not-found result is not an error; it means the symbol is absent from the index, which is genuinely useful for catching references to symbols that were never defined (a common agent hallucination).

Example -- symbol found:

$ python -m repomap where User -i sample.json
'User' is defined in 1 location(s):
  models.py:4  [class]

Example -- symbol absent:

$ python -m repomap where __missing__ -i sample.json
NOT FOUND: '__missing__' is not defined in the index (index: sample.json).
$ echo $?
1

Print statistics

python -m repomap stats [-i map.json]

Example:

$ python -m repomap stats -i sample.json
repo-map stats  (index: sample.json)
  Files with definitions : 3
  Total symbol entries   : 19

  By kind:
    class        3
    function     11
    type         2
    var          3

  By extension (language):
    .go          6
    .js          6
    .py          7

Use the library

from repomap.index import build_index, lookup_symbol, compute_stats

index = build_index("/path/to/my/project")
entries = lookup_symbol(index, "MyClass")
# entries = [{"kind": "class", "file": "src/models.py", "line": 4}]

missing = lookup_symbol(index, "__not_defined__")
# missing = []  -- exact set membership, no fuzzy search

stats = compute_stats(index)

Supported Languages

Extension Captured definitions
.py def, async def, class, top-level assignments
.js .jsx function, class, const/let/var =, export *
.ts .tsx Same as JS plus TypeScript-style exported declarations
.go func, func (recv) Method, type

Index Schema

{
  "root": "/absolute/path/to/scanned/dir",
  "generated_by": "repo-map",
  "symbols": {
    "MyClass": [
      {"kind": "class", "file": "src/models.py", "line": 4}
    ]
  },
  "files": {
    "src/models.py": [
      {"name": "MyClass", "kind": "class", "line": 4}
    ]
  }
}

The dual structure lets agents query in either direction: "where is symbol X defined?" or "what symbols does file F export?"


Theory -- Identity Normalization and the proof-carrying-ops Model

This tool is a concrete implementation of the Identity Normalization invariant from the proof-carrying-ops model: https://github.com/aharwelik/proof-carrying-ops

The core idea: every symbol in a codebase has a canonical identity of the form <file>::<symbol> (e.g., src/models.py::MyClass). Once all symbols are normalized to this canonical form:

  • "Where is X defined?" becomes an exact set lookup: O(1) against a pre-built dict, not O(n) grepping across n files.
  • "Does X exist?" becomes an exact set membership test: symbol in index, which returns a definitive yes/no rather than "I did not find it in the files I happened to read."

This is a meaningful invariant because an AI agent that holds the index holds a proof that a symbol exists (or does not exist) at the time the index was built. Without the index, the agent can only claim "I searched and did not find it", which is a weaker and noisier guarantee.

repo-map is intentionally a fast heuristic -- regex extraction, not a full language server -- because the goal is to narrow search space cheaply, not to replace semantic analysis. A symbol missed by the extractor degrades gracefully: the agent falls back to a targeted grep on one known file. A false hit costs one small context read. The invariant holds in the common case.


Honest Limitations

repo-map uses regular expressions to extract definitions. It will miss symbols that are:

  • Defined dynamically (via setattr, exec, metaclasses, decorators that rename the function, type() calls, etc.)
  • Generated by code generation frameworks at build time
  • Defined inside conditionals in ways that defeat line-level pattern matching

This is intentional. The tool is meant to give a fast, cheap first answer, not to replace an LSP or type checker. Use it to narrow search, then confirm with the actual source file.


Running Tests

python -m unittest discover -s tests -p 'test_*.py' -v

All 66 tests use the standard library only (no pytest, no third-party deps).


License

MIT -- see LICENSE.


Anthony Harwelik -- aharwelik@gmail.com -- https://github.com/aharwelik

About

A fast, compact symbol index so AI coding agents (Claude Code, Codex) stop blind-grepping and wasting context. Regex-based defs for Python, JS/TS, and Go; build/where/stats CLI. Stdlib only, no dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages