A fast, regex-based symbol index that lets AI coding agents look up where any symbol is defined in O(1) instead of grep-scanning the whole codebase.
Keywords: code search, symbol index, ctags alternative, AI coding agent, Claude Code, Codex, context window, repository map, developer tools, LLM code navigation
AI coding agents (Claude Code, Codex, Cursor, etc.) routinely waste 30-75 seconds
doing blind grep passes across an entire codebase to answer a question as simple as
"where is parseQuery defined?" Each grep pass:
- Walks every file tree branch.
- Loads irrelevant files into the context window.
- Discards results that do not match, burning tokens and wall-clock time.
This is the documented blind-grep / wasted-context problem: agents cannot efficiently distinguish "this symbol does not exist anywhere" from "I have not looked hard enough yet", so they keep grepping. The result is slow responses, ballooning context windows, and hallucinated references to symbols that were never defined.
repo-map eliminates these passes. Run build once (or on each commit), then answer
"where is X?" with a single dict lookup in the cached JSON index.
No third-party dependencies. Requires Python 3.10+.
git clone https://github.com/aharwelik/proof-carrying-ops
cd proof-carrying-ops/repo-map
python -m repomap --help
The package is also importable directly:
from repomap.index import build_index, lookup_symbolpython -m repomap build <dir> [-o map.json]
Walk <dir> recursively, extract symbol definitions from all supported source
files, and write a JSON index. Skips .git, node_modules, __pycache__,
.venv, dist, and build automatically.
Example:
$ python -m repomap build examples/sample_project -o sample.json
repo-map: scanned 3 file(s), indexed 19 symbol(s) -> sample.json
Default output file is repomap.json in the current directory.
python -m repomap where <symbol> [-i map.json]
Print every file and line number where <symbol> is defined. Exit code 0 if
found, exit code 1 if not found. A not-found result is not an error; it means
the symbol is absent from the index, which is genuinely useful for catching
references to symbols that were never defined (a common agent hallucination).
Example -- symbol found:
$ python -m repomap where User -i sample.json
'User' is defined in 1 location(s):
models.py:4 [class]
Example -- symbol absent:
$ python -m repomap where __missing__ -i sample.json
NOT FOUND: '__missing__' is not defined in the index (index: sample.json).
$ echo $?
1
python -m repomap stats [-i map.json]
Example:
$ python -m repomap stats -i sample.json
repo-map stats (index: sample.json)
Files with definitions : 3
Total symbol entries : 19
By kind:
class 3
function 11
type 2
var 3
By extension (language):
.go 6
.js 6
.py 7
from repomap.index import build_index, lookup_symbol, compute_stats
index = build_index("/path/to/my/project")
entries = lookup_symbol(index, "MyClass")
# entries = [{"kind": "class", "file": "src/models.py", "line": 4}]
missing = lookup_symbol(index, "__not_defined__")
# missing = [] -- exact set membership, no fuzzy search
stats = compute_stats(index)| Extension | Captured definitions |
|---|---|
.py |
def, async def, class, top-level assignments |
.js .jsx |
function, class, const/let/var =, export * |
.ts .tsx |
Same as JS plus TypeScript-style exported declarations |
.go |
func, func (recv) Method, type |
{
"root": "/absolute/path/to/scanned/dir",
"generated_by": "repo-map",
"symbols": {
"MyClass": [
{"kind": "class", "file": "src/models.py", "line": 4}
]
},
"files": {
"src/models.py": [
{"name": "MyClass", "kind": "class", "line": 4}
]
}
}The dual structure lets agents query in either direction: "where is symbol X defined?" or "what symbols does file F export?"
This tool is a concrete implementation of the Identity Normalization invariant from the proof-carrying-ops model: https://github.com/aharwelik/proof-carrying-ops
The core idea: every symbol in a codebase has a canonical identity of the form
<file>::<symbol> (e.g., src/models.py::MyClass). Once all symbols are
normalized to this canonical form:
- "Where is X defined?" becomes an exact set lookup: O(1) against a pre-built dict, not O(n) grepping across n files.
- "Does X exist?" becomes an exact set membership test:
symbol in index, which returns a definitive yes/no rather than "I did not find it in the files I happened to read."
This is a meaningful invariant because an AI agent that holds the index holds a proof that a symbol exists (or does not exist) at the time the index was built. Without the index, the agent can only claim "I searched and did not find it", which is a weaker and noisier guarantee.
repo-map is intentionally a fast heuristic -- regex extraction, not a full language server -- because the goal is to narrow search space cheaply, not to replace semantic analysis. A symbol missed by the extractor degrades gracefully: the agent falls back to a targeted grep on one known file. A false hit costs one small context read. The invariant holds in the common case.
repo-map uses regular expressions to extract definitions. It will miss symbols that are:
- Defined dynamically (via
setattr,exec, metaclasses, decorators that rename the function,type()calls, etc.) - Generated by code generation frameworks at build time
- Defined inside conditionals in ways that defeat line-level pattern matching
This is intentional. The tool is meant to give a fast, cheap first answer, not to replace an LSP or type checker. Use it to narrow search, then confirm with the actual source file.
python -m unittest discover -s tests -p 'test_*.py' -v
All 66 tests use the standard library only (no pytest, no third-party deps).
MIT -- see LICENSE.
Anthony Harwelik -- aharwelik@gmail.com -- https://github.com/aharwelik