codebase-index is a local-first codebase indexing tool that gives Claude Code,
Codex CLI, and OpenCode Cursor-like code search without sending source to the cloud.
This page answers the most common questions about installing, running, and trusting it.
codebase-index is distributed from GitHub, not PyPI. Install it in one command
with pipx (isolated) or pip, pinned to a release tag for reproducibility:
pipx install "git+https://github.com/denfry/codebase-index.git@v1.3.0"Then run codebase-index init inside your project and codebase-index index to build
the first index. In Claude Code you can instead install the plugin
(/plugin install codebase-index@codebase-index), which provisions an isolated venv on
first run. See QUICKSTART.md and INSTALLATION.md for
every install path.
No. codebase-index is not a replacement for Cursor or any IDE. It is a local retrieval layer for Claude Code, Codex CLI, OpenCode, and other terminal agents. You still use your AI coding agent as the primary interface; this tool makes it better at finding the right files.
No. By default, codebase-index is completely local-first and offline. All indexing, storage, and search happen on your machine. The only exception is if you explicitly enable external embeddings in your configuration, which requires:
- Setting
embeddings.allow_external = true - Providing an API key via environment variable
- Acknowledging warnings from
doctorandindex
Without all three, no code leaves your machine.
Yes. The default configuration disables embeddings entirely (backend = "noop"). Search uses:
- SQLite FTS5 for full-text lexical search
- Tree-sitter for symbol extraction and matching
- Path-based search for file location queries
- Dependency graph expansion for related files
Embeddings are an optional enhancement that can improve recall for semantic queries.
Yes. The index is incremental — only changed files are re-indexed. The SQLite database handles large datasets efficiently with FTS5 virtual tables. However:
- Initial indexing of very large repositories (100K+ files) may take several minutes
- The index size scales with the number of source files (not dependencies or generated files, which are excluded)
- You can configure
max_file_bytesand use.codeindexignoreto limit scope
Grep is great for exact string matching but has limitations:
- No symbol awareness — Grep can't distinguish a function definition from a call
- No ranking — Grep returns all matches with no relevance ordering
- No context — Grep doesn't know which files are related or what to read next
- Token-inefficient — Claude would need to read many irrelevant matches
codebase-index combines lexical search with symbol extraction, path matching, and graph expansion to return ranked, contextual results with specific line ranges to read.
Yes. Run:
codebase-index mcp --root /path/to/repoThe stdio MCP server exposes:
healthchecksearch_codefind_symbolfind_refsimpact_ofexplain_codeindex_stats
See MCP.md for schema and client config templates.
Yes. The CLI is agent-agnostic:
- Any agent that can run shell commands can use
codebase-index - JSON output (
--json) is parseable by any tool initcan write setup files for Claude Code, Codex CLI, and OpenCode- MCP clients can use
codebase-index mcp --root <repo>
# Delete the cache
codebase-index clean
# Or manually
rm -rf .claude/cache/codebase-index/
# Rebuild from scratch
codebase-index indexTier-A symbol extraction currently covers:
- Python
- JavaScript / JSX
- TypeScript / TSX
- Java
- Go
- Rust
- C
- C++
- C#
- Ruby
- PHP
- Kotlin
Lua exercises the Tier-B generic Tree-sitter path. Markdown, JSON, YAML, TOML, SQL, and other text/config files still get FTS5 lexical chunks, but not schema-aware code-intelligence extraction yet.
Important gaps for AI codebase search include Swift, Dart, Scala, Elixir, Clojure, Objective-C, Vue/Svelte component parsing, SQL schema-aware parsing, Terraform, Dockerfile, Gradle/Maven/npm config files, migrations, routes, CI, and infrastructure files.
The index is stored in:
.claude/cache/codebase-index/index.sqlite
This directory is in the default .gitignore and should never be committed.
Yes. Use any of these methods:
.codeindexignore— Tool-specific ignore file (highest priority).gitignore— Standard git ignore file.claudeignore— Claude-specific ignore file- Configuration —
extra_ignorepatterns in.codeindex.json
Yes — codebase-index is released as v1.3.0. Indexing, hybrid search, Tree-sitter
The core indexing and search functionality is implemented and tested. The
current 1.3.0 package includes:
- Hybrid FTS/path/symbol/vector retrieval
- Import/call/reference graph expansion and
impact - Optional local embeddings, with external embeddings gated behind explicit opt-in
- Hooks and watch mode for freshness
- Multi-CLI setup for Claude Code, Codex CLI, and OpenCode
Known gaps: the public benchmark suite is still small, the MCP server needs verified client-specific docs and progressive/paged results, and the graph is closer to an import/call/reference graph than a full framework-aware code intelligence graph.
See ROADMAP.md for the full milestone plan.
See CONTRIBUTING.md for development setup, testing, and PR guidelines.