Verified local code context for agents.
AI coding agents usually start by opening a file, following an import, checking a type definition, and backtracking through the repo until the context window is partly spent before the real task starts. archex does that retrieval and structural expansion up front and returns a ranked, token-budgeted context bundle plus a receipt that records what was included, what was skipped, and whether the bundle is complete enough to act on.
It runs locally, uses deterministic retrieval and analysis, and does not require hosted inference or an API key. The v0.13 line adds stronger benchmark trust surfaces, bundle-only evaluator support, and default 4-bit TurboQuant vector storage for local vector indexes.
Start: 30-second quickstart · MCP and Claude Code · Python API · Local metrics · Compatibility matrix · Installation trust contract · Security policy
Quick links: Proof bar · Fast paths · What archex returns · Use it your way · Trust and operations · Measured results · Advanced workflows · Installation details · Language support · Development · Documentation map
Watch the explainer · Open banner SVG · Open infographic SVG · Read the measured comparison
| Safe-to-act signals | Surfaces | Language coverage | Public evidence |
|---|---|---|---|
| Query/scout receipts expose freshness, index revision, skipped candidates, omitted edges, completeness, and next action | CLI, MCP, Python API, Docker, Claude Code skill | 25 declared language IDs with explicit full vs chunk-only tiers |
C1 public comparison, raw-ripgrep/read baseline, bundle-only evaluator lane, and TurboQuant A/B measurement with 7.07× mean vector .npz compression |
archex does not ask the downstream agent to trust ranking alone. Every query/scout receipt explains what was returned, what was skipped, whether freshness was current, and whether the bundle is complete enough to act on.
| If you are evaluating... | Start here | Why |
|---|---|---|
| Agent workflows | archex doctor, then archex scout "question" --budget 1000 --format json |
Checks local trust first, then returns a compact map, a receipt summary, and exact fetch handles. |
| Claude Code or MCP | MCP and Claude Code | Stdio MCP server, optional warm --watch, additive top-level receipts, and an in-repo skill that teaches doctor → scout → fetch. |
| Python applications | Python API | Deterministic query(), analyze(), compare(), and receipt-bearing bundles. |
| Benchmark proof | Measured results and archex vs. cocoindex-code | Same-task C1 report, raw-ripgrep/read baseline, bundle-only evaluator reports, required-file trust gates, and TurboQuant storage/recall evidence. |
| Installation and clients | Compatibility matrix | Client bootstrap paths for Claude Code, Codex, Pi, OpenCode, Cursor, and oh-my-pi (omp); global/user scope by default, --dry-run previews. |
uv tool install archex
archex doctor
archex query "How does authentication work?" --format xmlarchex doctor reports whether the local index, grammar support, model cache, MCP registration, and .archex/ state are healthy. Repo-local commands default to the current working directory. If the repo has not been initialized yet:
archex init
archex index
archex query "How does authentication work?" --format xmlarchex returns a context bundle plus receipt, not an answer. The downstream agent or model still does the reasoning; archex decides which code, symbols, dependencies, and type context belong in the prompt, then records why that bundle is safe or incomplete.
<context query="How does authentication work?">
<structural-context>
<file-tree><![CDATA[
src/auth/
middleware.py
tokens.py
models.py
]]></file-tree>
</structural-context>
<chunks>
<chunk file="src/auth/middleware.py" lines="42-78" symbol="authenticate" score="0.9312" tokens="284">
<imports><![CDATA[from auth.tokens import verify_jwt]]></imports>
<code><![CDATA[
def authenticate(request: Request) -> User:
token = extract_bearer(request)
claims = verify_jwt(token)
return load_user(claims.sub)
]]></code>
</chunk>
</chunks>
<type-definitions>
<type-def file="src/auth/models.py" symbol="User" lines="10-24"><![CDATA[
@dataclass
class User: ...
]]></type-def>
</type-definitions>
<dependencies>
<internal>auth.tokens.verify_jwt</internal>
<external>pyjwt</external>
</dependencies>
</context>The bundle carries ranked chunks, import context, referenced type definitions, dependency edges, token counts, and provenance. Use --format json or --format markdown when XML is not the right downstream envelope.
Small receipt example:
{
"receipt": {
"freshness": "clean",
"index_revision": "3d8b0c…",
"token_budget": { "requested": 12000, "consumed": 6132 },
"returned_total": 12,
"skipped_total": 23,
"included_edges_total": 9,
"omitted_edges_total": 17,
"context_complete": "incomplete",
"context_complete_reason": "dependency_frontier_cut",
"recommended_next_action": "fetch_skipped_candidate",
"returned_context": [
{
"handle": "chunk:src/auth/middleware.py::authenticate#function",
"file_path": "src/auth/middleware.py",
"start_line": 42,
"end_line": 78,
"score": 0.9312
}
],
"skipped_candidates": [
{ "file_path": "src/auth/session.py", "reason": "below_threshold" }
]
}
}Use CONTEXT_RECEIPTS for the full field contract.
Agents usually explore repositories by opening one file, following imports, checking type definitions, and backtracking. That burns context before the real task starts. archex performs local retrieval and structural expansion first: BM25F, optional local vector/SPLADE signals, graph expansion with edge confidence, type-definition packing, and intent-routed token budgets.
Repository → repo-local index → intent routing → retrieval → graph/type expansion → token-budgeted bundle → agent / MCP client
archex is a selection and assembly layer. Compression tools can shrink the final bundle later, but compressed irrelevant context is still irrelevant. For the vector index itself, v0.13 enables 4-bit TurboQuant storage by default when vector retrieval is turned on: same measured recall/MRR on the current corpus, about seven times smaller vector artifacts, and self-describing compatibility with older unquantized .npz files.
archex query "Where is cache invalidation handled?" --format xml
archex scout "How does authentication flow through this repo?" --budget 1000 --format json
archex index --quantize-vectors --quantize-bits 4 --allow-remote-code
archex graph export --output .archex/archgraph.json
archex graph neighbors src/auth/middleware.py --graph .archex/archgraph.json --format markdown
archex symbol 'symbol:src/auth/middleware.py::authenticate#function'Install the MCP extra and register the stdio server:
uv tool install "archex[mcp]"{
"mcpServers": {
"archex": { "command": "archex", "args": ["mcp"] }
}
}Install the client config (global/user scope by default; pass a SOURCE path or --scope project for a repo-local install). Add --dry-run to preview the exact target and config without writing:
archex install-client claude-code # global, writes immediately
archex install-client claude-code --dry-run # preview only, no changes
archex install-client claude-code . --scope projectFor warm local sessions, keep the MCP process alive and optionally watch the repo:
archex mcp --watch --watch-path .archex is a first-class install-client target for Claude Code, Codex, Cursor, OpenCode, Pi, and oh-my-pi (omp → ~/.omp/agent/mcp.json). Registration alone is not enough: harnesses with on-demand tool discovery surface a registered server's tools only after the agent activates them, and agent guidance that names only the CLI never produces MCP calls. Append the ready-to-paste guidance prompt to a global or repo-specific agent file so agents reach for the MCP tools first:
archex install-client omp --agent-file ~/.omp/agent/AGENTS.mdarchex metrics then reports a CLI-vs-MCP surface split so you can see whether agents actually route context through archex. The compatibility matrix explains the registration → surfacing → invocation distinction.
The in-repo Claude Code skill lives at skills/archex/. Its /archex command runs archex doctor, initializes/indexes when needed, scouts first for broad questions, then fetches exact symbol: or chunk: handles before a larger bundle query.
Exact install, MCP, Docker, cache, uninstall, and trust semantics are documented in the installation trust contract. Client-specific config targets and bootstrap paths live in the compatibility matrix.
Local usage metrics are off by default. If a user explicitly enables them with archex metrics enable, ARCHEX_USAGE_METRICS=on, or the persisted metrics setting, archex writes a machine-local ledger at ~/.archex/usage.sqlite. That ledger records anonymous counters only: tool name, category, token counts, file count, repo-local random ID, freshness, and index revision. It does not store query text, file paths, symbols, handles, rendered outputs, prompt bodies, remote URLs, org names, or repo names in event rows. archex metrics summary reports two labeled savings numbers: savings versus a full-file paste (tokens_saved = max(full_file_tokens - returned, 0), where full_file_tokens is the true per-file token cost of the returned files, not an inflated chunk sum) and savings versus a realistic targeted read (the matched line ranges plus a small context window — the conservative counterfactual). Both baselines are derived from the index, so the metrics path re-reads no file and calls no model. Whole-repo avoided tokens are demoted below the savings lines and labeled an upper-bound/context figure, not savings.
Important boundary: archex ships with no telemetry by default. Optional local metrics are separate from telemetry, stay on the machine, and require explicit enablement. Detailed traces remain a second explicit opt-in on top of metrics enablement. The exact calculation rules, privacy boundary, and controls live in LOCAL_METRICS.
archex metrics is the control surface:
archex metrics enable
archex metrics
archex metrics export --output usage.json
archex metrics delete --all
archex metrics trace enable
ARCHEX_USAGE_METRICS=on archex query "Where is auth handled?"Detailed traces stay opt-in via archex metrics trace enable or ARCHEX_USAGE_TRACE=on. Traces remain local-only and still do not store source code or rendered outputs. Metrics code paths make no LLM calls, no hosted upload calls, and no background network calls in v1.
from archex import query
from archex.models import RepoSource
bundle = query(
RepoSource(local_path="."),
"Where is database connection pooling implemented?",
)
print(bundle.to_prompt(format="xml"))analyze() returns an ArchProfile; compare() returns deterministic cross-repo dimension comparisons. LangChain and LlamaIndex retrievers ship as optional extras.
Two local-first images are built in CI:
Docker and warm-container MCP examples
# BM25-only, no torch
docker run --rm -v "$PWD:/workspace" -w /workspace ghcr.io/mathews-tom/archex:slim archex doctor
# Full local-embedding image with FastEmbed runtime
docker run --rm -v "$PWD:/workspace" -w /workspace ghcr.io/mathews-tom/archex:full archex query "Where is cache invalidation handled?" --strategy hybridWarm-container MCP pattern:
docker run -d --name archex-mcp -v "$PWD:/workspace" -w /workspace ghcr.io/mathews-tom/archex:slim sleep infinity
docker exec -i archex-mcp archex mcpMCP client config for that container:
{
"mcpServers": {
"archex": {
"command": "docker",
"args": ["exec", "-i", "archex-mcp", "archex", "mcp"]
}
}
}The mounted repository owns .archex/, so indexes survive container restarts and stay out of source control.
| Surface | Contract |
|---|---|
| Security policy | Supported versions, disclosure workflow, no-telemetry posture, secret-handling guidance, and model remote-code policy live in SECURITY. |
| Context receipts | Field contract, freshness/completeness semantics, output surfaces, and benchmark linkage live in CONTEXT_RECEIPTS. |
| Compatibility matrix | Tested vs unverified clients, exact config shapes, bootstrap commands, and verification steps live in CLIENT_COMPATIBILITY_MATRIX. |
| Installation trust contract | Exact CLI, MCP, Docker, skill, cache, network, freshness, benchmark, and uninstall semantics live in INSTALLATION_TRUST_CONTRACT. |
archex install-client |
Client config writer for Claude Code, Codex, Pi, OpenCode, Cursor, and oh-my-pi (omp). Global/user scope by default; --dry-run previews without writing. |
archex doctor |
Text/JSON diagnostics for index health, staleness, local model cache presence, grammar availability by tier, MCP registration, model security, and .archex/ disk usage. |
Repo-local .archex/ |
Generated state: settings, metadata, SQLite index, optional vectors, graph artifacts, dogfood history. Keep it uncommitted. |
| Local usage metrics | Calculation rules, privacy boundaries, default-off versus opt-in behavior, export/delete controls, and retention live in LOCAL_METRICS. |
The public C1 harness publishes the same external-repo comparison for archex, cocoindex-code (ccc), and a raw-ripgrep/read baseline. It records cold-start, warm latency, recall, precision, F1, token efficiency, required-file recall, missed-required-file rate, missed-required-task rate, all-required-present rate, receipt accuracy, and bundle-completion penalty tokens. The checked-in artifacts include those trust fields; receipt accuracy is n/a for the historical C1 run because those artifacts predate query receipt capture. Core retrieval benchmarks make no LLM calls.
See archex vs. cocoindex-code for the current published comparison and Retrieval Default Decisions for the decision trail.
A broader competitive comparison is available with archex benchmark headtohead competitive --input benchmarks/headtohead/results --format markdown. It groups the same lanes by repo/task family and aggregate (no aggregate-only winner) and adds warm p50/p95 latency, region/line recall where labeled, compression ratio, and an operational table. The checked-in public artifact set now includes the benchmark-only archex candidate lanes (archex_query_compressed, archex_query_efficiency_packed) alongside archex, ccc, raw-ripgrep/read, and two Graphify follow-up lanes: graphify_build_plus_query (aggregate recall 0.70, required-file recall 0.70, cold-start 937 ms, warm p50/p95 165/184 ms) and graphify_query_warm (aggregate recall 0.70, required-file recall 0.70, cold-start 0 ms, warm p50/p95 168/207 ms). Graphify is reported as a graph / memory layer, not as a direct retrieval-equivalent winner, so build cost and warm-query cost stay separate. Headroom-style compression lanes appear when operator artifacts are present. No new public claim is made unless the corresponding checked-in artifacts exist under benchmarks/headtohead/results/.
Bundle-only evaluation is a separate opt-in lane: archex benchmark bundle-eval --evaluator-command ... gives a user-supplied local command only the rendered bundle and receipt JSON, then reports bundle-only success and files the evaluator still needed outside returned context. archex does not provide hosted evaluator calls, telemetry, credentials, or default network behavior for that lane.
Cross-tool token efficiency is measured offline with archex benchmark cross-tool: it compares the tokens archex spends to localize a task's required files against a naive grep/read agent (whole grep-hit files, or +/-K context windows around hits) at a fixed required-file recall, so no figure compares unequal recall. On the checked-in reference artifact (benchmarks/cross-tool-efficiency/cross-tool-comparison.json), restricted to tasks where archex reaches 100% required-file recall, the token reduction versus the naive agent runs from 95.4% to 99.8% per corpus (for example external-localization: 13,247 vs 469,836 tokens, 97.2%). It measures how much cheaper archex localizes when it succeeds, not that it always succeeds. This is a benchmark-only number: it never enters the in-process ledger or archex metrics summary. The per-corpus table and method live in LOCAL_METRICS.
TurboQuant evidence is measured separately with archex_query_hybrid_quantized_4bit against archex_query_hybrid: 35 tasks, 7.07× mean vector .npz compression, 6.98× minimum compression, recall Δ +0.000, MRR Δ +0.000, F1 Δ +0.000, required-file recall Δ +0.000, and mean query latency Δ +110 ms. That passed the default gate, so 4-bit TurboQuant is now the default storage mode for vector indexes.
| Lane | Recall | Required-file recall | Missed task rate | F1 | Token efficiency | Token efficiency after completion | Warm latency ms |
|---|---|---|---|---|---|---|---|
archex |
0.95 | 0.95 | 0.16 | 0.66 | 0.76 | 0.74 | 408 |
ccc |
0.32 | 0.32 | 0.79 | 0.31 | 0.48 | 0.41 | 521 |
raw-ripgrep/read |
1.00 | 1.00 | 0.00 | 0.05 | 0.00 | 0.00 | 773 |
- Coverage stays close to raw search without paying raw-search token cost.
raw-ripgrep/readreaches1.00required-file recall, but it does so at0.00token efficiency. archex lands at0.95required-file recall with0.76token efficiency, so the returned bundle stays close to exhaustive file coverage without filling the prompt with every textual match. - Missed-task failures drop sharply versus
ccc. archex's missed task rate is0.16;ccclands at0.79. In the published C1 run, that is the difference between usually returning the files an agent needs and often requiring a second pass before the task can finish. - Vector storage got much smaller without a measured retrieval-quality change. The published 4-bit TurboQuant run reports
7.07×mean vector.npzcompression (6.98×minimum) with recall Δ+0.000, MRR Δ+0.000, and F1 Δ+0.000, so local vector indexes take far less disk without a measured quality regression in that benchmark.
# Repo-local lifecycle
archex init
archex index
archex status --strict
archex doctor --format json
# Architecture and graph surfaces
archex analyze --format markdown
archex onboard
archex graph export --output .archex/archgraph.json
archex graph path src/archex/cli/query_cmd.py src/archex/serve/context.py --graph .archex/archgraph.json --format markdown
archex impact --changed-file src/archex/serve/context.py
# Benchmarks and gates
archex benchmark headtohead report --input .archex/headtohead --format markdown
archex benchmark run --strategy archex_query_hybrid_quantized_4bit --output .archex/e2e-quantized --allow-remote-code
archex benchmark report --input .archex/e2e-quantized --baseline .archex/e2e-baseline --format markdown
archex benchmark gate --input .archex/e2e --baseline .archex/e2e-baseline --warn-latency-ms 3000
archex benchmark bundle-eval --tasks-dir benchmarks/tasks --evaluator-command ./local-evaluator
archex dogfood --all --baseline benchmarks/dogfood_baseline.json --format dogfood-deltauv tool install archex # CLI, system-wide
uv add archex # project dependencyOptional extras and integrations
# Agent integrations
uv tool install "archex[mcp]" # MCP server
uv add "archex[langchain]" # LangChain retriever
uv add "archex[llamaindex]" # LlamaIndex retriever
uv add "archex[lsap]" # LSP type enrichment
# Local retrieval extras
uv add "archex[vector-fast]" # FastEmbed (ONNX-backed, ~50MB)
uv add "archex[vector-torch]" # sentence-transformers / torch
uv add "archex[splade]" # SPLADE sparse retrieval
uv add "archex[graph]" # Leiden graph clustering
# Core extras bundle: graph, MCP, LangChain, LlamaIndex
uv add "archex[all]"For the full trust contract, including exact MCP JSON, Docker commands, cache locations, network behavior, and uninstall steps, see Installation and Trust Contract.
| Tier | Languages | Extraction |
|---|---|---|
full |
Python, JavaScript, TypeScript/TSX, Go, Rust, Java, Kotlin, C#, Swift | Symbols, imports, graph edges |
chunk-only |
C, C++, PHP, Ruby, Scala, Lua, Bash/Shell, SQL, HTML, CSS, YAML, TOML, JSON, Markdown, Solidity | AST chunking + retrieval; no symbol/import graph claim |
unknown |
any other text file | line-window chunks for BM25 visibility |
Need another language? Register an adapter via Python entry points. See System Design for the extension contract.
- Not a chatbot — it emits context bundles; another agent or LLM does the explaining.
- Not a hosted RAG service — indexing and retrieval run locally unless you explicitly query a remote Git URL.
- Not a vector database — vector search is optional; BM25 and structural signals are first-class.
- Not an LSP replacement — use LSAP/LSP where compiler-backed type resolution matters; archex packages repository-scale context for agents.
- Not a prompt template library — output is structured retrieval evidence, not prompt prose.
git clone https://github.com/Mathews-Tom/archex.git
cd archex
uv sync --all-extras
uv run ruff check && uv run ruff format --check .
uv run pyright
uv run pytestAuthority chain: README → System Design / archex vs. cocoindex-code → Roadmap completion record → Retrieval Default Decisions.
- Why archex — the agent token problem this solves
- System Overview — current product overview and boundaries
- System Design — shipped architecture, graph query, scout, language tiers, and distribution surfaces
- archex vs. cocoindex-code — evidence-backed C1 comparison
- Retrieval Default Decisions — default-strategy and TurboQuant evidence gates
- Context Receipts — receipt field contract and safe-to-act semantics
- Local Metrics — token-savings math, privacy boundary, and default-off versus opt-in behavior
Apache 2.0 — see LICENSE.

