Roadmap: codedb capability gaps & improvements (from the #528 audit)

Tracking issue for forward-looking improvements surfaced by the multi-agent capability audit run alongside #528 (CLI hardening) and #530 (audit bug fixes).

This is a **roadmap / epic**, not a defect — so it intentionally departs from the "every issue ships a failing test" rule (that's for bugs). Per-item, test-backed issues will be split out when each is picked up.

## 🥇 Top picks — highest leverage, mostly reusing what already exists

1. **Call-path queries + PageRank ranking** — *impact: high, effort: M*
   `codegraph.buildEdges` already computes the full call graph but **discards the edges, keeping only in-degree**. Retain them to unlock `codedb_callpath A→B` (shortest call chain — the #1 follow-up after finding a bug) and PageRank-over-imports as a much stronger ranking signal. Near-free given the substrate exists; pure-Zig, low-RSS. **Suggested first pickup.**

2. **`format=json` output + per-result provenance** — *impact: high, effort: L*
   All tool output is prose; errors are bare string prefixes with no machine-readable code; results carry no confidence, no "which index tier produced this," no "this file was past the 15k-trigram cap (recall incomplete)." Add a JSON output mode + response metadata so agents can threshold on results and tell "not found" from "not indexed."

3. **Prefix / fuzzy / kind-filtered symbol search** — *impact: high, effort: S*
   `codedb_symbol` is exact-name only. The symbol index is already a StringHashMap; add a sorted name list + reuse the existing Smith-Waterman scorer → `symbol parse_*`, `kind=interface`, "all `*Manager`". (Also fixes a Go symbol-kind parse bug.)

4. **Git-churn / recency fused into ranking** — *impact: high, effort: M*
   `git.zig` is ~31 LOC and ranking is 100% static-structural — zero temporal signal. Shell out to `git log` for churn + map diffs→symbols → `codedb_git`, "what changed in this PR", churn-boosted ranking.

5. **Token-budget-aware context packing** — *impact: high, effort: M*
   "Token-efficient" is a stated core value, yet every truncation is by **lines/bytes, never model tokens** (e.g. `codedb_context` silently cuts symbol bodies at 40 lines). Add `max_tokens` + a value-density knapsack packer: *"give me the best 8k tokens for this task."*

## 🥈 Bigger strategic bets (audit blind-spots & competitive gaps)

- **Whole-repo orientation / cold-start map** — no "README for a stranger"; `codedb_context` needs keywords you may not have yet. A PageRank-ranked `codedb_overview` (Aider/Cursor repo-map style).
- **Type-resolved find-references** — `codedb_callers` is whole-word text search → false positives for `get`/`init`. A lightweight `receiver_type::method` qualifier index.
- **Build/compiler-diagnostic ingestion** — nothing consumes `zig build`/`tsc`/`cargo check` output to map errors onto symbols. "Show the symbols implicated by current build errors" is a top agent task and is absent.
- **Also unmodeled:** structural/AST-pattern search (ast-grep/Comby-class), semantic/embedding retrieval, package-manifest graph (`package.json`/`Cargo.toml`/`go.mod`), config↔code linkage ("where is this key read"), and session working-memory (cross-call de-dup).

## 🥉 Smaller cleanups (grouped)

- **Parsing:** doc-comment extraction into `Symbol.detail`; Python `async def`/decorated symbols dropped; wrong Go symbol kinds; no Makefile/Dockerfile; no shebang/content detection.
- **Edit/refactor:** symbol-rename, move/delete, atomic multi-file edit groups, synchronous post-edit validation, `dry_run` diff for `str_replace`.
- **Indexing:** OS-native fs events (kqueue/inotify/FSEvents) vs polling; nested per-dir `.gitignore`; snapshot `FORMAT_VERSION` validated/migrated on load; content-based generated/vendored/minified exclusion.
- **Security/observability:** HTTP `serve` has zero auth; per-tool error-rate/p99 metrics; `update` signature verification beyond SHA-256.

---

Generated from a 9-agent capability audit (7 dimension maps → blind-spot/competitive critic → prioritized synthesis).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap: codedb capability gaps & improvements (from the #528 audit) #531

🥇 Top picks — highest leverage, mostly reusing what already exists

🥈 Bigger strategic bets (audit blind-spots & competitive gaps)

🥉 Smaller cleanups (grouped)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Roadmap: codedb capability gaps & improvements (from the #528 audit) #531

Description

🥇 Top picks — highest leverage, mostly reusing what already exists

🥈 Bigger strategic bets (audit blind-spots & competitive gaps)

🥉 Smaller cleanups (grouped)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions