Skip to content

Roadmap: codedb capability gaps & improvements (from the #528 audit) #531

@justrach

Description

@justrach

Tracking issue for forward-looking improvements surfaced by the multi-agent capability audit run alongside #528 (CLI hardening) and #530 (audit bug fixes).

This is a roadmap / epic, not a defect — so it intentionally departs from the "every issue ships a failing test" rule (that's for bugs). Per-item, test-backed issues will be split out when each is picked up.

🥇 Top picks — highest leverage, mostly reusing what already exists

  1. Call-path queries + PageRank rankingimpact: high, effort: M
    codegraph.buildEdges already computes the full call graph but discards the edges, keeping only in-degree. Retain them to unlock codedb_callpath A→B (shortest call chain — the Bug: edit.zig advisory lock not released on error paths #1 follow-up after finding a bug) and PageRank-over-imports as a much stronger ranking signal. Near-free given the substrate exists; pure-Zig, low-RSS. Suggested first pickup.

  2. format=json output + per-result provenanceimpact: high, effort: L
    All tool output is prose; errors are bare string prefixes with no machine-readable code; results carry no confidence, no "which index tier produced this," no "this file was past the 15k-trigram cap (recall incomplete)." Add a JSON output mode + response metadata so agents can threshold on results and tell "not found" from "not indexed."

  3. Prefix / fuzzy / kind-filtered symbol searchimpact: high, effort: S
    codedb_symbol is exact-name only. The symbol index is already a StringHashMap; add a sorted name list + reuse the existing Smith-Waterman scorer → symbol parse_*, kind=interface, "all *Manager". (Also fixes a Go symbol-kind parse bug.)

  4. Git-churn / recency fused into rankingimpact: high, effort: M
    git.zig is ~31 LOC and ranking is 100% static-structural — zero temporal signal. Shell out to git log for churn + map diffs→symbols → codedb_git, "what changed in this PR", churn-boosted ranking.

  5. Token-budget-aware context packingimpact: high, effort: M
    "Token-efficient" is a stated core value, yet every truncation is by lines/bytes, never model tokens (e.g. codedb_context silently cuts symbol bodies at 40 lines). Add max_tokens + a value-density knapsack packer: "give me the best 8k tokens for this task."

🥈 Bigger strategic bets (audit blind-spots & competitive gaps)

  • Whole-repo orientation / cold-start map — no "README for a stranger"; codedb_context needs keywords you may not have yet. A PageRank-ranked codedb_overview (Aider/Cursor repo-map style).
  • Type-resolved find-referencescodedb_callers is whole-word text search → false positives for get/init. A lightweight receiver_type::method qualifier index.
  • Build/compiler-diagnostic ingestion — nothing consumes zig build/tsc/cargo check output to map errors onto symbols. "Show the symbols implicated by current build errors" is a top agent task and is absent.
  • Also unmodeled: structural/AST-pattern search (ast-grep/Comby-class), semantic/embedding retrieval, package-manifest graph (package.json/Cargo.toml/go.mod), config↔code linkage ("where is this key read"), and session working-memory (cross-call de-dup).

🥉 Smaller cleanups (grouped)

  • Parsing: doc-comment extraction into Symbol.detail; Python async def/decorated symbols dropped; wrong Go symbol kinds; no Makefile/Dockerfile; no shebang/content detection.
  • Edit/refactor: symbol-rename, move/delete, atomic multi-file edit groups, synchronous post-edit validation, dry_run diff for str_replace.
  • Indexing: OS-native fs events (kqueue/inotify/FSEvents) vs polling; nested per-dir .gitignore; snapshot FORMAT_VERSION validated/migrated on load; content-based generated/vendored/minified exclusion.
  • Security/observability: HTTP serve has zero auth; per-tool error-rate/p99 metrics; update signature verification beyond SHA-256.

Generated from a 9-agent capability audit (7 dimension maps → blind-spot/competitive critic → prioritized synthesis).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus:backlogWork item has not been started

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions