perf: search result caches, background warmup, symbol-scan gating#613
Conversation
…earch Agents re-issue identical searches constantly. Two sibling caches now serve repeats: SearchResultCache (searchContent - a hit dupes the cached results into the caller's allocator, same ownership contract as a fresh search) and PlainRenderCache (renderPlainSearch - the MCP fast path renders straight to text and never reaches searchContent). 64 entries / 4 MB each, LRU. An entry is served only when BOTH its generation and env fingerprint still match. Explorer.search_gen bumps (atomically - searches hold the SHARED lock) on every mutation that can change results: commitParsedFileOwnedOutline, removeFile, rebuildWordIndex, and the one-shot lazy ranking builds (ensureSymbolIndex, call-graph, co-change). The fingerprint hashes the nine ranking kill-switch env vars, so tests that toggle CODEDB_LEX_FREQ_PENALTY et al mid-process can never be served results computed under the other setting. The generation is read BEFORE a search runs, so a concurrent mutation makes the stored entry stale immediately. CODEDB_NO_SEARCH_CACHE=1 disables both caches. The repo benchmark sets it for its per-query rows (numbers stay comparable across versions) and adds one explicit "cached" row: error 20.7us uncached -> 2.0us hit (10x). 8 new tests: hit identity + caller ownership, indexFile/removeFile invalidation, env-fingerprint staleness, kill-switch bypass, LRU bound, and the renderPlainSearch pair. 822/822 total, e2e MCP 20/20. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…breakdown on hits Bumping search_gen BEFORE taking the exclusive lock let a concurrent search load the new generation, win the shared lock, and cache pre-mutation results under the post-mutation generation — a permanent stale hit. Moved the bumps in commitParsedFileOwnedOutline, removeFile, and rebuildWordIndex inside the exclusive lock and documented the ordering invariant on bumpSearchGen. Cache hits also now restore the producing search's breakdown (tier/candidate/result counts, timings zeroed, cache_hit flag) instead of leaving last_search_breakdown pointing at whatever search ran last — mcp.zig's telemetry and the JSON provenance meta both read it after every search call. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ries.log Production query logs (2,467 calls) show the latency tail is lazy work charged to an innocent first query — the word-index rebuild after a snapshot fast-load runs 50ms-2s and lands on the first codedb_word/ search call — and 62% of calls are exact repeats of an earlier (tool, query) pair that the result caches could serve at microseconds, but only within one process lifetime. The serve/mcp/cli-daemon modes now spawn a background warmup thread that waits for the scan to be ready, then (1) loads-or-rebuilds and persists the word index off the query path, and (2) replays the most repeated recent queries from the project's queries.log WAL through the same entry points real codedb_search calls use (renderPlainSearch with the handler's default max_results, searchContentAuto fallback, so the caches are warm before the first real call and the replayed searches trigger the lazy ranking builds too. CODEDB_NO_WARMUP=1 disables it. Live MCP measurement on this repo: first-call search latency 21.8ms -> 6.3ms (remaining cost is JSON-RPC round-trip), variance 21-40ms -> 6.2-6.5ms. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> EOF )
Multi-word queries route through searchContentAuto to searchContentRanked, which had no result cache — so repeated conceptual/NL searches (and the warmup replay of logged multi-word queries) always paid the full BM25 + centrality pass. Uses a SEPARATE SearchResultCache instance with the same generation + env-fingerprint validation: the BM25 ranking returns different results than searchContent for an identical (query, max_results) key, so the two must never share entries (covered by a dedicated non-collision test). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
findSymbol/findAllSymbols/renderSymbols all ran full O(files × symbols) outline scans on EVERY call (renderSymbols twice: count + render) to catch symbols the index missed — a #310-era safety net that predates the symbol_index_complete flag (#564). All three entry points call ensureSymbolIndex first, and a complete index is maintained by every commit (rebuildSymbolIndexFor) and removal (removeSymbolIndexFor), so when complete the scans were pure overhead: ~6ms per call on a 20k-file corpus, matching the production codedb_find tail (med 4.5ms, p90 17.7ms). Now 50-100ns for index misses. ensureSymbolIndex also rebuilds from scratch now: entries indexed before markSymbolIndexIncomplete would otherwise be duplicated by the rebuild loop (had_prior=false skips eviction) — previously latent, load-bearing once the index is authoritative. benchmark: print tier3/4/5 in CODEDB_BENCH_BREAKDOWN rows (they were silently omitted, hiding 6ms of tier-3 time on zero-hit queries). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Low-memory mode trades latency for RSS everywhere else (see compactMcpReadyMemory); do not pre-pay index builds + result caches there. Measured warmup cost on a 620-file repo: ~70ms one-time background CPU, +4.4MB steady-state RSS vs the post-first-query baseline (caches are hard-capped at 4MB each). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
👋 Thanks for the contribution! Quick heads-up: this repo lands changes on the current Please retarget this PR via Edit → base branch to the active release branch (currently (Automated hint — reply here if you need a hand.) |
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956c8d9d70
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (list.items.len >= spec.max_results) break; | ||
| } | ||
| if (list.items.len >= spec.max_results) break; |
There was a problem hiding this comment.
Rank symbol matches before applying max_results
When a broad symbol query matches more than max_results entries (for example fuzzy=true on a typo with many candidate names, or kind=function in a large repo), this loop stops while iterating the symbol_index hash map and only sorts that arbitrary prefix afterward. That means better-scoring fuzzy matches, or alphabetically earlier matches that happen to be later in the map/outlines, are never considered and the new symbol search can return non-top results. Collect all matches (or maintain a top-k heap) before truncating.
Useful? React with 👍 / 👎.
Summary
Five commits attacking the production latency tail (search p90 30ms, codedb_find p90 17.7ms) from the 2,467-call production query log:
searchContent/renderPlainSearch(7c60f7d) and the BM25searchContentRankedpath (2e1c148). 64 entries / 4MB each; entries validated against bothExplorer.search_genand a fingerprint of the nine ranking kill-switch env vars.CODEDB_NO_SEARCH_CACHE=1disables. Measured: error query 20.7us -> 2.0us on hit.queries.logthrough the real search entry points. 62% of production calls are exact repeats of an earlier (tool, query) pair. First-call MCP search: 21.8ms -> 6.3ms.CODEDB_NO_WARMUP=1disables; skipped underCODEDB_LOW_MEMORY(956c8d9).findSymbol/findAllSymbols/renderSymbolsran full O(files x symbols) outline safety scans on every call — a feat: local-server trial — restore HTTP port, configurable CODEDB_PORT, O(1) findSymbol, MCP stdout fix #310-era net that predatessymbol_index_complete(perf: snapshot fast-load eagerly builds the symbol index — 33% of load time and ~43MB heap that plain search never uses #564). Gated on the flag: ~6ms/call -> 50-100ns for index misses on a 20k-file corpus.ensureSymbolIndexnow rebuilds from scratch to avoid duplicate entries.Stacked follow-up: the skip_trigram_files reconciliation fix (separate PR, based on this branch).
Test plan
zig build test— 835/835python3 scripts/e2e_mcp_test.py— 20/20CODEDB_NO_SEARCH_CACHE=1for comparable per-query rows + explicit cached rowGenerated with Devin