Releases: justrach/codedb
codedb 0.2.5825
A retrieval-quality, capability, and speed cut. 0.2.5825 closes out a long audit cycle (133 commits since 0.2.5824) and ships a sustained latency pass driven by 2,467 real production query-log calls — the search hot path is ~4–8× faster, repeat searches return in microseconds, and the single biggest production-tail bug (whole-repo tier-3 scans after a snapshot restore) is gone: negative searches drop 9.2 ms → 0.5–0.9 ms.
⚡ How much faster?
Every number below is a real measurement from this cycle (commit messages carry the full methodology).
Search latency
| Path | Before | After | Change |
|---|---|---|---|
searchContent hot path (#611) |
65–107 µs/query | 7.4–28.7 µs/query | ~4–8× |
| Repeat search (result LRU hit, #613) | 20.7 µs | 2.0 µs | ~10× |
| First MCP search call after startup (warmup, #613) | 21.8 ms (21–40 ms variance) | 6.3 ms (6.2–6.5 ms) | ~3.5×, stable |
| Fall-through / negative search after snapshot restore (#615) | 9.2 ms (whole-repo scan, recall_complete=false) |
0.5–0.9 ms (recall_complete=true) |
~10–18× |
| Symbol lookup with a complete index (#613) | ~6 ms/call | 50–100 ns | ~60,000× |
Zero-hit queries, 20k-file corpus, CODEDB_TRIGRAM_CAP uncapped (#615) |
7.1 ms | 1.4 ms | ~5× (opt-in: +110 MB peak RSS, +300 ms index time) |
Per-query micro-benchmarks (codedb repo, c_allocator, min-of-N, uncached — the benchmark pins CODEDB_NO_SEARCH_CACHE=1 so rows stay comparable across versions):
| Query | 0.2.5824 | 0.2.5825 | Speedup |
|---|---|---|---|
middleware |
88 µs | 10.2 µs | 8.6× |
database |
65 µs | 7.4 µs | 8.8× |
error |
107 µs | 19.6 µs | 5.5× |
authentication |
50 µs (mid-cycle) | 28.7 µs | 1.7× |
error (cache hit) |
20.7 µs | 2.0 µs | 10× |
How: line-offset cache instead of per-query line rescans, doc_id-grouped postings with a contiguous-run fast path (per-hit work drops to a doc_id compare), packed-u64-key sorts (no string compares or 40-byte struct moves inside the sort), rare-byte SIMD scan anchors (stop verifying authentication at every a), direct-address doc slots, symbol-length bitmasks that skip whole files, init-time path classification (was ~10 path tokenizations per path per rerank), memoized per-path rerank facts, and one outline fetch per candidate.
Memory & load path
| Path | Before | After |
|---|---|---|
| Snapshot fast-load, openclaw 13,654 files (#564) | 60 ms | 40 ms (−33%) |
| Pass C heap during load (#564) | +62.5 MB | +20.5 MB (−67%) |
| One-shot search physical footprint (#564) | 132.7 MB | 89.2 MB (−33%) |
| Max RSS, one-shot search (#564) | 244 MB | 200 MB |
codedb <dir> status (#553) |
full index materialized — a multi-GB resident process that never exited | metadata-only (reported by @lekt9 🙏) |
| Background warmup steady-state cost (#613) | — | ~70 ms one-time background CPU, +4.4 MB RSS (caches hard-capped at 4 MB each) |
The production numbers that drove it
A 2,467-call production query log showed: search p90 30 ms with occasional 2-second outliers, codedb_find median 4.5 ms / p90 17.7 ms, and 62% of calls being exact repeats of an earlier (tool, query) pair. All three tails are addressed: the p90/outliers traced to the #615 scan-set bug plus the 50 ms–2 s word-index rebuild that used to land on an innocent first query (now pre-paid by the warmup thread), the codedb_find tail was the O(files × symbols) safety scan (now gated), and the repeats now hit microsecond caches.
🔥 The big one: tier-3 scan-set reconciliation (#615)
Snapshot restore parks every file in skip_trigram_files (it can't know what the disk trigram index covers), and two compounding failures meant the set never emptied on the standard serve/mcp/cli-daemon startup path:
- Nothing pruned the set when the disk trigram index was later mmap-loaded.
- The snapshot freshness pass reindexes changed files into the heap trigram before the disk-load gate runs — and that gate early-returned on any heap entry. One dirty file blocked the disk trigram load for the whole repo.
Net effect: tier 3 content-scanned the entire project on every fall-through query, with recall_complete=false. Measured live: 613/616 files in the scan set. After the fix: 0.
All trigram replacement now funnels through adoptTrigramIndex / adoptTrigramBase (swap, bump the search generation, prune the skip set), and the mmap load keeps freshness-reindexed files as a masking overlay so their newer content wins over stale base entries.
⚡ Result caches + background warmup (#613)
- Whole-query result LRUs for
searchContent,renderPlainSearch(MCP fast path), and the BM25 ranked path — 64 entries / 4 MB each, validated against both the search generation and a fingerprint of the nine ranking kill-switch env vars.CODEDB_NO_SEARCH_CACHE=1disables. - Background warmup: serve/mcp/cli-daemon build + persist the word index off the query path and replay the most-repeated queries from
queries.log— 62% of production calls are exact repeats of an earlier(tool, query)pair, so the caches are warm before your first real call.CODEDB_NO_WARMUP=1disables; skipped underCODEDB_LOW_MEMORY. - Race fix: generation bumps moved inside the exclusive lock — a concurrent search can no longer cache pre-mutation results under the post-mutation generation.
🧠 Ranking: query-specific graph signals (#550, #546, #554)
- Call-graph distance (#608) — files near the matched symbols in the resolved call graph get a query-specific boost (
CODEDB_NO_GRAPH_DISTANCEopts out). - Git co-change (#609) — a bounded history pass (500 commits, ≤32-file commits, top-8 partners) boosts files that historically change together (
CODEDB_NO_COCHANGEopts out). - Negative lexical file-frequency penalty (#554) — mention-everywhere terms stop dragging hub files up.
- Multi-word CLI search is ranked end-to-end (#546) — incl. the first cold run; tooling paths (bench/scripts/website/install) rank below
srcimplementation (#557), basename test files get the test penalty (#580), and mention-dense tooling files can't saturate past the path prior (#598).
🆕 Features
codedb_callpath— shortest resolved call chain between two symbols, each hop aspath:name@line(#531).- PageRank graph centrality in ranked search (replaces in-degree;
CODEDB_IN_DEGREE_CENTRALITYreverts) (#531). codedb_context max_tokens— value-ordered section packing under a token budget, byte-identical output without the arg (#610).- Richer
codedb_symbol— kind / prefix / glob / fuzzy filters, optional source body per hit. format=json+paths_only+path_globon search — structured output with provenance meta, ~50% fewer tokens for broad surveys.codedb_changesin the CLI (#578),CODEDB_TRIGRAM_CAPfor big-corpus operators (#615),CODEDB_ALLOW_TEMPfor CI harnesses on temp checkouts (#538).
🛡️ Correctness & hardening
- Search recall after a snapshot load (#537, #539): restored files are searchable again; call-graph edges into restored files are back (#537b).
- Store hardening (#597, #603): no unlocked diff writes, data-log compaction, clean failure paths.
- mmap overlay (#593, #600): overlay edits mask stale base entries;
writeToDiskpersists merged state. - Word index (#583, #585, #606): stale postings dropped on disk load; doc_id slots reused — bounded memory in long-lived daemons.
- ContentCache (#584, #596): probe-window reachability + byte budget.
- OOM-safe indexing (#594), per-project flock for cli-daemon spawn (#592), comment/string-aware call-site extraction (#562, #572).
- Secret filtering (#589, #572):
id_ecdsa/id_dsa/*_skFIDO keys,*.envvariants,.git-credentialsblocked from indexing and search. - TS/JS dependency graph (#540–#543, #548): multi-line + re-export imports, relative-path resolution, no bogus deps from strings.
- A dozen CLI/tool UX fixes (#558, #560, #566, #568–#570, #573, #576, #588) — every one landed with a failing test first.
🙏 Contributors
- @nsxdavid — TS/JS dependency-graph fixes (#542, #543)
- @lekt9 — reported the resident-status-process leak (#553), now metadata-only
- @idea404 — PR #535 (local fallback when api.wiki.codes is unreachable), under review for the next cut
Full details in the CHANGELOG.
Install
curl -fsSL https://codedb.codegraff.com/install.sh | sh
or npx -y codedeebee mcp
| Platform | Asset | Signed |
|---|---|---|
| macOS ARM64 (Apple Silicon) | codedb-darwin-arm64 |
✅ codesigned + notarized |
| macOS x86_64 (Intel) | codedb-darwin-x86_64 |
temporarily unsigned (#504) |
| Linux ARM64 | codedb-linux-arm64 |
— |
| Linux x86_64 | codedb-linux-x86_64 |
— |
codedb 0.2.5824
A deterministic code-graph layer, a ~3× faster cold path, and a warm CLI — plus a batch of correctness fixes from a great community audit.
⚡ Performance
- Snapshot load ~3× — 380 → 125 ms on ~39k files; peak RSS 795 → 457 MB (−338 MB). mmap'd content section, borrowed strings, zero-copy
ContentCache, parallel freshness check, no re-hashing on load. (#524) - Cold index: RSS 4.3 GB → 1 GB, wall-time ~6.5× — worker-local parallel scan. (#519)
- Parallel WordIndex build — cold index ~1.49× + leaner ranked search. (#520)
- Warm CLI daemon: 13–114× per call —
codedb <repo> <query>auto-spawns/reuses a per-project warm daemon over a Unix socket instead of cold-reindexing. (#525) — answers @ahndohun's ask in #518 to keep the snapshot warm across CLI calls. - Faster fuzzy
find— SIMD Smith-Waterman (~1.8×, retrieval-identical) + a ~22× compound-identifier fast path. (#526)
🆕 Features
- Code-graph layer + graph-aware ranking (+15% MRR, 0.819 → 0.944) — a no-LLM resolved call graph, persisted in the snapshot; centrality folded into ranking, zero recall loss. (#523, #524)
- Edge-aware
codedb_context— now lists callers and callees. (#524) - ReScript
.res/.resisupport —let/type/module/external/open, decorators stripped. (#533) — requested by @yousafsabir (#532). - Windsurf + Devin auto-registration — direct JSON writes from the installer. (#521, #522)
- CLI hardening — robust arg parsing/validation, correct exit codes, new
codedb status, globally-honored--no-telemetry. (#529)
🐛 Correctness & fixes
- Non-ASCII identifiers (e.g. Korean) now indexed by
codedb_outline/codedb_symbol. (#524) — thanks @ahndohun (#518) codedb_findscore floor — non-matching queries return "no match" instead of confident bogus hits. (#524) — thanks @ahndohun (#518)- Python
classis labeledclass, notstruct_def. (#524) — thanks @ahndohun (#518) - Snapshot writer u16 name-length overflow that could panic on very long identifiers — fixed. (#525)
- Secret-filter drift guard + per-session edit locks from the #528 capability audit, with a runtime lock test. (#530)
🙏 Thanks
- @ahndohun — a thorough correctness/UX audit (#518): non-ASCII identifiers, the
findscore floor, Pythonclasskind, and the warm-CLI-daemon ask. - @yousafsabir — the ReScript language request (#532).
- @eramax — the opencode-subagent report (#516), which prompted verifying subagent MCP access and the CLI fallback path.
Install / update
codedb update
# or
curl -fsSL https://codedb.codegraff.com/install.sh | bash
macOS (codesigned + notarized) and Linux x86_64 / arm64. SHA256 checksums included.
Full details in CHANGELOG.md.
codedb 0.2.5823
0.2.5823 is an MCP compatibility hotfix for direct tools/call requests.
It ships the issue #512 fix and adds a wire-level stdio backtest so future
releases catch this exact client-wrapper failure mode.
MCP direct tool-call compatibility
- #512 — direct calls no longer drop inline args when
argumentsis empty.
Some clients send canonical MCPparams.nameandparams.arguments, but a
wrapper layer may also emitarguments: {}while placing the real fields
inline onparams, for example{"name":"codedb_outline","arguments":{}, "path":"src/mcp.zig"}. Directtools/callpreviously treated the empty
argumentsobject as authoritative, dispatchedcodedb_outlinewith no
path, and returnedmissing 'path'/received keys: []even though the
request contained a path. - Canonical MCP behavior is preserved. Non-empty
params.argumentsremains
authoritative. Whenargumentsis empty or absent, direct calls now copy
non-administrative inline fields into a clean argument map before dispatch.
A legacyparams.argsobject is accepted only as a compatibility fallback
when canonical args are absent or empty. Malformed non-objectarguments
still returns the protocol errorarguments must be object. - Diagnostics now match direct calls. Missing-arg guidance no longer says
"sub-op" for directtools/call; it explains the canonical direct shape and
separately mentions the bundled inline fallback.
Backtesting
- Added
test "issue-512: direct tools call accepts inline args when arguments is empty"to exercise the direct call handler. - Extended
scripts/e2e_mcp_test.pywith Scenario 4, which sends the malformed
direct stdio MCP request through the real server process. The fixed binary
passes 20/20 E2E checks; the pre-fix binary fails Scenario 4 with the old
missing 'path'/received keys: []response. - A subagent also validated the change with codedb MCP available. Its MCP
snapshot was stale, so it used codedb MCP to inspect what was available and
then confirmed the current disk state plus the focused and stdio E2E tests.
Release metadata
src/release_info.zig,build.zig.zon, andnpm/package.jsonare aligned
on0.2.5823.- The release branch
release/0.2.5823has been merged back intomain.
Deployment
- GitHub release assets were rebuilt locally from
release/0.2.5823with Zig
0.16.0and uploaded over the earlier CI-built assets. - macOS ARM64 was locally signed and its release archive was accepted by Apple
notarization. - macOS x86_64 remains unsigned by design because the build file documents a
Zig 0.16/macOS 26 crash after signing that slice. codedeebee@0.2.5823is published to npm with thelatesttag.
Validation
zig build test -Dtest-filter=issue-512zig build testzig buildpython3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb-release-0.2.5823
— 20/20 passed- GitHub PR bench-regression for #513 and #514: success
- Local release asset download verification: all checksums passed, macOS ARM64
and x86_64 both reportcodedb 0.2.5823. - npm registry install verification:
codedeebee@0.2.5823installs and runs
codedb 0.2.5823.
See benchmarks/v0.2.5823-validation.md
for the release validation notes.
codedb 0.2.5822
0.2.5822 is a hot-path performance and release-reliability follow-up to
0.2.5821. It keeps the protocol fixes from 0.2.5821, cuts the cost of
the common MCP tools, removes parser boilerplate, and fixes the remaining
Intel macOS/Rosetta release crash by leaving the x86_64 macOS artifact
unsigned until the Zig/Mach-O signing issue is resolved.
MCP hot-path performance
- Pre-rendered responses for hot tools.
codedb_tree,codedb_outline,
codedb_hot,codedb_deps,codedb_status, and related MCP response paths
now avoid unnecessary deep clones and intermediate buffers. The corrected
benchmark harness now runs cases from the temp corpus root, so edit/read
timings measure the intended project instead of the caller's checkout. - Lower edit latency.
codedb_editavoids extra project-root work in the
hot path and dropped from236300 nsto44700 nsp50 in the corrected
microbench, an 81.08% reduction. - No benchmark-critical regressions. Comparing the corrected baseline to
this release, every comparable MCP benchmark improved by more than 50%:
codedb_tree14530 -> 6270 ns,codedb_outline62930 -> 12820 ns,
codedb_search33700 -> 8450 ns,codedb_deps1620 -> 70 ns,
codedb_bundle93040 -> 28380 ns, andcodedb_snapshot
60100 -> 27750 ns.
Parser maintenance
src/explore.zigparser append cleanup. Older language parsers had many
repeated "dupe name/detail/import then append" blocks. These now route
through shared helpers that preserve the prior symbol/detail behavior while
cutting 393 net lines fromsrc/explore.zig(83 insertions,
476 deletions). This is intentionally behavior-preserving cleanup after
the parser expansion in earlier releases.
Glob matching
- #511 — brace alternatives in glob patterns.
codedb_globand all MCP
path_globfilters now support simple shell-style alternatives such as
**/*.{yaml,yml}andsrc/{mcp,explore}.zig. Malformed braces without a
comma continue to match literally, so existing literal-brace paths keep
working. This fixes the confusing zero-result behavior agents hit when
surveying YAML files with one glob.
macOS Intel / Rosetta
- #504 — signed x86_64 macOS binaries still crashed. Local Rosetta testing
reproduced the publishedv0.2.5821codedb-darwin-x86_64crash:
--helpexited139with no output. A fresh0.2.5822x86_64 build works
when unsigned, but manually applying an ad-hoc signature to that exact binary
brings back exit139. This matches the issue thread's native-Intel finding:
the crash is triggered by codesigning Zig 0.16 x86_64-macos binaries on
macOS 26, not by codedb startup logic. - Release workaround.
build.zignow makes-Dcodesign-identityopt-in and
skips codesign forx86_64-macoseven if the option is provided. The release
workflow no longer passes-Dcodesign-identityfor the Intel macOS matrix
entry. Apple Silicon macOS artifacts still sign with hardened runtime when
the signing identity is configured. - Docs updated to match distribution reality. README and MCP docs now state
thatcodedb-darwin-x86_64is temporarily unsigned and should be verified
by SHA256 checksum. Zig version badges / requirements now say Zig 0.16.
Release metadata
src/release_info.zig,build.zig.zon, andnpm/package.jsonare aligned
on0.2.5822, so the native binary andcodedeebeepackage metadata agree.
Validation
zig build testzig build test-query -Dtest-filter="issue-511"zig build test-mcp -Doptimize=ReleaseFastzig buildpython3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb
— 17/17 passed- Rosetta x86_64 release test:
- published signed
v0.2.5821asset:--helpexit139 - patched unsigned
0.2.5822x86_64 build:--helpexit0,
--versionexit0, MCP e2e 17/17 passed - manually re-signed patched x86_64 build:
--helpexit139 - patched arm64 macOS build: signed and
--helpexit0
- published signed
- Four-subagent SWE-bench Lite smoke using
codedb 0.2.5822on non-temp
workspaces:pallets__flask-4992: target TOML config test passed.pytest-dev__pytest-5221: two target fixture-listing tests passed with
plugin autoload disabled for the old pytest checkout.sympy__sympy-12454: rectangular matrix upper-triangular and Hessenberg
target tests passed.psf__requests-2317: codedb navigation succeeded, but the old checkout's
target pytest collection is blocked on Python 3.14 because stdlibcgiwas
removed; a direct smoke confirmed byte and string methods normalize to
GET.
See benchmarks/v0.2.5822-validation.md
for the benchmark table and SWE-bench Lite smoke details.
v0.2.5821 — 7-issue triage bundle
Bundle of seven fixes from the 2026-05-28 open-issue triage. PR: #509.
Closes
- #501 npm/npx distribution (codedeebee published)
- #502 mcp loading_snapshot stuck + sub-issues
- #503
codedb mcp <path>arg order - #504 macOS Intel x64 startup segfault (Zig
!voidmain runtime wrapper) - #505 opencode "No MCP tools"
- #506 Zed MCP timeout
- #507 search misses content after snapshot rebuild
- #508 codedb_remote HTTP 530 / Cloudflare 1033 actionable errors
See CHANGELOG.md for details.
Install
curl -fsSL https://codedb.codegraff.com/install.sh | bashOr via npm/npx:
npx -y codedeebee mcpVerification
635/635 tests pass across all 8 test binaries.
Binaries: codesigned (Developer ID Application: Rachit Pradhan) + notarized by Apple.
v0.2.5820
v0.2.5820
Version bump — identical to v0.2.5819 except the version string is 0.2.5820 so codedb update correctly sees it as newer than 0.2.58181.
See v0.2.5819 release notes for the full changelog (telemetry fix, installer hooks, Linux cross-compile fix).
v0.2.58181 — hotfix: CWD snapshot pollution
Fixes
- #496:
codedb.snapshot(and full index shards) were written into the process's current working directory instead of the indexed project root. This caused ~55MB of untracked binary files to appear ingit statuswhen the MCP server's CWD differed from the indexed root. AllwriteSnapshotDualcalls now use absolute paths. - #451:
scope=truesearch now correctly surfaces large skip-trigram files (already fixed via #447 refactor; added verification test). - #494: Test suite OOM resolved by prior test binary split.
Housekeeping
- Closed 8 stale/won't-do issues (#181–184 Windows support, #302, #196, #453, #454)
- Pruned all stale branches (66+ local, 7 remote)
- 0 open issues remaining
Assets
macOS binaries are codesigned + Apple notarized.
v0.2.5819
v0.2.5819
Telemetry fix
- Version stamped on every event — previously
versionwas only emitted onsession_startNDJSON lines, leavingtool_call,search_breakdown, andcodebase_statsevents withver=NULLin the analytics DB. Now every event carries the version field, enabling per-version byte-usage and tier-breakdown queries.
Claude Code hooks (installer)
register_hooks—curl codedb.codegraff.com/install.sh | bashnow auto-registers two Claude Code hooks:codedb-block-legacy.sh(PreToolUse/Bash) — redirectsgrep,cat,find,sed,head/tailtomcp__codedb__codedb_search/read/edit/find/glob. Graceful fallback: no-op when codedb is not installed; block message says "use Bash directly" when MCP is not connected.codedb-warmup.sh(SessionStart) — backgroundcodedb . statusto pre-warm the index on session start.- Hooks merge cleanly with existing hooks from other tools (e.g. muonry) — no clobbering.
Build fix
- Portable
sigactionmask init — fixes Linux cross-compile failure wheresigset_tis[16]c_ulong(notu32like macOS). Usesstd.mem.zeroesfor platform-independent zero initialization.
Checksums
76bff118 codedb-darwin-arm64
3eeb34c0 codedb-darwin-x86_64
63fc9d4c codedb-linux-x86_64
d56661a0 codedb-linux-arm64
v0.2.5818 — MCP stability + issue-44 + security + correctness fixes
TL;DR
v0.2.5818 merges the perf/search-and-snapshot-optimizations branch: MCP stability fixes (SIGPIPE, broken stdout, stale snapshots), security hardening (.env bypass, BM25 NaN safety), per-tier OTEL-style telemetry, and 8 independent test binaries replacing the monolithic test suite. Also includes the cross-platform sigemptyset() fix for Linux musl builds.
This release sits on top of v0.2.5815–5817, which shipped codedb_context (1 call replaces 3–5), reader.md (auto-prepended codebase maps), and the codedb read CLI.
What's in this release (since v0.2.5817)
| Change | Impact |
|---|---|
| SIGPIPE + broken stdout handling | MCP server no longer crashes when client disconnects mid-response |
| Issue-44: stale snapshot content | Search now sees working tree changes after snapshot invalidation |
| .env-local / .env_production bypass | Sensitive-path filter now blocks all .env variants |
| BM25 NaN safety | Zero-length documents no longer produce NaN scores |
| Per-tier search telemetry | OTEL-style spans for Tier 0–5 search breakdown |
| Search + snapshot I/O optimizations | Benchmarked hot-path improvements |
| 8 independent test binaries | Replaces monolithic tests.zig for faster CI |
| Cross-platform sigset_t fix | sigemptyset() instead of scalar 0 for Linux musl |
Cumulative since v0.2.5815
- codedb_context — task-shaped context composer, 1 MCP call replaces 3–5
- reader.md — hash-stable codebase maps, −57% tool calls on narrow-symbol tasks
- codedb read — CLI subcommand with path-safety guards
- Tier 5 short-circuit — skip full-scan when trigram returned candidates (Suspense regex: 15.6× faster)
- Trigram cap 64KB → 1MB — wider recall for large files
- codedb_status — 9.4× faster with cached
approxIndexSizeBytes
Eval results (QD Matrix — 8 tasks, 4 backends, 2 corpora)
codedb is Pareto-optimal: highest quality (4.65/5), lowest wall time (25.2s), best tokens-per-quality-point (3,892). Wins 5/5 quality niches in the MAP-Elites grid.
| backend | quality | tokens | wall (s) | status |
|---|---|---|---|---|
| codedb | 4.65 | 18,083 | 25.2 | PARETO-OPTIMAL |
| fts5_trigram | 4.38 | 17,172 | 36.9 | PARETO-OPTIMAL |
| codedb_LEAN | 4.33 | 24,474 | 108.0 | dominated |
| lean-ctx | 4.25 | 21,452 | 67.8 | dominated |
Notarization & verification
| binary | notary submission | status |
|---|---|---|
codedb-darwin-arm64 |
d97d5a25-a15f-44e9-a62a-24fa2bb1ed9c |
Accepted |
codedb-darwin-x86_64 |
4e7c4943-8939-4ef1-9163-7f49a15f1780 |
Accepted |
codedb-linux-x86_64 |
n/a (statically linked musl) | — |
Verify:
shasum -a 256 codedb-darwin-arm64
# expected: 88baed2b7e241dea2b6dd3cd0c2fd37d230346e34f6cd86be069ae8b90a79e12
Full SHA-256 list in checksums.sha256.
Full changelog: v0.2.5817…v0.2.5818
v0.2.5817 — reader.md + perf + security
TL;DR
v0.2.5817 ships reader.md — a hash-stable, agent-authored codebase map that codedb auto-prepends to codedb_context responses. Plus the perf + security bundle from v0.2.5816 (which never got tagged), plus three new codedb_context enhancements (inline symbol bodies, callers section, task-length gate).
Highlights vs the released v0.2.5815:
| v0.2.5815 | v0.2.5817 | |
|---|---|---|
Suspense regex p50 |
2.82 ms | 0.18 ms (15.6× faster) |
useState regex p99 |
16.57 ms | 2.04 ms (8.1× faster) |
codedb read CLI |
absent | present (with security guards) |
| Sensitive-file blocking | n/a | blocked (.env, id_rsa, .ssh/*) |
.codedb/reader.md support |
n/a | present (auto-prepend, hash-verified) |
| codedb_context inline bodies | no | yes (≤6 lines for ≤3 symbols) |
| codedb_context callers section | no | yes (top 6 non-test execution sites) |
End-to-end agent eval (Sonnet 4.6, n=3 per task) shows the v0.2.5817 binary cuts median tool calls on every task: T1 flask 5→4, T2 regex 13→7, T3 react 13→10.
New — reader.md auto-prepend
A .codedb/reader.md file (≤200 LOC of markdown, with a blake2b source_hash over up-to-20 listed source files) gets auto-prepended to every codedb_context response. When source files drift, codedb emits a "regenerate" hint; when missing, it's silent.
Lifecycle:
agent calls codedb_context
↓
codedb loads .codedb/reader.md
↓
blake2b(sorted source_files) == declared_hash?
├─ yes → prepend body with `<!-- reader.md (hash-verified): -->` markers
├─ no → prepend "stale, regenerate" hint
└─ missing/malformed → silent
↓
(existing composer output follows)
The agent regenerates reader.md (≤200 LOC budget, picks ≤10 source_files, computes blake2b) when it sees the stale signal. Codedb itself never writes the file.
Security guards (all close P1 review findings):
source_filesrejects absolute paths and..traversal — no reading/etc/passwdvia a hostile reader.mdsource_filescapped at 20 entries — no 600-entry × 8 MB DoS on every context callloc_actualcapped at 240 — no 60 KB body bloat- Golden blake2b roundtrip test locks the algorithm against std-library drift
The runtime overhead when reader.md is missing is ~0.1 ms (one stat + early return). When present and valid, recomputing the hash on every call adds another ~0.1 ms on small source_files.
Task-length gate: reader.md prepend is skipped for tasks ≤80 chars (narrow lookups where the composer's keyword extractor already pinpoints the answer). This avoids the ~5 KB body overhead on tasks that don't need orientation.
New — codedb_context symbol-body inline
When ## Symbol definitions has ≤3 entries, inline the first ~6 lines of each so the agent doesn't need a follow-up codedb_read:
## Symbol definitions
- before_request (function) — src/flask/sansio/scaffold.py:460
460 | def before_request(self, f: T_before_request) -> T_before_request:
461 | """Register a function to run before each request.
462 |
463 | For example, this can be used to open a database connection, or
464 | to load the logged in user from the session.
465 |
New — codedb_context callers section
For each ≤3 symbol_definition, surface up to 2 non-definition, non-test, non-import call sites with their enclosing scope:
## Callers (top non-test, non-import usages of these symbols)
- src/flask/app.py:1369: ... :attr:`before_request_funcs`
[in preprocess_request (function, L1366-L1392)]
That's literally the execution site the agent would have followed up for — pre-resolved in the first response.
Bundled from v0.2.5816 (never tagged)
- PR #484 —
codedb read <path>CLI subcommand (full file,-L FROM-TO,--compact)- P1 security:
isPathSafe+watcher.isSensitivePathguards - P2 correctness: opens project root, not cwd
- P1 security:
- PR #485 —
fix(search): skip Tier 5 full-scan when trigram returned candidates- Suspense regex query: 2.82 ms → 0.18 ms (15.6× faster)
- useState regex p99: 16.57 ms → 2.04 ms (8.1× faster)
- No recall regression — trigram filter is a sound superset
- PR #487 — shootout.py codegraph backend (multi-session bench against codegraph 0.7.10)
- PR #486 — ACE × codedb integration spec (design only)
- PR #483 — v0.2.5815 cross-corpus bench data
Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)
| Task | main median calls | exp median calls | Δ |
|---|---|---|---|
| T1 flask "find before_request decorator" (28 chars) | 5 | 4 | −1 ✓ |
| T2 regex "where is pattern compiled" (235 chars) | 13 | 7 | −6 ✓ |
| T3 react "passive effects flush" (230 chars) | 13 | 10 | −3 ✓ |
9/9 runs across the matrix returned correct answers.
Notarization & verification
All three binaries built locally on Apple Silicon. macOS binaries signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) + hardened runtime + secure timestamp, notarized via Apple notary service.
| binary | notary submission | gatekeeper |
|---|---|---|
codedb-darwin-arm64 |
576628b8-4f16-4a09-9e7b-917f51664033 — Accepted |
accepted, source=Notarized Developer ID |
codedb-darwin-x86_64 |
5f763d62-01c2-4245-9e6c-cc37cceec996 — Accepted |
accepted, source=Notarized Developer ID |
codedb-linux-x86_64 |
n/a (statically linked, ~13 MB) | smoke-tested via emulated docker linux/amd64 — codedb --version + tree command both green |
Verify the macOS download:
shasum -a 256 codedb-darwin-arm64
# expected: dea15a25a088f3b05d620e7a119377d09703c4e73512e35479819542c6c763c6
spctl -a -vv -t install codedb-darwin-arm64
# expected: accepted, source=Notarized Developer ID
Full SHA-256 list in checksums.sha256.
What's deferred (not blockers)
Critical-review pass from a Sonnet 4.6 sub-agent identified 11 issues. The 2 P1 (security) and 2 P2 (correctness) issues are closed in this release. P2/P3 follow-ups for the next cycle:
- I04
schema_versionparsed but not validated (cosmetic — only matters at format v2) - I05 reader.md not cached across calls (~0.1 ms per call; matters at scale)
- I06
codedb_statusdoesn't surface reader.md state (small ergonomic gap) - I09 stale hint doesn't include the previous source_files list
- I10 concurrent-write last-write-wins not documented
- I11 cost-benefit gate for shallow workloads (partial — task-length gate handles the codedb_context side)
Full changelog: v0.2.5815…v0.2.5817