Skip to content

Releases: justrach/codedb

codedb 0.2.5825

12 Jun 15:48

Choose a tag to compare

A retrieval-quality, capability, and speed cut. 0.2.5825 closes out a long audit cycle (133 commits since 0.2.5824) and ships a sustained latency pass driven by 2,467 real production query-log calls — the search hot path is ~4–8× faster, repeat searches return in microseconds, and the single biggest production-tail bug (whole-repo tier-3 scans after a snapshot restore) is gone: negative searches drop 9.2 ms → 0.5–0.9 ms.

⚡ How much faster?

Every number below is a real measurement from this cycle (commit messages carry the full methodology).

Search latency

Path Before After Change
searchContent hot path (#611) 65–107 µs/query 7.4–28.7 µs/query ~4–8×
Repeat search (result LRU hit, #613) 20.7 µs 2.0 µs ~10×
First MCP search call after startup (warmup, #613) 21.8 ms (21–40 ms variance) 6.3 ms (6.2–6.5 ms) ~3.5×, stable
Fall-through / negative search after snapshot restore (#615) 9.2 ms (whole-repo scan, recall_complete=false) 0.5–0.9 ms (recall_complete=true) ~10–18×
Symbol lookup with a complete index (#613) ~6 ms/call 50–100 ns ~60,000×
Zero-hit queries, 20k-file corpus, CODEDB_TRIGRAM_CAP uncapped (#615) 7.1 ms 1.4 ms ~5× (opt-in: +110 MB peak RSS, +300 ms index time)

Per-query micro-benchmarks (codedb repo, c_allocator, min-of-N, uncached — the benchmark pins CODEDB_NO_SEARCH_CACHE=1 so rows stay comparable across versions):

Query 0.2.5824 0.2.5825 Speedup
middleware 88 µs 10.2 µs 8.6×
database 65 µs 7.4 µs 8.8×
error 107 µs 19.6 µs 5.5×
authentication 50 µs (mid-cycle) 28.7 µs 1.7×
error (cache hit) 20.7 µs 2.0 µs 10×

How: line-offset cache instead of per-query line rescans, doc_id-grouped postings with a contiguous-run fast path (per-hit work drops to a doc_id compare), packed-u64-key sorts (no string compares or 40-byte struct moves inside the sort), rare-byte SIMD scan anchors (stop verifying authentication at every a), direct-address doc slots, symbol-length bitmasks that skip whole files, init-time path classification (was ~10 path tokenizations per path per rerank), memoized per-path rerank facts, and one outline fetch per candidate.

Memory & load path

Path Before After
Snapshot fast-load, openclaw 13,654 files (#564) 60 ms 40 ms (−33%)
Pass C heap during load (#564) +62.5 MB +20.5 MB (−67%)
One-shot search physical footprint (#564) 132.7 MB 89.2 MB (−33%)
Max RSS, one-shot search (#564) 244 MB 200 MB
codedb <dir> status (#553) full index materialized — a multi-GB resident process that never exited metadata-only (reported by @lekt9 🙏)
Background warmup steady-state cost (#613) ~70 ms one-time background CPU, +4.4 MB RSS (caches hard-capped at 4 MB each)

The production numbers that drove it

A 2,467-call production query log showed: search p90 30 ms with occasional 2-second outliers, codedb_find median 4.5 ms / p90 17.7 ms, and 62% of calls being exact repeats of an earlier (tool, query) pair. All three tails are addressed: the p90/outliers traced to the #615 scan-set bug plus the 50 ms–2 s word-index rebuild that used to land on an innocent first query (now pre-paid by the warmup thread), the codedb_find tail was the O(files × symbols) safety scan (now gated), and the repeats now hit microsecond caches.

🔥 The big one: tier-3 scan-set reconciliation (#615)

Snapshot restore parks every file in skip_trigram_files (it can't know what the disk trigram index covers), and two compounding failures meant the set never emptied on the standard serve/mcp/cli-daemon startup path:

  1. Nothing pruned the set when the disk trigram index was later mmap-loaded.
  2. The snapshot freshness pass reindexes changed files into the heap trigram before the disk-load gate runs — and that gate early-returned on any heap entry. One dirty file blocked the disk trigram load for the whole repo.

Net effect: tier 3 content-scanned the entire project on every fall-through query, with recall_complete=false. Measured live: 613/616 files in the scan set. After the fix: 0.

All trigram replacement now funnels through adoptTrigramIndex / adoptTrigramBase (swap, bump the search generation, prune the skip set), and the mmap load keeps freshness-reindexed files as a masking overlay so their newer content wins over stale base entries.

⚡ Result caches + background warmup (#613)

  • Whole-query result LRUs for searchContent, renderPlainSearch (MCP fast path), and the BM25 ranked path — 64 entries / 4 MB each, validated against both the search generation and a fingerprint of the nine ranking kill-switch env vars. CODEDB_NO_SEARCH_CACHE=1 disables.
  • Background warmup: serve/mcp/cli-daemon build + persist the word index off the query path and replay the most-repeated queries from queries.log — 62% of production calls are exact repeats of an earlier (tool, query) pair, so the caches are warm before your first real call. CODEDB_NO_WARMUP=1 disables; skipped under CODEDB_LOW_MEMORY.
  • Race fix: generation bumps moved inside the exclusive lock — a concurrent search can no longer cache pre-mutation results under the post-mutation generation.

🧠 Ranking: query-specific graph signals (#550, #546, #554)

  • Call-graph distance (#608) — files near the matched symbols in the resolved call graph get a query-specific boost (CODEDB_NO_GRAPH_DISTANCE opts out).
  • Git co-change (#609) — a bounded history pass (500 commits, ≤32-file commits, top-8 partners) boosts files that historically change together (CODEDB_NO_COCHANGE opts out).
  • Negative lexical file-frequency penalty (#554) — mention-everywhere terms stop dragging hub files up.
  • Multi-word CLI search is ranked end-to-end (#546) — incl. the first cold run; tooling paths (bench/scripts/website/install) rank below src implementation (#557), basename test files get the test penalty (#580), and mention-dense tooling files can't saturate past the path prior (#598).

🆕 Features

  • codedb_callpath — shortest resolved call chain between two symbols, each hop as path:name@line (#531).
  • PageRank graph centrality in ranked search (replaces in-degree; CODEDB_IN_DEGREE_CENTRALITY reverts) (#531).
  • codedb_context max_tokens — value-ordered section packing under a token budget, byte-identical output without the arg (#610).
  • Richer codedb_symbol — kind / prefix / glob / fuzzy filters, optional source body per hit.
  • format=json + paths_only + path_glob on search — structured output with provenance meta, ~50% fewer tokens for broad surveys.
  • codedb_changes in the CLI (#578), CODEDB_TRIGRAM_CAP for big-corpus operators (#615), CODEDB_ALLOW_TEMP for CI harnesses on temp checkouts (#538).

🛡️ Correctness & hardening

  • Search recall after a snapshot load (#537, #539): restored files are searchable again; call-graph edges into restored files are back (#537b).
  • Store hardening (#597, #603): no unlocked diff writes, data-log compaction, clean failure paths.
  • mmap overlay (#593, #600): overlay edits mask stale base entries; writeToDisk persists merged state.
  • Word index (#583, #585, #606): stale postings dropped on disk load; doc_id slots reused — bounded memory in long-lived daemons.
  • ContentCache (#584, #596): probe-window reachability + byte budget.
  • OOM-safe indexing (#594), per-project flock for cli-daemon spawn (#592), comment/string-aware call-site extraction (#562, #572).
  • Secret filtering (#589, #572): id_ecdsa / id_dsa / *_sk FIDO keys, *.env variants, .git-credentials blocked from indexing and search.
  • TS/JS dependency graph (#540#543, #548): multi-line + re-export imports, relative-path resolution, no bogus deps from strings.
  • A dozen CLI/tool UX fixes (#558, #560, #566, #568#570, #573, #576, #588) — every one landed with a failing test first.

🙏 Contributors

  • @nsxdavid — TS/JS dependency-graph fixes (#542, #543)
  • @lekt9 — reported the resident-status-process leak (#553), now metadata-only
  • @idea404 — PR #535 (local fallback when api.wiki.codes is unreachable), under review for the next cut

Full details in the CHANGELOG.

Install

curl -fsSL https://codedb.codegraff.com/install.sh | sh

or npx -y codedeebee mcp

Platform Asset Signed
macOS ARM64 (Apple Silicon) codedb-darwin-arm64 ✅ codesigned + notarized
macOS x86_64 (Intel) codedb-darwin-x86_64 temporarily unsigned (#504)
Linux ARM64 codedb-linux-arm64
Linux x86_64 codedb-linux-x86_64

codedb 0.2.5824

04 Jun 16:46
e199484

Choose a tag to compare

A deterministic code-graph layer, a ~3× faster cold path, and a warm CLI — plus a batch of correctness fixes from a great community audit.

⚡ Performance

  • Snapshot load ~3× — 380 → 125 ms on ~39k files; peak RSS 795 → 457 MB (−338 MB). mmap'd content section, borrowed strings, zero-copy ContentCache, parallel freshness check, no re-hashing on load. (#524)
  • Cold index: RSS 4.3 GB → 1 GB, wall-time ~6.5× — worker-local parallel scan. (#519)
  • Parallel WordIndex build — cold index ~1.49× + leaner ranked search. (#520)
  • Warm CLI daemon: 13–114× per callcodedb <repo> <query> auto-spawns/reuses a per-project warm daemon over a Unix socket instead of cold-reindexing. (#525) — answers @ahndohun's ask in #518 to keep the snapshot warm across CLI calls.
  • Faster fuzzy find — SIMD Smith-Waterman (~1.8×, retrieval-identical) + a ~22× compound-identifier fast path. (#526)

🆕 Features

  • Code-graph layer + graph-aware ranking (+15% MRR, 0.819 → 0.944) — a no-LLM resolved call graph, persisted in the snapshot; centrality folded into ranking, zero recall loss. (#523, #524)
  • Edge-aware codedb_context — now lists callers and callees. (#524)
  • ReScript .res / .resi supportlet/type/module/external/open, decorators stripped. (#533) — requested by @yousafsabir (#532).
  • Windsurf + Devin auto-registration — direct JSON writes from the installer. (#521, #522)
  • CLI hardening — robust arg parsing/validation, correct exit codes, new codedb status, globally-honored --no-telemetry. (#529)

🐛 Correctness & fixes

  • Non-ASCII identifiers (e.g. Korean) now indexed by codedb_outline / codedb_symbol. (#524) — thanks @ahndohun (#518)
  • codedb_find score floor — non-matching queries return "no match" instead of confident bogus hits. (#524) — thanks @ahndohun (#518)
  • Python class is labeled class, not struct_def. (#524) — thanks @ahndohun (#518)
  • Snapshot writer u16 name-length overflow that could panic on very long identifiers — fixed. (#525)
  • Secret-filter drift guard + per-session edit locks from the #528 capability audit, with a runtime lock test. (#530)

🙏 Thanks

  • @ahndohun — a thorough correctness/UX audit (#518): non-ASCII identifiers, the find score floor, Python class kind, and the warm-CLI-daemon ask.
  • @yousafsabir — the ReScript language request (#532).
  • @eramax — the opencode-subagent report (#516), which prompted verifying subagent MCP access and the CLI fallback path.

Install / update

codedb update
# or
curl -fsSL https://codedb.codegraff.com/install.sh | bash

macOS (codesigned + notarized) and Linux x86_64 / arm64. SHA256 checksums included.

Full details in CHANGELOG.md.

codedb 0.2.5823

29 May 07:26

Choose a tag to compare

0.2.5823 is an MCP compatibility hotfix for direct tools/call requests.
It ships the issue #512 fix and adds a wire-level stdio backtest so future
releases catch this exact client-wrapper failure mode.

MCP direct tool-call compatibility

  • #512 — direct calls no longer drop inline args when arguments is empty.
    Some clients send canonical MCP params.name and params.arguments, but a
    wrapper layer may also emit arguments: {} while placing the real fields
    inline on params, for example {"name":"codedb_outline","arguments":{}, "path":"src/mcp.zig"}. Direct tools/call previously treated the empty
    arguments object as authoritative, dispatched codedb_outline with no
    path, and returned missing 'path' / received keys: [] even though the
    request contained a path.
  • Canonical MCP behavior is preserved. Non-empty params.arguments remains
    authoritative. When arguments is empty or absent, direct calls now copy
    non-administrative inline fields into a clean argument map before dispatch.
    A legacy params.args object is accepted only as a compatibility fallback
    when canonical args are absent or empty. Malformed non-object arguments
    still returns the protocol error arguments must be object.
  • Diagnostics now match direct calls. Missing-arg guidance no longer says
    "sub-op" for direct tools/call; it explains the canonical direct shape and
    separately mentions the bundled inline fallback.

Backtesting

  • Added test "issue-512: direct tools call accepts inline args when arguments is empty" to exercise the direct call handler.
  • Extended scripts/e2e_mcp_test.py with Scenario 4, which sends the malformed
    direct stdio MCP request through the real server process. The fixed binary
    passes 20/20 E2E checks; the pre-fix binary fails Scenario 4 with the old
    missing 'path' / received keys: [] response.
  • A subagent also validated the change with codedb MCP available. Its MCP
    snapshot was stale, so it used codedb MCP to inspect what was available and
    then confirmed the current disk state plus the focused and stdio E2E tests.

Release metadata

  • src/release_info.zig, build.zig.zon, and npm/package.json are aligned
    on 0.2.5823.
  • The release branch release/0.2.5823 has been merged back into main.

Deployment

  • GitHub release assets were rebuilt locally from release/0.2.5823 with Zig
    0.16.0 and uploaded over the earlier CI-built assets.
  • macOS ARM64 was locally signed and its release archive was accepted by Apple
    notarization.
  • macOS x86_64 remains unsigned by design because the build file documents a
    Zig 0.16/macOS 26 crash after signing that slice.
  • codedeebee@0.2.5823 is published to npm with the latest tag.

Validation

  • zig build test -Dtest-filter=issue-512
  • zig build test
  • zig build
  • python3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb-release-0.2.5823
    20/20 passed
  • GitHub PR bench-regression for #513 and #514: success
  • Local release asset download verification: all checksums passed, macOS ARM64
    and x86_64 both report codedb 0.2.5823.
  • npm registry install verification: codedeebee@0.2.5823 installs and runs
    codedb 0.2.5823.

See benchmarks/v0.2.5823-validation.md
for the release validation notes.

codedb 0.2.5822

29 May 03:35

Choose a tag to compare

0.2.5822 is a hot-path performance and release-reliability follow-up to
0.2.5821. It keeps the protocol fixes from 0.2.5821, cuts the cost of
the common MCP tools, removes parser boilerplate, and fixes the remaining
Intel macOS/Rosetta release crash by leaving the x86_64 macOS artifact
unsigned until the Zig/Mach-O signing issue is resolved.

MCP hot-path performance

  • Pre-rendered responses for hot tools. codedb_tree, codedb_outline,
    codedb_hot, codedb_deps, codedb_status, and related MCP response paths
    now avoid unnecessary deep clones and intermediate buffers. The corrected
    benchmark harness now runs cases from the temp corpus root, so edit/read
    timings measure the intended project instead of the caller's checkout.
  • Lower edit latency. codedb_edit avoids extra project-root work in the
    hot path and dropped from 236300 ns to 44700 ns p50 in the corrected
    microbench, an 81.08% reduction.
  • No benchmark-critical regressions. Comparing the corrected baseline to
    this release, every comparable MCP benchmark improved by more than 50%:
    codedb_tree 14530 -> 6270 ns, codedb_outline 62930 -> 12820 ns,
    codedb_search 33700 -> 8450 ns, codedb_deps 1620 -> 70 ns,
    codedb_bundle 93040 -> 28380 ns, and codedb_snapshot
    60100 -> 27750 ns.

Parser maintenance

  • src/explore.zig parser append cleanup. Older language parsers had many
    repeated "dupe name/detail/import then append" blocks. These now route
    through shared helpers that preserve the prior symbol/detail behavior while
    cutting 393 net lines from src/explore.zig (83 insertions,
    476 deletions). This is intentionally behavior-preserving cleanup after
    the parser expansion in earlier releases.

Glob matching

  • #511 — brace alternatives in glob patterns. codedb_glob and all MCP
    path_glob filters now support simple shell-style alternatives such as
    **/*.{yaml,yml} and src/{mcp,explore}.zig. Malformed braces without a
    comma continue to match literally, so existing literal-brace paths keep
    working. This fixes the confusing zero-result behavior agents hit when
    surveying YAML files with one glob.

macOS Intel / Rosetta

  • #504 — signed x86_64 macOS binaries still crashed. Local Rosetta testing
    reproduced the published v0.2.5821 codedb-darwin-x86_64 crash:
    --help exited 139 with no output. A fresh 0.2.5822 x86_64 build works
    when unsigned, but manually applying an ad-hoc signature to that exact binary
    brings back exit 139. This matches the issue thread's native-Intel finding:
    the crash is triggered by codesigning Zig 0.16 x86_64-macos binaries on
    macOS 26, not by codedb startup logic.
  • Release workaround. build.zig now makes -Dcodesign-identity opt-in and
    skips codesign for x86_64-macos even if the option is provided. The release
    workflow no longer passes -Dcodesign-identity for the Intel macOS matrix
    entry. Apple Silicon macOS artifacts still sign with hardened runtime when
    the signing identity is configured.
  • Docs updated to match distribution reality. README and MCP docs now state
    that codedb-darwin-x86_64 is temporarily unsigned and should be verified
    by SHA256 checksum. Zig version badges / requirements now say Zig 0.16.

Release metadata

  • src/release_info.zig, build.zig.zon, and npm/package.json are aligned
    on 0.2.5822, so the native binary and codedeebee package metadata agree.

Validation

  • zig build test
  • zig build test-query -Dtest-filter="issue-511"
  • zig build test-mcp -Doptimize=ReleaseFast
  • zig build
  • python3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb
    17/17 passed
  • Rosetta x86_64 release test:
    • published signed v0.2.5821 asset: --help exit 139
    • patched unsigned 0.2.5822 x86_64 build: --help exit 0,
      --version exit 0, MCP e2e 17/17 passed
    • manually re-signed patched x86_64 build: --help exit 139
    • patched arm64 macOS build: signed and --help exit 0
  • Four-subagent SWE-bench Lite smoke using codedb 0.2.5822 on non-temp
    workspaces:
    • pallets__flask-4992: target TOML config test passed.
    • pytest-dev__pytest-5221: two target fixture-listing tests passed with
      plugin autoload disabled for the old pytest checkout.
    • sympy__sympy-12454: rectangular matrix upper-triangular and Hessenberg
      target tests passed.
    • psf__requests-2317: codedb navigation succeeded, but the old checkout's
      target pytest collection is blocked on Python 3.14 because stdlib cgi was
      removed; a direct smoke confirmed byte and string methods normalize to
      GET.

See benchmarks/v0.2.5822-validation.md
for the benchmark table and SWE-bench Lite smoke details.

v0.2.5821 — 7-issue triage bundle

28 May 15:15
2e8b668

Choose a tag to compare

Bundle of seven fixes from the 2026-05-28 open-issue triage. PR: #509.

Closes

  • #501 npm/npx distribution (codedeebee published)
  • #502 mcp loading_snapshot stuck + sub-issues
  • #503 codedb mcp <path> arg order
  • #504 macOS Intel x64 startup segfault (Zig !void main runtime wrapper)
  • #505 opencode "No MCP tools"
  • #506 Zed MCP timeout
  • #507 search misses content after snapshot rebuild
  • #508 codedb_remote HTTP 530 / Cloudflare 1033 actionable errors

See CHANGELOG.md for details.

Install

curl -fsSL https://codedb.codegraff.com/install.sh | bash

Or via npm/npx:

npx -y codedeebee mcp

Verification

635/635 tests pass across all 8 test binaries.
Binaries: codesigned (Developer ID Application: Rachit Pradhan) + notarized by Apple.

v0.2.5820

26 May 16:20

Choose a tag to compare

v0.2.5820

Version bump — identical to v0.2.5819 except the version string is 0.2.5820 so codedb update correctly sees it as newer than 0.2.58181.

See v0.2.5819 release notes for the full changelog (telemetry fix, installer hooks, Linux cross-compile fix).

v0.2.58181 — hotfix: CWD snapshot pollution

25 May 16:53

Choose a tag to compare

Fixes

  • #496: codedb.snapshot (and full index shards) were written into the process's current working directory instead of the indexed project root. This caused ~55MB of untracked binary files to appear in git status when the MCP server's CWD differed from the indexed root. All writeSnapshotDual calls now use absolute paths.
  • #451: scope=true search now correctly surfaces large skip-trigram files (already fixed via #447 refactor; added verification test).
  • #494: Test suite OOM resolved by prior test binary split.

Housekeeping

  • Closed 8 stale/won't-do issues (#181–184 Windows support, #302, #196, #453, #454)
  • Pruned all stale branches (66+ local, 7 remote)
  • 0 open issues remaining

Assets

macOS binaries are codesigned + Apple notarized.

v0.2.5819

26 May 15:51

Choose a tag to compare

v0.2.5819

Telemetry fix

  • Version stamped on every event — previously version was only emitted on session_start NDJSON lines, leaving tool_call, search_breakdown, and codebase_stats events with ver=NULL in the analytics DB. Now every event carries the version field, enabling per-version byte-usage and tier-breakdown queries.

Claude Code hooks (installer)

  • register_hookscurl codedb.codegraff.com/install.sh | bash now auto-registers two Claude Code hooks:
    • codedb-block-legacy.sh (PreToolUse/Bash) — redirects grep, cat, find, sed, head/tail to mcp__codedb__codedb_search/read/edit/find/glob. Graceful fallback: no-op when codedb is not installed; block message says "use Bash directly" when MCP is not connected.
    • codedb-warmup.sh (SessionStart) — background codedb . status to pre-warm the index on session start.
    • Hooks merge cleanly with existing hooks from other tools (e.g. muonry) — no clobbering.

Build fix

  • Portable sigaction mask init — fixes Linux cross-compile failure where sigset_t is [16]c_ulong (not u32 like macOS). Uses std.mem.zeroes for platform-independent zero initialization.

Checksums

76bff118  codedb-darwin-arm64
3eeb34c0  codedb-darwin-x86_64
63fc9d4c  codedb-linux-x86_64
d56661a0  codedb-linux-arm64

v0.2.5818 — MCP stability + issue-44 + security + correctness fixes

25 May 09:19

Choose a tag to compare

TL;DR

v0.2.5818 merges the perf/search-and-snapshot-optimizations branch: MCP stability fixes (SIGPIPE, broken stdout, stale snapshots), security hardening (.env bypass, BM25 NaN safety), per-tier OTEL-style telemetry, and 8 independent test binaries replacing the monolithic test suite. Also includes the cross-platform sigemptyset() fix for Linux musl builds.

This release sits on top of v0.2.5815–5817, which shipped codedb_context (1 call replaces 3–5), reader.md (auto-prepended codebase maps), and the codedb read CLI.

What's in this release (since v0.2.5817)

Change Impact
SIGPIPE + broken stdout handling MCP server no longer crashes when client disconnects mid-response
Issue-44: stale snapshot content Search now sees working tree changes after snapshot invalidation
.env-local / .env_production bypass Sensitive-path filter now blocks all .env variants
BM25 NaN safety Zero-length documents no longer produce NaN scores
Per-tier search telemetry OTEL-style spans for Tier 0–5 search breakdown
Search + snapshot I/O optimizations Benchmarked hot-path improvements
8 independent test binaries Replaces monolithic tests.zig for faster CI
Cross-platform sigset_t fix sigemptyset() instead of scalar 0 for Linux musl

Cumulative since v0.2.5815

  • codedb_context — task-shaped context composer, 1 MCP call replaces 3–5
  • reader.md — hash-stable codebase maps, −57% tool calls on narrow-symbol tasks
  • codedb read — CLI subcommand with path-safety guards
  • Tier 5 short-circuit — skip full-scan when trigram returned candidates (Suspense regex: 15.6× faster)
  • Trigram cap 64KB → 1MB — wider recall for large files
  • codedb_status — 9.4× faster with cached approxIndexSizeBytes

Eval results (QD Matrix — 8 tasks, 4 backends, 2 corpora)

codedb is Pareto-optimal: highest quality (4.65/5), lowest wall time (25.2s), best tokens-per-quality-point (3,892). Wins 5/5 quality niches in the MAP-Elites grid.

backend quality tokens wall (s) status
codedb 4.65 18,083 25.2 PARETO-OPTIMAL
fts5_trigram 4.38 17,172 36.9 PARETO-OPTIMAL
codedb_LEAN 4.33 24,474 108.0 dominated
lean-ctx 4.25 21,452 67.8 dominated

Notarization & verification

binary notary submission status
codedb-darwin-arm64 d97d5a25-a15f-44e9-a62a-24fa2bb1ed9c Accepted
codedb-darwin-x86_64 4e7c4943-8939-4ef1-9163-7f49a15f1780 Accepted
codedb-linux-x86_64 n/a (statically linked musl)

Verify:

shasum -a 256 codedb-darwin-arm64
# expected: 88baed2b7e241dea2b6dd3cd0c2fd37d230346e34f6cd86be069ae8b90a79e12

Full SHA-256 list in checksums.sha256.


Full changelog: v0.2.5817…v0.2.5818

v0.2.5817 — reader.md + perf + security

21 May 06:20

Choose a tag to compare

TL;DR

v0.2.5817 ships reader.md — a hash-stable, agent-authored codebase map that codedb auto-prepends to codedb_context responses. Plus the perf + security bundle from v0.2.5816 (which never got tagged), plus three new codedb_context enhancements (inline symbol bodies, callers section, task-length gate).

Highlights vs the released v0.2.5815:

v0.2.5815 v0.2.5817
Suspense regex p50 2.82 ms 0.18 ms (15.6× faster)
useState regex p99 16.57 ms 2.04 ms (8.1× faster)
codedb read CLI absent present (with security guards)
Sensitive-file blocking n/a blocked (.env, id_rsa, .ssh/*)
.codedb/reader.md support n/a present (auto-prepend, hash-verified)
codedb_context inline bodies no yes (≤6 lines for ≤3 symbols)
codedb_context callers section no yes (top 6 non-test execution sites)

End-to-end agent eval (Sonnet 4.6, n=3 per task) shows the v0.2.5817 binary cuts median tool calls on every task: T1 flask 5→4, T2 regex 13→7, T3 react 13→10.


New — reader.md auto-prepend

A .codedb/reader.md file (≤200 LOC of markdown, with a blake2b source_hash over up-to-20 listed source files) gets auto-prepended to every codedb_context response. When source files drift, codedb emits a "regenerate" hint; when missing, it's silent.

Lifecycle:

agent calls codedb_context
       ↓
       codedb loads .codedb/reader.md
       ↓
       blake2b(sorted source_files) == declared_hash?
       ├─ yes → prepend body with `<!-- reader.md (hash-verified): -->` markers
       ├─ no  → prepend "stale, regenerate" hint
       └─ missing/malformed → silent
       ↓
       (existing composer output follows)

The agent regenerates reader.md (≤200 LOC budget, picks ≤10 source_files, computes blake2b) when it sees the stale signal. Codedb itself never writes the file.

Security guards (all close P1 review findings):

  • source_files rejects absolute paths and .. traversal — no reading /etc/passwd via a hostile reader.md
  • source_files capped at 20 entries — no 600-entry × 8 MB DoS on every context call
  • loc_actual capped at 240 — no 60 KB body bloat
  • Golden blake2b roundtrip test locks the algorithm against std-library drift

The runtime overhead when reader.md is missing is ~0.1 ms (one stat + early return). When present and valid, recomputing the hash on every call adds another ~0.1 ms on small source_files.

Task-length gate: reader.md prepend is skipped for tasks ≤80 chars (narrow lookups where the composer's keyword extractor already pinpoints the answer). This avoids the ~5 KB body overhead on tasks that don't need orientation.

New — codedb_context symbol-body inline

When ## Symbol definitions has ≤3 entries, inline the first ~6 lines of each so the agent doesn't need a follow-up codedb_read:

## Symbol definitions
- before_request (function) — src/flask/sansio/scaffold.py:460
         460 |     def before_request(self, f: T_before_request) -> T_before_request:
         461 |         """Register a function to run before each request.
         462 |
         463 |         For example, this can be used to open a database connection, or
         464 |         to load the logged in user from the session.
         465 |

New — codedb_context callers section

For each ≤3 symbol_definition, surface up to 2 non-definition, non-test, non-import call sites with their enclosing scope:

## Callers (top non-test, non-import usages of these symbols)
- src/flask/app.py:1369: ... :attr:`before_request_funcs`
  [in preprocess_request (function, L1366-L1392)]

That's literally the execution site the agent would have followed up for — pre-resolved in the first response.

Bundled from v0.2.5816 (never tagged)

  • PR #484codedb read <path> CLI subcommand (full file, -L FROM-TO, --compact)
    • P1 security: isPathSafe + watcher.isSensitivePath guards
    • P2 correctness: opens project root, not cwd
  • PR #485fix(search): skip Tier 5 full-scan when trigram returned candidates
    • Suspense regex query: 2.82 ms → 0.18 ms (15.6× faster)
    • useState regex p99: 16.57 ms → 2.04 ms (8.1× faster)
    • No recall regression — trigram filter is a sound superset
  • PR #487 — shootout.py codegraph backend (multi-session bench against codegraph 0.7.10)
  • PR #486 — ACE × codedb integration spec (design only)
  • PR #483 — v0.2.5815 cross-corpus bench data

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Task main median calls exp median calls Δ
T1 flask "find before_request decorator" (28 chars) 5 4 −1
T2 regex "where is pattern compiled" (235 chars) 13 7 −6
T3 react "passive effects flush" (230 chars) 13 10 −3

9/9 runs across the matrix returned correct answers.

Notarization & verification

All three binaries built locally on Apple Silicon. macOS binaries signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) + hardened runtime + secure timestamp, notarized via Apple notary service.

binary notary submission gatekeeper
codedb-darwin-arm64 576628b8-4f16-4a09-9e7b-917f51664033Accepted accepted, source=Notarized Developer ID
codedb-darwin-x86_64 5f763d62-01c2-4245-9e6c-cc37cceec996Accepted accepted, source=Notarized Developer ID
codedb-linux-x86_64 n/a (statically linked, ~13 MB) smoke-tested via emulated docker linux/amd64 — codedb --version + tree command both green

Verify the macOS download:

shasum -a 256 codedb-darwin-arm64
# expected: dea15a25a088f3b05d620e7a119377d09703c4e73512e35479819542c6c763c6

spctl -a -vv -t install codedb-darwin-arm64
# expected: accepted, source=Notarized Developer ID

Full SHA-256 list in checksums.sha256.

What's deferred (not blockers)

Critical-review pass from a Sonnet 4.6 sub-agent identified 11 issues. The 2 P1 (security) and 2 P2 (correctness) issues are closed in this release. P2/P3 follow-ups for the next cycle:

  • I04 schema_version parsed but not validated (cosmetic — only matters at format v2)
  • I05 reader.md not cached across calls (~0.1 ms per call; matters at scale)
  • I06 codedb_status doesn't surface reader.md state (small ergonomic gap)
  • I09 stale hint doesn't include the previous source_files list
  • I10 concurrent-write last-write-wins not documented
  • I11 cost-benefit gate for shallow workloads (partial — task-length gate handles the codedb_context side)

Full changelog: v0.2.5815…v0.2.5817