Releases · justrach/codedb

12 Jun 15:48

justrach

v0.2.5825

9b3f9dc

codedb 0.2.5825 Latest

Latest

A retrieval-quality, capability, and speed cut. 0.2.5825 closes out a long audit cycle (133 commits since 0.2.5824) and ships a sustained latency pass driven by 2,467 real production query-log calls — the search hot path is ~4–8× faster, repeat searches return in microseconds, and the single biggest production-tail bug (whole-repo tier-3 scans after a snapshot restore) is gone: negative searches drop 9.2 ms → 0.5–0.9 ms.

⚡ How much faster?

Every number below is a real measurement from this cycle (commit messages carry the full methodology).

Search latency

Path	Before	After	Change
`searchContent` hot path (#611)	65–107 µs/query	7.4–28.7 µs/query	~4–8×
Repeat search (result LRU hit, #613)	20.7 µs	2.0 µs	~10×
First MCP search call after startup (warmup, #613)	21.8 ms (21–40 ms variance)	6.3 ms (6.2–6.5 ms)	~3.5×, stable
Fall-through / negative search after snapshot restore (#615)	9.2 ms (whole-repo scan, `recall_complete=false`)	0.5–0.9 ms (`recall_complete=true`)	~10–18×
Symbol lookup with a complete index (#613)	~6 ms/call	50–100 ns	~60,000×
Zero-hit queries, 20k-file corpus, `CODEDB_TRIGRAM_CAP` uncapped (#615)	7.1 ms	1.4 ms	~5× (opt-in: +110 MB peak RSS, +300 ms index time)

Per-query micro-benchmarks (codedb repo, c_allocator, min-of-N, uncached — the benchmark pins CODEDB_NO_SEARCH_CACHE=1 so rows stay comparable across versions):

Query	0.2.5824	0.2.5825	Speedup
`middleware`	88 µs	10.2 µs	8.6×
`database`	65 µs	7.4 µs	8.8×
`error`	107 µs	19.6 µs	5.5×
`authentication`	50 µs (mid-cycle)	28.7 µs	1.7×
`error` (cache hit)	20.7 µs	2.0 µs	10×

How: line-offset cache instead of per-query line rescans, doc_id-grouped postings with a contiguous-run fast path (per-hit work drops to a doc_id compare), packed-u64-key sorts (no string compares or 40-byte struct moves inside the sort), rare-byte SIMD scan anchors (stop verifying authentication at every a), direct-address doc slots, symbol-length bitmasks that skip whole files, init-time path classification (was ~10 path tokenizations per path per rerank), memoized per-path rerank facts, and one outline fetch per candidate.

Memory & load path

Path	Before	After
Snapshot fast-load, openclaw 13,654 files (#564)	60 ms	40 ms (−33%)
Pass C heap during load (#564)	+62.5 MB	+20.5 MB (−67%)
One-shot search physical footprint (#564)	132.7 MB	89.2 MB (−33%)
Max RSS, one-shot search (#564)	244 MB	200 MB
`codedb <dir> status` (#553)	full index materialized — a multi-GB resident process that never exited	metadata-only (reported by @lekt9 🙏)
Background warmup steady-state cost (#613)	—	~70 ms one-time background CPU, +4.4 MB RSS (caches hard-capped at 4 MB each)

The production numbers that drove it

A 2,467-call production query log showed: search p90 30 ms with occasional 2-second outliers, codedb_find median 4.5 ms / p90 17.7 ms, and 62% of calls being exact repeats of an earlier (tool, query) pair. All three tails are addressed: the p90/outliers traced to the #615 scan-set bug plus the 50 ms–2 s word-index rebuild that used to land on an innocent first query (now pre-paid by the warmup thread), the codedb_find tail was the O(files × symbols) safety scan (now gated), and the repeats now hit microsecond caches.

🔥 The big one: tier-3 scan-set reconciliation (#615)

Snapshot restore parks every file in skip_trigram_files (it can't know what the disk trigram index covers), and two compounding failures meant the set never emptied on the standard serve/mcp/cli-daemon startup path:

Nothing pruned the set when the disk trigram index was later mmap-loaded.
The snapshot freshness pass reindexes changed files into the heap trigram before the disk-load gate runs — and that gate early-returned on any heap entry. One dirty file blocked the disk trigram load for the whole repo.

Net effect: tier 3 content-scanned the entire project on every fall-through query, with recall_complete=false. Measured live: 613/616 files in the scan set. After the fix: 0.

All trigram replacement now funnels through adoptTrigramIndex / adoptTrigramBase (swap, bump the search generation, prune the skip set), and the mmap load keeps freshness-reindexed files as a masking overlay so their newer content wins over stale base entries.

⚡ Result caches + background warmup (#613)

Whole-query result LRUs for searchContent, renderPlainSearch (MCP fast path), and the BM25 ranked path — 64 entries / 4 MB each, validated against both the search generation and a fingerprint of the nine ranking kill-switch env vars. CODEDB_NO_SEARCH_CACHE=1 disables.
Background warmup: serve/mcp/cli-daemon build + persist the word index off the query path and replay the most-repeated queries from queries.log — 62% of production calls are exact repeats of an earlier (tool, query) pair, so the caches are warm before your first real call. CODEDB_NO_WARMUP=1 disables; skipped under CODEDB_LOW_MEMORY.
Race fix: generation bumps moved inside the exclusive lock — a concurrent search can no longer cache pre-mutation results under the post-mutation generation.

🧠 Ranking: query-specific graph signals (#550, #546, #554)

Call-graph distance (#608) — files near the matched symbols in the resolved call graph get a query-specific boost (CODEDB_NO_GRAPH_DISTANCE opts out).
Git co-change (#609) — a bounded history pass (500 commits, ≤32-file commits, top-8 partners) boosts files that historically change together (CODEDB_NO_COCHANGE opts out).
Negative lexical file-frequency penalty (#554) — mention-everywhere terms stop dragging hub files up.
Multi-word CLI search is ranked end-to-end (#546) — incl. the first cold run; tooling paths (bench/scripts/website/install) rank below src implementation (#557), basename test files get the test penalty (#580), and mention-dense tooling files can't saturate past the path prior (#598).

🆕 Features

codedb_callpath — shortest resolved call chain between two symbols, each hop as path:name@line (#531).
PageRank graph centrality in ranked search (replaces in-degree; CODEDB_IN_DEGREE_CENTRALITY reverts) (#531).
codedb_context max_tokens — value-ordered section packing under a token budget, byte-identical output without the arg (#610).
Richer codedb_symbol — kind / prefix / glob / fuzzy filters, optional source body per hit.
format=json + paths_only + path_glob on search — structured output with provenance meta, ~50% fewer tokens for broad surveys.
codedb_changes in the CLI (#578), CODEDB_TRIGRAM_CAP for big-corpus operators (#615), CODEDB_ALLOW_TEMP for CI harnesses on temp checkouts (#538).

🛡️ Correctness & hardening

Search recall after a snapshot load (#537, #539): restored files are searchable again; call-graph edges into restored files are back (#537b).
Store hardening (#597, #603): no unlocked diff writes, data-log compaction, clean failure paths.
mmap overlay (#593, #600): overlay edits mask stale base entries; writeToDisk persists merged state.
Word index (#583, #585, #606): stale postings dropped on disk load; doc_id slots reused — bounded memory in long-lived daemons.
ContentCache (#584, #596): probe-window reachability + byte budget.
OOM-safe indexing (#594), per-project flock for cli-daemon spawn (#592), comment/string-aware call-site extraction (#562, #572).
Secret filtering (#589, #572): id_ecdsa / id_dsa / *_sk FIDO keys, *.env variants, .git-credentials blocked from indexing and search.
TS/JS dependency graph (#540–#543, #548): multi-line + re-export imports, relative-path resolution, no bogus deps from strings.
A dozen CLI/tool UX fixes (#558, #560, #566, #568–#570, #573, #576, #588) — every one landed with a failing test first.

🙏 Contributors

@nsxdavid — TS/JS dependency-graph fixes (#542, #543)
@lekt9 — reported the resident-status-process leak (#553), now metadata-only
@idea404 — PR #535 (local fallback when api.wiki.codes is unreachable), under review for the next cut

Full details in the CHANGELOG.

Install

curl -fsSL https://codedb.codegraff.com/install.sh | sh

or npx -y codedeebee mcp

Platform	Asset	Signed
macOS ARM64 (Apple Silicon)	`codedb-darwin-arm64`	✅ codesigned + notarized
macOS x86_64 (Intel)	`codedb-darwin-x86_64`	temporarily unsigned (#504)
Linux ARM64	`codedb-linux-arm64`	—
Linux x86_64	`codedb-linux-x86_64`	—

Contributors

nsxdavid, lekt9, and idea404

Assets 7

04 Jun 16:46

justrach

v0.2.5824

e199484

codedb 0.2.5824

A deterministic code-graph layer, a ~3× faster cold path, and a warm CLI — plus a batch of correctness fixes from a great community audit.

⚡ Performance

Snapshot load ~3× — 380 → 125 ms on ~39k files; peak RSS 795 → 457 MB (−338 MB). mmap'd content section, borrowed strings, zero-copy ContentCache, parallel freshness check, no re-hashing on load. (#524)
Cold index: RSS 4.3 GB → 1 GB, wall-time ~6.5× — worker-local parallel scan. (#519)
Parallel WordIndex build — cold index ~1.49× + leaner ranked search. (#520)
Warm CLI daemon: 13–114× per call — codedb <repo> <query> auto-spawns/reuses a per-project warm daemon over a Unix socket instead of cold-reindexing. (#525) — answers @ahndohun's ask in #518 to keep the snapshot warm across CLI calls.
Faster fuzzy find — SIMD Smith-Waterman (~1.8×, retrieval-identical) + a ~22× compound-identifier fast path. (#526)

🆕 Features

Code-graph layer + graph-aware ranking (+15% MRR, 0.819 → 0.944) — a no-LLM resolved call graph, persisted in the snapshot; centrality folded into ranking, zero recall loss. (#523, #524)
Edge-aware codedb_context — now lists callers and callees. (#524)
ReScript .res / .resi support — let/type/module/external/open, decorators stripped. (#533) — requested by @yousafsabir (#532).
Windsurf + Devin auto-registration — direct JSON writes from the installer. (#521, #522)
CLI hardening — robust arg parsing/validation, correct exit codes, new codedb status, globally-honored --no-telemetry. (#529)

🐛 Correctness & fixes

Non-ASCII identifiers (e.g. Korean) now indexed by codedb_outline / codedb_symbol. (#524) — thanks @ahndohun (#518)
codedb_find score floor — non-matching queries return "no match" instead of confident bogus hits. (#524) — thanks @ahndohun (#518)
Python class is labeled class, not struct_def. (#524) — thanks @ahndohun (#518)
Snapshot writer u16 name-length overflow that could panic on very long identifiers — fixed. (#525)
Secret-filter drift guard + per-session edit locks from the #528 capability audit, with a runtime lock test. (#530)

🙏 Thanks

@ahndohun — a thorough correctness/UX audit (#518): non-ASCII identifiers, the find score floor, Python class kind, and the warm-CLI-daemon ask.
@yousafsabir — the ReScript language request (#532).
@eramax — the opencode-subagent report (#516), which prompted verifying subagent MCP access and the CLI fallback path.

Install / update

codedb update
# or
curl -fsSL https://codedb.codegraff.com/install.sh | bash

macOS (codesigned + notarized) and Linux x86_64 / arm64. SHA256 checksums included.

Full details in CHANGELOG.md.

Contributors

eramax, ahndohun, and yousafsabir

Assets 7

29 May 07:26

justrach

v0.2.5823

6dcf72d

codedb 0.2.5823

0.2.5823 is an MCP compatibility hotfix for direct tools/call requests.
It ships the issue #512 fix and adds a wire-level stdio backtest so future
releases catch this exact client-wrapper failure mode.

MCP direct tool-call compatibility

#512 — direct calls no longer drop inline args when arguments is empty.
Some clients send canonical MCP params.name and params.arguments, but a
wrapper layer may also emit arguments: {} while placing the real fields
inline on params, for example {"name":"codedb_outline","arguments":{}, "path":"src/mcp.zig"}. Direct tools/call previously treated the empty
arguments object as authoritative, dispatched codedb_outline with no
path, and returned missing 'path' / received keys: [] even though the
request contained a path.
Canonical MCP behavior is preserved. Non-empty params.arguments remains
authoritative. When arguments is empty or absent, direct calls now copy
non-administrative inline fields into a clean argument map before dispatch.
A legacy params.args object is accepted only as a compatibility fallback
when canonical args are absent or empty. Malformed non-object arguments
still returns the protocol error arguments must be object.
Diagnostics now match direct calls. Missing-arg guidance no longer says
"sub-op" for direct tools/call; it explains the canonical direct shape and
separately mentions the bundled inline fallback.

Backtesting

Added test "issue-512: direct tools call accepts inline args when arguments is empty" to exercise the direct call handler.
Extended scripts/e2e_mcp_test.py with Scenario 4, which sends the malformed
direct stdio MCP request through the real server process. The fixed binary
passes 20/20 E2E checks; the pre-fix binary fails Scenario 4 with the old
missing 'path' / received keys: [] response.
A subagent also validated the change with codedb MCP available. Its MCP
snapshot was stale, so it used codedb MCP to inspect what was available and
then confirmed the current disk state plus the focused and stdio E2E tests.

Release metadata

src/release_info.zig, build.zig.zon, and npm/package.json are aligned
on 0.2.5823.
The release branch release/0.2.5823 has been merged back into main.

Deployment

GitHub release assets were rebuilt locally from release/0.2.5823 with Zig
0.16.0 and uploaded over the earlier CI-built assets.
macOS ARM64 was locally signed and its release archive was accepted by Apple
notarization.
macOS x86_64 remains unsigned by design because the build file documents a
Zig 0.16/macOS 26 crash after signing that slice.
codedeebee@0.2.5823 is published to npm with the latest tag.

Validation

zig build test -Dtest-filter=issue-512
zig build test
zig build
python3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb-release-0.2.5823
— 20/20 passed
GitHub PR bench-regression for #513 and #514: success
Local release asset download verification: all checksums passed, macOS ARM64
and x86_64 both report codedb 0.2.5823.
npm registry install verification: codedeebee@0.2.5823 installs and runs
codedb 0.2.5823.

See benchmarks/v0.2.5823-validation.md
for the release validation notes.

Assets 7

29 May 03:35

justrach

v0.2.5822

aaab3cc

codedb 0.2.5822

0.2.5822 is a hot-path performance and release-reliability follow-up to
0.2.5821. It keeps the protocol fixes from 0.2.5821, cuts the cost of
the common MCP tools, removes parser boilerplate, and fixes the remaining
Intel macOS/Rosetta release crash by leaving the x86_64 macOS artifact
unsigned until the Zig/Mach-O signing issue is resolved.

MCP hot-path performance

Pre-rendered responses for hot tools. codedb_tree, codedb_outline,
codedb_hot, codedb_deps, codedb_status, and related MCP response paths
now avoid unnecessary deep clones and intermediate buffers. The corrected
benchmark harness now runs cases from the temp corpus root, so edit/read
timings measure the intended project instead of the caller's checkout.
Lower edit latency. codedb_edit avoids extra project-root work in the
hot path and dropped from 236300 ns to 44700 ns p50 in the corrected
microbench, an 81.08% reduction.
No benchmark-critical regressions. Comparing the corrected baseline to
this release, every comparable MCP benchmark improved by more than 50%:
codedb_tree 14530 -> 6270 ns, codedb_outline 62930 -> 12820 ns,
codedb_search 33700 -> 8450 ns, codedb_deps 1620 -> 70 ns,
codedb_bundle 93040 -> 28380 ns, and codedb_snapshot
60100 -> 27750 ns.

Parser maintenance

src/explore.zig parser append cleanup. Older language parsers had many
repeated "dupe name/detail/import then append" blocks. These now route
through shared helpers that preserve the prior symbol/detail behavior while
cutting 393 net lines from src/explore.zig (83 insertions,
476 deletions). This is intentionally behavior-preserving cleanup after
the parser expansion in earlier releases.

Glob matching

#511 — brace alternatives in glob patterns. codedb_glob and all MCP
path_glob filters now support simple shell-style alternatives such as
**/*.{yaml,yml} and src/{mcp,explore}.zig. Malformed braces without a
comma continue to match literally, so existing literal-brace paths keep
working. This fixes the confusing zero-result behavior agents hit when
surveying YAML files with one glob.

macOS Intel / Rosetta

#504 — signed x86_64 macOS binaries still crashed. Local Rosetta testing
reproduced the published v0.2.5821 codedb-darwin-x86_64 crash:
--help exited 139 with no output. A fresh 0.2.5822 x86_64 build works
when unsigned, but manually applying an ad-hoc signature to that exact binary
brings back exit 139. This matches the issue thread's native-Intel finding:
the crash is triggered by codesigning Zig 0.16 x86_64-macos binaries on
macOS 26, not by codedb startup logic.
Release workaround. build.zig now makes -Dcodesign-identity opt-in and
skips codesign for x86_64-macos even if the option is provided. The release
workflow no longer passes -Dcodesign-identity for the Intel macOS matrix
entry. Apple Silicon macOS artifacts still sign with hardened runtime when
the signing identity is configured.
Docs updated to match distribution reality. README and MCP docs now state
that codedb-darwin-x86_64 is temporarily unsigned and should be verified
by SHA256 checksum. Zig version badges / requirements now say Zig 0.16.

Release metadata

src/release_info.zig, build.zig.zon, and npm/package.json are aligned
on 0.2.5822, so the native binary and codedeebee package metadata agree.

Validation

zig build test
zig build test-query -Dtest-filter="issue-511"
zig build test-mcp -Doptimize=ReleaseFast
zig build
python3 scripts/e2e_mcp_test.py --binary zig-out/bin/codedb --project /Users/blackfloofie/codedb
— 17/17 passed
Rosetta x86_64 release test:
- published signed v0.2.5821 asset: --help exit 139
- patched unsigned 0.2.5822 x86_64 build: --help exit 0,
  --version exit 0, MCP e2e 17/17 passed
- manually re-signed patched x86_64 build: --help exit 139
- patched arm64 macOS build: signed and --help exit 0
Four-subagent SWE-bench Lite smoke using codedb 0.2.5822 on non-temp
workspaces:
- pallets__flask-4992: target TOML config test passed.
- pytest-dev__pytest-5221: two target fixture-listing tests passed with
  plugin autoload disabled for the old pytest checkout.
- sympy__sympy-12454: rectangular matrix upper-triangular and Hessenberg
  target tests passed.
- psf__requests-2317: codedb navigation succeeded, but the old checkout's
  target pytest collection is blocked on Python 3.14 because stdlib cgi was
  removed; a direct smoke confirmed byte and string methods normalize to
  GET.

See benchmarks/v0.2.5822-validation.md
for the benchmark table and SWE-bench Lite smoke details.

Assets 7

28 May 15:15

justrach

v0.2.5821

2e8b668

v0.2.5821 — 7-issue triage bundle

Bundle of seven fixes from the 2026-05-28 open-issue triage. PR: #509.

Closes

#501 npm/npx distribution (codedeebee published)
#502 mcp loading_snapshot stuck + sub-issues
#503 codedb mcp <path> arg order
#504 macOS Intel x64 startup segfault (Zig !void main runtime wrapper)
#505 opencode "No MCP tools"
#506 Zed MCP timeout
#507 search misses content after snapshot rebuild
#508 codedb_remote HTTP 530 / Cloudflare 1033 actionable errors

See CHANGELOG.md for details.

Install

curl -fsSL https://codedb.codegraff.com/install.sh | bash

Or via npm/npx:

npx -y codedeebee mcp

Verification

635/635 tests pass across all 8 test binaries.
Binaries: codesigned (Developer ID Application: Rachit Pradhan) + notarized by Apple.

Assets 7

26 May 16:20

justrach

v0.2.5820

6317fe8

v0.2.5820

Version bump — identical to v0.2.5819 except the version string is 0.2.5820 so codedb update correctly sees it as newer than 0.2.58181.

See v0.2.5819 release notes for the full changelog (telemetry fix, installer hooks, Linux cross-compile fix).

Assets 7

25 May 16:53

justrach

v0.2.58181

1e8d05a

v0.2.58181 — hotfix: CWD snapshot pollution

Fixes

#496: codedb.snapshot (and full index shards) were written into the process's current working directory instead of the indexed project root. This caused ~55MB of untracked binary files to appear in git status when the MCP server's CWD differed from the indexed root. All writeSnapshotDual calls now use absolute paths.
#451: scope=true search now correctly surfaces large skip-trigram files (already fixed via #447 refactor; added verification test).
#494: Test suite OOM resolved by prior test binary split.

Housekeeping

Closed 8 stale/won't-do issues (#181–184 Windows support, #302, #196, #453, #454)
Pruned all stale branches (66+ local, 7 remote)
0 open issues remaining

Assets

macOS binaries are codesigned + Apple notarized.

Assets 12

26 May 15:51

justrach

v0.2.5819

1e8d05a

v0.2.5819

Telemetry fix

Version stamped on every event — previously version was only emitted on session_start NDJSON lines, leaving tool_call, search_breakdown, and codebase_stats events with ver=NULL in the analytics DB. Now every event carries the version field, enabling per-version byte-usage and tier-breakdown queries.

Claude Code hooks (installer)

register_hooks — curl codedb.codegraff.com/install.sh | bash now auto-registers two Claude Code hooks:
- codedb-block-legacy.sh (PreToolUse/Bash) — redirects grep, cat, find, sed, head/tail to mcp__codedb__codedb_search/read/edit/find/glob. Graceful fallback: no-op when codedb is not installed; block message says "use Bash directly" when MCP is not connected.
- codedb-warmup.sh (SessionStart) — background codedb . status to pre-warm the index on session start.
- Hooks merge cleanly with existing hooks from other tools (e.g. muonry) — no clobbering.

Build fix

Portable sigaction mask init — fixes Linux cross-compile failure where sigset_t is [16]c_ulong (not u32 like macOS). Uses std.mem.zeroes for platform-independent zero initialization.

Checksums

76bff118  codedb-darwin-arm64
3eeb34c0  codedb-darwin-x86_64
63fc9d4c  codedb-linux-x86_64
d56661a0  codedb-linux-arm64

Assets 7

25 May 09:19

justrach

v0.2.5818

75f8d89

v0.2.5818 — MCP stability + issue-44 + security + correctness fixes

TL;DR

v0.2.5818 merges the perf/search-and-snapshot-optimizations branch: MCP stability fixes (SIGPIPE, broken stdout, stale snapshots), security hardening (.env bypass, BM25 NaN safety), per-tier OTEL-style telemetry, and 8 independent test binaries replacing the monolithic test suite. Also includes the cross-platform sigemptyset() fix for Linux musl builds.

This release sits on top of v0.2.5815–5817, which shipped codedb_context (1 call replaces 3–5), reader.md (auto-prepended codebase maps), and the codedb read CLI.

What's in this release (since v0.2.5817)

Change	Impact
SIGPIPE + broken stdout handling	MCP server no longer crashes when client disconnects mid-response
Issue-44: stale snapshot content	Search now sees working tree changes after snapshot invalidation
.env-local / .env_production bypass	Sensitive-path filter now blocks all .env variants
BM25 NaN safety	Zero-length documents no longer produce NaN scores
Per-tier search telemetry	OTEL-style spans for Tier 0–5 search breakdown
Search + snapshot I/O optimizations	Benchmarked hot-path improvements
8 independent test binaries	Replaces monolithic tests.zig for faster CI
Cross-platform sigset_t fix	`sigemptyset()` instead of scalar 0 for Linux musl

Cumulative since v0.2.5815

codedb_context — task-shaped context composer, 1 MCP call replaces 3–5
reader.md — hash-stable codebase maps, −57% tool calls on narrow-symbol tasks
codedb read — CLI subcommand with path-safety guards
Tier 5 short-circuit — skip full-scan when trigram returned candidates (Suspense regex: 15.6× faster)
Trigram cap 64KB → 1MB — wider recall for large files
codedb_status — 9.4× faster with cached approxIndexSizeBytes

Eval results (QD Matrix — 8 tasks, 4 backends, 2 corpora)

codedb is Pareto-optimal: highest quality (4.65/5), lowest wall time (25.2s), best tokens-per-quality-point (3,892). Wins 5/5 quality niches in the MAP-Elites grid.

backend	quality	tokens	wall (s)	status
codedb	4.65	18,083	25.2	PARETO-OPTIMAL
fts5_trigram	4.38	17,172	36.9	PARETO-OPTIMAL
codedb_LEAN	4.33	24,474	108.0	dominated
lean-ctx	4.25	21,452	67.8	dominated

Notarization & verification

binary	notary submission	status
`codedb-darwin-arm64`	`d97d5a25-a15f-44e9-a62a-24fa2bb1ed9c`	Accepted
`codedb-darwin-x86_64`	`4e7c4943-8939-4ef1-9163-7f49a15f1780`	Accepted
`codedb-linux-x86_64`	n/a (statically linked musl)	—

Verify:

shasum -a 256 codedb-darwin-arm64
# expected: 88baed2b7e241dea2b6dd3cd0c2fd37d230346e34f6cd86be069ae8b90a79e12

Full SHA-256 list in checksums.sha256.

Full changelog: v0.2.5817…v0.2.5818

Assets 12

21 May 06:20

justrach

v0.2.5817

7fd3dd6

v0.2.5817 — reader.md + perf + security

TL;DR

v0.2.5817 ships reader.md — a hash-stable, agent-authored codebase map that codedb auto-prepends to codedb_context responses. Plus the perf + security bundle from v0.2.5816 (which never got tagged), plus three new codedb_context enhancements (inline symbol bodies, callers section, task-length gate).

Highlights vs the released v0.2.5815:

	v0.2.5815	v0.2.5817
`Suspense` regex p50	2.82 ms	0.18 ms (15.6× faster)
`useState` regex p99	16.57 ms	2.04 ms (8.1× faster)
`codedb read` CLI	absent	present (with security guards)
Sensitive-file blocking	n/a	blocked (`.env`, `id_rsa`, `.ssh/*`)
`.codedb/reader.md` support	n/a	present (auto-prepend, hash-verified)
codedb_context inline bodies	no	yes (≤6 lines for ≤3 symbols)
codedb_context callers section	no	yes (top 6 non-test execution sites)

End-to-end agent eval (Sonnet 4.6, n=3 per task) shows the v0.2.5817 binary cuts median tool calls on every task: T1 flask 5→4, T2 regex 13→7, T3 react 13→10.

New — reader.md auto-prepend

A .codedb/reader.md file (≤200 LOC of markdown, with a blake2b source_hash over up-to-20 listed source files) gets auto-prepended to every codedb_context response. When source files drift, codedb emits a "regenerate" hint; when missing, it's silent.

Lifecycle:

agent calls codedb_context
       ↓
       codedb loads .codedb/reader.md
       ↓
       blake2b(sorted source_files) == declared_hash?
       ├─ yes → prepend body with `<!-- reader.md (hash-verified): -->` markers
       ├─ no  → prepend "stale, regenerate" hint
       └─ missing/malformed → silent
       ↓
       (existing composer output follows)

The agent regenerates reader.md (≤200 LOC budget, picks ≤10 source_files, computes blake2b) when it sees the stale signal. Codedb itself never writes the file.

Security guards (all close P1 review findings):

source_files rejects absolute paths and .. traversal — no reading /etc/passwd via a hostile reader.md
source_files capped at 20 entries — no 600-entry × 8 MB DoS on every context call
loc_actual capped at 240 — no 60 KB body bloat
Golden blake2b roundtrip test locks the algorithm against std-library drift

The runtime overhead when reader.md is missing is ~0.1 ms (one stat + early return). When present and valid, recomputing the hash on every call adds another ~0.1 ms on small source_files.

Task-length gate: reader.md prepend is skipped for tasks ≤80 chars (narrow lookups where the composer's keyword extractor already pinpoints the answer). This avoids the ~5 KB body overhead on tasks that don't need orientation.

New — `codedb_context` symbol-body inline

When ## Symbol definitions has ≤3 entries, inline the first ~6 lines of each so the agent doesn't need a follow-up codedb_read:

## Symbol definitions
- before_request (function) — src/flask/sansio/scaffold.py:460
         460 |     def before_request(self, f: T_before_request) -> T_before_request:
         461 |         """Register a function to run before each request.
         462 |
         463 |         For example, this can be used to open a database connection, or
         464 |         to load the logged in user from the session.
         465 |

New — `codedb_context` callers section

For each ≤3 symbol_definition, surface up to 2 non-definition, non-test, non-import call sites with their enclosing scope:

## Callers (top non-test, non-import usages of these symbols)
- src/flask/app.py:1369: ... :attr:`before_request_funcs`
  [in preprocess_request (function, L1366-L1392)]

That's literally the execution site the agent would have followed up for — pre-resolved in the first response.

Bundled from v0.2.5816 (never tagged)

PR #484 — codedb read <path> CLI subcommand (full file, -L FROM-TO, --compact)
- P1 security: isPathSafe + watcher.isSensitivePath guards
- P2 correctness: opens project root, not cwd
PR #485 — fix(search): skip Tier 5 full-scan when trigram returned candidates
- Suspense regex query: 2.82 ms → 0.18 ms (15.6× faster)
- useState regex p99: 16.57 ms → 2.04 ms (8.1× faster)
- No recall regression — trigram filter is a sound superset
PR #487 — shootout.py codegraph backend (multi-session bench against codegraph 0.7.10)
PR #486 — ACE × codedb integration spec (design only)
PR #483 — v0.2.5815 cross-corpus bench data

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Task	main median calls	exp median calls	Δ
T1 flask "find before_request decorator" (28 chars)	5	4	−1 ✓
T2 regex "where is pattern compiled" (235 chars)	13	7	−6 ✓
T3 react "passive effects flush" (230 chars)	13	10	−3 ✓

9/9 runs across the matrix returned correct answers.

Notarization & verification

All three binaries built locally on Apple Silicon. macOS binaries signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) + hardened runtime + secure timestamp, notarized via Apple notary service.

binary	notary submission	gatekeeper
`codedb-darwin-arm64`	`576628b8-4f16-4a09-9e7b-917f51664033` — Accepted	`accepted, source=Notarized Developer ID`
`codedb-darwin-x86_64`	`5f763d62-01c2-4245-9e6c-cc37cceec996` — Accepted	`accepted, source=Notarized Developer ID`
`codedb-linux-x86_64`	n/a (statically linked, ~13 MB)	smoke-tested via emulated docker linux/amd64 — `codedb --version` + tree command both green

Verify the macOS download:

shasum -a 256 codedb-darwin-arm64
# expected: dea15a25a088f3b05d620e7a119377d09703c4e73512e35479819542c6c763c6

spctl -a -vv -t install codedb-darwin-arm64
# expected: accepted, source=Notarized Developer ID

Full SHA-256 list in checksums.sha256.

What's deferred (not blockers)

Critical-review pass from a Sonnet 4.6 sub-agent identified 11 issues. The 2 P1 (security) and 2 P2 (correctness) issues are closed in this release. P2/P3 follow-ups for the next cycle:

I04 schema_version parsed but not validated (cosmetic — only matters at format v2)
I05 reader.md not cached across calls (~0.1 ms per call; matters at scale)
I06 codedb_status doesn't surface reader.md state (small ergonomic gap)
I09 stale hint doesn't include the previous source_files list
I10 concurrent-write last-write-wins not documented
I11 cost-benefit gate for shallow workloads (partial — task-length gate handles the codedb_context side)

Full changelog: v0.2.5815…v0.2.5817

Assets 12

Releases: justrach/codedb

codedb 0.2.5825

⚡ How much faster?

Search latency

Memory & load path

The production numbers that drove it

🔥 The big one: tier-3 scan-set reconciliation (#615)

⚡ Result caches + background warmup (#613)

🧠 Ranking: query-specific graph signals (#550, #546, #554)

🆕 Features

🛡️ Correctness & hardening

🙏 Contributors

Install

Contributors

Uh oh!

codedb 0.2.5824

⚡ Performance

🆕 Features

🐛 Correctness & fixes

🙏 Thanks

Install / update

Contributors

Uh oh!

codedb 0.2.5823

MCP direct tool-call compatibility

Backtesting

Release metadata

Deployment

Validation

Uh oh!

codedb 0.2.5822

MCP hot-path performance

Parser maintenance

Glob matching

macOS Intel / Rosetta

Release metadata

Validation

Uh oh!

v0.2.5821 — 7-issue triage bundle

Closes

Install

Verification

Uh oh!

v0.2.5820

v0.2.5820

Uh oh!

v0.2.58181 — hotfix: CWD snapshot pollution

Fixes

Housekeeping

Assets

Uh oh!

v0.2.5819

v0.2.5819

Telemetry fix

Claude Code hooks (installer)

Build fix

Checksums

Uh oh!

v0.2.5818 — MCP stability + issue-44 + security + correctness fixes

TL;DR

What's in this release (since v0.2.5817)

Cumulative since v0.2.5815

Eval results (QD Matrix — 8 tasks, 4 backends, 2 corpora)

Notarization & verification

Uh oh!

v0.2.5817 — reader.md + perf + security

TL;DR

New — reader.md auto-prepend

New — codedb_context symbol-body inline

New — codedb_context callers section

Bundled from v0.2.5816 (never tagged)

Measured impact (Sonnet 4.6 sub-agents, n=3 each, vs v0.2.5815)

Notarization & verification

What's deferred (not blockers)

Uh oh!

New — `codedb_context` symbol-body inline

New — `codedb_context` callers section