Skill audit: freshness, pagination, graph-coverage honesty + vec_cache by denfry · Pull Request #6 · denfry/codebase-index

denfry · 2026-06-08T04:28:39Z

Summary

Acts on a skill/CLI audit of codebase-index, plus folds in two in-progress workstreams (embedding cache + release-sync). Each change ships with tests; full suite passes at 80.9% coverage and mypy src/codebase_index is clean.

Audit fixes

fix(skill) (eccfcac): explain now honors the index-freshness contract (passes root/config into the pipeline; blends vectors when embeddings are on). Added doctor to the cbx whitelist. Narrowed the skill allowed-tools from Bash(python *) to Bash(python -m codebase_index *). Documented intent/mode/pagination, --mode vector, and graph --open (human-only).
feat(search) (ace0f9b): expose --offset so the CLI/skill can actually page — the pipeline & MCP already supported it, but the CLI never surfaced the flag (every call silently returned page one). Markdown notes when more results exist.
feat(graph) (3d86326): refs/impact now report a coverage block. Import/inheritance edges are only extracted for Tier-A languages, so an empty/short result for a Tier-B language (e.g. Lua) is inconclusive, not authoritative — coverage.partial flags it so agents fall back to Grep.
feat(diagnostics) (9262ced): stats tags each language graph: full|partial; doctor adds a graph_coverage finding. Same honesty signal, surfaced repo-wide and upfront.

Folded-in workstreams

feat(embeddings) (19df9d4): content-addressed vec_cache keyed by (model, content_sha) — chunk ids churn on every rebuild, so this stops re-embedding unchanged content (only cache misses hit the backend).
chore(release) (e92afe4): stamp installed skill .skill_version copies to 1.2.2.
fix(types) (e1c0850): assert the _PARSE_CONFIG worker-global invariant to restore a clean mypy (preexisting latent error, unmasked by cache invalidation).

Notes

The skill contract (SKILL.md) is kept byte-identical across the authored copy, the packaged template, the three installed copies, and the plugin copy (parity tests enforce this).
Goldens for refs/impact/stats were regenerated deliberately (new coverage/graph fields; Tier-A = full/non-partial).
Preexisting ruff format drift in a few files (cli.py, pipeline.py, repo.py, doctor.py, markdown.py) was left untouched to keep the diff focused; new code in this PR is format-clean.

Known failing test (preexisting, unrelated)

tests/test_bootstrap.py::test_cold_run_installs_from_requirements_lock fails on Windows due to a path-separator assertion (root/requirements.lock vs root\). Present on main; not touched by this PR.

🤖 Generated with Claude Code

The 1.2.2 version bump (e8714a2) left .claude-plugin/plugin.json at 1.2.1 and requirements.lock pinned to the v1.2.0 release tarball, breaking the version-consistency contract enforced by test_plugin_manifest.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ools explain() now passes root/config into the retrieval pipeline so the index freshness block (stale / files_changed_since_build) is real instead of a hardcoded "fresh" fallback — the skill freshness check silently never fired for "how does X work" questions. explain also blends vectors when embeddings are enabled, matching search --mode hybrid. Skill (all targets: claude/codex/opencode + plugin skills/ + bin/ wrappers): - add `doctor` to the cbx whitelist (the skill fallback already invokes it) - narrow allowed-tools python to `-m codebase_index` so the skill cannot run arbitrary Python - document the --mode vector path, the intent/mode/pagination response fields, and clarify graph --open is a human-facing HTML view (use impact/refs for agent-readable dependency answers) Regenerated the three installed skill copies via skill-update so they match the authored skill/ and wheel-bundled skill_template/ sources. Tests: regression test that explain reports staleness after an edit; update the packaging whitelist assertion for the new `doctor` entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The retrieval pipeline and MCP server already supported result paging (offset -> pagination.next_offset), but the CLI `search` command never surfaced an --offset flag. Every invocation silently returned page one and the advertised `pagination.next_offset` was a dead end for the skill. - Add `--offset` to `search` (rejects negative values with exit 2). - Surface a "more available — --offset N" note in markdown output. - Update SKILL.md (authored + packaged template + the three installed copies) to document paging via --offset, replacing the stale "CLI search is single-page" guidance. - Regression tests at the CLI layer for paging and negative-offset rejection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Import/inheritance edges are only extracted for the hand-tuned (Tier-A) languages. A symbol or file in a Tier-B language (generic tree-sitter walk, e.g. Lua) yields symbols and best-effort call sites but no import/extends/ implements edges, so `refs`/`impact` can silently undercount — an empty result read as "nothing references this" is a footgun for an agent. - Add a `GraphCoverage` model (`partial`, `languages`, `reason`) and attach it to RefsResponse / ImpactResponse. `for_paths` classifies a symbol/target's defining language(s): Tier-B (tree-sitter routed, no LangSpec) -> partial. - refs_lookup judges coverage by the symbol's definition language; impact_lookup by the resolved target file('s) language. - Markdown output prints a "Partial graph coverage" warning (including on the empty-result path, where it matters most). - Document the `coverage` field in SKILL.md (authored + packaged template + the four installed/plugin copies); regenerate the refs/impact goldens (Tier-A Python = partial:false). - Regression test over a mixed Lua/Python repo asserts partial for the Tier-B symbol/file and full coverage for the Tier-A one. Also syncs the plugin skill copy with the prior --offset doc change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Extends the Tier-B graph-coverage honesty from per-query (refs/impact `coverage`) to a repo-wide, upfront signal. - `stats`: each tree-sitter language now carries `graph: full|partial` (`full` = Tier-A spec with import/inheritance edges; `partial` = Tier-B, symbols only). Human output appends "· partial graph (Tier-B)". - `doctor`: new informational `graph_coverage` finding listing Tier-B languages present in the index, so refs/impact undercounting is visible during diagnostics rather than only when an answer comes back empty. - Add `languages.has_full_graph(lang)` as the single source of truth for the Tier-A/Tier-B distinction, shared by stats and doctor. - Document the field in SKILL.md (all copies); regenerate the stats golden. - Regression tests: Tier-B (Lua) index flags partial; Tier-A-only is full. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…changed chunks Chunk ids churn on every full rebuild (replace_chunks), so a chunk-id-keyed skip alone re-embeds the entire repo each time. The embedding pass now hashes each chunk's content (sha256) and consults a `vec_cache` table keyed by (model, content_sha): only text never embedded under the active model hits the (potentially slow or paid) backend; unchanged content reuses its cached vector. - New `vec_cache` table + repo helpers (cached_embeddings, store_cached_embeddings, upsert_chunk_vector_blob); orphan vectors pruned in a single batched executemany. - `_embed_chunks` reports cache misses (vectors actually computed) as its count. - Schema/pipeline docs updated for the cache table and reuse flow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The package is at 1.2.2 (pyproject/__init__), but the installed skill `.skill_version` stamps still read 1.2.1. Sync them so the auto-update check doesn't see phantom drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`_parse_one` runs in ProcessPoolExecutor workers and reads the module global `_PARSE_CONFIG` (typed `Optional[Config]`, set by the pool initializer `_pool_init`). Passing it straight to `_parse`, which expects `Config`, tripped mypy (`Config | None` vs `Config`). The global is always set before any worker parses, so assert that invariant — documents the contract and satisfies the type checker. Restores a clean `mypy src/codebase_index`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

denfry and others added 8 commits June 7, 2026 16:33

chore(release): stamp installed skill copies to 1.2.2

e92afe4

The package is at 1.2.2 (pyproject/__init__), but the installed skill `.skill_version` stamps still read 1.2.1. Sync them so the auto-update check doesn't see phantom drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

denfry merged commit 9213506 into main Jun 8, 2026
1 of 2 checks passed

denfry deleted the fix/skill-audit-contract branch June 8, 2026 04:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skill audit: freshness, pagination, graph-coverage honesty + vec_cache#6

Skill audit: freshness, pagination, graph-coverage honesty + vec_cache#6
denfry merged 8 commits into
mainfrom
fix/skill-audit-contract

denfry commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

denfry commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Audit fixes

Folded-in workstreams

Notes

Known failing test (preexisting, unrelated)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

denfry commented Jun 8, 2026 •

edited

Loading