Skip to content

Skill audit: freshness, pagination, graph-coverage honesty + vec_cache#6

Merged
denfry merged 8 commits into
mainfrom
fix/skill-audit-contract
Jun 8, 2026
Merged

Skill audit: freshness, pagination, graph-coverage honesty + vec_cache#6
denfry merged 8 commits into
mainfrom
fix/skill-audit-contract

Conversation

@denfry

@denfry denfry commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Acts on a skill/CLI audit of codebase-index, plus folds in two in-progress workstreams (embedding cache + release-sync). Each change ships with tests; full suite passes at 80.9% coverage and mypy src/codebase_index is clean.

Audit fixes

  • fix(skill) (eccfcac): explain now honors the index-freshness contract (passes root/config into the pipeline; blends vectors when embeddings are on). Added doctor to the cbx whitelist. Narrowed the skill allowed-tools from Bash(python *) to Bash(python -m codebase_index *). Documented intent/mode/pagination, --mode vector, and graph --open (human-only).
  • feat(search) (ace0f9b): expose --offset so the CLI/skill can actually page — the pipeline & MCP already supported it, but the CLI never surfaced the flag (every call silently returned page one). Markdown notes when more results exist.
  • feat(graph) (3d86326): refs/impact now report a coverage block. Import/inheritance edges are only extracted for Tier-A languages, so an empty/short result for a Tier-B language (e.g. Lua) is inconclusive, not authoritativecoverage.partial flags it so agents fall back to Grep.
  • feat(diagnostics) (9262ced): stats tags each language graph: full|partial; doctor adds a graph_coverage finding. Same honesty signal, surfaced repo-wide and upfront.

Folded-in workstreams

  • feat(embeddings) (19df9d4): content-addressed vec_cache keyed by (model, content_sha) — chunk ids churn on every rebuild, so this stops re-embedding unchanged content (only cache misses hit the backend).
  • chore(release) (e92afe4): stamp installed skill .skill_version copies to 1.2.2.
  • fix(types) (e1c0850): assert the _PARSE_CONFIG worker-global invariant to restore a clean mypy (preexisting latent error, unmasked by cache invalidation).

Notes

  • The skill contract (SKILL.md) is kept byte-identical across the authored copy, the packaged template, the three installed copies, and the plugin copy (parity tests enforce this).
  • Goldens for refs/impact/stats were regenerated deliberately (new coverage/graph fields; Tier-A = full/non-partial).
  • Preexisting ruff format drift in a few files (cli.py, pipeline.py, repo.py, doctor.py, markdown.py) was left untouched to keep the diff focused; new code in this PR is format-clean.

Known failing test (preexisting, unrelated)

  • tests/test_bootstrap.py::test_cold_run_installs_from_requirements_lock fails on Windows due to a path-separator assertion (root/requirements.lock vs root\). Present on main; not touched by this PR.

🤖 Generated with Claude Code

denfry and others added 8 commits June 7, 2026 16:33
The 1.2.2 version bump (e8714a2) left .claude-plugin/plugin.json at 1.2.1
and requirements.lock pinned to the v1.2.0 release tarball, breaking the
version-consistency contract enforced by test_plugin_manifest.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ools

explain() now passes root/config into the retrieval pipeline so the index
freshness block (stale / files_changed_since_build) is real instead of a
hardcoded "fresh" fallback — the skill freshness check silently never fired
for "how does X work" questions. explain also blends vectors when embeddings
are enabled, matching search --mode hybrid.

Skill (all targets: claude/codex/opencode + plugin skills/ + bin/ wrappers):
- add `doctor` to the cbx whitelist (the skill fallback already invokes it)
- narrow allowed-tools python to `-m codebase_index` so the skill cannot run
  arbitrary Python
- document the --mode vector path, the intent/mode/pagination response fields,
  and clarify graph --open is a human-facing HTML view (use impact/refs for
  agent-readable dependency answers)

Regenerated the three installed skill copies via skill-update so they match
the authored skill/ and wheel-bundled skill_template/ sources.

Tests: regression test that explain reports staleness after an edit; update
the packaging whitelist assertion for the new `doctor` entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The retrieval pipeline and MCP server already supported result paging
(offset -> pagination.next_offset), but the CLI `search` command never
surfaced an --offset flag. Every invocation silently returned page one and
the advertised `pagination.next_offset` was a dead end for the skill.

- Add `--offset` to `search` (rejects negative values with exit 2).
- Surface a "more available — --offset N" note in markdown output.
- Update SKILL.md (authored + packaged template + the three installed
  copies) to document paging via --offset, replacing the stale
  "CLI search is single-page" guidance.
- Regression tests at the CLI layer for paging and negative-offset rejection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Import/inheritance edges are only extracted for the hand-tuned (Tier-A)
languages. A symbol or file in a Tier-B language (generic tree-sitter walk,
e.g. Lua) yields symbols and best-effort call sites but no import/extends/
implements edges, so `refs`/`impact` can silently undercount — an empty
result read as "nothing references this" is a footgun for an agent.

- Add a `GraphCoverage` model (`partial`, `languages`, `reason`) and attach it
  to RefsResponse / ImpactResponse. `for_paths` classifies a symbol/target's
  defining language(s): Tier-B (tree-sitter routed, no LangSpec) -> partial.
- refs_lookup judges coverage by the symbol's definition language; impact_lookup
  by the resolved target file('s) language.
- Markdown output prints a "Partial graph coverage" warning (including on the
  empty-result path, where it matters most).
- Document the `coverage` field in SKILL.md (authored + packaged template + the
  four installed/plugin copies); regenerate the refs/impact goldens (Tier-A
  Python = partial:false).
- Regression test over a mixed Lua/Python repo asserts partial for the Tier-B
  symbol/file and full coverage for the Tier-A one.

Also syncs the plugin skill copy with the prior --offset doc change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extends the Tier-B graph-coverage honesty from per-query (refs/impact
`coverage`) to a repo-wide, upfront signal.

- `stats`: each tree-sitter language now carries `graph: full|partial`
  (`full` = Tier-A spec with import/inheritance edges; `partial` = Tier-B,
  symbols only). Human output appends "· partial graph (Tier-B)".
- `doctor`: new informational `graph_coverage` finding listing Tier-B
  languages present in the index, so refs/impact undercounting is visible
  during diagnostics rather than only when an answer comes back empty.
- Add `languages.has_full_graph(lang)` as the single source of truth for the
  Tier-A/Tier-B distinction, shared by stats and doctor.
- Document the field in SKILL.md (all copies); regenerate the stats golden.
- Regression tests: Tier-B (Lua) index flags partial; Tier-A-only is full.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…changed chunks

Chunk ids churn on every full rebuild (replace_chunks), so a chunk-id-keyed
skip alone re-embeds the entire repo each time. The embedding pass now hashes
each chunk's content (sha256) and consults a `vec_cache` table keyed by
(model, content_sha): only text never embedded under the active model hits the
(potentially slow or paid) backend; unchanged content reuses its cached vector.

- New `vec_cache` table + repo helpers (cached_embeddings, store_cached_embeddings,
  upsert_chunk_vector_blob); orphan vectors pruned in a single batched executemany.
- `_embed_chunks` reports cache misses (vectors actually computed) as its count.
- Schema/pipeline docs updated for the cache table and reuse flow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The package is at 1.2.2 (pyproject/__init__), but the installed skill
`.skill_version` stamps still read 1.2.1. Sync them so the auto-update check
doesn't see phantom drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`_parse_one` runs in ProcessPoolExecutor workers and reads the module global
`_PARSE_CONFIG` (typed `Optional[Config]`, set by the pool initializer
`_pool_init`). Passing it straight to `_parse`, which expects `Config`, tripped
mypy (`Config | None` vs `Config`). The global is always set before any worker
parses, so assert that invariant — documents the contract and satisfies the
type checker. Restores a clean `mypy src/codebase_index`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@denfry denfry merged commit 9213506 into main Jun 8, 2026
1 of 2 checks passed
@denfry denfry deleted the fix/skill-audit-contract branch June 8, 2026 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant