Skip to content

fix(analyze): re-dispatch a plugin's files when its tag schema moves (clarion-e12d424f1d)#71

Merged
tachyon-beep merged 7 commits into
mainfrom
fix/incremental-reanalyze-ontology-marker
Jun 25, 2026
Merged

fix(analyze): re-dispatch a plugin's files when its tag schema moves (clarion-e12d424f1d)#71
tachyon-beep merged 7 commits into
mainfrom
fix/incremental-reanalyze-ontology-marker

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Bug (clarion-e12d424f1d, P1)

Incremental analyze skipped a file purely on its whole-file content hash, with
no plugin/ontology-version component, and the plugin-declared
ontology_version was never persisted (handshake-only). entity_tags rows
are rewritten (DELETE+INSERT) only when a file is dispatched.

So after an operator upgrades a plugin whose emitted vocabulary changed (e.g. the
ADR-053/054 reachability-root tags), every unchanged file is silently skipped
and keeps its pre-upgrade rows — which carry no root tags. The MCP dead-code
survey's per-plugin honest-empty guard (plugins_with_root_tags) is
all-or-nothing: it fires only when a plugin emits zero roots index-wide. On
the realistic upgrade + small edit path the one changed file emits a few roots,
so the guard disengages and the unchanged public surface plus its transitive
callees are reported as confidently-wrong dead-code — the exact
false-positive-dead asymmetry the feature exists to prevent. Bites in the default
RootsMode::Explicit.

Fix (the ticket's preferred option)

Persist a per-plugin (version, ontology_version) marker
(plugin_index_meta, migration 0011) and force a full re-dispatch of a
plugin's files when either component moves (or no marker exists) — even on an
incremental run.

  • In-memory partition override only — the stored per-file hashes (keyed on
    the core file entity, not per language plugin) are never mutated.
  • Marker is rewritten in the same transaction as the prior-index snapshot,
    and only on a fully-successful run, so it can never advance past an index
    the new vocabulary was actually written into (no marker-vs-tags skew on crash).
  • Keys on the pair because ontology_version is bumped-but-not-gate-enforced
    for the Rust plugin; [plugin].version bumps on any release and is the
    conservative backstop.
  • An index with no marker (first run after this ships) re-dispatches once,
    healing any skew an earlier silent upgrade already left.
  • The MCP guard is left untouched: it structurally cannot detect partial
    coverage; this fix restores the index consistency its all-or-nothing assumption
    depends on. Heals both Explicit and Auto roots modes (the fix is upstream
    of the survey).

Tests

Integration test (analyze_ontology_bump_forces_full_reanalysis) bumps
ontology_version (then version) between byte-identical runs and asserts a full
re-analyse, and that the force-full is a one-shot per marker change (marker
persisted, byte-hash skip re-engages — guards against permanently disabling
incremental analysis). Storage round-trip + atomic-commit + per-plugin-upsert
unit tests + a migration-0011 table-shape test.

Test boundary (reasoned): the fixture plugin can't emit tags, so the tests
verify the root cause (re-dispatch on marker move) not the harm directly. The
harm follows by a verified chain: re-dispatch → entity_tags DELETE+INSERT
(writer.rs:750) → fresh root tags → survey correct.

Verification

  • Full Rust CI floor green: fmt, clippy -D warnings, build, nextest 1902
    passed
    , doc -D warnings, cargo deny.
  • All 7 scripts/check-*.py lockstep gates pass, incl. the active
    migration-retirement guard (0011 additive, 0001 untouched).

🤖 Generated with Claude Code

tachyon-beep and others added 7 commits June 24, 2026 11:39
Add docs/operator/language-support.md — a side-by-side of what each language
plugin extracts and tags: entity kinds, structural + relation edge kinds,
categorisation/reachability-root tags, resolver backend, Wardline-awareness, and
which tools work per language. Makes explicit that Python emits dead-code
reachability roots (incl. the no-__all__ `public-surface` heuristic, ADR-053)
while the Rust plugin emits no categorisation tags today, so `entity_dead_list`
is signal-unavailable on a pure-Rust index (tracked in clarion-05fdd0490e).

- rust-known-limitations.md: enrich the pure-Rust dead-code section with the
  Python contrast + the Rust root model, fix the stale ticket ref
  (e1899a109f → 05fdd0490e), link the matrix.
- getting-started.md + operator/README.md: link the new matrix (and
  rust-known-limitations from the index).
- roadmap.md: point the Rust categorisation-tag item at the superseding ticket.
- CLAUDE.md: note the plugins differ in coverage; point at the matrix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gold complete

- PDR-0003: Now bet = extract loomweave-llm from loomweave-core (clarion-141e9c08c8)
- PDR-0004: accept the 1.1.0 / Rust-plugin-gold bet as complete (PDR-0002 gate satisfied)
- roadmap: promote extraction Next->Now; bank shipped 1.1-1.3 work out of horizons
- metrics: collision-families 4->0 TARGET MET; add trust-surface guardrail; tools/list NEEDS RE-CHECK
- vision: grant Last reviewed 2026-06-24 (confirmed unchanged)
- dispatch artifacts: PRD-0001 + docs/plans/2026-06-24-loomweave-llm-extraction.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Rust plugin emitted zero categorisation tags, so dead-code analysis was
signal-unavailable on a pure-Rust index (clarion-05fdd0490e). Increment 1 of
ADR-054 derives the root vocabulary from Rust's explicit semantics:

- exported-api: unrestricted `pub` whose whole module chain is `pub`, in lib
  targets (pub(crate)/restricted excluded; bin targets suppressed via the
  `@bin(...)` module-path discriminator; `#[macro_export]` for macros)
- entry-point: `fn main`, `#[tokio::main]`/runtime-entry attrs, FFI
  `#[no_mangle]`/`#[export_name]`
- test: `#[test]`/`#[bench]`, items under `#[cfg(test)]`
- allow-dead-code: `#[allow(dead_code)]`/`#[expect(dead_code)]` keep-signal
  (new DEAD_CODE_ROOT_TAGS entry)

Engine: the no-roots envelope + LOW-confidence advisory are now language-aware
(a Rust corpus gets Rust levers, never `__all__`); modules are excluded from
dead-code candidacy (DEAD_CODE_CONTAINER_KINDS — the containment spine is never
"dead", which the dogfood showed would otherwise dominate the candidate set).

Ontology bump 0.5.0 -> 0.6.0 (plugin.toml + wheel-data copy + serve.rs + doc).
TDD throughout; dogfooded on a lib+bin crate (genuine orphan only, moderate
confidence; find_entry_points/find_dead_code light up for Rust; exclusion lifts).

Deferred to increment 2: framework-attribute handlers, `pub use` re-export
resolution, `pub`-method rooting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…increment 2)

Increment 2 of ADR-054, informed by a 6-agent framework-attribute taxonomy
sweep (precision-first, collision-aware):

- http-route (+framework-handler) — actix/ntex/rocket route attrs (get/post/
  put/patch/delete/head/options/trace/connect/route, last-segment match)
- cli-command (+framework-handler) — clap/structopt derives (Parser/Subcommand/
  Args/ValueEnum/StructOpt, derive-list match)
- entry-point — pyo3 FFI host exports (pyfunction/pyfn/pyclass/pymodule) and
  proc-macro entry points (proc_macro/_derive/_attribute)
- test — std-replacement runners (rstest/test_case/quickcheck)
- pub-method rooting — a pub method of an inherent impl whose module chain is
  pub (lib) is exported-api; trait-impl methods stay unrooted (inherited vis)

Correctness fact the survey caught + verified against shortcuts.rs:
framework-handler is in DEAD_CODE_EXCLUDED_TAGS, NOT _ROOT_TAGS — it excludes
the tagged entity but does not root its callees, so http-route/cli-command are
the real roots and framework-handler rides as the self-exclusion companion
(mirroring Python). FFI exports map to entry-point (a real root) so their
callees are traversed. The catastrophic typetag::serde collision is avoided
(no bare `serde` last-segment match; CLI is derive-gated).

Also excludes `impl` from dead-code candidacy (DEAD_CODE_CONTAINER_KINDS, with
`module`) — an impl block is a container, never actionable "dead code".

Ontology bump 0.6.0 -> 0.7.0 (4 locations + wheel copy). TDD throughout;
dogfooded end-to-end (every framework attribute tags correctly via the real
plugin->host->store pipeline). Floor green (nextest 1894).

Still deferred: pub use re-export resolution, trait-impl-method rooting, and
lower-prevalence frameworks (wasm/napi/uniffi/cxx, poem/salvo handlers, rocket
catch, tarpc/jsonrpsee, argh/gumdrop). Builder-pattern frameworks (axum/warp/
tonic) are a permanent parse-only limitation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Five confirmed findings from the multi-dimension review of increments 1+2:

1. [MAJOR, bug] #[macro_export] is chain-INDEPENDENT — it lifts a macro to the
   crate root regardless of module privacy. The exported-api gate wrongly
   required ancestors_all_pub, so the standard `mod macros { #[macro_export]
   macro_rules! … }` idiom read as dead (under-rooting — the trust-killing
   direction). Fixed via TagCtx::with_export_chain_satisfied(), applied only to
   exported macros; dogfooded end-to-end.
2. [MAJOR, test] the serde catastrophe guard never fed an actual #[serde(...)] —
   half-vacuous. Now pins that a real #[serde(rename_all=…)] yields no root tag.
3. [MAJOR, test] #[export_name] FFI entry-point had zero coverage (only
   #[no_mangle] was tested) — added.
4. [MAJOR, docs] language-support.md intro omitted http-route/cli-command —
   now lists all six emitted tags.
5. [MINOR, docs] pub-method-rooting line lacked the lib/bin qualifier — added.

Floor green (nextest 1897).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Analysed the loomweave workspace (181 files, 4231 entities): entity_dead_list
is available + plausible (357 dead / 2384 ≈ 15%, moderate confidence). Trait-
method deferral vindicated (only 8/357 dead are impl methods). The candidate
set is dominated by private const/struct/enum reached via uncaptured value-
references — a pre-existing reference-extraction + reachability gap surfaced
(not caused) by making Rust analysable, tracked as clarion-a325bab42f. Deferred
root extensions tracked as clarion-cbbd971eb1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(clarion-e12d424f1d)

The incremental `analyze` fast path skipped a file purely on its whole-file
content hash, with no plugin/ontology-version component, and the plugin-declared
`ontology_version` was never persisted. So after an operator upgraded a plugin
whose emitted vocabulary changed (e.g. the ADR-053/054 reachability-root tags),
every UNCHANGED file was silently skipped and kept its pre-upgrade `entity_tags`
rows — which carry no root tags. The MCP dead-code survey's per-plugin
honest-empty guard is all-or-nothing (a plugin with >=1 root tag index-wide is
surveyed in full), so the moment one re-edited file re-tags, the guard
disengages and the unchanged public surface plus its transitive callees are
reported as confidently-wrong dead-code — the exact false-positive-dead the
feature exists to prevent.

Fix (the ticket's preferred option): persist a per-plugin
`(version, ontology_version)` marker (`plugin_index_meta`, migration 0011) and,
on the next `analyze`, force a full re-dispatch of that plugin's files when
either component moved (or no marker exists yet) — even on an incremental run.
The marker is rewritten in the SAME transaction as the prior-index snapshot and
only on a fully-successful run, so it can never advance past an index the new
vocabulary was actually written into. The partition override is in-memory only
(the stored per-file hashes — keyed on the core `file` entity, not per language
plugin — are never mutated). An index with no marker re-dispatches once, healing
any skew an earlier silent upgrade left behind. The comparison keys on the pair
because `ontology_version` is not gate-enforced for every plugin; `version`
bumps on any release and is the conservative backstop.

The MCP guard is left untouched: it structurally cannot detect partial coverage,
and this fix restores the index consistency its all-or-nothing assumption needs.

Tests: integration test bumps ontology_version (then version) between
byte-identical runs and asserts a full re-analyse, plus that the force-full is a
one-shot per marker change (the marker is persisted; incremental skip
re-engages). Storage round-trip + atomic-commit + per-plugin-upsert unit tests.
ADR-054 records the new invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tachyon-beep tachyon-beep merged commit 3d398e4 into main Jun 25, 2026
4 checks passed
@tachyon-beep tachyon-beep deleted the fix/incremental-reanalyze-ontology-marker branch June 25, 2026 10:18

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e975c97af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

&mut child,
root_tags(
&m.sig.ident.to_string(),
is_unrestricted_pub(&m.vis),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require receiver visibility before rooting methods

In a library module with struct S; impl S { pub fn unused(&self) {} }, this passes true because the method itself is pub, while ctx only checks the module chain and bin target. That emits exported-api for methods on private or pub(crate) receiver types even though they are not externally callable; since find_dead_code treats exported-api as a reachability root, such private unused methods and their callees get excluded from the dead-code survey. Please also require the implemented type to be exported before passing is_public here.

Useful? React with 👍 / 👎.

pub fn for_file(module_path: &str) -> Self {
Self {
ancestors_all_pub: true, // the crate root is the public boundary
under_cfg_test: false,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Carry out-of-line module visibility into tag context

For an out-of-line module declared as mod internal;, this initializes the file with ancestors_all_pub = true even though the declaring module is private. A common layout like src/lib.rs containing mod internal; and src/internal.rs containing pub fn helper() {} will therefore tag helper as exported-api; find_dead_code treats that tag as a root, so unused private-module helpers and their callees are hidden from the survey. Please derive the file-root context from the declaring mod visibility chain before emitting exported-api tags.

Useful? React with 👍 / 👎.

}
// entry-point: a bare module-level `fn main` (fns only), OR an entry
// attribute (runtime entry / FFI host export / proc-macro) on any item.
if (is_fn && ctx.at_file_top && name == "main") || has_entry_attr(attrs) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit fn main roots to actual crate/bin roots

Because ctx.at_file_top is true for every analyzed file root, this tags any top-level function named main in src/lib.rs or src/foo.rs as an entry-point, even though only the crate root/binary target main.rs is a program entry. In those non-entry files, an otherwise unused private fn main() and everything reachable from it become live roots and disappear from find_dead_code. Please also check that the current file is an actual crate/bin root before adding this tag.

Useful? React with 👍 / 👎.

fn has_entry_attr(attrs: &[Attribute]) -> bool {
attr_last_seg_in(attrs, &["main"])
|| attr_last_seg_in(attrs, PYO3_ENTRY_ATTRS)
|| attr_is_ident_in(attrs, &["no_mangle", "export_name"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize Rust 2024 unsafe export attributes

Rust 2024 code writes these exports as #[unsafe(no_mangle)] or #[unsafe(export_name = "...")]; syn represents that as an unsafe list, not a bare no_mangle/export_name path, so this check misses the FFI entry signal. In bin targets where exported-api is suppressed, or for private FFI exports, those externally invoked functions will not get entry-point and can be reported as dead. Please inspect the nested unsafe attribute tokens as well as bare attributes.

Useful? React with 👍 / 👎.

Comment on lines +144 to +145
if has_allow_dead_code(attrs) {
tags.insert("allow-dead-code");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate inherited dead-code suppressions

When the suppression is applied at module or crate scope, such as #[allow(dead_code)] mod generated { ... } or a file-level #![allow(dead_code)], descendants do not carry that attribute themselves and modules are not tagged here. Those functions can then be reported by find_dead_code even though rustc suppresses the same dead-code lint for the whole scope. Please carry this suppression through the tag context instead of checking only the item’s own attributes.

Useful? React with 👍 / 👎.

Comment on lines +124 to +125
if ctx.under_cfg_test || has_test_attr(attrs) {
tags.insert("test");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat item-level cfg(test) as test code

For code such as #[cfg(test)] fn helper() {} or #[cfg(test)] struct Fixture;, the item is test-only but this condition only sees #[test]/#[bench] attrs or a cfg-test ancestor module. Such item-level test helpers stay untagged and can show up as app dead code, especially with app_only, even though they are not part of the application build. Please include cfg(test) on the item’s own attributes when deciding the test tag.

Useful? React with 👍 / 👎.

Comment on lines +140 to +142
if derive_last_seg_in(attrs, CLI_COMMAND_DERIVES) {
tags.insert("cli-command");
tags.insert("framework-handler");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating every Parser derive as a CLI root

A non-CLI derive named Parser (for example pest_derive::Parser on a grammar parser struct) satisfies this name-only check and gets cli-command plus framework-handler. find_dead_code then treats that private parser type as a root/excluded handler even when it is not an externally invoked CLI command, hiding genuinely unused parser types from the survey. Please restrict this to clap/structopt derives or require a corroborating CLI attribute before tagging it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant