From 8839ce8f60e7d960e508bbc4d634080cc82ef91d Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Wed, 24 Jun 2026 11:39:33 +1000 Subject: [PATCH 1/7] docs: clarify per-plugin coverage (Python vs Rust) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add docs/operator/language-support.md — a side-by-side of what each language plugin extracts and tags: entity kinds, structural + relation edge kinds, categorisation/reachability-root tags, resolver backend, Wardline-awareness, and which tools work per language. Makes explicit that Python emits dead-code reachability roots (incl. the no-__all__ `public-surface` heuristic, ADR-053) while the Rust plugin emits no categorisation tags today, so `entity_dead_list` is signal-unavailable on a pure-Rust index (tracked in clarion-05fdd0490e). - rust-known-limitations.md: enrich the pure-Rust dead-code section with the Python contrast + the Rust root model, fix the stale ticket ref (e1899a109f → 05fdd0490e), link the matrix. - getting-started.md + operator/README.md: link the new matrix (and rust-known-limitations from the index). - roadmap.md: point the Rust categorisation-tag item at the superseding ticket. - CLAUDE.md: note the plugins differ in coverage; point at the matrix. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/operator/README.md | 5 ++ docs/operator/getting-started.md | 3 ++ docs/operator/language-support.md | 69 +++++++++++++++++++++++++ docs/operator/rust-known-limitations.md | 16 ++++-- docs/product/roadmap.md | 5 +- 5 files changed, 91 insertions(+), 7 deletions(-) create mode 100644 docs/operator/language-support.md diff --git a/docs/operator/README.md b/docs/operator/README.md index 4ecd746a..409205fb 100644 --- a/docs/operator/README.md +++ b/docs/operator/README.md @@ -5,6 +5,11 @@ Practical notes for configuring and running Loomweave. - [Getting started](./getting-started.md) — single-flow walkthrough: install, analyse a small repo, connect an MCP client, ask three questions, verify the secret-block. Target ≤15 minutes end-to-end. +- [Language support](./language-support.md) — what each language plugin (Python, + Rust) extracts and tags, side by side: entity/edge kinds, categorisation tags, + and which tools work per language (e.g. dead-code is Python-only today). +- [Rust analysis: known limitations](./rust-known-limitations.md) — what Rust + analysis does and does not resolve (macros, external edges, dead-code roots). - [OpenRouter LLM provider](./openrouter.md) — API key, model ID, attribution headers, and token-ceiling configuration. - [Coding-agent LLM providers](./coding-agent-llm-providers.md) — Codex CLI diff --git a/docs/operator/getting-started.md b/docs/operator/getting-started.md index fd71fc2c..202995ee 100644 --- a/docs/operator/getting-started.md +++ b/docs/operator/getting-started.md @@ -421,6 +421,9 @@ for the v1.0 → v2.0 trajectory. ## Where to go next +- [Language support](./language-support.md) — what each language plugin (Python, + Rust) extracts and tags, side by side: entity/edge kinds, categorisation tags, + and which tools (e.g. dead-code) work per language. - [Operator notes index](./README.md) — OpenRouter, runtime topology, secret scanning, federation contracts, coding-agent LLM providers. - [Design ladder](../loomweave/1.0/README.md) — requirements → system-design → diff --git a/docs/operator/language-support.md b/docs/operator/language-support.md new file mode 100644 index 00000000..13a77ae7 --- /dev/null +++ b/docs/operator/language-support.md @@ -0,0 +1,69 @@ +# Language support: what each plugin covers + +Loomweave's structural graph is produced by language **plugins**, one per +language, each a subprocess the core launches over JSON-RPC (ADR-002). v1.x ships +two first-party plugins. They do **not** cover the same surface — this page is the +single place to see what each emits, so a missing edge kind or an unavailable +tool reads as *expected* rather than *broken*. + +The MCP read tools, summaries, SEI identity, findings, and the pre-ingest secret +scanner are **plugin-agnostic** — they work the same regardless of which plugin +produced an entity. The differences below are entirely in what the plugins +*extract and tag*. + +## At a glance + +| Capability | Python (`loomweave-plugin-python`) | Rust (`loomweave-plugin-rust`) | +|---|---|---| +| Status | first-party, v1.0 | first-party, 1.x | +| Source backend | `pyright` (type-resolved) | `syn` (parse-only, in-project symbol table) | +| Ontology version | 0.9.0 | 0.5.0 | +| Wardline-aware | **yes** (`wardline:*` trust tags) | no | +| **Entity kinds** | `function`, `class`, `module` | `module`, `struct`, `enum`, `trait`, `function`, `impl`, `type_alias`, `const`, `static`, `macro` | +| **Structural edges** | `contains`, `calls`, `references`, `imports` | `contains`, `calls`, `references`, `imports` | +| **Relation edges** | `inherits_from`, `decorates` | `implements`, `derives` | +| Call/ref resolution tiers | `resolved` / `ambiguous` / `inferred` (pyright) | `resolved` (in-project only; external targets dropped) | +| **Categorisation / reachability-root tags** | **yes** — see below | **none today** | +| Dead-code analysis (`entity_dead_list`) | **works** | **unavailable** (no roots — see below) | +| Summaries (`entity_summary_get`) | on-demand, any entity | on-demand, any entity | + +## Categorisation & reachability-root tags + +These `entity_tags` drive the dead-code and faceted views. They are what makes +`entity_dead_list`, `entity_entry_point_list`, `entity_http_route_list`, etc. +return data. + +**Python emits:** `entry-point`, `exported-api`, `public-surface`, `test`, +`data-model`, `http-route`, `cli-command`, `framework-handler`, and the +Wardline-derived `wardline:external_boundary` / `wardline:trusted`. Notable: a +module that declares no `__all__` gets its non-underscore module-level +defs/classes tagged `public-surface` — a lower-confidence reachability root than +a declared `exported-api` (ADR-053 / clarion-4ec50f3d92), so a Python codebase is +not over-reported as dead just because it does not exhaustively declare `__all__`. + +**Rust emits none of these today.** The plugin extracts entities and edges but no +categorisation tags. Consequently `entity_dead_list` on a **pure-Rust** index is +**signal-unavailable**: the dead-code engine excludes a plugin's entities when it +emits no reachability roots (rather than false-flagging the entire crate dead). +The structural tools (`entity_find`, `entity_callers_list`, +`entity_neighborhood_get`, the edge surfaces) are unaffected. Adding the Rust +root model (visibility → `exported-api`, `fn main`/bin → `entry-point`, +`#[test]` → `test`, route/CLI attribute macros → handlers) is tracked in +**clarion-05fdd0490e**. See [rust-known-limitations.md](./rust-known-limitations.md) +for the full list of what Rust analysis does and does not resolve. + +## Mixed-language repositories + +A repo with both Python and Rust is analysed by both plugins in one pass; each +file is routed to the plugin that claims its extension. Dead-code reachability +runs over the union, so in a mixed repo Python's roots can make Python entities +reachable while Rust entities remain in the "no roots for this plugin" exclusion +until the Rust root model lands. The low-confidence dead-code advisory's lever +copy is Python-centric today (it names `__all__`); making it language-aware is +folded into clarion-05fdd0490e. + +## Other languages + +Java and TypeScript are v2.0+ scope. Because plugins are subprocesses speaking a +stable JSON-RPC contract (ADR-002) with a manifest-declared ontology (ADR-022), +a new language is an additive plugin, not a core change. diff --git a/docs/operator/rust-known-limitations.md b/docs/operator/rust-known-limitations.md index 1a846105..69302e22 100644 --- a/docs/operator/rust-known-limitations.md +++ b/docs/operator/rust-known-limitations.md @@ -126,11 +126,17 @@ override only if a real repo/analyzer trips it in practice. This is a known limitation, not an error. **Why.** The dead-code (and related categorisation) views are driven by -tags the language plugin emits; the Rust plugin does not yet emit those -categorisation tags, so a pure-Rust index has no data to populate them -(tracked as **clarion-e1899a109f**). The structural tools (`entity_find`, -`entity_callers_list`, `entity_neighborhood_get`, and the edge surfaces) are -unaffected. +reachability-root tags the language plugin emits (`exported-api`, `entry-point`, +`test`, …); the Rust plugin emits **no** categorisation tags, so a pure-Rust +index has no roots and the engine excludes its entities rather than +false-flagging the whole crate dead. The Python plugin does emit these (including +the no-`__all__` `public-surface` heuristic, ADR-053); the Rust analog — +visibility → `exported-api`, `fn main`/bin → `entry-point`, `#[test]` → `test`, +route/CLI attribute macros → handlers — is tracked as **clarion-05fdd0490e**. The +structural tools (`entity_find`, `entity_callers_list`, +`entity_neighborhood_get`, and the edge surfaces) are unaffected. For a +side-by-side of what each plugin extracts and tags, see +[language-support.md](./language-support.md). ## Unnamed `const _` items are not entities diff --git a/docs/product/roadmap.md b/docs/product/roadmap.md index 26ddc501..b311c3e6 100644 --- a/docs/product/roadmap.md +++ b/docs/product/roadmap.md @@ -55,8 +55,9 @@ count → 0. - Python entity-kind coverage beyond function/class/module — module-level consts/vars, type aliases (clarion-a0ecac062f; additive under ADR-027). -- Rust plugin categorisation-tag parity so pure-Rust dead-code analysis works - (clarion-e1899a109f). +- Rust plugin categorisation-tag parity so pure-Rust dead-code analysis works — + visibility/entry-point/test/handler reachability roots, the Rust analog of the + Python `public-surface` work (clarion-05fdd0490e; supersedes clarion-e1899a109f). - ADR-021's `plugin_limits.*` loomweave.yaml config surface (clarion-271287b54b). - `references` envelope extension: match/let pattern paths + discriminant From a7109be3dc7b580391142b0f1ab1aa80c34ec1ac Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Wed, 24 Jun 2026 15:43:41 +1000 Subject: [PATCH 2/7] product: turn over Now bet to loomweave-llm extraction; accept 1.1.0/gold complete - PDR-0003: Now bet = extract loomweave-llm from loomweave-core (clarion-141e9c08c8) - PDR-0004: accept the 1.1.0 / Rust-plugin-gold bet as complete (PDR-0002 gate satisfied) - roadmap: promote extraction Next->Now; bank shipped 1.1-1.3 work out of horizons - metrics: collision-families 4->0 TARGET MET; add trust-surface guardrail; tools/list NEEDS RE-CHECK - vision: grant Last reviewed 2026-06-24 (confirmed unchanged) - dispatch artifacts: PRD-0001 + docs/plans/2026-06-24-loomweave-llm-extraction.md Co-Authored-By: Claude Opus 4.8 (1M context) --- .../2026-06-24-loomweave-llm-extraction.md | 422 ++++++++++++++++++ docs/product/current-state.md | 90 ++-- .../0003-now-bet-loomweave-llm-extraction.md | 46 ++ .../0004-accept-1.1.0-gold-bet-complete.md | 37 ++ docs/product/metrics.md | 69 ++- .../prd/PRD-0001-loomweave-llm-extraction.md | 176 ++++++++ docs/product/roadmap.md | 101 +++-- docs/product/vision.md | 3 +- 8 files changed, 835 insertions(+), 109 deletions(-) create mode 100644 docs/plans/2026-06-24-loomweave-llm-extraction.md create mode 100644 docs/product/decisions/0003-now-bet-loomweave-llm-extraction.md create mode 100644 docs/product/decisions/0004-accept-1.1.0-gold-bet-complete.md create mode 100644 docs/product/prd/PRD-0001-loomweave-llm-extraction.md diff --git a/docs/plans/2026-06-24-loomweave-llm-extraction.md b/docs/plans/2026-06-24-loomweave-llm-extraction.md new file mode 100644 index 00000000..cfd27d69 --- /dev/null +++ b/docs/plans/2026-06-24-loomweave-llm-extraction.md @@ -0,0 +1,422 @@ +# `loomweave-llm` Crate Extraction — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Move the LLM + embedding provider code out of `loomweave-core` into a new pure-leaf crate `loomweave-llm`, so the plugin-supervisor + SEI crate (`loomweave-core`) no longer links an outbound HTTP client (`reqwest`). + +**Architecture:** Behavior-preserving lift-and-shift. The two provider modules (`llm_provider.rs`, `embedding_provider.rs`) move verbatim — including their `#[cfg(test)]` modules — into a new leaf crate that depends on **no** workspace crate. `loomweave-core` drops `reqwest`/`async-trait`/`fs2`. The two current consumers (`loomweave-cli`, `loomweave-mcp`) repoint their provider imports from `loomweave_core::` to `loomweave_llm::`. **No provider behavior changes; no per-provider split** (that is the downstream bet clarion-4328c5c757, explicitly out of scope). + +**Tech Stack:** Rust workspace (resolver 3, edition 2024, rust-version 1.88), `cargo nextest`, `cargo-deny`, clippy pedantic `-D warnings`, `unsafe_code = "deny"`. Source PRD: `docs/product/prd/PRD-0001-loomweave-llm-extraction.md`. Tracker: clarion-141e9c08c8. + +**Prerequisites:** +- Clean working tree on a feature branch (e.g. `feat/loomweave-llm-extraction`). Do **not** work on `main`. +- Toolchain already pinned via `rust-toolchain`; the Python venv at `plugins/python/.venv` exists (Python gates are unaffected by this change but run in the floor). +- Read the source PRD and this plan's **Ground truth** section before starting. + +--- + +## Ground truth (verified against the codebase 2026-06-24 — trust these, do not re-derive) + +**Files to move** (wholesale, with their test modules): +- `crates/loomweave-core/src/llm_provider.rs` (3198 LOC) +- `crates/loomweave-core/src/embedding_provider.rs` (460 LOC) + +**`reqwest` lives only in those two files** within `loomweave-core/src` (4 occurrences total). Nothing else in core links HTTP. + +**Deps after the move:** +- `loomweave-core/Cargo.toml` — REMOVE `async-trait`, `fs2`, `reqwest` (verified now-unused by remaining core code). KEEP `tempfile`, `tracing`, `serde_json`, `which`, `tokio`, `serde`, `thiserror`, `toml`, `nix`. +- New `loomweave-llm/Cargo.toml` — needs `async-trait`, `fs2`, `reqwest`, `serde`, `serde_json`, `tempfile`, `thiserror`, `tokio`, `tracing`, `which` (all already in `[workspace.dependencies]`). + +**The provider modules import zero workspace code** — only `std`, `async-trait`, `fs2`, `reqwest`, `serde`, `serde_json`, `tempfile`, `thiserror`, `tokio`, `tracing`, `which`. `embedding_provider.rs` references `crate::llm_provider` **only in doc-comments**, which stay valid because both modules move together. ⇒ `loomweave-llm` is a pure leaf; no `loomweave-core → loomweave-llm` edge, no cycle. + +**The complete set of provider symbols that move** (the current `pub use` lists in `loomweave-core/src/lib.rs` lines 17–31): +- from `embedding_provider`: `ApiEmbeddingProvider, ApiEmbeddingProviderConfig, EmbeddingProvider, EmbeddingProviderError, EmbeddingRecording, RecordingEmbeddingProvider` +- from `llm_provider`: `CachingModel, ClaudeCliProvider, ClaudeCliProviderConfig, CodexCliProvider, CodexCliProviderConfig, INFERRED_CALLS_PROMPT_VERSION, InferredCallsPromptInput, LEAF_SUMMARY_PROMPT_TEMPLATE_ID, LeafSummaryPromptInput, LlmProvider, LlmProviderError, LlmPurpose, LlmRequest, LlmResponse, OpenRouterProvider, OpenRouterProviderConfig, PromptTemplate, Recording, RecordingProvider, TrafficLoggingProvider, build_coding_agent_provider_prompt, build_inferred_calls_prompt, build_leaf_summary_prompt` + +**Symbols that STAY in `loomweave-core`** (do not touch their imports): `McpErrorCode`, `EdgeConfidence`, `HttpErrorCode`, `EntityId`, everything under `loomweave_core::{plugin, store, entity_id, errors, hardened_git}`. + +**The complete consumer import-site set** (8 edit sites across 2 crates — verified exhaustive via per-file reads + a global backstop grep): +- `loomweave-cli/src/serve.rs:9–14` · `loomweave-cli/src/analyze.rs:26–30` and `:7984` and `:8053` +- `loomweave-mcp/src/lib.rs:18–21` and `:6099` · `loomweave-mcp/src/tools/summary.rs:11–15` · `loomweave-mcp/src/tools/status.rs:9` +- `loomweave-cli/tests/serve.rs:13–16` · `loomweave-mcp/tests/storage_tools.rs:11–16` · `loomweave-mcp/tests/catalogue_tools.rs:9` + +**Confirmed NON-consumers (do NOT edit):** `loomweave-federation`, `loomweave-storage`, `loomweave-cli/src/config.rs`, `loomweave-cli/src/doctor.rs`, `loomweave-mcp/src/catalogue/semantic.rs`. The first two never construct a provider; the last three use `loomweave_federation::config::{LlmProviderKind, ProviderSelection, …}` (federation config enums) and/or concrete in-crate types — **not** the moved traits. + +**Path-dep style in this workspace:** `loomweave-llm = { path = "../loomweave-llm", version = "1.3.1" }`. + +**CI floor lives in** `.github/workflows/verify.yml`. **`scripts/check-workspace-version-lockstep.py` needs NO change** (it tracks `pyproject.toml` files, not per-crate `Cargo.toml`; `version.workspace = true` satisfies lockstep). + +--- + +## Task 1: Scaffold `loomweave-llm` and copy the provider modules in (workspace stays green) + +This task **copies** the modules so both crates compile at the commit boundary. Task 2 removes core's copies and flips consumers. Copy-then-flip keeps every commit green. + +**Files:** +- Create: `crates/loomweave-llm/Cargo.toml` +- Create: `crates/loomweave-llm/src/lib.rs` +- Create: `crates/loomweave-llm/src/llm_provider.rs` (copy of core's) +- Create: `crates/loomweave-llm/src/embedding_provider.rs` (copy of core's) +- Modify: `Cargo.toml` (root — workspace `members`) + +**Step 1: Create the crate's `Cargo.toml`** + +```toml +[package] +name = "loomweave-llm" +description = "Loomweave LLM + embedding provider traits, concrete providers (OpenRouter / Codex CLI / Claude CLI), and the outbound HTTP/CLI transport for summaries and embeddings." +version.workspace = true +edition.workspace = true +license.workspace = true +repository.workspace = true +rust-version.workspace = true + +[lints] +workspace = true + +[dependencies] +async-trait.workspace = true +fs2.workspace = true +reqwest.workspace = true +serde.workspace = true +serde_json.workspace = true +tempfile.workspace = true +thiserror.workspace = true +tokio.workspace = true +tracing.workspace = true +which.workspace = true +``` + +**Why no `[dev-dependencies]`:** the moved test modules use only `std` + crates already in `[dependencies]` (`tempfile`, `tokio`'s test macros via the workspace feature set). If `cargo nextest` later reports a missing test-only crate, add it then — but it is not expected. + +**Step 2: Copy the two module files into the new crate** + +```bash +cp crates/loomweave-core/src/llm_provider.rs crates/loomweave-llm/src/llm_provider.rs +cp crates/loomweave-core/src/embedding_provider.rs crates/loomweave-llm/src/embedding_provider.rs +``` + +Do **not** edit their contents — they are self-contained (no `crate::` code references; doc-links resolve within the new crate). + +**Step 3: Create `crates/loomweave-llm/src/lib.rs`** (mirrors core's exact re-export lists) + +```rust +//! loomweave-llm — LLM + embedding provider traits, concrete providers, and the +//! outbound HTTP/CLI transport for Loomweave summaries and embeddings. +//! +//! Extracted from `loomweave-core` (PRD-0001, clarion-141e9c08c8) so the +//! plugin-supervisor + SEI crate does not link an outbound HTTP client. + +pub mod embedding_provider; +pub mod llm_provider; + +pub use embedding_provider::{ + ApiEmbeddingProvider, ApiEmbeddingProviderConfig, EmbeddingProvider, EmbeddingProviderError, + EmbeddingRecording, RecordingEmbeddingProvider, +}; +pub use llm_provider::{ + CachingModel, ClaudeCliProvider, ClaudeCliProviderConfig, CodexCliProvider, + CodexCliProviderConfig, INFERRED_CALLS_PROMPT_VERSION, InferredCallsPromptInput, + LEAF_SUMMARY_PROMPT_TEMPLATE_ID, LeafSummaryPromptInput, LlmProvider, LlmProviderError, + LlmPurpose, LlmRequest, LlmResponse, OpenRouterProvider, OpenRouterProviderConfig, + PromptTemplate, Recording, RecordingProvider, TrafficLoggingProvider, + build_coding_agent_provider_prompt, build_inferred_calls_prompt, build_leaf_summary_prompt, +}; +``` + +**Step 4: Register the crate in the workspace** — add `"crates/loomweave-llm",` to the `members` array in the root `Cargo.toml` (keep it grouped with the other crates, e.g. right after `"crates/loomweave-core",`). + +**Step 5: Verify the new crate compiles, lints, and its tests pass in isolation** + +Run: +```bash +cargo build -p loomweave-llm +cargo clippy -p loomweave-llm --all-targets --all-features -- -D warnings +cargo nextest run -p loomweave-llm +cargo build --workspace # core still has its own copies → whole workspace still green +``` + +Expected: all green. The workspace builds because `loomweave-core` is unchanged (it still owns its copies) and `loomweave-llm` is a new, not-yet-consumed leaf. + +**Step 6: Commit** + +```bash +git add crates/loomweave-llm Cargo.toml +git commit -m "feat(loomweave-llm): scaffold crate with copied provider modules + +Pure-leaf crate holding the LLM + embedding providers, copied from +loomweave-core. Consumers are flipped and core's copies removed in the +next commit (PRD-0001, clarion-141e9c08c8). + +Co-Authored-By: Claude Opus 4.8 (1M context) " +``` + +**Definition of Done:** +- [ ] `loomweave-llm` builds, clippies clean, and its moved tests pass in isolation. +- [ ] `cargo build --workspace` still green (core unchanged). +- [ ] Committed. + +--- + +## Task 2: Flip core + consumers to `loomweave-llm`; remove the providers from core (green → green) + +This is the load-bearing transition. Its sub-steps are **not** independently compilable — the workspace goes red between Step 1 and the last edit, then green again. **Commit only at the end, when the floor passes.** + +**Files:** +- Delete: `crates/loomweave-core/src/llm_provider.rs`, `crates/loomweave-core/src/embedding_provider.rs` +- Modify: `crates/loomweave-core/src/lib.rs`, `crates/loomweave-core/Cargo.toml` +- Modify: `crates/loomweave-cli/Cargo.toml`, `crates/loomweave-mcp/Cargo.toml` +- Modify: the 8 consumer import sites listed under Ground truth + +**Step 1: Remove the modules from `loomweave-core`** + +```bash +git rm crates/loomweave-core/src/llm_provider.rs crates/loomweave-core/src/embedding_provider.rs +``` + +**Step 2: Strip the provider declarations + re-exports from `crates/loomweave-core/src/lib.rs`** + +- Delete line 9 `pub mod embedding_provider;` and line 13 `pub mod llm_provider;`. +- Delete the `pub use embedding_provider::{ … };` block (lines 17–20) and the `pub use llm_provider::{ … };` block (lines 24–31). +- **Keep** `pub use entity_id::{…}`, `pub use errors::{…}`, `pub use hardened_git::{…}`, and the whole `pub use plugin::{…}` block. + +After editing, the top of `lib.rs` reads (module list): +```rust +pub mod entity_id; +pub mod errors; +pub mod hardened_git; +pub mod plugin; +pub mod store; +``` + +**Step 3: Drop the now-unused deps from `crates/loomweave-core/Cargo.toml`** + +Remove these three lines from `[dependencies]`: +```toml +async-trait.workspace = true +fs2.workspace = true +reqwest.workspace = true +``` +Update the `description` to drop the provider mention, e.g.: +```toml +description = "Loomweave core: entity-ID assembler, sandboxed JSON-RPC plugin host, and manifest parser." +``` +(Leave `[dev-dependencies] tempfile` as-is.) + +**Step 4: Add the `loomweave-llm` dependency to the two consumer crates** + +In `crates/loomweave-cli/Cargo.toml` and `crates/loomweave-mcp/Cargo.toml`, add to `[dependencies]` (next to the existing `loomweave-core` line): +```toml +loomweave-llm = { path = "../loomweave-llm", version = "1.3.1" } +``` + +**Step 5: Rewire the 8 import sites.** Each edit below is exact (old → new). + +**5a — `loomweave-cli/src/serve.rs:9–14`** (whole block moves; change the path only): +```rust +use loomweave_llm::{ + ApiEmbeddingProvider, ApiEmbeddingProviderConfig, ClaudeCliProvider, ClaudeCliProviderConfig, + CodexCliProvider, CodexCliProviderConfig, EmbeddingProvider, EmbeddingProviderError, + LlmProvider, OpenRouterProvider, OpenRouterProviderConfig, Recording, RecordingProvider, + TrafficLoggingProvider, +}; +``` + +**5b — `loomweave-cli/src/analyze.rs:26–30`** (split out `EmbeddingProvider`): +```rust +use loomweave_core::{ + AcceptedEdge, AcceptedEntity, AnalyzeFileOutcome, CrashLoopBreaker, CrashLoopState, + DiscoveredPlugin, FINDING_DISABLED_CRASH_LOOP, HostError, HostFinding, UnresolvedCallSite, + discover, +}; +use loomweave_llm::EmbeddingProvider; +``` + +**5c — `loomweave-cli/src/analyze.rs:7984` and `:8053`** (identical lines, both inside test fns — change the path; an exact-string replace-all hits both): +```rust + use loomweave_llm::{EmbeddingProvider, EmbeddingRecording, RecordingEmbeddingProvider}; +``` + +**5d — `loomweave-cli/tests/serve.rs:13–16`** (split `LEAF_SUMMARY_PROMPT_TEMPLATE_ID` from the kept `plugin::` path): +```rust +use loomweave_core::plugin::{ContentLengthCeiling, Frame, read_frame, write_frame}; +use loomweave_llm::LEAF_SUMMARY_PROMPT_TEMPLATE_ID; +``` + +**5e — `loomweave-mcp/src/lib.rs:18–21`** (split — `EdgeConfidence`/`McpErrorCode` stay): +```rust +use loomweave_core::{EdgeConfidence, McpErrorCode}; +use loomweave_llm::{EmbeddingProvider, LlmProvider, LlmProviderError, LlmRequest, LlmResponse}; +``` + +**5f — `loomweave-mcp/src/lib.rs:6099`** (test module; whole set moves): +```rust + use loomweave_llm::{CachingModel, LlmProvider, LlmProviderError, LlmRequest, LlmResponse}; +``` + +**5g — `loomweave-mcp/src/tools/summary.rs:11–15`** (split — `EdgeConfidence`/`McpErrorCode` stay): +```rust +use loomweave_core::{EdgeConfidence, McpErrorCode}; +use loomweave_llm::{ + INFERRED_CALLS_PROMPT_VERSION, InferredCallsPromptInput, LEAF_SUMMARY_PROMPT_TEMPLATE_ID, + LeafSummaryPromptInput, LlmPurpose, LlmRequest, build_inferred_calls_prompt, + build_leaf_summary_prompt, +}; +``` + +**5h — `loomweave-mcp/src/tools/status.rs:9`** (split — `McpErrorCode` stays): +```rust +use loomweave_core::McpErrorCode; +use loomweave_llm::{LeafSummaryPromptInput, build_leaf_summary_prompt}; +``` + +**5i — `loomweave-mcp/tests/storage_tools.rs:11–16`** (whole set moves; change the path only): +```rust +use loomweave_llm::{ + CachingModel, INFERRED_CALLS_PROMPT_VERSION, InferredCallsPromptInput, + LEAF_SUMMARY_PROMPT_TEMPLATE_ID, LeafSummaryPromptInput, LlmProvider, LlmProviderError, + LlmPurpose, LlmRequest, LlmResponse, OpenRouterProvider, OpenRouterProviderConfig, Recording, + RecordingProvider, build_inferred_calls_prompt, build_leaf_summary_prompt, +}; +``` + +**5j — `loomweave-mcp/tests/catalogue_tools.rs:9`** (both move): +```rust +use loomweave_llm::{EmbeddingRecording, RecordingEmbeddingProvider}; +``` + +**Step 6: Compiler backstop — catch anything the enumerated edits missed** + +Run, in order: +```bash +# (i) No moved symbol should still be referenced via loomweave_core:: anywhere. +grep -rnE 'loomweave_core::[^;]*(LlmProvider|EmbeddingProvider|OpenRouterProvider|ApiEmbeddingProvider|CodexCliProvider|ClaudeCliProvider|TrafficLoggingProvider|RecordingEmbeddingProvider|RecordingProvider|EmbeddingRecording|CachingModel|LlmRequest|LlmResponse|LlmPurpose|LeafSummaryPromptInput|InferredCallsPromptInput|build_leaf_summary_prompt|build_inferred_calls_prompt|build_coding_agent_provider_prompt|PromptTemplate|LEAF_SUMMARY_PROMPT_TEMPLATE_ID|INFERRED_CALLS_PROMPT_VERSION)' crates/*/src crates/*/tests --include='*.rs' | grep -v 'crates/loomweave-llm/' || echo "CLEAN" +# (ii) Full workspace build (the exhaustive backstop). +cargo build --workspace --all-targets +``` +If (i) is not `CLEAN`, repoint each remaining site `loomweave_core::X` → `loomweave_llm::X`. **Watch item:** `loomweave-mcp/src/catalogue/semantic.rs` calls `state.provider.model_id()` (an `EmbeddingProvider` trait method) but does not import the trait from `loomweave_core`; it is expected to keep compiling unchanged. If (ii) reports the trait is not in scope there, add `use loomweave_llm::EmbeddingProvider;` to that file — the only anticipated surprise. + +**Step 7: Run the full CI floor** + +```bash +cargo fmt --all -- --check +cargo clippy --workspace --all-targets --all-features -- -D warnings +cargo build --workspace --bins +cargo nextest run --workspace --all-features +RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps --all-features +cargo deny check +plugins/python/.venv/bin/ruff check plugins/python +plugins/python/.venv/bin/ruff format --check plugins/python +plugins/python/.venv/bin/mypy --strict plugins/python +plugins/python/.venv/bin/pytest plugins/python +``` +Expected: all green. Then confirm the trust-surface invariant: +```bash +cargo tree -p loomweave-core --edges normal | grep -q '^reqwest' && echo "FAIL: core still links reqwest" || echo "PASS: core has no reqwest" +``` +Expected: `PASS`. + +**Step 8: Commit** + +```bash +git add -A +git commit -m "refactor(core): extract LLM/embedding providers into loomweave-llm + +loomweave-core no longer links reqwest/async-trait/fs2. The two provider +modules now live in the pure-leaf loomweave-llm crate; cli and mcp repoint +their provider imports. Behavior-preserving lift-and-shift; no per-provider +split (clarion-4328c5c757 remains separate). PRD-0001, clarion-141e9c08c8. + +Co-Authored-By: Claude Opus 4.8 (1M context) " +``` + +**Definition of Done:** +- [ ] `git rm` of both core modules; core `lib.rs` + `Cargo.toml` stripped of providers and `reqwest`/`async-trait`/`fs2`. +- [ ] `loomweave-llm` dep added to cli + mcp; all 8 import sites rewired; backstop grep CLEAN. +- [ ] Full floor green; `cargo tree -p loomweave-core` shows no `reqwest`. +- [ ] Committed. + +--- + +## Task 3: Add the trust-surface CI gate + +Make the invariant standing, not a one-time check. + +**Files:** +- Modify: `.github/workflows/verify.yml` + +**Step 1: Add a gate step to the Rust job** (place it after the build/clippy steps, before or alongside `cargo deny`): + +```yaml + - name: Trust-surface — loomweave-core must not link an HTTP client + run: | + if cargo tree -p loomweave-core --edges normal | grep -q '^reqwest'; then + echo "::error::loomweave-core links reqwest; the provider HTTP must stay in loomweave-llm (PRD-0001)" + exit 1 + fi + echo "OK: loomweave-core has no reqwest in its dependency tree" +``` + +**Step 2: Verify the step's command passes locally** + +```bash +if cargo tree -p loomweave-core --edges normal | grep -q '^reqwest'; then echo FAIL; exit 1; else echo OK; fi +``` +Expected: `OK`. + +**Step 3: Commit** + +```bash +git add .github/workflows/verify.yml +git commit -m "ci(verify): assert loomweave-core links no outbound HTTP client + +Standing trust-surface gate for PRD-0001: fails CI if reqwest re-enters +loomweave-core's dependency tree. + +Co-Authored-By: Claude Opus 4.8 (1M context) " +``` + +**Definition of Done:** +- [ ] Gate step added to `verify.yml`; command verified locally. +- [ ] Committed. + +--- + +## Task 4: Acceptance verification + close the tracker item + +No code changes — verify every PRD-0001 acceptance criterion and bank the bet. + +**Step 1: Re-run the full floor** (Task 2 Step 7) and confirm all green. + +**Step 2: Verify each acceptance criterion explicitly:** +- **Criterion 1/2 (trust-surface):** `cargo tree -p loomweave-core --edges normal | grep reqwest` → no output. +- **Criterion 3 (CI floor):** all gates green (above). +- **Criterion 4 (identity stability):** `git diff --name-only main...HEAD | grep entity_id.rs` → **empty** (entity_id.rs untouched). SEI churn: not expected — no identity code moved. If a reference corpus is handy, a before/after `loomweave analyze` should show 0 SEI churn; otherwise the untouched-`entity_id.rs` check is the proxy the PRD allows. +- **Criterion 5 (no consumer regression):** cli + mcp tests pass (covered by `cargo nextest run --workspace`); the `RecordingProvider` / `RecordingEmbeddingProvider` replay tests now run in `loomweave-llm` and pass **unchanged** (their source was not edited). +- **Criterion 6 (pure lift-and-shift):** `git diff main...HEAD -- crates/loomweave-llm/src/llm_provider.rs crates/loomweave-llm/src/embedding_provider.rs` shows **only** the file relocation (no content delta vs. the deleted core copies). Confirm with `git log --follow` / a rename-aware diff. + +**Step 3: Update the tracker** + +```bash +filigree close clarion-141e9c08c8 --actor claude +# clarion-4328c5c757 (per-provider split) is now unblocked — leave it for the next bet. +``` + +**Step 4 (product loop):** report back so `/product-checkpoint` can bank the acceptance and add the trust-surface guardrail to `metrics.md` (BASELINE `loomweave-core links reqwest: yes` → TARGET `no`, now met). + +**Definition of Done:** +- [ ] All six acceptance criteria verified with the commands above. +- [ ] `clarion-141e9c08c8` closed; `clarion-4328c5c757` noted as unblocked. +- [ ] Hand back to the product loop for checkpoint. + +--- + +## Risks & rollback + +- **Largest risk:** a missed fully-qualified `loomweave_core::` reference. Mitigated by the Task 2 Step 6 backstop grep + `cargo build --workspace --all-targets`; the compiler names any straggler exactly. +- **`semantic.rs` trait scope:** the one anticipated surprise (see Task 2 Step 6 watch item) — a one-line `use loomweave_llm::EmbeddingProvider;` fix if it surfaces. +- **Rollback:** the change is two code commits + one CI commit on a feature branch; `git revert` or branch-delete restores the prior state cleanly. No data, schema, or protocol surface is touched. + +## Validate before execution (recommended) + +This is a structural refactor touching a load-bearing crate boundary. **RECOMMENDED SUB-SKILL:** run `/review-plan docs/plans/2026-06-24-loomweave-llm-extraction.md` (reality / architecture / quality / systems reviewers) before executing. Proceed on `APPROVED` / `APPROVED_WITH_WARNINGS`; revise on `CHANGES_REQUESTED`. diff --git a/docs/product/current-state.md b/docs/product/current-state.md index 28f39ebf..7ee7ed04 100644 --- a/docs/product/current-state.md +++ b/docs/product/current-state.md @@ -1,49 +1,57 @@ # Loomweave — Current State (resume brief) -> Written at bootstrap, 2026-06-11. Next session: start here, then -> `vision.md` (grant), `roadmap.md` + `metrics.md`, then reconcile the -> tracker IDs below against Filigree. +> Refreshed at checkpoint **2026-06-24**. Next session: start here, then +> `vision.md` (grant), `roadmap.md` + `metrics.md`, then reconcile the tracker +> IDs below against Filigree. ## The bet right now -Ship the **1.1.0 release line (rc4)** with the Rust language plugin at -**gold** — the Sprint-4 closeout verdict was *not gold*; four entity-ID -collision families (self-type-path, trait-path, `#[path]`-module, `const _`) -remain as the gold blockers. Alongside: MCP-surface polish (4 of 6 audit -tickets already shipped) and incremental-analyze correctness bugs. - -## In flight (tracker is authoritative for status) - -- Nothing. The bootstrap-time WIP (clarion-7c9336163e) closed the same day: - the dormant `wardline.yaml` manifest ingest was retired (rc4 @ 1bd27b0, - retarget evaluated and rejected). clarion-f3eb3852d6 (Python deep-nesting - characterization, a roadmap Next item) also landed (d5baac5). - -## Recently landed (context, not work) - -- MCP/command-surface audit follow-ups shipped and closed: callers-honesty - (e5327dc), skill-dialect (43b7b25), token-budget (13b20bc), - finding-filter validation (7722942). -- ADR-052 duplicate-qualname first-wins semantics frozen (4cd6c4f). -- Rust plugin merged to rc4 (2380c88); sprints 1–4 (hardening, edges, - scale-QA, gold closeout) landed; ADR-049 Amendments 4–9 shipped. - -## Decided at bootstrap (2026-06-11, owner-confirmed) - -- **Authority grant CONFIRMED** as drafted (`vision.md`). -- **Gold gates 1.1.0** (PDR-0002): all 4 collision families fixed before the - cut; reversal trigger 2026-06-30. - -## Open questions - -1. **North-star TARGET date** (gold by 2026-06-30) is still a placeholder. -2. **Wardline handoff** (Amendments 4–9 corpus re-vendor) is prepared but - not pushed — outward-facing, escalation-gated. -3. **Adoption metric** — does the owner want one at all, given local-first? +**Extract `loomweave-llm` from `loomweave-core`** (clarion-141e9c08c8, PDR-0003) +— pay the head-of-critical-path architecture debt and remove outbound HTTP from +the plugin-supervisor + SEI crate. **Dispatched, not yet executed:** +- PRD: `docs/product/prd/PRD-0001-loomweave-llm-extraction.md` (ready-for-planning). +- Boundary ratified by a solution-architect pass (pure-leaf crate, no cycle). +- Implementation plan: `docs/plans/2026-06-24-loomweave-llm-extraction.md`. +- **Next action:** `/review-plan` that plan, then execute (subagent-driven or a + fresh `executing-plans` session). Metric it moves: trust-surface + `loomweave-core links reqwest: yes → no`. + +## In flight (tracker authoritative for status) + +- Nothing claimed/in-progress. The Now bet's tracker item (clarion-141e9c08c8) + is dispatched (PRD + plan) but not started — no code changed this session. +- Active defect cluster (Now/Next): clarion-feab311907, clarion-14398b2536, + clarion-a65cb18b02 (all confirmed); clarion-abda98c869, clarion-c20593d0d8 + (triage). + +## Decided this session (2026-06-24) + +- **Authority grant CONFIRMED as-is** by owner; `Last reviewed` stamped + 2026-06-24 (content unchanged). +- **PDR-0003** — Now bet = `loomweave-llm` extraction. +- **PDR-0004** — accepted the 1.1.0 / Rust-plugin-gold bet as **complete** + (PDR-0002 gate satisfied; all 4 collision families fixed; now v1.3.1). + +## Metric signals + +- North star (open collision families) **4 → 0, TARGET MET** — needs a fresh + successor target (owner). See `metrics.md`. +- New guardrail: trust-surface (`loomweave-core` HTTP) — currently `yes`, target + `no`, met when the Now bet lands. +- **Re-check needed:** `tools/list` 22 KB budget (MCP surface grew; margin was 13 + bytes — reading unknown). CI floor presumed green at 1.3.1, not re-verified. + +## Open questions / awaiting owner + +1. **Fresh north-star target** now that the collision-family target is met. +2. **`tools/list` byte budget** — re-measure; may be breached. +3. **Adoption metric** — still undecided; telemetry is escalation-gated (local-first). +4. **ESCALATION (carried forward):** the Wardline Amendments 4–9 corpus + re-vendor handoff is **prepared but not pushed** — outward-facing, gated. Do + not push without owner sign-off. ## Where the next session starts -1. DECIDE/DISPATCH the top Now bet: the 4 collision-family fixes (check - whether their Filigree issues are filed and ready — Sprint-4 memo says - filed; they did not appear in the open-issue list at bootstrap, so verify - where they were filed). +1. `/review-plan docs/plans/2026-06-24-loomweave-llm-extraction.md`, then execute + the `loomweave-llm` extraction (the recorded Now bet). On completion, run the + PRD-0001 acceptance gate and bank the trust-surface metric flip. diff --git a/docs/product/decisions/0003-now-bet-loomweave-llm-extraction.md b/docs/product/decisions/0003-now-bet-loomweave-llm-extraction.md new file mode 100644 index 00000000..d72083c4 --- /dev/null +++ b/docs/product/decisions/0003-now-bet-loomweave-llm-extraction.md @@ -0,0 +1,46 @@ +# PDR-0003: Now bet — extract `loomweave-llm` from `loomweave-core` + +- **Date:** 2026-06-24 +- **Status:** accepted (autonomous within grant; Now-bet selection confirmed by owner this session) +- **PRD:** PRD-0001 (`docs/product/prd/PRD-0001-loomweave-llm-extraction.md`) +- **Tracker:** clarion-141e9c08c8 (head of critical path) → unblocks clarion-4328c5c757 + +## Context + +The 2026-06-11 bootstrap Now bet ("ship 1.1.0/rc4 with the Rust plugin at +gold") is complete and shipped — the product is now at v1.3.1 (see PDR-0004). +With that done, the workspace had **no recorded Now bet**. The Filigree +critical path's head is unchanged from bootstrap: extract `loomweave-llm` +(clarion-141e9c08c8), which unblocks the per-provider split +(clarion-4328c5c757). + +## Options + +1. **Extract `loomweave-llm`** — pay the head-of-critical-path architecture + debt; a behavior-preserving lift-and-shift. +2. **Incremental-analyze correctness cluster first** — close the 5 open graph + re-analyze bugs (defends the north-star directly). +3. **Triage the B.4* 24× analyze perf regression first** (clarion-c20593d0d8). + +## The call + +Option 1. It is the head of the critical path (unblocks the most downstream +work) and carries a real **trust-surface** argument: `loomweave-core` is the +crate that forks sandboxed plugin subprocesses and mints stable entity +identity (SEI), yet it also links an outbound HTTP client (`reqwest`) purely +for the LLM/embedding providers. Moving the providers to a dedicated leaf crate +removes HTTP from the plugin-supervisor + SEI crate. Owner confirmed this as the +Now bet this session. + +The bet was de-risked the same session: a solution-architect trace confirmed the +two provider modules import **no** workspace code, so `loomweave-llm` is a pure +leaf crate (no `core → llm` dependency, no cycle) — a clean lift-and-shift. + +## Reversal trigger + +Reopen / re-shape the bet if extraction is found to force a +`loomweave-core → loomweave-llm` dependency (which would re-link `reqwest` +transitively and void the trust-surface goal). Measured by the PRD-0001 +acceptance gate: `cargo tree -p loomweave-core` must resolve **no** `reqwest`. +If it cannot, the bet is not a clean move and returns to `DECIDE`. (Trace says +this is low-risk, but the gate is the falsifier.) diff --git a/docs/product/decisions/0004-accept-1.1.0-gold-bet-complete.md b/docs/product/decisions/0004-accept-1.1.0-gold-bet-complete.md new file mode 100644 index 00000000..5f1960db --- /dev/null +++ b/docs/product/decisions/0004-accept-1.1.0-gold-bet-complete.md @@ -0,0 +1,37 @@ +# PDR-0004: Accept the 1.1.0 / Rust-plugin-gold bet as complete + +- **Date:** 2026-06-24 +- **Status:** accepted (ACCEPT against PDR-0002's criteria) +- **Relates to:** PDR-0002 (gold-gates-1.1.0) — its gate is now satisfied + +## Context + +PDR-0002 gated the 1.1.0 cut on fixing four entity-ID collision families +(self-type-path, trait-path, `#[path]`-module, `const _`) and recording a gold +verdict, with a 2026-06-30 reversal trigger. The 2026-06-11 `current-state.md` +recorded this bet as in-flight. At this session's RESUME, reality had moved 13 +days and 112 commits ahead — the bet was never checkpointed as done. + +## What was observed (git, this session) + +- All four collision families fixed: ADR-049 Amendments 6+7 (`c4791aa`, + self-type + trait-path), Amendment 8 (`05b44f3`, `#[path]`-module), Amendment + 9 (`f7f8a69`, `const _`), plus a `LMWV-DUPLICATE-LOCATOR` runtime guardrail + (`be0e780`). +- 1.1.0 GA cut via PR #57 (`a97e1d8`); 1.2.0 / 1.2.1 / 1.3.0 / 1.3.1 shipped on + top. Workspace version is now `1.3.1`. + +## The call + +**Accept the bet as complete.** The PDR-0002 gate is satisfied; the +north-star reading it gated moved from 4 open collision families to **0** (see +`metrics.md`, dated 2026-06-24). The 2026-06-30 reversal trigger was never +needed and is now moot. The bet leaves "in flight" and is banked as accepted. + +## Reversal trigger + +If a regression re-opens any collision family (the `LMWV-DUPLICATE-LOCATOR` +runtime alarm or the adversarial QA sweep surfaces a new collision on the +reference corpora), re-open identity-correctness as a fresh bet — but the 1.1.0 +acceptance itself stands; identity correctness is now a standing guardrail, not +an open bet. diff --git a/docs/product/metrics.md b/docs/product/metrics.md index b750f35f..92f3e1ad 100644 --- a/docs/product/metrics.md +++ b/docs/product/metrics.md @@ -1,40 +1,65 @@ # Loomweave — Metrics -> Bootstrapped 2026-06-11. Baselines are real observed readings; **TARGET -> numbers and dates are placeholders for the human owner to set** — they are -> drafted falsifiably (a number and a date) so they can be confirmed or -> rejected, never left directional. +> Bootstrapped 2026-06-11. **Updated 2026-06-24** (checkpoint). Baselines are +> real observed readings; targets are falsifiable (a number/boolean and a date). ## North star -**Graph trustworthiness on the reference QA sweep** — a consult agent can -only prefer the graph over grep if the graph is correct. Proxy: open +**Graph trustworthiness on the reference QA sweep** — a consult agent can only +prefer the graph over grep if the graph is correct. Proxy: open entity-identity defect families (collisions, fabricated edges, dropped files) -found by the adversarial 4-corpus QA sweep (ripgrep / tokio / + 2). +found by the adversarial 4-corpus QA sweep. - `BASELINE (2026-06-11): 4 open collision families (Sprint-4 gold verdict)` -- `TARGET: 0 open families, gold verdict recorded — by 2026-06-30 [PLACEHOLDER — confirm]` +- `READING (2026-06-24): 0 open families` — **TARGET MET.** All four fixed + (ADR-049 Amendments 6–9; PDR-0004). The original `0 by 2026-06-30` target is + achieved ahead of its date. +- **OPEN QUESTION (owner):** the north-star target is now met and needs a fresh + falsifiable successor. Candidate: fabricated-edge / dropped-file defect count + on the same sweep stays 0 across the 1.3.x line — but the owner should set the + real next target rather than have it invented here. ## Guardrails -1. **CI floor stays green** (ADR-023): fmt, clippy `-D warnings`, build, - nextest, doc, deny, ruff, mypy --strict, pytest, e2e scripts. +1. **Trust-surface — does `loomweave-core` (plugin-host + SEI crate) link an + outbound HTTP client (`reqwest`)?** Added 2026-06-24 for the Now bet + (PDR-0003 / PRD-0001). The plugin-supervisor + identity crate must not also + carry HTTP. *(Scope: `loomweave-core`-specific — `reqwest` is legitimate in + `loomweave-federation` and `loomweave-cli`; this is not a workspace-wide ban.)* + - `BASELINE (2026-06-24): yes` — `loomweave-core` links `reqwest` directly. + - `TARGET: no` — verified by `cargo tree -p loomweave-core` resolving no + `reqwest`; becomes met when PRD-0001 lands. **Currently: yes (bet open).** +2. **CI floor stays green** (ADR-023): fmt, clippy `-D warnings`, build, nextest, + doc, deny, ruff, ruff-format, mypy --strict, pytest, e2e scripts. - `BASELINE (2026-06-11): green, ~1450 nextest tests` - - `TARGET: green on every rc4 merge — standing, no end date` -2. **MCP context tax stays under budget**: `tools/list` payload has a - CI-enforced 22,000-byte budget. + - `READING (2026-06-24): presumed green` — five releases (1.2.0–1.3.1) cut + since, each implying a green floor; **not independently re-verified this + session.** One security event handled: msgpack GHSA-6v7p-g79w-8964 bumped + (1.3.1). + - `TARGET: green on every release/merge — standing, no end date` +3. **MCP context tax under budget**: `tools/list` payload has a CI-enforced + 22,000-byte budget. - `BASELINE (2026-06-11): 13 bytes under budget` - - `TARGET: never exceeds budget; any schema growth buys bytes elsewhere first — standing` -3. **Identity stability**: SEI churn on unchanged re-analyze of reference - corpora. + - `READING (2026-06-24): UNKNOWN — NEEDS RE-CHECK.` The MCP surface grew since + bootstrap (entity dossier, app_only filters, caller-honesty fields, config + tools). Margin was 13 bytes; growth may have breached or re-tightened it. + - `TARGET: never exceeds budget; any schema growth buys bytes elsewhere first` +4. **Identity stability**: SEI churn on unchanged re-analyze of reference corpora. - `BASELINE (2026-06-11): 0 SEI churn (Sprint-3 sweep)` - - `TARGET: stays 0 across the rc4 line — standing` + - `READING (2026-06-24): presumed 0` — no identity code changed; not re-swept + this session. + - `TARGET: stays 0 across the 1.3.x line — standing` ## Watchlist (not yet a target) -- **Subsystem-count drift** on unchanged re-analyze (ripgrep 9→14, - tokio 28→42 in the Sprint-3 sweep) — clustering instability, - clarion-14398b2536. Promote to a guardrail when the fix lands. +- **Subsystem-count drift** on unchanged re-analyze (clustering instability, + clarion-14398b2536, confirmed). Promote to a guardrail when the fix lands. +- **B.4* analyze wall-time 24× regression** on elspeth_mini (3.99s → 96.99s; + next-tier projection 3.1min → 96min), clarion-c20593d0d8 (triage). **Added + 2026-06-24.** Bears directly on the "graph fast enough to prefer over grep" + north-star; promote to a perf guardrail once root-caused. (Note a B.4* week-2 + gate refresh read GREEN 2026-06-18 — reconcile the conflicting signals during + triage.) - **Adoption / operator installs** — no instrumentation exists; local-first - design makes telemetry an explicit product decision (escalate before - adding any). `BASELINE: unknown → TARGET: TBD by owner`. + design makes telemetry an explicit, escalation-gated product decision. + `BASELINE: unknown → TARGET: TBD by owner`. diff --git a/docs/product/prd/PRD-0001-loomweave-llm-extraction.md b/docs/product/prd/PRD-0001-loomweave-llm-extraction.md new file mode 100644 index 00000000..06c64c05 --- /dev/null +++ b/docs/product/prd/PRD-0001-loomweave-llm-extraction.md @@ -0,0 +1,176 @@ +# PRD-0001 — Extract `loomweave-llm` crate from `loomweave-core` Status: ready-for-planning +Decision: PDR-0003 (pending — confirmed in-session 2026-06-24, written at next /product-checkpoint) +Bet (roadmap.md): Now (promoted from Next this session) Target metric (metrics.md): Trust-surface — does `loomweave-core` (plugin-host/SEI crate) link outbound HTTP (NEW) + +## Problem + +**Who:** Loomweave's maintainers and anyone reasoning about its trust posture — +the property the product sells is a *trustworthy* local-first graph (stable SEI, +sandboxed plugin host, credential-free `analyze`). + +**The pain:** `loomweave-core` is two crates wearing one coat. It is the +**plugin-supervisor** crate — it forks each language plugin as a sandboxed +subprocess (`plugin/host.rs`, `jail.rs`, `limits.rs`, `breaker.rs`, the only +permitted `unsafe`) and it owns **stable entity identity** (`entity_id.rs`, +SEI). It *also* carries ~3,660 LOC of **outbound LLM/embedding HTTP** +(`llm_provider.rs`, `embedding_provider.rs`), and pulls `reqwest` into the +dependency tree for them. The crate that runs untrusted forked children and +mints identity tokens should not also be the crate that opens network sockets to +a model provider — every dependent of `loomweave-core` (storage, mcp, cli, the +plugin crates) transitively links that HTTP stack whether or not it ever calls a +model. + +**Desired outcome:** Outbound model HTTP lives in one dedicated crate +(`loomweave-llm`); `loomweave-core` no longer links `reqwest`; the +plugin-host / SEI crate is back to a single, defensible job. Consumers that +genuinely need a provider depend on `loomweave-llm` directly. + +**Why now:** This is the **head of the tracker's critical path** +(clarion-141e9c08c8). Nothing downstream — most directly the per-provider split +(clarion-4328c5c757) — can proceed cleanly until the provider code has its own +crate. The 1.1.0/gold bet is done and shipped (now at 1.3.1); this is the +next-largest unpaid structural debt and it is load-bearing for the trust story. + +## Success metric (the signal the bet paid off) + +**Trust-surface — does the plugin-supervisor + SEI crate (`loomweave-core`) link +an outbound HTTP client (`reqwest`)?** This metric does not yet exist on the +`metrics.md` scoreboard; adding it (BASELINE observed today) is a precondition of +ACCEPT, not a fabrication. + +- `BASELINE (2026-06-24): yes` — `loomweave-core` links `reqwest` directly (verified in its Cargo.toml). +- `TARGET: no` — `loomweave-core`'s dependency tree resolves no `reqwest`, verified on the merge commit. + +> **Scope of the invariant (corrected by trace, 2026-06-24):** the goal is *not* +> "centralize all HTTP in one crate." `reqwest` is legitimately used by +> `loomweave-federation` (sibling HTTP) and `loomweave-cli` (sarif/sei-git/doctor +> HTTP), and those crates neither fork sandboxed plugins nor mint SEI. The +> trust-surface invariant is specifically that the crate which *does* fork +> untrusted children and own identity must not also carry an HTTP client. + +Falsification: if `cargo tree -p loomweave-core` still resolves `reqwest` after +the bet lands, the bet did not pay off, regardless of how much code moved. + +## Acceptance criteria (falsifiable) + +> Observation window for every criterion below is **the CI run on the merge +> commit of this bet** — a concrete event, not "eventually." The *calendar* +> forecast for when that merge happens is `/axiom-program-management`'s, not this +> PRD's. + +1. **SUCCESS (structural)** — On the merge commit, `cargo tree -p loomweave-core` + resolves **no `reqwest`** dependency, and both provider modules + (`llm_provider`, `embedding_provider`) with their traits and impls live in a + new `loomweave-llm` crate. + *Reject branch:* `reqwest` still in `loomweave-core`'s tree → bet **not + accepted**; the trust boundary was not achieved; open a follow-up PDR. + +2. **METRIC (trust-surface, from amended `metrics.md`)** — Trust-surface reading + flips **yes → no** on the merge commit: `loomweave-core` no longer links + `reqwest`. Enforced by a **CI assertion** that `cargo tree -p loomweave-core + --edges normal` resolves no `reqwest` (a per-dependent ban that `cargo-deny`'s + `[bans]` cannot express — `reqwest` stays legitimate for federation/cli, so it + is *not* denied workspace-wide). + *Reject branch:* `loomweave-core` still resolves `reqwest` → bet **rejected + even if (1) reads done**. + +3. **GUARDRAIL — CI floor green (`metrics.md`, ADR-023)** — every floor gate + passes on the merge commit: `fmt`, `clippy --all-targets --all-features + -D warnings`, `build`, `nextest`, `doc -D warnings`, `deny`, plus `ruff`, + `ruff format --check`, `mypy --strict`, `pytest`, and the three e2e scripts. + *Reject branch:* any gate red → bet **rejected even if (1)+(2) pass**. + +4. **GUARDRAIL — identity stability (`metrics.md`, SEI-churn-0)** — re-analyze of + the reference corpora shows **0 SEI churn** vs. pre-bet, and `entity_id.rs` is + **not modified** by the extraction (the LLM crate must not pull in or alter + entity-ID code). + *Reject branch:* any SEI churn, or `entity_id.rs` touched by the move → bet + **rejected** (identity is the product's core promise). + +5. **GUARDRAIL — no consumer regression** — every current provider consumer + (`loomweave-cli` analyze/config/doctor/serve, `loomweave-mcp` + semantic/summary, `loomweave-federation/config`) compiles and its tests pass + against the new boundary; provider-replay tests (`RecordingProvider`, + `RecordingEmbeddingProvider`) pass **unchanged**. + *Reject branch:* any consumer behavior change, or a test that had to be + loosened to pass → bet **rejected**; it stopped being a lift-and-shift. + +6. **SCOPE — pure lift-and-shift** — no provider *behavior* changes: no new + providers, no transport/retry/caching/timeout logic changes, no config-schema + changes. The diff is relocation + re-wiring only. + *Reject branch:* any provider behavior change → out of scope for this bet; + carve it into a separate bet. + +## Non-goals (this bet) + +- **Not** the per-provider split of `llm_provider.rs` (OpenRouter / Codex-CLI + into separate modules) — that is the downstream bet **clarion-4328c5c757**, + unblocked *by* this one. This bet only relocates the existing module wholesale. +- **No** new provider behavior, retry/caching/transport/timeout changes, or + `llm_policy` config-schema changes. +- **No** changes to `entity_id`/SEI, the plugin host (`plugin/`), `storage`, or + the MCP/HTTP read surface. +- **No** change to summary/embedding *semantics* (lazy, per-entity, opt-in; + `analyze` stays credential-free). + +## Constraints & guardrails + +- **Dependency direction is the load-bearing boundary:** `loomweave-core` must + **not** gain a dependency on `loomweave-llm` (that would re-introduce `reqwest` + transitively and void the whole bet). A compat re-export living *in* + `loomweave-core` is therefore **ruled out**. *Trace finding (2026-06-24):* the + two provider modules import **no** workspace code (`std` + `async-trait` / + `fs2` / `serde` / `thiserror` / `reqwest` / `tokio` only), so `loomweave-llm` + is a **pure leaf crate** — the direction is trivially acyclic. Consumers (`cli`, + `mcp`, and any federation use) repoint provider imports `loomweave_core::…` → + `loomweave_llm::…`. *The exact mechanism (shared error crate? type placement) + is `/axiom-solution-architect`'s to ratify — but the trace says none is needed.* +- **Version lockstep:** the new crate uses `version.workspace = true`; all + `scripts/check-*.py` lockstep guards (workspace version, entity cap, pyright + pin, ontology version, migration retirement) stay green. +- **Workspace hygiene:** `unsafe_code = "deny"` (the move introduces no unsafe); + clippy `pedantic -D warnings` workspace-wide; edition 2024, `rust-version 1.88`. +- **Anti-goals preserved (`vision.md`):** local-first, no eager LLM spend, no + mandatory cloud — unchanged by relocation. + +## Open questions / assumptions + +- **ASSUMPTION (provenance):** the decision to make this the Now bet (PDR-0003) + was confirmed by the owner this session but is **not yet written** to + `decisions/` — it lands at `/product-checkpoint`. This PRD's authority rests on + that in-session DECIDE. +- **ASSUMPTION (metric gate):** the trust-surface guardrail is **added to + `metrics.md`** (BASELINE `loomweave-core links reqwest: yes` → TARGET `no`) as + part of accepting this bet. Until it is, criterion 2 references a metric not yet + on the scoreboard — that amendment is an ACCEPT precondition (also a checkpoint + action). The BASELINE is observed, not invented. +- **CORRECTION (metric scope, 2026-06-24):** the metric was initially drafted as + "no crate outside `loomweave-llm` links `reqwest`" and **corrected by trace** — + `reqwest` is legitimately used by `loomweave-federation` and `loomweave-cli`. + The invariant is `loomweave-core`-specific. The bet is unchanged; only the + measurement was made achievable. +- **RESOLVED (consumer set, ratified by solution-architect 2026-06-24):** the + re-wire is **`loomweave-cli`** (`src/serve.rs`, `src/analyze.rs`, + `tests/serve.rs`) and **`loomweave-mcp`** (`src/lib.rs`, `src/tools/summary.rs`, + `src/tools/status.rs`, `tests/storage_tools.rs`, `tests/catalogue_tools.rs`) — + both already link their own `reqwest`, so no new HTTP surface. **`loomweave- + federation` and `loomweave-storage` are confirmed NON-consumers** (federation + defines its own config enums; it never constructs a provider). Planning input, + not an acceptance gate. +- **ASSUMPTION (cohesion):** `CodexCliProvider` (CLI-based, no `reqwest`) moves + with the LLM module for cohesion, so the whole `LlmProvider` abstraction lives + in one crate. + +## Handoff + +- **Top item → `/axiom-planning`:** the crate extraction + consumer re-wire as a + **behavior-preserving move** (tracker **clarion-141e9c08c8**) — the executable, + codebase-validated plan for the lift-and-shift and the `cargo-deny` ban rule. +- **Solution shape → `/axiom-solution-architect`:** the crate boundary and + dependency direction (compat re-export vs. direct consumer imports; where the + shared error/types live; cycle-avoidance), and the design of the + ban-`reqwest`-outside-`loomweave-llm` check. The PRD names the constraints; the + design chooses within them. +- **Sequencing & dated forecast → `/axiom-program-management`** (no date here). +- **Tracker linkage:** clarion-141e9c08c8 (this bet) → unblocks + clarion-4328c5c757 (per-provider split, the next bet). diff --git a/docs/product/roadmap.md b/docs/product/roadmap.md index b311c3e6..4f26181d 100644 --- a/docs/product/roadmap.md +++ b/docs/product/roadmap.md @@ -4,63 +4,74 @@ > horizon, and why. Sequencing, WSJF scoring, and dated forecasts are produced > by `/axiom-program-management`, never here. No dates, no commitments. > -> Bootstrapped 2026-06-11 from observed direction (rc4 commit history, open -> tracker items, sprint memos). Tracker IDs are Filigree issues. +> Bootstrapped 2026-06-11. **Updated: 2026-06-24 (PDR-0003, PDR-0004)** — the +> 1.1.0/gold bet shipped (now v1.3.1); the Now horizon turns over to the +> architecture-debt paydown. Tracker IDs are Filigree issues. -## Now — ship the 1.1.0 release line (rc4) with the Rust plugin at gold +## Now — extract `loomweave-llm` (pay the head-of-critical-path debt) -The dominant observed bet: the first-party Rust language plugin merged into -rc4 and four sprints (hardening, edges, scale-QA, gold closeout) drove it -toward "gold" — but the Sprint-4 verdict was **not gold**: four new entity-ID -collision families (self-type-path, trait-path, `#[path]`-module, `const _`) -were found and filed as the remaining gold blockers. +The recorded bet. `loomweave-core` is the plugin-supervisor + SEI crate (it +forks sandboxed plugin subprocesses and mints stable entity identity) yet also +links an outbound HTTP client (`reqwest`) purely for the LLM/embedding +providers. Extracting the providers into a dedicated pure-leaf crate +(`loomweave-llm`) removes HTTP from that crate and unblocks the per-provider +split. -- Close the 4 collision families blocking the Rust-plugin gold verdict. -- Finish the in-flight orphaned-input fix: Wardline manifest-ingest reads a - `wardline.yaml` format Wardline no longer produces (clarion-7c9336163e, - in progress with uncommitted working-tree changes). -- Incremental-analyze correctness: stale anchored edges from deleted files - never pruned (clarion-feab311907, major), subsystem clustering unstable on - unchanged re-analyze (clarion-14398b2536), wrong-language double - syntax-error findings (clarion-a65cb18b02). -- MCP/HTTP surface convergence remainder: X-6 pagination idiom + slim row - projection (clarion-b24df21158), version the wardline HTTP group before the - contract freeze (clarion-29b3ddcb0c), schema polish (clarion-e323e32b53). +- Extract `loomweave-llm` from `loomweave-core` (clarion-141e9c08c8) — head of + the tracker critical path; unblocks the per-provider split + (clarion-4328c5c757). **Dispatched** this session: PRD-0001 (ready-for- + planning), solution-architect-ratified boundary, implementation plan at + `docs/plans/2026-06-24-loomweave-llm-extraction.md`. Next action: `/review-plan` + then execute. +- Incremental-analyze correctness cluster (defends the north-star directly): + stale anchored edges from deleted files never pruned (clarion-feab311907, + confirmed), subsystem clustering unstable on unchanged re-analyze + (clarion-14398b2536, confirmed), wrong-language double syntax-error findings + (clarion-a65cb18b02, confirmed), incremental-move PARENT-CONTAINS-MISMATCH + (clarion-abda98c869, triage). +- B.4* analyze wall-time 24× regression on elspeth_mini (clarion-c20593d0d8, + triage) — bears on the "graph fast enough to prefer over grep" north-star. -**Metric this moves:** Rust-plugin gold blockers → 0; graph-correctness -defects on the 4-corpus QA sweep (see `metrics.md`). +**Metric this moves:** trust-surface (`loomweave-core` links outbound HTTP: +yes → no); critical-path length → 0 open; graph-correctness defects on the +4-corpus re-analyze sweep (see `metrics.md`). -## Next — pay the architecture debt and reach Python/Rust launch parity +## Shipped since 2026-06-11 (banked, no longer open bets) -- Extract `loomweave-llm` crate from `loomweave-core` (clarion-141e9c08c8) — - head of the tracker's critical path; unblocks the per-provider split - (clarion-4328c5c757). Trust-surface argument: LLM HTTP transport out of the - plugin-supervisor crate. +- **1.1.0 GA + the 1.2/1.3 line** — PR #57; Rust plugin at gold (4 collision + families fixed, PDR-0004). Now v1.3.1. +- **Dead-code public-surface reachability** (clarion-4ec50f3d92, done) — was a + Later item; the no-`__all__` fallback root shipped early. +- **Doctor index-integrity repair** (PR #64) — `doctor --fix` repairs + stale-file / parent-contains corruption. +- **Session-start auto-analyze + staleness refresh discipline** (1.3.x). +- **msgpack security bump** GHSA-6v7p-g79w-8964 (1.3.1). +- **Default write-tools-on** for the local agent loop; **public website** (`www/`). + +## Next — finish launch parity and the federation-audit remainder + +- Per-provider split of `llm_provider.rs` (clarion-4328c5c757) — unblocked once + the Now bet lands. - Split `analyze.rs` `run_with_options` (clarion-cb9676de57). -- Python plugin launch parity: pin the calls/references resolution envelope - with audit tests (clarion-e9cfde2773); characterize deep-recursion behavior - on hostile input (clarion-f3eb3852d6). -- Federation-audit G-series gaps from the 2026-06-10 weft-hub audit - (G2 historical-locator resolve, G10 project selector, G14 canonical-JSON - SEI oracle, G15 serde alias test, G16 rename-parser vectors, G24/G25). -- Shared `weft.toml` key-layout proposal for the hub to bless - (clarion-00abdf2fcb). +- Python plugin launch parity: pin the calls/references resolution envelope with + audit tests (clarion-e9cfde2773). +- Federation-audit G-series gaps (G2 historical-locator resolve + clarion-3c47f53e99, G10 project selector clarion-c37e1714fd, G14 canonical-JSON + SEI oracle clarion-9d0e82513c, G16 rename-parser vectors clarion-73dff1d2d1). +- Shared `weft.toml` key-layout proposal for the hub to bless (clarion-00abdf2fcb). - Wardline handoff for Amendments 4–9 corpus re-vendor (prepared, not pushed; - escalation-gated — outward-facing). - -**Metric this moves:** critical-path length → 0 open; launch-parity label -count → 0. + **escalation-gated — outward-facing**, see `current-state.md`). ## Later — coverage expansion and deferred surfaces -- Python entity-kind coverage beyond function/class/module — module-level - consts/vars, type aliases (clarion-a0ecac062f; additive under ADR-027). +- Python entity-kind coverage beyond function/class/module (clarion-a0ecac062f; + additive under ADR-027). - Rust plugin categorisation-tag parity so pure-Rust dead-code analysis works — visibility/entry-point/test/handler reachability roots, the Rust analog of the - Python `public-surface` work (clarion-05fdd0490e; supersedes clarion-e1899a109f). -- ADR-021's `plugin_limits.*` loomweave.yaml config surface - (clarion-271287b54b). -- `references` envelope extension: match/let pattern paths + discriminant - exprs (clarion-efc8715d98). + shipped Python `public-surface` work (clarion-05fdd0490e). Plus public-method + reachability roots (clarion-961a1acb2c). +- ADR-021's `plugin_limits.*` loomweave.yaml config surface (clarion-271287b54b). +- `references` envelope extension: match/let pattern paths + discriminant exprs + (clarion-efc8715d98). - Guidance staleness-review UI (deferred from v1.0). - Other-language plugins (TypeScript, Java) — v2.0+ (NG-15). diff --git a/docs/product/vision.md b/docs/product/vision.md index f05643fc..d6f8f055 100644 --- a/docs/product/vision.md +++ b/docs/product/vision.md @@ -69,5 +69,6 @@ The agent **escalates before**: Wardline/Weft-hub repos — note the existing standing authorization covers *tool use*, not outward-facing publication) -Last reviewed: 2026-06-11. Review cadence: every 30 days or at each +Last reviewed: 2026-06-24 (confirmed unchanged by owner at `/own-product` +resume; first reviewed 2026-06-11). Review cadence: every 30 days or at each `/own-product` resume, whichever comes first. From e1790a40478dfa7b0a2622b87e97441f4d18d26c Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Thu, 25 Jun 2026 10:53:53 +1000 Subject: [PATCH 3/7] feat(plugin-rust): emit reachability-root tags (ADR-054, increment 1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Rust plugin emitted zero categorisation tags, so dead-code analysis was signal-unavailable on a pure-Rust index (clarion-05fdd0490e). Increment 1 of ADR-054 derives the root vocabulary from Rust's explicit semantics: - exported-api: unrestricted `pub` whose whole module chain is `pub`, in lib targets (pub(crate)/restricted excluded; bin targets suppressed via the `@bin(...)` module-path discriminator; `#[macro_export]` for macros) - entry-point: `fn main`, `#[tokio::main]`/runtime-entry attrs, FFI `#[no_mangle]`/`#[export_name]` - test: `#[test]`/`#[bench]`, items under `#[cfg(test)]` - allow-dead-code: `#[allow(dead_code)]`/`#[expect(dead_code)]` keep-signal (new DEAD_CODE_ROOT_TAGS entry) Engine: the no-roots envelope + LOW-confidence advisory are now language-aware (a Rust corpus gets Rust levers, never `__all__`); modules are excluded from dead-code candidacy (DEAD_CODE_CONTAINER_KINDS — the containment spine is never "dead", which the dogfood showed would otherwise dominate the candidate set). Ontology bump 0.5.0 -> 0.6.0 (plugin.toml + wheel-data copy + serve.rs + doc). TDD throughout; dogfooded on a lib+bin crate (genuine orphan only, moderate confidence; find_entry_points/find_dead_code light up for Rust; exclusion lifts). Deferred to increment 2: framework-attribute handlers, `pub use` re-export resolution, `pub`-method rooting. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../loomweave-mcp/src/catalogue/shortcuts.rs | 177 ++++++++++++-- crates/loomweave-mcp/tests/catalogue_tools.rs | 177 +++++++++++++- crates/loomweave-plugin-rust/plugin.toml | 7 +- crates/loomweave-plugin-rust/src/extract.rs | 98 +++++++- crates/loomweave-plugin-rust/src/lib.rs | 1 + crates/loomweave-plugin-rust/src/root_tags.rs | 223 +++++++++++++++++ crates/loomweave-plugin-rust/src/serve.rs | 2 +- .../tests/analyze_e2e.rs | 2 +- .../loomweave-plugin-rust/tests/root_tags.rs | 186 ++++++++++++++ .../ADR-054-rust-reachability-root-model.md | 229 ++++++++++++++++++ docs/loomweave/adr/README.md | 1 + docs/operator/language-support.md | 53 ++-- .../share/loomweave/plugins/rust/plugin.toml | 7 +- 13 files changed, 1097 insertions(+), 66 deletions(-) create mode 100644 crates/loomweave-plugin-rust/src/root_tags.rs create mode 100644 crates/loomweave-plugin-rust/tests/root_tags.rs create mode 100644 docs/loomweave/adr/ADR-054-rust-reachability-root-model.md diff --git a/crates/loomweave-mcp/src/catalogue/shortcuts.rs b/crates/loomweave-mcp/src/catalogue/shortcuts.rs index 9b945ab7..f8bbf0bd 100644 --- a/crates/loomweave-mcp/src/catalogue/shortcuts.rs +++ b/crates/loomweave-mcp/src/catalogue/shortcuts.rs @@ -71,6 +71,12 @@ const DEAD_CODE_ROOT_TAGS: &[&str] = &[ // a module *with* `__all__` emits no `public-surface`, so well-declared // modules are byte-identical to before. "public-surface", + // `allow-dead-code` (ADR-054): the Rust plugin emits this for an item + // carrying `#[allow(dead_code)]` / `#[expect(dead_code)]` — an explicit + // author "keep this" assertion that suppresses rustc's own dead-code lint. + // The lowest-confidence root class (an explicit local suppression, not a + // structural surface), but fail-toward-live and consistent with rustc. + "allow-dead-code", "wardline:external_boundary", "wardline:trusted", ]; @@ -93,6 +99,17 @@ const DEAD_CODE_EXCLUDED_TAGS: &[&str] = &["framework-handler", "plugin-hook"]; /// kind, not plugin, so a plugin-emitted non-code kind is covered too. const DEAD_CODE_NON_CODE_KINDS: &[&str] = &["file", "project", "subsystem", "guidance"]; +/// Code-adjacent CONTAINER kinds that are never dead-code candidates (ADR-054). +/// A `module` is the containment spine rooted at the always-live crate root: +/// reachability-by-containment reaches every module by construction, so a module +/// is never "dead" in any actionable sense — you remove its contents, not the +/// namespace. Reachability proper runs over call+import edges only, and the Rust +/// plugin emits no module-targeting `imports` edges (its import edges target +/// items), so without this exclusion every Rust module would read as dead and +/// dominate the candidate set. Kept distinct from [`DEAD_CODE_NON_CODE_KINDS`] +/// (modules are code, not non-code anchors) and disclosed separately. +const DEAD_CODE_CONTAINER_KINDS: &[&str] = &["module"]; + /// Runtime import predicate used by graph shortcuts. Missing or malformed /// properties fail toward inclusion; explicit `type_only=true` or /// `scope="function"` marks an import as non-module-runtime evidence. @@ -349,8 +366,10 @@ impl ServerState { .with_reader(move |conn| { let filter = scope.resolve(conn)?; let (in_scope, scope_truncated) = filter.in_scope_ids(conn, &project_root)?; + // The languages present localise every advisory lever (ADR-054). + let langs = plugins_present(conn)?; let Some(reachability) = dead_code_reachability(conn, app_only)? else { - return Ok(dead_code_no_roots_envelope(&page, scope_truncated)); + return Ok(dead_code_no_roots_envelope(&page, scope_truncated, &langs)); }; let DeadCodeReachability { app_excluded, @@ -428,6 +447,7 @@ impl ServerState { unresolved_call_site_suppressed, withheld_count, roots_mode, + &langs, ), "app_only": app_only, "dead_code": dead_code, @@ -443,6 +463,7 @@ impl ServerState { }, "excluded": { "non_code_kinds": DEAD_CODE_NON_CODE_KINDS, + "container_kinds": DEAD_CODE_CONTAINER_KINDS, "plugins_without_roots": plugins_without_roots_json(excluded_by_plugin), }, // Aggregate-level actionability without identity @@ -477,11 +498,65 @@ const SHORTCUT_PAGE_DEFAULT: usize = 50; const SHORTCUT_PAGE_MAX: usize = 200; const CHURN_SCAN_CAP: usize = 50_000; +/// The distinct plugin ids that own at least one entity in this index — the +/// languages whose source-level levers a dead-code advisory should name +/// (ADR-054: a Rust corpus must never be handed Python-only `__all__` advice). +fn plugins_present(conn: &rusqlite::Connection) -> loomweave_storage::Result> { + let mut stmt = conn.prepare("SELECT DISTINCT plugin_id FROM entities")?; + let mut rows = stmt.query([])?; + let mut out = BTreeSet::new(); + while let Some(row) = rows.next()? { + out.insert(row.get::<_, String>(0)?); + } + Ok(out) +} + +/// The source-level levers that emit reachability roots, phrased per language +/// from the plugin ids actually present (ADR-054 / ADR-053). Sorted-set input → +/// deterministic ordering; an unknown/empty plugin set yields a generic phrase. +fn root_tag_levers(plugin_ids: &BTreeSet) -> String { + let phrases: Vec<&'static str> = plugin_ids + .iter() + .filter_map(|p| match p.as_str() { + "python" => Some( + "for Python, declare `__all__` to mark a module's public surface \ + (with no `__all__`, public module-level defs/classes are auto-tagged \ + `public-surface`) and add entry-point / cli-command / http-route \ + decorators to externally-invoked functions", + ), + "rust" => Some( + "for Rust, make an item `pub` (exported API), add a `fn main` / \ + `[[bin]]` or a `#[tokio::main]` entry point, and mark tests with \ + `#[test]` / `#[cfg(test)]`", + ), + _ => None, + }) + .collect(); + if phrases.is_empty() { + return "the levers are source-level: emit reachability-root tags (exported \ + API, entry points, tests) for the analysed languages" + .to_owned(); + } + format!("the levers are source-level — {}", phrases.join("; ")) +} + /// The honest signal-unavailable envelope for `find_dead_code` when no /// reachability root tags exist — zero candidates, never a whole-corpus false /// positive. Identical in both `explicit` and `auto` modes: `auto` cannot -/// fabricate roots from an empty tag set. -fn dead_code_no_roots_envelope(page: &Page, scope_truncated: bool) -> Value { +/// fabricate roots from an empty tag set. `plugin_ids` localises the lever copy +/// to the languages present (ADR-054). +fn dead_code_no_roots_envelope( + page: &Page, + scope_truncated: bool, + plugin_ids: &BTreeSet, +) -> Value { + let levers = root_tag_levers(plugin_ids); + let signal_msg = format!( + "this index has no reachability root tags (entry-point / http-route / test / \ + data-model / cli-command / exported-api / public-surface / allow-dead-code), \ + so dead code cannot be determined — this is NOT a guarantee there is no dead \ + code. {levers}" + ); success_envelope(json!({ "dead_code": [], "page": { @@ -490,16 +565,7 @@ fn dead_code_no_roots_envelope(page: &Page, scope_truncated: bool) -> Value { }, "scope_truncated": scope_truncated, "scan_truncated": false, - "signal": missing_signal( - "entity_tags", - "this index has no reachability root tags (entry-point / http-route / \ - test / data-model / cli-command / exported-api / public-surface), so \ - dead code cannot be determined — this is NOT a guarantee there is no \ - dead code. The levers are source-level: declare `__all__` (or, with \ - no `__all__`, public module-level defs/classes are auto-tagged \ - `public-surface`) and add entry-point / cli-command / http-route \ - decorators to public entry functions", - ), + "signal": missing_signal("entity_tags", &signal_msg), })) } @@ -517,6 +583,7 @@ fn dead_code_summary( shielded_by_unresolved_calls: usize, withheld_secret: usize, roots_mode: RootsMode, + plugin_ids: &BTreeSet, ) -> Value { let analysed = reachable.saturating_add(dead_candidates); let dead_pct = dead_candidates @@ -525,16 +592,14 @@ fn dead_code_summary( .unwrap_or(0); let low_confidence = dead_pct > 25; let advisory = low_confidence.then(|| { + let levers = root_tag_levers(plugin_ids); format!( "{dead_pct}% of analysed entities ({dead_candidates}/{analysed}) are unreachable \ from the reachability roots — implausibly high, so the roots likely do not cover \ this corpus (e.g. code reached through framework dispatch, dependency injection, a \ CLI, or tests that static analysis cannot follow). Treat these candidates as LOW \ - CONFIDENCE. There is no roots config knob; the levers are source-level: declare \ - `__all__` to mark a module's public surface (non-underscore module-level \ - defs/classes are auto-tagged `public-surface` roots when a module declares no \ - `__all__`), and add entry-point / cli-command / http-route decorators to \ - externally-invoked entry points, before relying on dead-code detection." + CONFIDENCE. There is no roots config knob; {levers}, before relying on dead-code \ + detection." ) }); json!({ @@ -1033,6 +1098,7 @@ fn dead_code_candidate_set( let mut candidates: Vec = all_rows .into_iter() .filter(|row| !DEAD_CODE_NON_CODE_KINDS.contains(&row.kind.as_str())) + .filter(|row| !DEAD_CODE_CONTAINER_KINDS.contains(&row.kind.as_str())) .filter(|row| match &plugins_with_roots { None => true, Some(plugins_with_roots) => { @@ -1397,12 +1463,17 @@ fn strongly_connected_cycles(adjacency: &HashMap>) -> Vec BTreeSet { + ids.iter().map(|s| (*s).to_owned()).collect() + } + #[test] fn dead_code_summary_flags_low_confidence_when_unreachable_share_is_high() { // The lacuna shape: 141 candidates of ~391 analysed (36%) — implausibly // high, so the roots don't cover the corpus. Confidence must read LOW and // recruit the operator, not present 141 as a confident headline. - let s = dead_code_summary(141, 250, 1, 3, 5, RootsMode::Explicit); + let py = plugin_set(&["python"]); + let s = dead_code_summary(141, 250, 1, 3, 5, RootsMode::Explicit, &py); assert_eq!(s["confidence"], "low"); assert!(s["advisory"].as_str().unwrap().contains("LOW CONFIDENCE")); assert_eq!(s["dead_candidates"], 141); @@ -1415,17 +1486,17 @@ mod tests { // A plausible dead share (a few orphans in a well-rooted corpus) reads // MODERATE with no advisory — the breakdown still leads, but no alarm. - let s = dead_code_summary(5, 400, 0, 0, 0, RootsMode::Explicit); + let s = dead_code_summary(5, 400, 0, 0, 0, RootsMode::Explicit, &py); assert_eq!(s["confidence"], "moderate"); assert!(s["advisory"].is_null()); // Degenerate: nothing analysed → no division panic, no false alarm. - let s = dead_code_summary(0, 0, 0, 0, 0, RootsMode::Explicit); + let s = dead_code_summary(0, 0, 0, 0, 0, RootsMode::Explicit, &py); assert_eq!(s["confidence"], "moderate"); assert!(s["advisory"].is_null()); // Auto mode declares the derived-confidence roots. - let s = dead_code_summary(5, 400, 0, 0, 0, RootsMode::Auto); + let s = dead_code_summary(5, 400, 0, 0, 0, RootsMode::Auto, &py); assert_eq!(s["roots_mode"], "auto"); assert_eq!(s["roots_confidence"], "derived"); } @@ -1435,8 +1506,16 @@ mod tests { // clarion-4ec50f3d92: the LOW-confidence advisory must recruit the // *actual* levers — declaring `__all__`, adding entry-point/cli/http // decorators — never a "configure roots" knob that loomweave.yaml does - // not have. - let s = dead_code_summary(141, 250, 1, 3, 5, RootsMode::Explicit); + // not have. (Python corpus.) + let s = dead_code_summary( + 141, + 250, + 1, + 3, + 5, + RootsMode::Explicit, + &plugin_set(&["python"]), + ); let advisory = s["advisory"] .as_str() .expect("low-confidence advisory present"); @@ -1455,15 +1534,42 @@ mod tests { ); } + #[test] + fn dead_code_advisory_is_language_aware_for_rust() { + // ADR-054: a Rust corpus must be handed Rust levers (`pub` / `#[test]` / + // an entry point), NEVER Python-only `__all__` advice. + let s = dead_code_summary( + 141, + 250, + 0, + 0, + 0, + RootsMode::Explicit, + &plugin_set(&["rust"]), + ); + let advisory = s["advisory"] + .as_str() + .expect("low-confidence advisory present"); + assert!(advisory.contains("LOW CONFIDENCE")); + assert!( + advisory.contains("#[test]"), + "advisory names a Rust lever: {advisory}" + ); + assert!( + !advisory.contains("__all__"), + "no Python-only advice for a Rust corpus: {advisory}" + ); + } + #[test] fn dead_code_no_roots_envelope_recruits_real_levers() { // clarion-4ec50f3d92: the signal-unavailable envelope must point at the - // source-level levers, not imply a config knob. + // source-level levers, not imply a config knob. (Python corpus.) let page = Page { limit: 50, offset: 0, }; - let envelope = dead_code_no_roots_envelope(&page, false); + let envelope = dead_code_no_roots_envelope(&page, false, &plugin_set(&["python"])); let rendered = serde_json::to_string(&envelope).expect("serialise envelope"); assert!( rendered.contains("__all__"), @@ -1475,6 +1581,25 @@ mod tests { ); } + #[test] + fn dead_code_no_roots_envelope_is_language_aware_for_rust() { + // ADR-054: a Rust-only no-roots corpus gets Rust levers, not `__all__`. + let page = Page { + limit: 50, + offset: 0, + }; + let envelope = dead_code_no_roots_envelope(&page, false, &plugin_set(&["rust"])); + let rendered = serde_json::to_string(&envelope).expect("serialise envelope"); + assert!( + rendered.contains("#[test]"), + "envelope names a Rust lever: {rendered}" + ); + assert!( + !rendered.contains("__all__"), + "no Python-only advice for a Rust corpus: {rendered}" + ); + } + fn edge_scan_conn() -> rusqlite::Connection { let conn = rusqlite::Connection::open_in_memory().expect("open in-memory db"); conn.execute_batch( diff --git a/crates/loomweave-mcp/tests/catalogue_tools.rs b/crates/loomweave-mcp/tests/catalogue_tools.rs index 7b8b8076..c6cb4aee 100644 --- a/crates/loomweave-mcp/tests/catalogue_tools.rs +++ b/crates/loomweave-mcp/tests/catalogue_tools.rs @@ -201,6 +201,14 @@ fn insert_tag(conn: &Connection, entity_id: &str, tag: &str) { .expect("insert tag"); } +fn insert_tag_with_plugin(conn: &Connection, entity_id: &str, plugin_id: &str, tag: &str) { + conn.execute( + "INSERT INTO entity_tags (entity_id, plugin_id, tag) VALUES (?1, ?2, ?3)", + params![entity_id, plugin_id, tag], + ) + .expect("insert tag"); +} + fn insert_contains_edge(conn: &Connection, parent: &str, child: &str) { conn.execute( "INSERT INTO edges (kind, from_id, to_id, confidence) VALUES ('contains', ?1, ?2, 'resolved')", @@ -3636,15 +3644,17 @@ async fn find_dead_code_excludes_non_code_entities() { ); } -/// B2(2) failing-first: entities owned by a plugin that emitted NO reachability -/// root tags (the Rust plugin today — binary/lib roots unsupported, PDR-0012 -/// keeps the Rust line out of the launch envelope) must be EXCLUDED with an -/// in-band marker, never false-flagged dead. A wrong answer is worse than an -/// honest scope statement. +/// B2(2): entities owned by a plugin that emitted NO reachability root tags must +/// be EXCLUDED with an in-band marker, never false-flagged dead. A wrong answer +/// is worse than an honest scope statement. (Since ADR-054 the Rust plugin DOES +/// emit roots — see `find_dead_code_surveys_rust_once_it_emits_roots` — so the +/// untagged `rust` entity here stands in for any hypothetical rootless plugin; +/// the exclusion mechanism is plugin-name-agnostic, keyed on emitted tags.) #[tokio::test] async fn find_dead_code_excludes_plugins_without_root_coverage_with_marker() { let (project, db, conn) = open_project(); - // Python emits roots; rust emits none (true to the live plugins). + // One plugin emits roots (python); the other emits none — its entities are + // withheld rather than false-flagged dead. insert_entity( &conn, "python:function:main", @@ -4178,6 +4188,161 @@ async fn find_dead_code_public_surface_root_rescues_library_api_in_app_only() { assert!(env["result"]["summary"]["advisory"].is_null(), "{env}"); } +/// ADR-054 acceptance: once the Rust plugin emits reachability roots, a Rust-only +/// index is SURVEYED (not withheld by the no-roots exclusion); a genuinely-unused +/// private fn is flagged dead; and a `pub` lib fn reached only via a test stays +/// live through its `exported-api` root even in `app_only` (where tests are +/// excluded). The exclusion lift is automatic — `plugins_without_roots` drops to +/// zero the moment Rust owns a root-tagged entity. +#[tokio::test] +async fn find_dead_code_surveys_rust_once_it_emits_roots() { + let (project, db, conn) = open_project(); + // A pub lib fn → `exported-api` root. + insert_entity_with_plugin( + &conn, + "rust:function:lib.public_api", + "rust", + "function", + "src/lib.rs", + "{}", + ); + insert_tag_with_plugin( + &conn, + "rust:function:lib.public_api", + "rust", + "exported-api", + ); + // A private internal reached only from the public API — kept live transitively. + insert_entity_with_plugin( + &conn, + "rust:function:lib.internal_used", + "rust", + "function", + "src/lib.rs", + "{}", + ); + insert_calls_edge( + &conn, + "rust:function:lib.public_api", + "rust:function:lib.internal_used", + "resolved", + ); + // A pub lib fn whose ONLY caller is a test — its `exported-api` root must keep + // it live in app_only, where the test root is excluded. + insert_entity_with_plugin( + &conn, + "rust:function:lib.tested_only", + "rust", + "function", + "src/lib.rs", + "{}", + ); + insert_tag_with_plugin( + &conn, + "rust:function:lib.tested_only", + "rust", + "exported-api", + ); + insert_entity_with_plugin( + &conn, + "rust:function:tests.it_works", + "rust", + "function", + "src/lib.rs", + "{}", + ); + insert_tag_with_plugin(&conn, "rust:function:tests.it_works", "rust", "test"); + insert_calls_edge( + &conn, + "rust:function:tests.it_works", + "rust:function:lib.tested_only", + "resolved", + ); + // Genuinely dead: a private fn nothing calls. + insert_entity_with_plugin( + &conn, + "rust:function:lib.dead_helper", + "rust", + "function", + "src/lib.rs", + "{}", + ); + drop(conn); + let state = state_for(project.path(), &db); + + for app_only in [false, true] { + let env = call_tool(&state, "find_dead_code", json!({ "app_only": app_only })).await; + assert_eq!(env["ok"], true, "{env}"); + let dead: Vec = env["result"]["dead_code"] + .as_array() + .unwrap() + .iter() + .map(|c| c["entity"]["id"].as_str().unwrap().to_owned()) + .collect(); + assert_eq!( + dead, + vec!["rust:function:lib.dead_helper".to_owned()], + "only the genuine orphan is dead (app_only={app_only}); the exported-api root \ + keeps the test-only-reached pub fn live: {env}" + ); + // The exclusion has lifted: Rust is surveyed, not withheld. + assert_eq!( + env["result"]["summary"]["not_analysed"]["plugins_without_roots"], 0, + "rust emits roots → no plugin is withheld (app_only={app_only}): {env}" + ); + } +} + +/// ADR-054: a `module` is the containment spine rooted at the always-live crate +/// root, not removable code — so a module entity is never a dead-code candidate, +/// even with no inbound call/import edge. (Rust modules systematically lack +/// module-targeting import edges; without this exclusion every Rust module would +/// read as dead and dominate the candidate set.) +#[tokio::test] +async fn find_dead_code_excludes_module_containers() { + let (project, db, conn) = open_project(); + insert_entity( + &conn, + "python:function:main", + "function", + "app.py", + Some((1, 5)), + ); + insert_tag(&conn, "python:function:main", "entry-point"); + // A module with no inbound call/import edge — a container, never "dead code". + insert_entity( + &conn, + "python:module:orphan_mod", + "module", + "orphan.py", + Some((1, 10)), + ); + // A genuinely dead function — the only legitimate candidate. + insert_entity( + &conn, + "python:function:dead", + "function", + "app.py", + Some((6, 9)), + ); + drop(conn); + let state = state_for(project.path(), &db); + + let env = call_tool(&state, "find_dead_code", json!({})).await; + assert_eq!(env["ok"], true, "{env}"); + let dead: Vec = env["result"]["dead_code"] + .as_array() + .unwrap() + .iter() + .map(|c| c["entity"]["id"].as_str().unwrap().to_owned()) + .collect(); + assert_eq!( + dead, + vec!["python:function:dead".to_owned()], + "a module container is never a dead-code candidate: {env}" + ); +} + /// `app_only: true` on coupling excludes test-tagged callers from the ranking so /// a hub's coupling drops to reflect only first-party app fan-in/out. #[tokio::test] diff --git a/crates/loomweave-plugin-rust/plugin.toml b/crates/loomweave-plugin-rust/plugin.toml index 6e921b1a..b3632557 100644 --- a/crates/loomweave-plugin-rust/plugin.toml +++ b/crates/loomweave-plugin-rust/plugin.toml @@ -53,8 +53,13 @@ rule_id_prefix = "LMWV-RUST-" # no gate enforces the Rust plugin ontology_version (every check-*.py confirmed). # 0.5.0: additive edge kinds `derives` + `references` (Phase 2 completion, # plan 2026-06-10 — ADR-027 MINOR). +# 0.6.0: additive reachability-root categorisation tags `exported-api` / +# `entry-point` / `test` / `allow-dead-code` (ADR-054, clarion-05fdd0490e — +# ADR-027 MINOR). Tags are not manifest-gated (the host validates size, not +# membership), so this bump is documentation + cache-invalidation, not a wire +# contract; bumped anyway per ADR-027. # Lockstep with the `serve.rs` handshake constant. -ontology_version = "0.5.0" +ontology_version = "0.6.0" [ontology.roles] file_scope = ["module"] diff --git a/crates/loomweave-plugin-rust/src/extract.rs b/crates/loomweave-plugin-rust/src/extract.rs index 6a767ad0..543edae1 100644 --- a/crates/loomweave-plugin-rust/src/extract.rs +++ b/crates/loomweave-plugin-rust/src/extract.rs @@ -42,6 +42,7 @@ use crate::references::{ fields_reference_sites, signature_reference_sites, type_reference_sites, }; use crate::resolve::{Resolution, Resolver}; +use crate::root_tags::{TagCtx, has_macro_export, is_unrestricted_pub, root_tags}; use crate::signature::{function_signature, impl_signature, struct_signature}; use crate::spans::{SourceRange, source_range_of}; @@ -176,6 +177,7 @@ fn extract_file_on_pinned_stack( &module_id, file_path, resolution, + TagCtx::for_file(module_path), &mut entities, &mut edges, &mut acc, @@ -392,12 +394,16 @@ fn degraded_module_tuple( // near-identical dispatch arm over the item enum. Splitting it would obscure the // one-arm-per-syn-Item structure the reader relies on. #[allow(clippy::too_many_lines)] +// The walk threads file context (path/module/resolver) + the two output sinks + +// the ADR-054 tag context; bundling would only hide the data flow. +#[allow(clippy::too_many_arguments)] fn walk_items( items: &[Item], module_path: &str, parent_id: &str, file_path: &str, resolution: Option<(&str, &Resolver)>, + ctx: TagCtx, out: &mut Vec, edges: &mut Vec, acc: &mut Phase2Acc, @@ -502,7 +508,11 @@ fn walk_items( for item in items { match item { Item::Fn(ItemFn { - sig, attrs, block, .. + vis, + sig, + attrs, + block, + .. }) => { let name = sig.ident.to_string(); let mut q = free_item_qualname(module_path, &name); @@ -511,7 +521,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "function", &q, file_path, @@ -519,6 +529,10 @@ fn walk_items( Some(parent_id), Some(function_signature(sig)), )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), true, attrs, ctx), + ); let fn_id = build_id("function", &q)?; push_with_contains(parent_id, child, out, edges); // Phase 2: walk the body for call sites, ONLY with a resolver @@ -543,6 +557,7 @@ fn walk_items( } } Item::Struct(ItemStruct { + vis, ident, fields, attrs, @@ -555,7 +570,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "struct", &q, file_path, @@ -563,6 +578,10 @@ fn walk_items( Some(parent_id), Some(struct_signature(fields)), )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); // Phase 2: anchored `derives` edges, ONLY with a resolver (the // edges-aware entry point) — parity with `imports`/`implements`. @@ -592,6 +611,7 @@ fn walk_items( )?; } Item::Mod(ItemMod { + vis, ident, content: Some((_, inner)), attrs, @@ -615,8 +635,18 @@ fn walk_items( None, )?); let nested_id = build_id("module", &nested)?; + // ADR-054: the pub-chain and cfg(test) ancestry descend with the + // module nesting; `in_bin_target` is fixed per file. walk_items( - inner, &nested, &nested_id, file_path, resolution, out, edges, acc, + inner, + &nested, + &nested_id, + file_path, + resolution, + ctx.descend_into_mod(vis, attrs), + out, + edges, + acc, )?; } // Phase 1b leaf kinds: free items riding the same qualname + entity + @@ -624,6 +654,7 @@ fn walk_items( // signature builder yet — trait/impl SEI signatures are a later task). // Trait *bodies* are deliberately NOT walked here (matching 1a). Item::Enum(ItemEnum { + vis, ident, attrs, variants, @@ -636,7 +667,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "enum", &q, file_path, @@ -644,6 +675,10 @@ fn walk_items( Some(parent_id), None, )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); // Phase 2: `derives` edges for enums too (structs + enums are // the only derive targets in the walk — no `Item::Union` arm). @@ -660,7 +695,9 @@ fn walk_items( emit_reference_edges(&ref_sites, &enum_id, from_crate, resolver, acc, edges); } } - Item::Trait(ItemTrait { ident, attrs, .. }) => { + Item::Trait(ItemTrait { + vis, ident, attrs, .. + }) => { let name = ident.to_string(); let mut q = free_item_qualname(module_path, &name); if is_cfg_twin("trait", &name) @@ -668,7 +705,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "trait", &q, file_path, @@ -676,10 +713,18 @@ fn walk_items( Some(parent_id), None, )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); } Item::Type(ItemType { - ident, attrs, ty, .. + vis, + ident, + attrs, + ty, + .. }) => { let name = ident.to_string(); let mut q = free_item_qualname(module_path, &name); @@ -688,7 +733,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "type_alias", &q, file_path, @@ -696,6 +741,10 @@ fn walk_items( Some(parent_id), None, )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); // Phase 2: the alias RHS is a type position — `references` // sites from the type_alias entity (D3), resolver-gated. @@ -707,6 +756,7 @@ fn walk_items( } } Item::Const(ItemConst { + vis, ident, attrs, ty, @@ -734,7 +784,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "const", &q, file_path, @@ -742,6 +792,10 @@ fn walk_items( Some(parent_id), None, )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); // Phase 2: declared type (type position) + initializer // (expression position) both mint `references` sites from the @@ -755,6 +809,7 @@ fn walk_items( } } Item::Static(ItemStatic { + vis, ident, attrs, ty, @@ -768,7 +823,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "static", &q, file_path, @@ -776,6 +831,10 @@ fn walk_items( Some(parent_id), None, )?; + attach_tags( + &mut child, + root_tags(&name, is_unrestricted_pub(vis), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); // Phase 2: same channel as `const` — declared type + // initializer, from the static entity (D3), resolver-gated. @@ -801,7 +860,7 @@ fn walk_items( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "macro", &q, file_path, @@ -809,6 +868,12 @@ fn walk_items( Some(parent_id), None, )?; + // `macro_rules!` has no `Visibility`; `#[macro_export]` is its + // external-surface marker (ADR-054 §1). + attach_tags( + &mut child, + root_tags(&name, has_macro_export(attrs), false, attrs, ctx), + ); push_with_contains(parent_id, child, out, edges); } // `use` items resolve to anchored `imports` edges (Phase 1b, Task 7) @@ -1337,6 +1402,15 @@ fn push_with_contains(from_id: &str, child: Value, out: &mut Vec, edges: out.push(child); } +/// Attach ADR-054 reachability-root `tags` to an entity, omitting the key when +/// the item carries none (default-empty keeps the wire addition non-breaking, +/// parity with the Python plugin's `_attach_optional_entity_metadata`). +fn attach_tags(child: &mut Value, tags: Vec) { + if !tags.is_empty() { + child["tags"] = Value::from(tags); + } +} + /// A structural `contains` edge. Per ADR-026 decision 3 a structural edge /// carries NULL byte offsets (omitted here → wire default `None`); confidence /// is `resolved` (the relationship is syntactically certain). diff --git a/crates/loomweave-plugin-rust/src/lib.rs b/crates/loomweave-plugin-rust/src/lib.rs index c5f75062..40a6e0c0 100644 --- a/crates/loomweave-plugin-rust/src/lib.rs +++ b/crates/loomweave-plugin-rust/src/lib.rs @@ -10,6 +10,7 @@ pub mod parse_guard; pub mod qualname; pub mod references; pub mod resolve; +pub mod root_tags; pub mod scope; pub mod serve; pub mod signature; diff --git a/crates/loomweave-plugin-rust/src/root_tags.rs b/crates/loomweave-plugin-rust/src/root_tags.rs new file mode 100644 index 00000000..f2b43bfa --- /dev/null +++ b/crates/loomweave-plugin-rust/src/root_tags.rs @@ -0,0 +1,223 @@ +//! ADR-054 reachability-root tagging. Pure derivation of the +//! `exported-api` / `entry-point` / `test` / `allow-dead-code` root tags from a +//! `syn` item's visibility + attributes and the threaded module context. No +//! I/O, no resolver, no cross-file resolution (increment 1, clarion-05fdd0490e). +//! +//! Provenance lives in the tag value, mirroring ADR-053: `exported-api` is a +//! declared `pub` surface; `entry-point` / `test` are structural; the +//! lowest-confidence `allow-dead-code` is an explicit `#[allow(dead_code)]` +//! suppression. The engine (`loomweave-mcp`) unions them all into the dead-code +//! root set. + +use std::collections::BTreeSet; + +use syn::{Attribute, Meta, Visibility}; + +/// Module context threaded down the recursive item walk for tag derivation. +/// Four independent lexical facts about the current position — a context bag, +/// not a state machine; two-variant enums would obscure rather than clarify. +#[allow(clippy::struct_excessive_bools)] +#[derive(Clone, Copy)] +pub struct TagCtx { + /// Every enclosing module back to the crate root is `pub` — a precondition + /// for `exported-api` (the visibility chain must reach the external surface). + ancestors_all_pub: bool, + /// An enclosing module carries `#[cfg(test)]` → everything inside is `test`. + under_cfg_test: bool, + /// The file routes to a `@bin()` target (ADR-049 / scope.rs): + /// its `pub` items are internal, so `exported-api` is suppressed. + in_bin_target: bool, + /// The item is a direct child of the file root (where a bare `fn main` is + /// the program entry; a nested `fn main` is just a function). + at_file_top: bool, +} + +impl TagCtx { + /// The root context for a freshly-parsed file. A `@bin(` segment in the + /// file's root `module_path` marks a binary target (ADR-049 / scope.rs). + #[must_use] + pub fn for_file(module_path: &str) -> Self { + Self { + ancestors_all_pub: true, // the crate root is the public boundary + under_cfg_test: false, + in_bin_target: module_path.contains("@bin("), + at_file_top: true, + } + } + + /// The context for the body of an inline `mod` nested in this one. + #[must_use] + pub fn descend_into_mod(self, vis: &Visibility, attrs: &[Attribute]) -> Self { + Self { + ancestors_all_pub: self.ancestors_all_pub && is_unrestricted_pub(vis), + under_cfg_test: self.under_cfg_test || has_cfg_test(attrs), + in_bin_target: self.in_bin_target, + at_file_top: false, + } + } +} + +/// Reachability-root tags for a walked item, sorted + deduplicated (ADR-054). +/// +/// * `is_public` — the item exposes external visibility: unrestricted `pub` for +/// value/type items, `#[macro_export]` for `macro_rules!` (macros carry no +/// [`Visibility`]). +/// * `is_fn` — `entry-point` applies only to functions. +/// * `name` — the item identifier (for the bare `fn main` entry rule). +#[must_use] +pub fn root_tags( + name: &str, + is_public: bool, + is_fn: bool, + attrs: &[Attribute], + ctx: TagCtx, +) -> Vec { + let mut tags: BTreeSet<&'static str> = BTreeSet::new(); + if is_public && ctx.ancestors_all_pub && !ctx.in_bin_target { + tags.insert("exported-api"); + } + if ctx.under_cfg_test || has_test_attr(attrs) { + tags.insert("test"); + } + if is_fn && is_entry_point(name, attrs, ctx) { + tags.insert("entry-point"); + } + if has_allow_dead_code(attrs) { + tags.insert("allow-dead-code"); + } + tags.into_iter().map(str::to_owned).collect() +} + +/// [`Visibility::Public`] only — `pub(crate)` / `pub(super)` / `pub(in ..)` are +/// [`Visibility::Restricted`] (intra-crate, not external API). +#[must_use] +pub fn is_unrestricted_pub(vis: &Visibility) -> bool { + matches!(vis, Visibility::Public(_)) +} + +/// `#[macro_export]` — the only export marker for a `macro_rules!` item (macros +/// have no [`Visibility`]). +#[must_use] +pub fn has_macro_export(attrs: &[Attribute]) -> bool { + attrs.iter().any(|a| a.path().is_ident("macro_export")) +} + +/// `#[test]` / `#[bench]`, including last-segment variants like `#[tokio::test]`. +fn has_test_attr(attrs: &[Attribute]) -> bool { + attrs + .iter() + .any(|a| last_segment_is(a, "test") || last_segment_is(a, "bench")) +} + +/// A module-level `fn main`, an async-runtime entry attribute (`#[tokio::main]` +/// / `#[actix_web::main]` / `#[async_std::main]`, matched on the `main` last +/// segment), or an FFI export (`#[no_mangle]` / `#[export_name]`). +fn is_entry_point(name: &str, attrs: &[Attribute], ctx: TagCtx) -> bool { + (ctx.at_file_top && name == "main") + || attrs.iter().any(|a| { + last_segment_is(a, "main") + || a.path().is_ident("no_mangle") + || a.path().is_ident("export_name") + }) +} + +/// `#[allow(dead_code)]` or `#[expect(dead_code)]` — an explicit author keep +/// signal that suppresses rustc's own dead-code lint. +fn has_allow_dead_code(attrs: &[Attribute]) -> bool { + attrs.iter().any(|a| { + if let Meta::List(list) = &a.meta { + (list.path.is_ident("allow") || list.path.is_ident("expect")) + && list + .tokens + .to_string() + .split(|c: char| !c.is_alphanumeric() && c != '_') + .any(|t| t == "dead_code") + } else { + false + } + }) +} + +/// `#[cfg(test)]` exactly (a bare `test` predicate). Compound forms like +/// `cfg(all(test, ..))` are out of increment-1 scope (fail-toward-live: a missed +/// cfg-test item is merely surveyed, never mis-rooted). +fn has_cfg_test(attrs: &[Attribute]) -> bool { + attrs.iter().any(|a| { + if let Meta::List(list) = &a.meta { + list.path.is_ident("cfg") && list.tokens.to_string().trim() == "test" + } else { + false + } + }) +} + +/// The attribute's final path segment equals `name` (so `#[test]` and +/// `#[tokio::test]` both match `"test"`). +fn last_segment_is(attr: &Attribute, name: &str) -> bool { + attr.path().segments.last().is_some_and(|s| s.ident == name) +} + +#[cfg(test)] +mod tests { + use super::*; + + fn attrs(src: &str) -> Vec { + // Parse the attributes off a throwaway item. + let item: syn::ItemFn = syn::parse_str(&format!("{src}\nfn f() {{}}")).unwrap(); + item.attrs + } + + fn lib_ctx() -> TagCtx { + TagCtx::for_file("k.m") + } + + #[test] + fn unrestricted_pub_only() { + let pub_vis: Visibility = syn::parse_str("pub").unwrap(); + let crate_vis: Visibility = syn::parse_str("pub(crate)").unwrap(); + let inherited = Visibility::Inherited; + assert!(is_unrestricted_pub(&pub_vis)); + assert!(!is_unrestricted_pub(&crate_vis)); + assert!(!is_unrestricted_pub(&inherited)); + } + + #[test] + fn for_file_reads_bin_segment() { + let lib = TagCtx::for_file("k.m"); + let bin = TagCtx::for_file("k@bin(k)"); + assert!(!lib.in_bin_target); + assert!(bin.in_bin_target); + } + + #[test] + fn private_mod_breaks_pub_chain() { + let descended = lib_ctx().descend_into_mod(&Visibility::Inherited, &[]); + assert!(!descended.ancestors_all_pub); + // a pub item under a private mod is NOT exported-api + assert!(root_tags("x", true, false, &[], descended).is_empty()); + } + + #[test] + fn pub_mod_preserves_pub_chain() { + let pub_vis: Visibility = syn::parse_str("pub").unwrap(); + let descended = lib_ctx().descend_into_mod(&pub_vis, &[]); + assert!(descended.ancestors_all_pub); + assert_eq!( + root_tags("x", true, false, &[], descended), + ["exported-api"] + ); + } + + #[test] + fn allow_dead_code_detected_in_list() { + assert!(has_allow_dead_code(&attrs("#[allow(unused, dead_code)]"))); + assert!(has_allow_dead_code(&attrs("#[expect(dead_code)]"))); + assert!(!has_allow_dead_code(&attrs("#[allow(unused)]"))); + } + + #[test] + fn cfg_test_only_matches_bare_test() { + assert!(has_cfg_test(&attrs("#[cfg(test)]"))); + assert!(!has_cfg_test(&attrs("#[cfg(feature = \"x\")]"))); + } +} diff --git a/crates/loomweave-plugin-rust/src/serve.rs b/crates/loomweave-plugin-rust/src/serve.rs index 68c86f09..b53185f5 100644 --- a/crates/loomweave-plugin-rust/src/serve.rs +++ b/crates/loomweave-plugin-rust/src/serve.rs @@ -129,7 +129,7 @@ pub fn run() -> ! { version: env!("CARGO_PKG_VERSION").to_owned(), // Lockstep with plugin.toml `[ontology].ontology_version` // (ADR-027). Bump both together. - ontology_version: "0.5.0".to_owned(), + ontology_version: "0.6.0".to_owned(), capabilities: serde_json::json!({}), }; send_result(&mut writer, id, serde_json::to_value(result).unwrap()); diff --git a/crates/loomweave-plugin-rust/tests/analyze_e2e.rs b/crates/loomweave-plugin-rust/tests/analyze_e2e.rs index d5e25686..4c9b974d 100644 --- a/crates/loomweave-plugin-rust/tests/analyze_e2e.rs +++ b/crates/loomweave-plugin-rust/tests/analyze_e2e.rs @@ -224,7 +224,7 @@ fn analyze_e2e_stored_rust_entity_set_excludes_out_of_src_files() { "rust:impl:e2e_crate.Widget.impl[Display]".to_owned(), "rust:function:e2e_crate.Widget.impl[Display].fmt".to_owned(), // Task 11 (exit gate): the remaining leaf kinds, so the analyzed crate - // exercises EVERY entity kind in the 0.5.0 ontology. (`module`, `struct`, + // exercises EVERY entity kind in the ontology. (`module`, `struct`, // `function`, `trait`, `impl` above; `enum`/`type_alias`/`const`/`static`/ // `macro` here.) Enum VARIANTS do not emit as separate entities. "rust:enum:e2e_crate.Color".to_owned(), diff --git a/crates/loomweave-plugin-rust/tests/root_tags.rs b/crates/loomweave-plugin-rust/tests/root_tags.rs new file mode 100644 index 00000000..6dcda032 --- /dev/null +++ b/crates/loomweave-plugin-rust/tests/root_tags.rs @@ -0,0 +1,186 @@ +//! ADR-054 reachability-root tagging: `exported-api` / `entry-point` / `test` / +//! `allow-dead-code`. Drives the extractor's tag emission (clarion-05fdd0490e). +//! +//! Tests pass the file's root `module_path` directly so a `@bin(...)` root can +//! be simulated without a real Cargo layout (the bin discriminator is the +//! ADR-049 module-path segment, see `scope.rs`). + +use std::collections::BTreeMap; + +use loomweave_plugin_rust::extract::extract_file; +use serde_json::Value; + +/// Every emitted entity id → its `tags` array (empty when none). Tags are +/// emitted sorted, so equality against a sorted literal is order-stable. +fn tags_by_id(crate_name: &str, module_path: &str, src: &str) -> BTreeMap> { + extract_file(crate_name, module_path, "/p/src/lib.rs", src) + .unwrap() + .iter() + .map(|e| { + let id = e["id"].as_str().unwrap().to_owned(); + let tags = e + .get("tags") + .and_then(Value::as_array) + .map(|a| { + a.iter() + .map(|t| t.as_str().unwrap().to_owned()) + .collect::>() + }) + .unwrap_or_default(); + (id, tags) + }) + .collect() +} + +fn tags(map: &BTreeMap>, id: &str) -> Vec { + map.get(id) + .cloned() + .unwrap_or_else(|| panic!("entity {id} not emitted; ids: {:?}", map.keys())) +} + +// ---- exported-api (visibility chain → external surface) -------------------- + +#[test] +fn pub_lib_fn_is_exported_api() { + let m = tags_by_id("k", "k.m", "pub fn helper() {}\n"); + assert_eq!(tags(&m, "rust:function:k.m.helper"), ["exported-api"]); +} + +#[test] +fn private_fn_has_no_tags() { + let m = tags_by_id("k", "k.m", "fn helper() {}\n"); + assert!(tags(&m, "rust:function:k.m.helper").is_empty()); +} + +#[test] +fn pub_crate_fn_is_not_exported_api() { + // pub(crate)/pub(super)/pub(in ..) are intra-crate, statically analysable — + // not the external API surface (ADR-054 §1). + let m = tags_by_id("k", "k.m", "pub(crate) fn helper() {}\n"); + assert!( + tags(&m, "rust:function:k.m.helper").is_empty(), + "pub(crate) is not exported-api" + ); +} + +#[test] +fn pub_fn_in_private_mod_is_not_exported_api() { + // The visibility chain is broken by the private `mod internal`. + let m = tags_by_id("k", "k.m", "mod internal { pub fn helper() {} }\n"); + assert!( + tags(&m, "rust:function:k.m.internal.helper").is_empty(), + "pub item under a private mod is not external surface" + ); +} + +#[test] +fn pub_fn_in_pub_mod_is_exported_api() { + let m = tags_by_id("k", "k.m", "pub mod api { pub fn helper() {} }\n"); + assert_eq!(tags(&m, "rust:function:k.m.api.helper"), ["exported-api"]); +} + +#[test] +fn pub_leaf_kinds_are_exported_api() { + let src = "pub struct S;\npub enum E { A }\npub trait T {}\n\ + pub type A = i32;\npub const C: i32 = 1;\npub static ST: i32 = 1;\n"; + let m = tags_by_id("k", "k.m", src); + assert_eq!(tags(&m, "rust:struct:k.m.S"), ["exported-api"]); + assert_eq!(tags(&m, "rust:enum:k.m.E"), ["exported-api"]); + assert_eq!(tags(&m, "rust:trait:k.m.T"), ["exported-api"]); + assert_eq!(tags(&m, "rust:type_alias:k.m.A"), ["exported-api"]); + assert_eq!(tags(&m, "rust:const:k.m.C"), ["exported-api"]); + assert_eq!(tags(&m, "rust:static:k.m.ST"), ["exported-api"]); +} + +// ---- entry-point ---------------------------------------------------------- + +#[test] +fn fn_main_is_entry_point() { + let m = tags_by_id("k", "k", "fn main() {}\n"); + assert_eq!(tags(&m, "rust:function:k.main"), ["entry-point"]); +} + +#[test] +fn tokio_main_attr_is_entry_point() { + let m = tags_by_id("k", "k", "#[tokio::main]\nasync fn run() {}\n"); + assert_eq!(tags(&m, "rust:function:k.run"), ["entry-point"]); +} + +#[test] +fn no_mangle_ffi_export_is_entry_point_and_exported_api() { + // FFI export in a lib file: an entry from outside the Rust graph AND pub. + let m = tags_by_id("k", "k.m", "#[no_mangle]\npub extern \"C\" fn ffi() {}\n"); + assert_eq!( + tags(&m, "rust:function:k.m.ffi"), + ["entry-point", "exported-api"] + ); +} + +// ---- test ----------------------------------------------------------------- + +#[test] +fn test_attr_fn_is_test() { + let m = tags_by_id("k", "k.m", "#[test]\nfn it_works() {}\n"); + assert_eq!(tags(&m, "rust:function:k.m.it_works"), ["test"]); +} + +#[test] +fn bench_attr_fn_is_test() { + let m = tags_by_id("k", "k.m", "#[bench]\nfn bench_it(b: &mut Bencher) {}\n"); + assert_eq!(tags(&m, "rust:function:k.m.bench_it"), ["test"]); +} + +#[test] +fn items_under_cfg_test_mod_are_test() { + let src = "#[cfg(test)]\nmod tests {\n fn helper() {}\n struct Fixture;\n}\n"; + let m = tags_by_id("k", "k.m", src); + assert_eq!(tags(&m, "rust:function:k.m.tests.helper"), ["test"]); + assert_eq!(tags(&m, "rust:struct:k.m.tests.Fixture"), ["test"]); +} + +// ---- allow-dead-code (explicit author keep-signal) ------------------------ + +#[test] +fn allow_dead_code_is_root() { + let m = tags_by_id("k", "k.m", "#[allow(dead_code)]\nfn kept() {}\n"); + assert_eq!(tags(&m, "rust:function:k.m.kept"), ["allow-dead-code"]); +} + +#[test] +fn allow_dead_code_combines_with_exported_api() { + let m = tags_by_id("k", "k.m", "#[allow(dead_code)]\npub fn kept() {}\n"); + assert_eq!( + tags(&m, "rust:function:k.m.kept"), + ["allow-dead-code", "exported-api"] + ); +} + +// ---- bin targets (pub is internal; main is the entry) --------------------- + +#[test] +fn bin_target_pub_fn_is_not_exported_api() { + let m = tags_by_id("k", "k@bin(k)", "pub fn helper() {}\nfn main() {}\n"); + assert!( + tags(&m, "rust:function:k@bin(k).helper").is_empty(), + "a bin target's pub item is internal, not external API" + ); + assert_eq!(tags(&m, "rust:function:k@bin(k).main"), ["entry-point"]); +} + +// ---- macros (exported via #[macro_export], not `pub`) --------------------- + +#[test] +fn macro_export_is_exported_api() { + let m = tags_by_id( + "k", + "k.m", + "#[macro_export]\nmacro_rules! mac { () => {}; }\n", + ); + assert_eq!(tags(&m, "rust:macro:k.m.mac"), ["exported-api"]); +} + +#[test] +fn non_exported_macro_has_no_tags() { + let m = tags_by_id("k", "k.m", "macro_rules! mac { () => {}; }\n"); + assert!(tags(&m, "rust:macro:k.m.mac").is_empty()); +} diff --git a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md new file mode 100644 index 00000000..5818e4b7 --- /dev/null +++ b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md @@ -0,0 +1,229 @@ +# ADR-054: Rust Reachability-Root Tag Model — Visibility, Entry-Points, Tests, Handlers + +**Status**: Accepted +**Date**: 2026-06-25 +**Deciders**: john@foundryside.dev +**Context**: clarion-05fdd0490e. Sibling to ADR-053 (the Python `public-surface` +fallback); coordinates with ADR-049 (Rust qualname canonicalization) for the +`@bin()` / `@cfg(...)` namespace segments this model reads. Closes the +limitation recorded as PDR-0012 ("binary/lib roots unsupported"). + +## Context + +The Rust language plugin (`crates/loomweave-plugin-rust`) extracts entities and +edges but emits **zero categorisation / reachability-root tags**. The dead-code +engine (`loomweave-mcp` `catalogue/shortcuts.rs`) excludes a plugin's entire +entity set from the survey when that plugin emits no root tags — a deliberate +honesty posture (signal-*unavailable* beats false-flagging an entire crate +dead, `dead_code_candidate_set` → `plugins_with_root_tags`). The consequence is +that `entity_dead_list` / `find_dead_code` simply **does not work for Rust**, and +the faceted surfaces (`entity_entry_point_list`, `entity_http_route_list`, …) +return nothing for Rust entities. + +This is the Rust counterpart of ADR-053, but it is **net-new, not a port**. +Python needed a PEP 8 *inference* (`public-surface`) because `__all__` is +optional and usually absent. Rust's visibility is **explicit in the grammar** +(`pub`), so there is no inference gap to paper over — the plugin simply needs to +read the visibility, entry-point, and test signals already present in the AST +and emit the root vocabulary the engine already consumes. + +The engine side is ready. `DEAD_CODE_ROOT_TAGS` already contains +`entry-point` / `exported-api` / `test` / `http-route` / `cli-command` / +`framework-handler`; the per-plugin no-roots exclusion is keyed on +`entity_tags`, not on a plugin name, so it **auto-lifts** the moment Rust emits +any root tag — no new MCP root plumbing is required. The wire already carries +`tags: Vec` on every plugin entity (`loomweave-core` `plugin/host.rs`). + +## The grounding principle: error-cost asymmetry (fail-toward-live) + +Reachability roots exist to stop *live* code being reported *dead*. The two +error directions are not symmetric: + +- **Over-rooting** (tag something that is actually intra-crate-reachable) → the + item merely reads **live** → we under-report some dead code. Safe. +- **Under-rooting** (miss a genuine external-API root) → real API reads **dead** + → a false positive that erodes trust in the whole signal. This is the exact + failure ADR-053 fought. + +So every judgement call below resolves toward rooting. Precision (Cargo.toml +target parsing, full re-export resolution, method-level rooting) is deferred +where it would only *narrow* the root set, because narrowing is the unsafe +direction and the safe default already covers the case. + +## Decision + +The Rust plugin emits four reachability-root tag classes, derived from the +`syn` AST with no cross-file resolution (increment 1). All are computed +per-item during the existing recursive item walk (`extract.rs` `walk_items`), +which already carries the enclosing module path and attribute list. + +### 1. `exported-api` — external public surface (lib targets) + +An item is `exported-api` iff **all** hold: + +- its visibility is `pub` **without restriction** — `syn::Visibility::Public`. + `pub(crate)` / `pub(super)` / `pub(in path)` are **not** `exported-api`: their + reachability is intra-crate and statically analysable, so the normal + call/import reachability handles them (and missing them only over-reports, the + safe direction is already covered by leaving them out — they are reachable + from a rooted caller if used); +- **every enclosing module is `pub`** (the visibility chain reaches the crate + root). A `pub fn` inside a private `mod` is *not* part of the crate's external + surface; a `pub fn` inside a `pub mod` is. The walk threads a single + `ancestors_all_pub` boolean; the file-root module is the crate boundary and + counts as public; +- the file is **not a binary-target file**. Binary targets route to a + `@bin()` module-path root (ADR-049 / `scope.rs`), which can never + collide with a real module — so `module_path` containing `@bin(` is a reliable + "this is a bin target" discriminator. `pub` in a bin is internal; the real + entry is `fn main` (rooted separately). + +Applies to the leaf value/type item kinds (`function`, `struct`, `enum`, +`trait`, `type_alias`, `const`, `static`, `macro`). **Module entities are not +tagged, and are excluded from dead-code candidacy engine-side** (a new +`DEAD_CODE_CONTAINER_KINDS = ["module"]` in `loomweave-mcp`): a module is the +*containment spine* rooted at the always-live crate root, so it is never "dead" +in any actionable sense — you remove its contents, not the namespace. +Reachability proper runs over call+import edges only, and the Rust plugin emits +no module-targeting `imports` edges (its import edges target items), so without +this exclusion **every** Rust module would read as dead and dominate the +candidate set (the dogfood confirmed: a 3-of-7 over-report tripping the +LOW-confidence band, vs. the clean 1-of-5 once modules are excluded). The +exclusion is kind-based and language-agnostic — it also closes the same latent +over-report for any never-imported Python module. + +**Accepted imprecision (documented, fail-toward-live):** a *pure-binary* crate +(`src/main.rs` with no sibling `src/lib.rs`) routes its files to the **bare** +crate root, not `@bin(...)` (`scope.rs`: "main.rs IS its canonical crate root"). +So a pure-bin crate's `pub` items are indistinguishable from a lib's at the +module-path level and will receive `exported-api`. This over-roots (their pub is +really internal) — the safe direction. Precise lib-vs-bin classification from +`Cargo.toml` `[lib]`/`[[bin]]` targets is deferred; it would only *remove* roots. + +### 2. `entry-point` — program entry + +An item is `entry-point` iff any hold: + +- it is a module-level `fn main` (covers both the lib+bin `@bin` root and the + pure-bin bare root); +- it carries a runtime-entry attribute macro: `#[tokio::main]`, + `#[actix_web::main]`, `#[async_std::main]` (last path segment `main` under a + known async-runtime path); +- it carries `#[no_mangle]` or `#[export_name = "…"]` — an FFI / C-ABI export + reachable only from outside the Rust call graph. + +### 3. `test` — test / bench entry + +An item is `test` iff any hold: + +- it carries `#[test]` or `#[bench]`; +- it is under a `#[cfg(test)]` ancestor module (the walk threads an + `under_cfg_test` boolean, set when descending into a module whose attrs carry + a literal `cfg(test)` predicate). + +Test items are roots (they are entry points the harness invokes) and are +excluded from `app_only` reachability by the engine, exactly as Python's `test` +tag. + +### 4. `allow-dead-code` — explicit author "keep" assertion + +An item carrying `#[allow(dead_code)]` is tagged `allow-dead-code`, a **new +additive entry in `DEAD_CODE_ROOT_TAGS`**. `#[allow(dead_code)]` is the author +explicitly suppressing rustc's own dead-code lint — an "I am keeping this on +purpose" assertion. Rooting it is fail-toward-live and consistent with rustc's +own behaviour (it will not warn). It is the lowest-confidence class (an explicit +suppression, not a structural surface); the provenance lives in the distinct tag +value, per the ADR-053 precedent. + +### Provenance by tag value, not by plumbing + +As in ADR-053, the declared-vs-inferred distinction lives in the **tag value** +(`exported-api` = declared `pub` surface; `entry-point`/`test` = structural; +`allow-dead-code` = explicit suppression), not in new wire fields. For +reachability the union is what matters, so all four simply join the root set. + +### Advisory copy is language-aware + +`dead_code_no_roots_envelope` and the LOW-confidence advisory in `shortcuts.rs` +currently name `__all__` and Python decorators as the levers. That is correct +only while a Rust-only index hits the *no-roots* exclusion. **Once Rust emits +roots**, the advisory can fire for Rust corpora and MUST name Rust levers (`pub` +an item, add a `[[bin]]` / `fn main`, add `#[test]`) instead of `__all__`. The +lever phrasing is sourced per-plugin / per-language so a Rust corpus is never +handed Python-only advice. This ships **with** the roots, not after. + +### Ontology bump + +Additive tag-vocabulary change: Rust plugin `ontology_version` **0.5.0 → +0.6.0**, in lockstep across the four locations that carry it — +`crates/loomweave-plugin-rust/plugin.toml`, its byte-identical wheel-data copy +(`packaging/rust-plugin-dist/wheel-data/.../plugin.toml`, guarded by +`scripts/check-rust-plugin-manifest-lockstep.py`), `serve.rs`'s `initialize` +response, and the `docs/operator/language-support.md` table. + +## Deferred to increment 2 (noted here, like ADR-053's Alternative 3) + +These are real follow-ups, deferred because each either needs cross-file +resolution or would only *narrow* the root set: + +- **Framework-attribute handlers** — `http-route` (axum/actix/rocket route + attribute macros, e.g. `#[get("/…")]`), `cli-command` (clap/structopt derives + and `#[command]`), `framework-handler` (proc-macro registration attrs). Rust's + web frameworks are heterogeneous (many use the builder pattern, not + attributes), so this is best-effort attribute detection with a documented + coverage limit — breadth research, deferred to a focused increment. +- **`pub use` re-export resolution** — a privately-defined item re-exported + `pub` is part of the API surface. Resolving the re-export target needs the + cross-file symbol table (the resolver). The common facade case (`pub use + internal::Thing` where `Thing` is itself `pub`) is **already covered** by + `Thing`'s own `exported-api` tag at its definition; only a `pub(crate)` item + re-exported `pub` is under-rooted, a narrow residual. Deferred. +- **`pub`-method rooting** — a `pub` method of a `pub` type (an `impl` item) is + external API, but reachability traverses call+import edges only, so rooting + the type does not root its methods. This is the Rust analog of the Python + follow-up clarion-961a1acb2c; deferred with it. + +## Alternatives considered + +### Alternative 1: reuse `exported-api` for the `#[allow(dead_code)]` keep-signal + +**Pros**: one fewer tag, no `DEAD_CODE_ROOT_TAGS` entry. **Cons**: conflates an +external-API claim with an explicit local suppression — an agent inspecting the +tag could no longer tell a public export from a privately-kept dead item. The +distinct tag costs one const entry and keeps provenance legible (the ADR-053 +reasoning). Rejected. + +### Alternative 2: classify lib vs bin from `Cargo.toml` targets now + +**Pros**: precise `exported-api` suppression for pure-bin crates. **Cons**: +larger (parse `[lib]`/`[[bin]]`/`[[example]]`/`[[bench]]`, thread target kind +into extraction) and it only *removes* roots — the unsafe direction. The +`@bin(...)` module-path discriminator already handles the lib+bin and multi-bin +cases for free; the residual (pure-bin over-rooting) is fail-toward-live. Left as +a precision follow-up, not built. + +### Alternative 3: require the full `pub` visibility chain via type resolution + +The chosen model approximates the pub-chain with a threaded `ancestors_all_pub` +boolean over the lexical module nesting. A fully precise model would resolve +re-exports and `pub use` paths to compute true external reachability. That is the +deferred re-export work; the lexical approximation is safe (it can only +over-root via the pure-bin case, never under-root a lexically-public item). + +## Consequences + +- `entity_dead_list` / `find_dead_code` becomes **available for Rust**: a + Rust-only index is surveyed instead of wholesale-excluded, and the + `plugins_without_roots` exclusion stops withholding Rust entities + automatically (no engine change beyond the advisory copy). +- The faceted surfaces (`entity_entry_point_list`, `entity_test_list`, etc.) + light up for Rust with no read-side change — they are plugin-agnostic queries + over `entity_tags`. +- Mixed Python+Rust repos get a unified root set across both plugins; a Rust + `pub` API reached only from a sibling crate or a test stays live via its + `exported-api` / `test` root. +- A Rust corpus that genuinely has little rooted surface still gets an **honest, + language-correct** advisory (Rust levers, not `__all__`). +- Increment 2's additions (handler tags, re-export resolution, method rooting) + are all additive: new tags join `DEAD_CODE_ROOT_TAGS`; no existing tag + semantics change. The Rust ontology bumps again then. diff --git a/docs/loomweave/adr/README.md b/docs/loomweave/adr/README.md index 2593f1d5..a0b88708 100644 --- a/docs/loomweave/adr/README.md +++ b/docs/loomweave/adr/README.md @@ -54,6 +54,7 @@ This folder is the canonical home for authored Loomweave architecture decision r | [ADR-051](./ADR-051-relation-edge-direction-and-anchor.md) | Relation edge direction + anchor semantics — `inherits_from` runs subclass → base, `decorates` runs decorator → decorated (kind name read as a sentence; supersedes the `decorated_by` sketch); both anchor the reduced dotted path token (Rust trait-path parity, factory args excluded); precise-entity resolution only (no module-id fallback), class-kind filter on `inherits_from` targets, self-edge drop; ambiguous `decorates` candidates are FROM-side. First emitter: Python plugin ontology 0.8.0 (clarion-43416be550). Relates to ADR-026/027/028 | Accepted | | [ADR-052](./ADR-052-python-duplicate-qualname-first-wins.md) | Python duplicate-qualname semantics — first-wins frozen at launch (first definition owns the bare qualname; later same-id `def`/`class` drops entity + `contains` edge + whole subtree, stderr line + `duplicate_entities_dropped_total`); attribution invariant: emit boundary and pyright function index apply the identical rule, so a dropped body's call/reference sites are absent, never mis-attributed to the survivor; additive evolution clause — future setter/deleter surfacing must keep the survivor's bare qualname and use declaration-derived discriminants (source-order ordinals rejected, ADR-049 precedent); uniqueness gates = unit pins (property pair, singledispatch `def _`, conditional `def`) + dogfood test over `plugins/python` (Rust identity/dogfood-uniqueness parity) (clarion-12dd19c6a1). Relates to ADR-003/049 | Accepted | | [ADR-053](./ADR-053-public-surface-reachability-root.md) | `public-surface` reachability root — PEP 8 fail-toward-live fallback so corpora (apps or libraries) are not over-reported dead when their public surface is reached through framework dispatch / DI / CLI / tests the static graph can't follow: when a module declares **no** `__all__`, the Python plugin tags its non-underscore module-level defs/classes `public-surface` (an *inferred*, lower-confidence root, distinct from declared `exported-api`); a module with `__all__` (incl. empty) emits none, so well-declared modules are byte-identical. Joins `DEAD_CODE_ROOT_TAGS`; ontology bump 0.8.0 → 0.9.0. Also rewords the LOW-confidence advisory + no-roots envelope to name real levers (`__all__` / decorators) instead of a phantom "configure roots" knob. Dogfood (elspeth, a web app + CLI): app_only dead 64% → 48% — material but does not exit the >25% band alone (public methods of public classes = tracked follow-up) (clarion-4ec50f3d92). Relates to ADR-003/011 | Accepted | +| [ADR-054](./ADR-054-rust-reachability-root-model.md) | Rust reachability-root tag model — the Rust analog of ADR-053, but net-new (Rust visibility is explicit, no inference gap). The Rust plugin emits `exported-api` (unrestricted `pub` whose whole module chain is `pub`, lib targets only — `pub(crate)`/restricted excluded, bin targets suppressed via the `@bin(…)` module-path discriminator, `#[macro_export]` for macros), `entry-point` (`fn main` / `#[tokio::main]` / FFI `#[no_mangle]`/`#[export_name]`), `test` (`#[test]`/`#[bench]`/`#[cfg(test)]`), and `allow-dead-code` (an explicit `#[allow(dead_code)]` keep-signal; new `DEAD_CODE_ROOT_TAGS` entry). Grounded in error-cost asymmetry (over-root = safe, under-root = false-positive). Modules are excluded from dead-code candidacy engine-side (`DEAD_CODE_CONTAINER_KINDS` — the containment spine is never "dead"). The no-roots envelope + LOW-confidence advisory are now language-aware (Rust corpora get Rust levers, never `__all__`). Ontology bump 0.5.0 → 0.6.0. Deferred to increment 2: framework-attribute handlers, `pub use` re-export resolution, `pub`-method rooting. Dogfood: a lib+bin crate reports the genuine orphan only (moderate confidence) (clarion-05fdd0490e). Relates to ADR-049/053 | Accepted | ## Backlog still tracked in the detailed design diff --git a/docs/operator/language-support.md b/docs/operator/language-support.md index 13a77ae7..cf10fde6 100644 --- a/docs/operator/language-support.md +++ b/docs/operator/language-support.md @@ -17,14 +17,14 @@ produced an entity. The differences below are entirely in what the plugins |---|---|---| | Status | first-party, v1.0 | first-party, 1.x | | Source backend | `pyright` (type-resolved) | `syn` (parse-only, in-project symbol table) | -| Ontology version | 0.9.0 | 0.5.0 | +| Ontology version | 0.9.0 | 0.6.0 | | Wardline-aware | **yes** (`wardline:*` trust tags) | no | | **Entity kinds** | `function`, `class`, `module` | `module`, `struct`, `enum`, `trait`, `function`, `impl`, `type_alias`, `const`, `static`, `macro` | | **Structural edges** | `contains`, `calls`, `references`, `imports` | `contains`, `calls`, `references`, `imports` | | **Relation edges** | `inherits_from`, `decorates` | `implements`, `derives` | | Call/ref resolution tiers | `resolved` / `ambiguous` / `inferred` (pyright) | `resolved` (in-project only; external targets dropped) | -| **Categorisation / reachability-root tags** | **yes** — see below | **none today** | -| Dead-code analysis (`entity_dead_list`) | **works** | **unavailable** (no roots — see below) | +| **Categorisation / reachability-root tags** | **yes** — see below | **yes** — see below (ADR-054) | +| Dead-code analysis (`entity_dead_list`) | **works** | **works** (lib/bin roots, ADR-054) | | Summaries (`entity_summary_get`) | on-demand, any entity | on-demand, any entity | ## Categorisation & reachability-root tags @@ -41,26 +41,43 @@ defs/classes tagged `public-surface` — a lower-confidence reachability root th a declared `exported-api` (ADR-053 / clarion-4ec50f3d92), so a Python codebase is not over-reported as dead just because it does not exhaustively declare `__all__`. -**Rust emits none of these today.** The plugin extracts entities and edges but no -categorisation tags. Consequently `entity_dead_list` on a **pure-Rust** index is -**signal-unavailable**: the dead-code engine excludes a plugin's entities when it -emits no reachability roots (rather than false-flagging the entire crate dead). -The structural tools (`entity_find`, `entity_callers_list`, -`entity_neighborhood_get`, the edge surfaces) are unaffected. Adding the Rust -root model (visibility → `exported-api`, `fn main`/bin → `entry-point`, -`#[test]` → `test`, route/CLI attribute macros → handlers) is tracked in -**clarion-05fdd0490e**. See [rust-known-limitations.md](./rust-known-limitations.md) -for the full list of what Rust analysis does and does not resolve. +**Rust emits** (ADR-054, clarion-05fdd0490e): `exported-api`, `entry-point`, +`test`, and `allow-dead-code`, derived from Rust's explicit semantics rather than +inferred — so `entity_dead_list` now **works** on a pure-Rust index. + +- `exported-api` — an unrestricted `pub` value/type item whose whole enclosing + module chain is `pub` (the visibility chain reaches the crate's external + surface), in a **library** target. `pub(crate)`/`pub(super)`/`pub(in …)` are + intra-crate, not external API, and are excluded; a **binary** target's `pub` + items are internal (their entry is `fn main`), detected via the `@bin(…)` + module-path root. A `macro_rules!` is `exported-api` when it carries + `#[macro_export]`. +- `entry-point` — `fn main`; a runtime-entry attribute (`#[tokio::main]` / + `#[actix_web::main]` / `#[async_std::main]`); an FFI export (`#[no_mangle]` / + `#[export_name]`). +- `test` — `#[test]` / `#[bench]`, or any item under a `#[cfg(test)]` module. +- `allow-dead-code` — an item carrying `#[allow(dead_code)]` / + `#[expect(dead_code)]` (an explicit author keep-signal; the lowest-confidence + root class). + +Not yet emitted by Rust (tracked, increment 2): framework-attribute handlers +(`http-route` / `cli-command` / `framework-handler` from axum/actix/rocket/clap +attributes), `pub use` re-export resolution, and `pub`-method rooting of `pub` +types. A `pub(crate)` item re-exported `pub` is therefore under-rooted today (a +narrow, fail-toward-live residual). The structural tools (`entity_find`, +`entity_callers_list`, `entity_neighborhood_get`, the edge surfaces) are +unaffected. See [rust-known-limitations.md](./rust-known-limitations.md) for the +full list of what Rust analysis does and does not resolve. ## Mixed-language repositories A repo with both Python and Rust is analysed by both plugins in one pass; each file is routed to the plugin that claims its extension. Dead-code reachability -runs over the union, so in a mixed repo Python's roots can make Python entities -reachable while Rust entities remain in the "no roots for this plugin" exclusion -until the Rust root model lands. The low-confidence dead-code advisory's lever -copy is Python-centric today (it names `__all__`); making it language-aware is -folded into clarion-05fdd0490e. +runs over the union of both plugins' roots. The low-confidence dead-code advisory +and the no-roots envelope are **language-aware** (ADR-054): the lever copy is +sourced from the plugins actually present, so a Rust corpus is handed Rust levers +(`pub` an item, add a `[[bin]]` / `fn main`, `#[test]`) and a Python corpus is +handed Python levers (`__all__`, decorators) — never the other language's advice. ## Other languages diff --git a/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml b/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml index 6e921b1a..b3632557 100644 --- a/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml +++ b/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml @@ -53,8 +53,13 @@ rule_id_prefix = "LMWV-RUST-" # no gate enforces the Rust plugin ontology_version (every check-*.py confirmed). # 0.5.0: additive edge kinds `derives` + `references` (Phase 2 completion, # plan 2026-06-10 — ADR-027 MINOR). +# 0.6.0: additive reachability-root categorisation tags `exported-api` / +# `entry-point` / `test` / `allow-dead-code` (ADR-054, clarion-05fdd0490e — +# ADR-027 MINOR). Tags are not manifest-gated (the host validates size, not +# membership), so this bump is documentation + cache-invalidation, not a wire +# contract; bumped anyway per ADR-027. # Lockstep with the `serve.rs` handshake constant. -ontology_version = "0.5.0" +ontology_version = "0.6.0" [ontology.roles] file_scope = ["module"] From c64fa6e87d6814e60c2609b7f31b20dfc584ca2f Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Thu, 25 Jun 2026 11:16:56 +1000 Subject: [PATCH 4/7] feat(plugin-rust): framework handlers + pub-method rooting (ADR-054, increment 2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Increment 2 of ADR-054, informed by a 6-agent framework-attribute taxonomy sweep (precision-first, collision-aware): - http-route (+framework-handler) — actix/ntex/rocket route attrs (get/post/ put/patch/delete/head/options/trace/connect/route, last-segment match) - cli-command (+framework-handler) — clap/structopt derives (Parser/Subcommand/ Args/ValueEnum/StructOpt, derive-list match) - entry-point — pyo3 FFI host exports (pyfunction/pyfn/pyclass/pymodule) and proc-macro entry points (proc_macro/_derive/_attribute) - test — std-replacement runners (rstest/test_case/quickcheck) - pub-method rooting — a pub method of an inherent impl whose module chain is pub (lib) is exported-api; trait-impl methods stay unrooted (inherited vis) Correctness fact the survey caught + verified against shortcuts.rs: framework-handler is in DEAD_CODE_EXCLUDED_TAGS, NOT _ROOT_TAGS — it excludes the tagged entity but does not root its callees, so http-route/cli-command are the real roots and framework-handler rides as the self-exclusion companion (mirroring Python). FFI exports map to entry-point (a real root) so their callees are traversed. The catastrophic typetag::serde collision is avoided (no bare `serde` last-segment match; CLI is derive-gated). Also excludes `impl` from dead-code candidacy (DEAD_CODE_CONTAINER_KINDS, with `module`) — an impl block is a container, never actionable "dead code". Ontology bump 0.6.0 -> 0.7.0 (4 locations + wheel copy). TDD throughout; dogfooded end-to-end (every framework attribute tags correctly via the real plugin->host->store pipeline). Floor green (nextest 1894). Still deferred: pub use re-export resolution, trait-impl-method rooting, and lower-prevalence frameworks (wasm/napi/uniffi/cxx, poem/salvo handlers, rocket catch, tarpc/jsonrpsee, argh/gumdrop). Builder-pattern frameworks (axum/warp/ tonic) are a permanent parse-only limitation. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../loomweave-mcp/src/catalogue/shortcuts.rs | 20 ++- crates/loomweave-mcp/tests/catalogue_tools.rs | 23 ++- crates/loomweave-plugin-rust/plugin.toml | 14 +- crates/loomweave-plugin-rust/src/extract.rs | 20 ++- crates/loomweave-plugin-rust/src/root_tags.rs | 109 ++++++++++-- crates/loomweave-plugin-rust/src/serve.rs | 2 +- .../loomweave-plugin-rust/tests/root_tags.rs | 159 ++++++++++++++++++ .../ADR-054-rust-reachability-root-model.md | 72 ++++++-- docs/operator/language-support.md | 35 ++-- .../share/loomweave/plugins/rust/plugin.toml | 14 +- 10 files changed, 403 insertions(+), 65 deletions(-) diff --git a/crates/loomweave-mcp/src/catalogue/shortcuts.rs b/crates/loomweave-mcp/src/catalogue/shortcuts.rs index f8bbf0bd..c7a939f4 100644 --- a/crates/loomweave-mcp/src/catalogue/shortcuts.rs +++ b/crates/loomweave-mcp/src/catalogue/shortcuts.rs @@ -100,15 +100,17 @@ const DEAD_CODE_EXCLUDED_TAGS: &[&str] = &["framework-handler", "plugin-hook"]; const DEAD_CODE_NON_CODE_KINDS: &[&str] = &["file", "project", "subsystem", "guidance"]; /// Code-adjacent CONTAINER kinds that are never dead-code candidates (ADR-054). -/// A `module` is the containment spine rooted at the always-live crate root: -/// reachability-by-containment reaches every module by construction, so a module -/// is never "dead" in any actionable sense — you remove its contents, not the -/// namespace. Reachability proper runs over call+import edges only, and the Rust -/// plugin emits no module-targeting `imports` edges (its import edges target -/// items), so without this exclusion every Rust module would read as dead and -/// dominate the candidate set. Kept distinct from [`DEAD_CODE_NON_CODE_KINDS`] -/// (modules are code, not non-code anchors) and disclosed separately. -const DEAD_CODE_CONTAINER_KINDS: &[&str] = &["module"]; +/// A `module` (and a Rust `impl` block) is the containment spine rooted at the +/// always-live crate root: reachability-by-containment reaches every container by +/// construction, so it is never "dead" in any actionable sense — you remove its +/// contents (which ARE surveyed individually), not the namespace/block. +/// Reachability proper runs over call+import edges only, and the Rust plugin +/// emits no module/impl-targeting `imports` edges (its import edges target +/// items), so without this exclusion every Rust module and `impl` would read as +/// dead and dominate the candidate set. Kept distinct from +/// [`DEAD_CODE_NON_CODE_KINDS`] (these are code, not non-code anchors) and +/// disclosed separately. +const DEAD_CODE_CONTAINER_KINDS: &[&str] = &["module", "impl"]; /// Runtime import predicate used by graph shortcuts. Missing or malformed /// properties fail toward inclusion; explicit `type_only=true` or diff --git a/crates/loomweave-mcp/tests/catalogue_tools.rs b/crates/loomweave-mcp/tests/catalogue_tools.rs index c6cb4aee..a9aea1c2 100644 --- a/crates/loomweave-mcp/tests/catalogue_tools.rs +++ b/crates/loomweave-mcp/tests/catalogue_tools.rs @@ -4293,11 +4293,12 @@ async fn find_dead_code_surveys_rust_once_it_emits_roots() { } } -/// ADR-054: a `module` is the containment spine rooted at the always-live crate -/// root, not removable code — so a module entity is never a dead-code candidate, -/// even with no inbound call/import edge. (Rust modules systematically lack -/// module-targeting import edges; without this exclusion every Rust module would -/// read as dead and dominate the candidate set.) +/// ADR-054: `module` and `impl` are containment-spine containers rooted at the +/// always-live crate root, not removable code — so they are never dead-code +/// candidates, even with no inbound call/import edge. (Rust modules/impls +/// systematically lack module-targeting import edges; without this exclusion +/// every Rust module and impl block would read as dead and dominate the +/// candidate set.) #[tokio::test] async fn find_dead_code_excludes_module_containers() { let (project, db, conn) = open_project(); @@ -4309,7 +4310,7 @@ async fn find_dead_code_excludes_module_containers() { Some((1, 5)), ); insert_tag(&conn, "python:function:main", "entry-point"); - // A module with no inbound call/import edge — a container, never "dead code". + // Container entities with no inbound call/import edge — never "dead code". insert_entity( &conn, "python:module:orphan_mod", @@ -4317,6 +4318,14 @@ async fn find_dead_code_excludes_module_containers() { "orphan.py", Some((1, 10)), ); + insert_entity_with_plugin( + &conn, + "rust:impl:orphan.Widget.impl#<>", + "rust", + "impl", + "src/orphan.rs", + "{}", + ); // A genuinely dead function — the only legitimate candidate. insert_entity( &conn, @@ -4339,7 +4348,7 @@ async fn find_dead_code_excludes_module_containers() { assert_eq!( dead, vec!["python:function:dead".to_owned()], - "a module container is never a dead-code candidate: {env}" + "module / impl containers are never dead-code candidates: {env}" ); } diff --git a/crates/loomweave-plugin-rust/plugin.toml b/crates/loomweave-plugin-rust/plugin.toml index b3632557..2444518f 100644 --- a/crates/loomweave-plugin-rust/plugin.toml +++ b/crates/loomweave-plugin-rust/plugin.toml @@ -54,12 +54,16 @@ rule_id_prefix = "LMWV-RUST-" # 0.5.0: additive edge kinds `derives` + `references` (Phase 2 completion, # plan 2026-06-10 — ADR-027 MINOR). # 0.6.0: additive reachability-root categorisation tags `exported-api` / -# `entry-point` / `test` / `allow-dead-code` (ADR-054, clarion-05fdd0490e — -# ADR-027 MINOR). Tags are not manifest-gated (the host validates size, not -# membership), so this bump is documentation + cache-invalidation, not a wire -# contract; bumped anyway per ADR-027. +# `entry-point` / `test` / `allow-dead-code` (ADR-054 increment 1, +# clarion-05fdd0490e — ADR-027 MINOR). Tags are not manifest-gated (the host +# validates size, not membership), so this bump is documentation + +# cache-invalidation, not a wire contract; bumped anyway per ADR-027. +# 0.7.0: additive framework-attribute tags `http-route` / `cli-command` / +# `framework-handler` (actix/rocket routes, clap derives, pyo3/proc-macro +# entry-points, rstest runners) + `pub`-method rooting (ADR-054 increment 2 — +# ADR-027 MINOR). # Lockstep with the `serve.rs` handshake constant. -ontology_version = "0.6.0" +ontology_version = "0.7.0" [ontology.roles] file_scope = ["module"] diff --git a/crates/loomweave-plugin-rust/src/extract.rs b/crates/loomweave-plugin-rust/src/extract.rs index 543edae1..0e815fda 100644 --- a/crates/loomweave-plugin-rust/src/extract.rs +++ b/crates/loomweave-plugin-rust/src/extract.rs @@ -605,6 +605,7 @@ fn walk_items( &method_is_cfg_twin, &mut seen_impl_ids, resolution, + ctx, out, edges, acc, @@ -1198,6 +1199,7 @@ fn emit_impl( method_is_cfg_twin: &dyn Fn(&str, &str) -> bool, seen_impl_ids: &mut std::collections::BTreeSet, resolution: Option<(&str, &Resolver)>, + ctx: TagCtx, out: &mut Vec, edges: &mut Vec, acc: &mut Phase2Acc, @@ -1269,7 +1271,7 @@ fn emit_impl( { q.push_str(&disc); } - let child = entity( + let mut child = entity( "function", &q, file_path, @@ -1277,6 +1279,22 @@ fn emit_impl( Some(&impl_id), Some(function_signature(&m.sig)), )?; + // ADR-054 increment 2: a `pub` method of a `pub` type is external API + // (`exported-api`). Trait-impl methods carry inherited visibility (no + // `pub`), so the pub rule leaves them unrooted — a deferred follow-up. + // The pub-chain + cfg(test) gating is inherited from the enclosing + // module via `ctx`; `at_file_top` is cleared so a method named `main` + // is not an entry point. + attach_tags( + &mut child, + root_tags( + &m.sig.ident.to_string(), + is_unrestricted_pub(&m.vis), + true, + &m.attrs, + ctx.descend_into_impl(), + ), + ); let method_id = build_id("function", &q)?; push_with_contains(&impl_id, child, out, edges); // impl -> method // Phase 2: walk the method body for call sites, ONLY with a resolver. diff --git a/crates/loomweave-plugin-rust/src/root_tags.rs b/crates/loomweave-plugin-rust/src/root_tags.rs index f2b43bfa..9c6b56cb 100644 --- a/crates/loomweave-plugin-rust/src/root_tags.rs +++ b/crates/loomweave-plugin-rust/src/root_tags.rs @@ -13,6 +13,26 @@ use std::collections::BTreeSet; use syn::{Attribute, Meta, Visibility}; +/// actix-web / ntex / rocket route attribute macros (last-segment match). All +/// cross-crate collisions are benign — every match means an HTTP route — and +/// over-rooting is fail-toward-live, so a generic last-segment match is safe. +const HTTP_ROUTE_ATTRS: &[&str] = &[ + "get", "post", "put", "patch", "delete", "head", "options", "trace", "connect", "route", +]; +/// clap (v3/v4) + structopt CLI command/arg derive macros (derive-list match). +/// Distinctive, derive-position-unambiguous names (collision-safe per the +/// framework-taxonomy survey, ADR-054 increment 2). +const CLI_COMMAND_DERIVES: &[&str] = &["Parser", "Subcommand", "Args", "ValueEnum", "StructOpt"]; +/// pyo3 FFI host-export attributes (last-segment) — callable from a Python host, +/// so a genuine entry point from outside the Rust call graph. pyo3-unique names, +/// zero collision. +const PYO3_ENTRY_ATTRS: &[&str] = &["pyfunction", "pyfn", "pyclass", "pymodule"]; +/// proc-macro entry points (bare single ident, never path-qualified) — +/// compiler-invoked, so reachability roots. +const PROC_MACRO_ATTRS: &[&str] = &["proc_macro", "proc_macro_derive", "proc_macro_attribute"]; +/// std-replacement test runners beyond `#[test]`/`#[bench]` (last-segment). +const TEST_RUNNER_ATTRS: &[&str] = &["rstest", "test_case", "quickcheck"]; + /// Module context threaded down the recursive item walk for tag derivation. /// Four independent lexical facts about the current position — a context bag, /// not a state machine; two-variant enums would obscure rather than clarify. @@ -55,6 +75,18 @@ impl TagCtx { at_file_top: false, } } + + /// The context for an `impl` block's methods. The pub-chain and cfg(test) + /// ancestry are inherited from the enclosing module unchanged (an `impl` + /// adds no visibility of its own); only `at_file_top` clears, so a method + /// named `main` is never mistaken for the program entry (ADR-054 increment 2). + #[must_use] + pub fn descend_into_impl(self) -> Self { + Self { + at_file_top: false, + ..self + } + } } /// Reachability-root tags for a walked item, sorted + deduplicated (ADR-054). @@ -79,9 +111,23 @@ pub fn root_tags( if ctx.under_cfg_test || has_test_attr(attrs) { tags.insert("test"); } - if is_fn && is_entry_point(name, attrs, ctx) { + // entry-point: a bare module-level `fn main` (fns only), OR an entry + // attribute (runtime entry / FFI host export / proc-macro) on any item. + if (is_fn && ctx.at_file_top && name == "main") || has_entry_attr(attrs) { tags.insert("entry-point"); } + // Framework-dispatched handlers — reached by the framework, not by a static + // caller, so roots regardless of visibility/crate-type. `framework-handler` + // rides as the excluded-tag companion (mirroring the Python plugin); the + // ROOT is `http-route` / `cli-command` (ADR-054 increment 2). + if attr_last_seg_in(attrs, HTTP_ROUTE_ATTRS) { + tags.insert("http-route"); + tags.insert("framework-handler"); + } + if derive_last_seg_in(attrs, CLI_COMMAND_DERIVES) { + tags.insert("cli-command"); + tags.insert("framework-handler"); + } if has_allow_dead_code(attrs) { tags.insert("allow-dead-code"); } @@ -102,23 +148,62 @@ pub fn has_macro_export(attrs: &[Attribute]) -> bool { attrs.iter().any(|a| a.path().is_ident("macro_export")) } -/// `#[test]` / `#[bench]`, including last-segment variants like `#[tokio::test]`. +/// `#[test]` / `#[bench]` (incl. last-segment variants like `#[tokio::test]`), +/// and the std-replacement test-runner attributes (`#[rstest]`, etc.). fn has_test_attr(attrs: &[Attribute]) -> bool { attrs .iter() .any(|a| last_segment_is(a, "test") || last_segment_is(a, "bench")) + || attr_last_seg_in(attrs, TEST_RUNNER_ATTRS) +} + +/// An attribute that reaches an item from OUTSIDE the Rust call graph: an +/// async-runtime entry (`#[tokio::main]` / `#[actix_web::main]`, matched on the +/// `main` last segment), an FFI host export (pyo3, `#[no_mangle]` / +/// `#[export_name]`), or a compiler-invoked proc-macro entry point. +fn has_entry_attr(attrs: &[Attribute]) -> bool { + attr_last_seg_in(attrs, &["main"]) + || attr_last_seg_in(attrs, PYO3_ENTRY_ATTRS) + || attr_is_ident_in(attrs, &["no_mangle", "export_name"]) + || attr_is_ident_in(attrs, PROC_MACRO_ATTRS) +} + +/// Any attribute whose final path segment is one of `names` (so `#[get]` and +/// `#[actix_web::get]` both match `"get"`). +fn attr_last_seg_in(attrs: &[Attribute], names: &[&str]) -> bool { + attrs.iter().any(|a| { + a.path() + .segments + .last() + .is_some_and(|s| names.iter().any(|n| s.ident == n)) + }) } -/// A module-level `fn main`, an async-runtime entry attribute (`#[tokio::main]` -/// / `#[actix_web::main]` / `#[async_std::main]`, matched on the `main` last -/// segment), or an FFI export (`#[no_mangle]` / `#[export_name]`). -fn is_entry_point(name: &str, attrs: &[Attribute], ctx: TagCtx) -> bool { - (ctx.at_file_top && name == "main") - || attrs.iter().any(|a| { - last_segment_is(a, "main") - || a.path().is_ident("no_mangle") - || a.path().is_ident("export_name") - }) +/// Any attribute that is the bare single ident `name` (no path) for one of +/// `names` — used where the attribute is never path-qualified (`#[proc_macro]`). +fn attr_is_ident_in(attrs: &[Attribute], names: &[&str]) -> bool { + attrs + .iter() + .any(|a| names.iter().any(|n| a.path().is_ident(n))) +} + +/// Any `#[derive(...)]` whose derive list contains a path with a final segment +/// (or any segment) in `names` — catches `#[derive(Parser)]` and +/// `#[derive(clap::Parser)]`. The `names` are distinctive enough that a +/// path-prefix token can never be one of them. +fn derive_last_seg_in(attrs: &[Attribute], names: &[&str]) -> bool { + attrs.iter().any(|a| { + if let Meta::List(list) = &a.meta { + list.path.is_ident("derive") + && list + .tokens + .to_string() + .split(|c: char| !c.is_alphanumeric() && c != '_') + .any(|t| names.contains(&t)) + } else { + false + } + }) } /// `#[allow(dead_code)]` or `#[expect(dead_code)]` — an explicit author keep diff --git a/crates/loomweave-plugin-rust/src/serve.rs b/crates/loomweave-plugin-rust/src/serve.rs index b53185f5..d65bf3a8 100644 --- a/crates/loomweave-plugin-rust/src/serve.rs +++ b/crates/loomweave-plugin-rust/src/serve.rs @@ -129,7 +129,7 @@ pub fn run() -> ! { version: env!("CARGO_PKG_VERSION").to_owned(), // Lockstep with plugin.toml `[ontology].ontology_version` // (ADR-027). Bump both together. - ontology_version: "0.6.0".to_owned(), + ontology_version: "0.7.0".to_owned(), capabilities: serde_json::json!({}), }; send_result(&mut writer, id, serde_json::to_value(result).unwrap()); diff --git a/crates/loomweave-plugin-rust/tests/root_tags.rs b/crates/loomweave-plugin-rust/tests/root_tags.rs index 6dcda032..b199300b 100644 --- a/crates/loomweave-plugin-rust/tests/root_tags.rs +++ b/crates/loomweave-plugin-rust/tests/root_tags.rs @@ -184,3 +184,162 @@ fn non_exported_macro_has_no_tags() { let m = tags_by_id("k", "k.m", "macro_rules! mac { () => {}; }\n"); assert!(tags(&m, "rust:macro:k.m.mac").is_empty()); } + +// ---- impl-method rooting (increment 2: pub methods of pub types) ----------- + +/// Find the single emitted method entity whose id ends in `.` (robust to +/// the exact `…impl[…]` qualname rendering). +fn method_tags(map: &BTreeMap>, name: &str) -> Vec { + let id = map + .keys() + .find(|k| k.starts_with("rust:function:") && k.ends_with(&format!(".{name}"))) + .unwrap_or_else(|| panic!("method {name} not emitted; ids: {:?}", map.keys())); + map.get(id).cloned().unwrap_or_default() +} + +#[test] +fn pub_inherent_method_is_exported_api() { + let src = "pub struct S;\nimpl S { pub fn doit(&self) {} fn helper(&self) {} }\n"; + let m = tags_by_id("k", "k.m", src); + assert_eq!( + method_tags(&m, "doit"), + ["exported-api"], + "a pub inherent method is external API" + ); + assert!( + method_tags(&m, "helper").is_empty(), + "a private inherent method is not rooted" + ); +} + +#[test] +fn trait_impl_method_is_not_exported_api() { + // Trait methods carry inherited visibility (no `pub`), so the pub rule does + // not root them — their dispatch-reachability is a deferred follow-up. + let src = "pub struct S;\n\ + impl std::fmt::Display for S {\n\ + fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { Ok(()) }\n\ + }\n"; + let m = tags_by_id("k", "k.m", src); + assert!( + method_tags(&m, "fmt").is_empty(), + "a trait-impl method is not rooted by the pub rule" + ); +} + +#[test] +fn pub_method_in_bin_target_is_not_exported_api() { + let src = "pub struct S;\nimpl S { pub fn doit(&self) {} }\n"; + let m = tags_by_id("k", "k@bin(k)", src); + assert!( + method_tags(&m, "doit").is_empty(), + "a bin target's pub method is internal, not external API" + ); +} + +#[test] +fn pub_method_under_private_mod_is_not_exported_api() { + // The impl's enclosing module chain must be pub (same rule as free items). + let src = "mod internal { pub struct S; impl S { pub fn doit(&self) {} } }\n"; + let m = tags_by_id("k", "k.m", src); + assert!( + method_tags(&m, "doit").is_empty(), + "a pub method under a private mod is not external surface" + ); +} + +// ---- framework-attribute handlers (increment 2) --------------------------- +// http-route / cli-command emit `framework-handler` as a companion (mirroring +// the Python plugin); FFI host exports + proc-macros map to `entry-point` (a +// real root — their callees are traversed). framework-handler is an +// excluded-tag, never a standalone root. + +#[test] +fn actix_route_attr_is_http_route() { + let m = tags_by_id("k", "k.m", "#[get(\"/\")]\nasync fn index() {}\n"); + assert_eq!( + tags(&m, "rust:function:k.m.index"), + ["framework-handler", "http-route"] + ); +} + +#[test] +fn post_and_generic_route_attrs_are_http_route() { + let src = "#[post(\"/x\")]\nfn create() {}\n#[route(\"/y\")]\nfn multi() {}\n"; + let m = tags_by_id("k", "k.m", src); + assert_eq!( + tags(&m, "rust:function:k.m.create"), + ["framework-handler", "http-route"] + ); + assert_eq!( + tags(&m, "rust:function:k.m.multi"), + ["framework-handler", "http-route"] + ); +} + +#[test] +fn clap_parser_derive_is_cli_command() { + let m = tags_by_id("k", "k.m", "#[derive(Parser)]\nstruct Cli { v: i32 }\n"); + assert_eq!( + tags(&m, "rust:struct:k.m.Cli"), + ["cli-command", "framework-handler"] + ); +} + +#[test] +fn clap_subcommand_enum_is_cli_command() { + let m = tags_by_id("k", "k.m", "#[derive(Subcommand)]\nenum Cmd { A, B }\n"); + assert_eq!( + tags(&m, "rust:enum:k.m.Cmd"), + ["cli-command", "framework-handler"] + ); +} + +#[test] +fn pyo3_pyfunction_is_entry_point() { + let m = tags_by_id( + "k", + "k.m", + "#[pyfunction]\nfn add(a: i64, b: i64) -> i64 { a + b }\n", + ); + assert_eq!(tags(&m, "rust:function:k.m.add"), ["entry-point"]); +} + +#[test] +fn pyo3_pyclass_is_entry_point() { + let m = tags_by_id("k", "k.m", "#[pyclass]\nstruct PyThing { v: i32 }\n"); + assert_eq!(tags(&m, "rust:struct:k.m.PyThing"), ["entry-point"]); +} + +#[test] +fn proc_macro_is_entry_point() { + let m = tags_by_id( + "k", + "k.m", + "#[proc_macro]\npub fn my_macro(input: TokenStream) -> TokenStream { input }\n", + ); + assert_eq!( + tags(&m, "rust:function:k.m.my_macro"), + ["entry-point", "exported-api"] + ); +} + +#[test] +fn rstest_attr_is_test() { + let m = tags_by_id("k", "k.m", "#[rstest]\nfn checks() {}\n"); + assert_eq!(tags(&m, "rust:function:k.m.checks"), ["test"]); +} + +#[test] +fn serde_and_plain_derives_are_not_roots() { + // Guard against the typetag::serde catastrophic-collision class: neither a + // bare `#[serde(...)]` nor a non-framework derive may produce a root tag. + let plain = tags_by_id( + "k", + "k.m", + "#[derive(Clone, Debug)]\nstruct Plain { v: i32 }\n", + ); + let serded = tags_by_id("k", "k.m", "struct P { v: i32 }\n"); + assert!(tags(&plain, "rust:struct:k.m.Plain").is_empty()); + assert!(tags(&serded, "rust:struct:k.m.P").is_empty()); +} diff --git a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md index 5818e4b7..604ff689 100644 --- a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md +++ b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md @@ -161,27 +161,69 @@ Additive tag-vocabulary change: Rust plugin `ontology_version` **0.5.0 → `scripts/check-rust-plugin-manifest-lockstep.py`), `serve.rs`'s `initialize` response, and the `docs/operator/language-support.md` table. -## Deferred to increment 2 (noted here, like ADR-053's Alternative 3) +## Increment 2 (implemented): framework handlers + `pub`-method rooting + +A focused framework-attribute survey (a 6-agent taxonomy sweep across the Rust +web / CLI / FFI / RPC / test ecosystems) produced a precision-first, collision- +aware detection set. The Rust plugin now also emits: + +- **`http-route`** (+ `framework-handler` companion) — actix-web / ntex / rocket + route attribute macros, matched on the attribute's last path segment (`get`, + `post`, `put`, `patch`, `delete`, `head`, `options`, `trace`, `connect`, + `route`). Cross-crate last-segment collisions are benign (every match means a + route) and over-rooting is fail-toward-live. +- **`cli-command`** (+ `framework-handler`) — clap / structopt CLI derives, + matched on the `#[derive(...)]` list (`Parser`, `Subcommand`, `Args`, + `ValueEnum`, `StructOpt`). +- **`entry-point`** also from pyo3 FFI host exports (`#[pyfunction]` / `#[pyfn]` + / `#[pyclass]` / `#[pymodule]`) and proc-macro entry points (`#[proc_macro]` / + `#[proc_macro_derive]` / `#[proc_macro_attribute]`) — items reached from a + non-Rust host or the compiler. +- **`test`** also from the std-replacement runners `#[rstest]` / `#[test_case]` + / `#[quickcheck]`. +- **`pub`-method rooting** — a `pub` method of an inherent `impl` whose enclosing + module chain is `pub` (lib target) is `exported-api`. Trait-impl methods carry + inherited visibility (no `pub`) so the rule leaves them unrooted (see the + residual below). The pub-chain + cfg(test) gating is inherited from the + enclosing module; a method named `main` is never an entry point. + +**Critical tag-semantics fact (verified against `shortcuts.rs`):** +`framework-handler` is in `DEAD_CODE_EXCLUDED_TAGS`, **not** `DEAD_CODE_ROOT_TAGS` +— it excludes the *tagged entity itself* from dead-code candidacy but does NOT +root its callees. So `http-route` / `cli-command` (the real roots) carry +`framework-handler` only as the self-exclusion companion, exactly as the Python +plugin does. FFI host exports map to `entry-point` (a real root) precisely so +their callees ARE traversed. + +The collision survey's one **catastrophic** finding is honoured: a bare +last-segment match on `serde` (from `typetag::serde`) would match every +`#[serde(...)]`, so no such match exists; CLI detection is derive-gated and the +deferred FFI frameworks that need it use full-path matching. + +This is a further additive ontology change: Rust plugin `ontology_version` +**0.6.0 → 0.7.0**. + +## Still deferred (second-pass extensions) -These are real follow-ups, deferred because each either needs cross-file -resolution or would only *narrow* the root set: - -- **Framework-attribute handlers** — `http-route` (axum/actix/rocket route - attribute macros, e.g. `#[get("/…")]`), `cli-command` (clap/structopt derives - and `#[command]`), `framework-handler` (proc-macro registration attrs). Rust's - web frameworks are heterogeneous (many use the builder pattern, not - attributes), so this is best-effort attribute detection with a documented - coverage limit — breadth research, deferred to a focused increment. - **`pub use` re-export resolution** — a privately-defined item re-exported `pub` is part of the API surface. Resolving the re-export target needs the cross-file symbol table (the resolver). The common facade case (`pub use internal::Thing` where `Thing` is itself `pub`) is **already covered** by `Thing`'s own `exported-api` tag at its definition; only a `pub(crate)` item - re-exported `pub` is under-rooted, a narrow residual. Deferred. -- **`pub`-method rooting** — a `pub` method of a `pub` type (an `impl` item) is - external API, but reachability traverses call+import edges only, so rooting - the type does not root its methods. This is the Rust analog of the Python - follow-up clarion-961a1acb2c; deferred with it. + re-exported `pub` is under-rooted, a narrow residual. +- **Trait-impl-method rooting** — a method reached only via trait dispatch + (the Rust framework-dispatch case) has no inbound call edge; the pub rule does + not root it (inherited visibility). `framework-handler` on a `#[handler]` would + exclude the handler but not its private callees — the documented under-rooting + residual this model's error-cost asymmetry otherwise fights. Left as an open + follow-up rather than silently promoting handlers to `entry-point`. +- **Lower-prevalence frameworks** — wasm-bindgen / napi / uniffi / cxx FFI, + poem/salvo `#[handler]`, rocket `#[catch]`, tarpc / jsonrpsee, argh / gumdrop / + bpaf. Several need full-path matching to stay collision-safe; deferred to a + second pass. Builder-pattern frameworks (axum `Router::route`, warp filters, + tonic codegen) register routes via runtime calls, not attributes, so a + parse-only (`syn`) extractor cannot detect them — a permanent limitation, not a + deferral. ## Alternatives considered diff --git a/docs/operator/language-support.md b/docs/operator/language-support.md index cf10fde6..a65e72ef 100644 --- a/docs/operator/language-support.md +++ b/docs/operator/language-support.md @@ -17,7 +17,7 @@ produced an entity. The differences below are entirely in what the plugins |---|---|---| | Status | first-party, v1.0 | first-party, 1.x | | Source backend | `pyright` (type-resolved) | `syn` (parse-only, in-project symbol table) | -| Ontology version | 0.9.0 | 0.6.0 | +| Ontology version | 0.9.0 | 0.7.0 | | Wardline-aware | **yes** (`wardline:*` trust tags) | no | | **Entity kinds** | `function`, `class`, `module` | `module`, `struct`, `enum`, `trait`, `function`, `impl`, `type_alias`, `const`, `static`, `macro` | | **Structural edges** | `contains`, `calls`, `references`, `imports` | `contains`, `calls`, `references`, `imports` | @@ -55,19 +55,34 @@ inferred — so `entity_dead_list` now **works** on a pure-Rust index. - `entry-point` — `fn main`; a runtime-entry attribute (`#[tokio::main]` / `#[actix_web::main]` / `#[async_std::main]`); an FFI export (`#[no_mangle]` / `#[export_name]`). -- `test` — `#[test]` / `#[bench]`, or any item under a `#[cfg(test)]` module. +- `test` — `#[test]` / `#[bench]`, the std-replacement runners (`#[rstest]`, + `#[test_case]`, `#[quickcheck]`), or any item under a `#[cfg(test)]` module. - `allow-dead-code` — an item carrying `#[allow(dead_code)]` / `#[expect(dead_code)]` (an explicit author keep-signal; the lowest-confidence root class). +- `http-route` (+ `framework-handler`) — actix-web / ntex / rocket route + attribute macros (`#[get("/")]`, `#[post]`, …, `#[route]`). +- `cli-command` (+ `framework-handler`) — clap / structopt CLI derives + (`#[derive(Parser)]` / `Subcommand` / `Args` / `ValueEnum` / `StructOpt`). +- `entry-point` also covers pyo3 FFI host exports (`#[pyfunction]` / + `#[pyclass]` / `#[pymodule]`) and proc-macro entry points (`#[proc_macro]` / + `#[proc_macro_derive]` / `#[proc_macro_attribute]`) — items reached from a + non-Rust host or the compiler. +- `pub` methods of `pub` types (inherent `impl` blocks) are `exported-api`. -Not yet emitted by Rust (tracked, increment 2): framework-attribute handlers -(`http-route` / `cli-command` / `framework-handler` from axum/actix/rocket/clap -attributes), `pub use` re-export resolution, and `pub`-method rooting of `pub` -types. A `pub(crate)` item re-exported `pub` is therefore under-rooted today (a -narrow, fail-toward-live residual). The structural tools (`entity_find`, -`entity_callers_list`, `entity_neighborhood_get`, the edge surfaces) are -unaffected. See [rust-known-limitations.md](./rust-known-limitations.md) for the -full list of what Rust analysis does and does not resolve. +Not yet emitted by Rust (tracked second-pass extensions): `pub use` re-export +resolution (a `pub(crate)` item re-exported `pub` is under-rooted — a narrow, +fail-toward-live residual); trait-impl-method rooting (a method reached only via +trait dispatch — `framework-handler` excludes the *handler* but not its private +callees, the documented under-rooting residual); and the lower-prevalence +frameworks (wasm-bindgen / napi / uniffi / cxx FFI, poem/salvo `#[handler]`, +rocket `#[catch]`, tarpc/jsonrpsee, argh/gumdrop). Builder-pattern frameworks +(axum `Router::route`, warp filters, tonic codegen) register routes via runtime +calls, not attributes, so a parse-only extractor cannot detect them. The +structural tools (`entity_find`, `entity_callers_list`, +`entity_neighborhood_get`, the edge surfaces) are unaffected. See +[rust-known-limitations.md](./rust-known-limitations.md) for the full list of +what Rust analysis does and does not resolve. ## Mixed-language repositories diff --git a/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml b/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml index b3632557..2444518f 100644 --- a/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml +++ b/packaging/rust-plugin-dist/wheel-data/data/share/loomweave/plugins/rust/plugin.toml @@ -54,12 +54,16 @@ rule_id_prefix = "LMWV-RUST-" # 0.5.0: additive edge kinds `derives` + `references` (Phase 2 completion, # plan 2026-06-10 — ADR-027 MINOR). # 0.6.0: additive reachability-root categorisation tags `exported-api` / -# `entry-point` / `test` / `allow-dead-code` (ADR-054, clarion-05fdd0490e — -# ADR-027 MINOR). Tags are not manifest-gated (the host validates size, not -# membership), so this bump is documentation + cache-invalidation, not a wire -# contract; bumped anyway per ADR-027. +# `entry-point` / `test` / `allow-dead-code` (ADR-054 increment 1, +# clarion-05fdd0490e — ADR-027 MINOR). Tags are not manifest-gated (the host +# validates size, not membership), so this bump is documentation + +# cache-invalidation, not a wire contract; bumped anyway per ADR-027. +# 0.7.0: additive framework-attribute tags `http-route` / `cli-command` / +# `framework-handler` (actix/rocket routes, clap derives, pyo3/proc-macro +# entry-points, rstest runners) + `pub`-method rooting (ADR-054 increment 2 — +# ADR-027 MINOR). # Lockstep with the `serve.rs` handshake constant. -ontology_version = "0.6.0" +ontology_version = "0.7.0" [ontology.roles] file_scope = ["module"] From 5d0583a5b33fc5e0aecae8f3cf2e359121d35b00 Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Thu, 25 Jun 2026 11:32:05 +1000 Subject: [PATCH 5/7] fix(plugin-rust): address adversarial-review findings (ADR-054) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five confirmed findings from the multi-dimension review of increments 1+2: 1. [MAJOR, bug] #[macro_export] is chain-INDEPENDENT — it lifts a macro to the crate root regardless of module privacy. The exported-api gate wrongly required ancestors_all_pub, so the standard `mod macros { #[macro_export] macro_rules! … }` idiom read as dead (under-rooting — the trust-killing direction). Fixed via TagCtx::with_export_chain_satisfied(), applied only to exported macros; dogfooded end-to-end. 2. [MAJOR, test] the serde catastrophe guard never fed an actual #[serde(...)] — half-vacuous. Now pins that a real #[serde(rename_all=…)] yields no root tag. 3. [MAJOR, test] #[export_name] FFI entry-point had zero coverage (only #[no_mangle] was tested) — added. 4. [MAJOR, docs] language-support.md intro omitted http-route/cli-command — now lists all six emitted tags. 5. [MINOR, docs] pub-method-rooting line lacked the lib/bin qualifier — added. Floor green (nextest 1897). Co-Authored-By: Claude Opus 4.8 (1M context) --- crates/loomweave-plugin-rust/src/extract.rs | 12 +++- crates/loomweave-plugin-rust/src/root_tags.rs | 13 ++++ .../loomweave-plugin-rust/tests/root_tags.rs | 59 +++++++++++++++++++ docs/operator/language-support.md | 8 ++- 4 files changed, 87 insertions(+), 5 deletions(-) diff --git a/crates/loomweave-plugin-rust/src/extract.rs b/crates/loomweave-plugin-rust/src/extract.rs index 0e815fda..5fe4df0e 100644 --- a/crates/loomweave-plugin-rust/src/extract.rs +++ b/crates/loomweave-plugin-rust/src/extract.rs @@ -870,10 +870,18 @@ fn walk_items( None, )?; // `macro_rules!` has no `Visibility`; `#[macro_export]` is its - // external-surface marker (ADR-054 §1). + // external-surface marker (ADR-054 §1). It is CHAIN-INDEPENDENT — + // a `#[macro_export]` in a private mod is still crate-root API — + // so the pub-chain gate is waived for an exported macro. + let exported_macro = has_macro_export(attrs); + let macro_ctx = if exported_macro { + ctx.with_export_chain_satisfied() + } else { + ctx + }; attach_tags( &mut child, - root_tags(&name, has_macro_export(attrs), false, attrs, ctx), + root_tags(&name, exported_macro, false, attrs, macro_ctx), ); push_with_contains(parent_id, child, out, edges); } diff --git a/crates/loomweave-plugin-rust/src/root_tags.rs b/crates/loomweave-plugin-rust/src/root_tags.rs index 9c6b56cb..7a5936b8 100644 --- a/crates/loomweave-plugin-rust/src/root_tags.rs +++ b/crates/loomweave-plugin-rust/src/root_tags.rs @@ -87,6 +87,19 @@ impl TagCtx { ..self } } + + /// The export chain is satisfied independently of the lexical `pub` chain. + /// `#[macro_export]` lifts a macro to the crate root regardless of how deeply + /// it is nested in private modules, so its `exported-api` status must not be + /// gated by `ancestors_all_pub` (the standard `mod macros { #[macro_export] + /// … }` idiom would otherwise read as dead — under-rooting). + #[must_use] + pub fn with_export_chain_satisfied(self) -> Self { + Self { + ancestors_all_pub: true, + ..self + } + } } /// Reachability-root tags for a walked item, sorted + deduplicated (ADR-054). diff --git a/crates/loomweave-plugin-rust/tests/root_tags.rs b/crates/loomweave-plugin-rust/tests/root_tags.rs index b199300b..fb7bdd26 100644 --- a/crates/loomweave-plugin-rust/tests/root_tags.rs +++ b/crates/loomweave-plugin-rust/tests/root_tags.rs @@ -185,6 +185,65 @@ fn non_exported_macro_has_no_tags() { assert!(tags(&m, "rust:macro:k.m.mac").is_empty()); } +#[test] +fn macro_export_in_private_mod_is_still_exported_api() { + // `#[macro_export]` lifts a macro to the crate root regardless of module + // privacy — chain-INDEPENDENT, unlike `pub` visibility. The common idiom is + // `mod macros { #[macro_export] macro_rules! ... }`; without this the macro + // (real external API) reads as dead — the under-rooting failure ADR-054 fights. + let m = tags_by_id( + "k", + "k.m", + "mod internal { #[macro_export] macro_rules! mac { () => {}; } }\n", + ); + let id = m + .keys() + .find(|k| k.starts_with("rust:macro:")) + .expect("macro emitted") + .clone(); + assert_eq!( + tags(&m, &id), + ["exported-api"], + "#[macro_export] is chain-independent" + ); +} + +// ---- entry-point: export_name FFI (distinct from no_mangle) ---------------- + +#[test] +fn export_name_ffi_export_is_entry_point() { + let m = tags_by_id( + "k", + "k.m", + "#[export_name = \"my_export\"]\npub extern \"C\" fn exported() {}\n", + ); + assert_eq!( + tags(&m, "rust:function:k.m.exported"), + ["entry-point", "exported-api"] + ); +} + +// ---- regression guard: serde/typetag must NOT be mistaken for a root ------- + +#[test] +fn serde_attribute_is_not_a_root() { + // The typetag::serde catastrophe class: a bare last-segment match on `serde` + // would tag every `#[serde(...)]`. This pins that a real serde attribute (and + // a non-framework derive) produces NO reachability-root tag. + let serded = tags_by_id( + "k", + "k.m", + "#[serde(rename_all = \"camelCase\")]\nstruct P { v: i32 }\n", + ); + let derived = tags_by_id( + "k", + "k.m", + "#[derive(Clone, Debug, Serialize)]\nstruct Q { v: i32 }\n", + ); + assert!(tags(&serded, "rust:struct:k.m.P").is_empty()); + assert!(tags(&derived, "rust:struct:k.m.Q").is_empty()); +} + // ---- impl-method rooting (increment 2: pub methods of pub types) ----------- /// Find the single emitted method entity whose id ends in `.` (robust to diff --git a/docs/operator/language-support.md b/docs/operator/language-support.md index a65e72ef..cac51570 100644 --- a/docs/operator/language-support.md +++ b/docs/operator/language-support.md @@ -42,8 +42,9 @@ a declared `exported-api` (ADR-053 / clarion-4ec50f3d92), so a Python codebase i not over-reported as dead just because it does not exhaustively declare `__all__`. **Rust emits** (ADR-054, clarion-05fdd0490e): `exported-api`, `entry-point`, -`test`, and `allow-dead-code`, derived from Rust's explicit semantics rather than -inferred — so `entity_dead_list` now **works** on a pure-Rust index. +`test`, `allow-dead-code`, `http-route`, and `cli-command` (the last two with a +`framework-handler` companion), derived from Rust's explicit semantics rather +than inferred — so `entity_dead_list` now **works** on a pure-Rust index. - `exported-api` — an unrestricted `pub` value/type item whose whole enclosing module chain is `pub` (the visibility chain reaches the crate's external @@ -68,7 +69,8 @@ inferred — so `entity_dead_list` now **works** on a pure-Rust index. `#[pyclass]` / `#[pymodule]`) and proc-macro entry points (`#[proc_macro]` / `#[proc_macro_derive]` / `#[proc_macro_attribute]`) — items reached from a non-Rust host or the compiler. -- `pub` methods of `pub` types (inherent `impl` blocks) are `exported-api`. +- `pub` methods of `pub` types in **library** targets (inherent `impl` blocks) + are `exported-api` (same lib/bin and pub-chain gating as regular `pub` items). Not yet emitted by Rust (tracked second-pass extensions): `pub use` re-export resolution (a `pub(crate)` item re-exported `pub` is under-rooted — a narrow, From 50d41d9b5d83505f20a8c872e1dba395e2e64131 Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Thu, 25 Jun 2026 11:39:13 +1000 Subject: [PATCH 6/7] docs(adr-054): record real-corpus dogfood result (AC#6) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Analysed the loomweave workspace (181 files, 4231 entities): entity_dead_list is available + plausible (357 dead / 2384 ≈ 15%, moderate confidence). Trait- method deferral vindicated (only 8/357 dead are impl methods). The candidate set is dominated by private const/struct/enum reached via uncaptured value- references — a pre-existing reference-extraction + reachability gap surfaced (not caused) by making Rust analysable, tracked as clarion-a325bab42f. Deferred root extensions tracked as clarion-cbbd971eb1. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../ADR-054-rust-reachability-root-model.md | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md index 604ff689..da8c38ae 100644 --- a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md +++ b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md @@ -225,6 +225,30 @@ This is a further additive ontology change: Rust plugin `ontology_version` parse-only (`syn`) extractor cannot detect them — a permanent limitation, not a deferral. +## Empirical result (dogfood, the loomweave workspace itself) + +Analysed a copy of the loomweave Rust workspace (181 files, **4231 entities**, +5755 edges) with the increment-1+2 plugin (acceptance criterion AC#6): + +- `entity_dead_list` is **available and plausible**: **357 dead / 2384 analysed + ≈ 15%** → **moderate** confidence (well inside the plausible band; nowhere near + the >25% "implausible" band ADR-053 fought for Python). The no-roots exclusion + lifts; `find_entry_points` / `find_http_routes` light up for Rust. +- **Trait-impl-method rooting deferral is vindicated by the data:** of the 357 + candidates only **8** are `impl` methods. Trait-heavy real Rust does *not* + pathologically inflate the dead share, so deferring trait-method rooting was + the right call, not a hidden weakness. +- The candidate set is dominated by **private value items** — `const` (124), + `struct` (113), `enum` (25) — reached through *value references* (a const read + in an expression, a struct used as a field type) that the Rust plugin does not + yet emit as reachability edges, AND the dead-code adjacency counts only + call+import (not `references`). So these read as dead though they are used: a + **pre-existing reference-extraction + reachability-coverage gap surfaced (not + caused) by making Rust analysable.** It is the natural next accuracy lever + (the Rust analogue of ADR-053's method-rooting follow-up) and is tracked + separately; it does not block this model, whose headline number is already + honest and plausible. + ## Alternatives considered ### Alternative 1: reuse `exported-api` for the `#[allow(dead_code)]` keep-signal From 7e975c97af471d4a961ff75f97fff3dfa9bd68d8 Mon Sep 17 00:00:00 2001 From: John Morrissey <544926+tachyon-beep@users.noreply.github.com> Date: Thu, 25 Jun 2026 19:55:34 +1000 Subject: [PATCH 7/7] fix(analyze): re-dispatch a plugin's files when its tag schema moves (clarion-e12d424f1d) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The incremental `analyze` fast path skipped a file purely on its whole-file content hash, with no plugin/ontology-version component, and the plugin-declared `ontology_version` was never persisted. So after an operator upgraded a plugin whose emitted vocabulary changed (e.g. the ADR-053/054 reachability-root tags), every UNCHANGED file was silently skipped and kept its pre-upgrade `entity_tags` rows — which carry no root tags. The MCP dead-code survey's per-plugin honest-empty guard is all-or-nothing (a plugin with >=1 root tag index-wide is surveyed in full), so the moment one re-edited file re-tags, the guard disengages and the unchanged public surface plus its transitive callees are reported as confidently-wrong dead-code — the exact false-positive-dead the feature exists to prevent. Fix (the ticket's preferred option): persist a per-plugin `(version, ontology_version)` marker (`plugin_index_meta`, migration 0011) and, on the next `analyze`, force a full re-dispatch of that plugin's files when either component moved (or no marker exists yet) — even on an incremental run. The marker is rewritten in the SAME transaction as the prior-index snapshot and only on a fully-successful run, so it can never advance past an index the new vocabulary was actually written into. The partition override is in-memory only (the stored per-file hashes — keyed on the core `file` entity, not per language plugin — are never mutated). An index with no marker re-dispatches once, healing any skew an earlier silent upgrade left behind. The comparison keys on the pair because `ontology_version` is not gate-enforced for every plugin; `version` bumps on any release and is the conservative backstop. The MCP guard is left untouched: it structurally cannot detect partial coverage, and this fix restores the index consistency its all-or-nothing assumption needs. Tests: integration test bumps ontology_version (then version) between byte-identical runs and asserts a full re-analyse, plus that the force-full is a one-shot per marker change (the marker is persisted; incremental skip re-engages). Storage round-trip + atomic-commit + per-plugin-upsert unit tests. ADR-054 records the new invariant. Co-Authored-By: Claude Opus 4.8 (1M context) --- crates/loomweave-cli/src/analyze.rs | 77 +++++++- crates/loomweave-cli/tests/analyze.rs | 117 +++++++++++ .../migrations/0011_plugin_index_meta.sql | 41 ++++ crates/loomweave-storage/src/commands.rs | 16 +- crates/loomweave-storage/src/lib.rs | 5 +- crates/loomweave-storage/src/prior_index.rs | 187 +++++++++++++++++- crates/loomweave-storage/src/schema.rs | 7 +- crates/loomweave-storage/src/writer.rs | 8 +- .../loomweave-storage/tests/schema_apply.rs | 41 +++- .../ADR-054-rust-reachability-root-model.md | 26 +++ 10 files changed, 509 insertions(+), 16 deletions(-) create mode 100644 crates/loomweave-storage/migrations/0011_plugin_index_meta.sql diff --git a/crates/loomweave-cli/src/analyze.rs b/crates/loomweave-cli/src/analyze.rs index 04b02939..4e6bb6ee 100644 --- a/crates/loomweave-cli/src/analyze.rs +++ b/crates/loomweave-cli/src/analyze.rs @@ -686,6 +686,27 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti HashMap::new(), ) }; + // clarion-e12d424f1d: the per-plugin tag-schema markers from the last + // successful run, keyed by plugin_id. Each plugin's live manifest + // (version, ontology_version) is compared against its stored marker below; + // a mismatch (or absence) forces a full re-dispatch of that plugin's files, + // so a plugin upgrade that changed the emitted vocabulary (e.g. ADR-053/054 + // root tags) never leaves unchanged files carrying stale entity_tags. Empty + // under `--no-incremental` (everything re-dispatches regardless), but the + // markers are still re-stamped after the run so the next incremental run + // sees current values. + let prior_plugin_markers: HashMap = if incremental + { + Connection::open(&db_path) + .ok() + .and_then(|conn| loomweave_storage::load_plugin_index_markers(&conn).ok()) + .unwrap_or_default() + } else { + HashMap::new() + }; + // The markers to persist after a fully-successful run — one per dispatched + // plugin, recording the (version, ontology_version) the index now reflects. + let mut current_plugin_markers: Vec = Vec::new(); // Locators of skipped-unchanged entities — fed into the SEI matcher's // current-locator union AND re-appended to the prior-index rebuild below. let mut retained_locators: HashSet = HashSet::new(); @@ -785,6 +806,43 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti continue; } + // clarion-e12d424f1d: record the marker this plugin is analysing the + // index under (persisted only on a fully-successful run, alongside the + // prior-index snapshot), and decide whether its tag schema moved since + // the last run. The comparison keys on BOTH `version` (bumps on any + // release) and `ontology_version` (the declared vocabulary version): a + // mismatch on EITHER — or no stored marker at all — forces a full + // re-dispatch of this plugin's files, even on an incremental run. This + // is the only signal that distinguishes "plugin emits no roots" from + // "plugin emits roots but this index's unchanged files predate them", + // which the MCP dead-code survey's all-or-nothing guard cannot. + let plugin_version = plugin.manifest.plugin.version.clone(); + let ontology_version = plugin.manifest.ontology.ontology_version.clone(); + let plugin_tag_schema_changed = match prior_plugin_markers.get(&plugin_id) { + Some(prior) => { + prior.plugin_version != plugin_version || prior.ontology_version != ontology_version + } + // No recorded marker: a fresh plugin, or the first run after this + // fix shipped against an index built by a (possibly already + // upgraded) plugin. Re-dispatch to heal any pre-existing skew — + // the safe, fail-toward-work direction. + None => true, + }; + if incremental && plugin_tag_schema_changed && !prior_plugin_markers.is_empty() { + tracing::info!( + plugin_id = %plugin_id, + plugin_version = %plugin_version, + ontology_version = %ontology_version, + "plugin tag-schema marker changed since last run; forcing full re-dispatch \ + of this plugin's files (clarion-e12d424f1d)" + ); + } + current_plugin_markers.push(loomweave_storage::PluginIndexMarker { + plugin_id: plugin_id.clone(), + plugin_version, + ontology_version, + }); + // Wave 2 / T3.1: partition into files to re-analyse (changed, new, // unhashable → fail toward work) and files to skip (whole-file hash // identical to the prior run). Each skipped file's prior entities stay @@ -793,9 +851,21 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti // A secret-bearing UNCHANGED file skips too (weft-4165f1ed71): its // finding anchor is seeded from the committed rows below, so the skip // no longer re-anchors (and thereby duplicates) the finding. - let (plugin_files, skipped_files): (Vec, Vec) = plugin_files - .into_iter() - .partition(|path| file_needs_reanalysis(&project_root, path, &prior_file_hashes)); + // + // clarion-e12d424f1d: when this plugin's tag schema moved, skip the + // byte-hash consultation entirely and re-dispatch every file (in-memory + // only — the stored per-file hashes are never mutated). The hashes are + // keyed on the core `file` entity, not per language plugin, so there is + // nothing plugin-scoped to clear; overriding the partition is both the + // correct scope and the safe one. + let (plugin_files, skipped_files): (Vec, Vec) = + if plugin_tag_schema_changed { + (plugin_files, Vec::new()) + } else { + plugin_files.into_iter().partition(|path| { + file_needs_reanalysis(&project_root, path, &prior_file_hashes) + }) + }; // Locators of THIS plugin's skipped-unchanged entities. These rows stay in // the committed DB untouched this run (they are guarded against orphan // deletion via `retained_locators` below — see the SEI matcher's @@ -1479,6 +1549,7 @@ pub(crate) async fn run_with_options(project_path: PathBuf, options: AnalyzeOpti if let Err(e) = writer .send_wait(|ack| WriterCmd::UpsertPriorIndex { entries: prior_index_entries, + plugin_markers: current_plugin_markers, recorded_at: iso8601_now(), ack, }) diff --git a/crates/loomweave-cli/tests/analyze.rs b/crates/loomweave-cli/tests/analyze.rs index 70048e95..e79ff1e2 100644 --- a/crates/loomweave-cli/tests/analyze.rs +++ b/crates/loomweave-cli/tests/analyze.rs @@ -3962,6 +3962,123 @@ fn analyze_no_incremental_forces_full_reanalysis() { ); } +#[test] +#[cfg_attr(not(unix), ignore = "fixture plugin script is a unix shebang")] +fn analyze_ontology_bump_forces_full_reanalysis() { + // clarion-e12d424f1d: the incremental skip keys ONLY on a file's byte + // content (`file_needs_reanalysis` → whole-file hash), with no plugin + // tag-schema component. After a plugin upgrade that changes the emitted + // vocabulary (e.g. ADR-053/054 reachability-root tags), every UNCHANGED + // file keeps its pre-upgrade `entity_tags` rows — which carry no root tags — + // because the file is byte-identical and so silently skipped. The dead-code + // survey then false-flags the unchanged public surface as dead. The fix + // persists a per-plugin (version, ontology_version) marker and forces a full + // re-dispatch of that plugin's files when the marker moves, even WITHOUT + // --no-incremental. + // + // Here we bump ONLY the manifest's `ontology_version` between two + // byte-identical runs and assert the second (plain incremental) run + // re-analyses everything (skipped_files == 0) rather than skipping on the + // stale hash. Before the fix the second run skips both files. + let (project_dir, plugin_dir, plugin_path) = phase3_env(); + std::fs::write(project_dir.path().join("bump_a.p3"), b"module\n").unwrap(); + std::fs::write(project_dir.path().join("bump_b.p3"), b"module\n").unwrap(); + + // Run 1: the PHASE3 manifest ships ontology_version 0.6.0. + loomweave_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + assert_eq!( + latest_run_stats(project_dir.path())["skipped_files"].as_u64(), + Some(0), + "first run has no prior index, so it skips nothing" + ); + + // Sanity: a plain incremental re-run with the SAME manifest skips both + // unchanged files — proving the skip is live and the next assertion is not + // vacuously true. + loomweave_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + assert_eq!( + latest_run_stats(project_dir.path())["skipped_files"].as_u64(), + Some(2), + "an unchanged incremental re-run with an unchanged manifest skips both files" + ); + + // Upgrade the plugin in place: bump ONLY ontology_version, leave both source + // files byte-identical. + let bumped = PHASE3_PLUGIN_MANIFEST.replace( + "ontology_version = \"0.6.0\"", + "ontology_version = \"0.7.0\"", + ); + assert_ne!( + bumped, PHASE3_PLUGIN_MANIFEST, + "the manifest bump must actually change the ontology_version line" + ); + std::fs::write(plugin_dir.path().join("plugin.toml"), &bumped).unwrap(); + + // Run 3: plain incremental (NO --no-incremental). The tag-schema marker + // moved, so every file must re-dispatch to refresh its entity_tags. + loomweave_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + assert_eq!( + latest_run_stats(project_dir.path())["skipped_files"].as_u64(), + Some(0), + "an ontology_version bump must force a full re-analyse despite byte-identical \ + source — otherwise unchanged files keep stale (rootless) entity_tags and the \ + dead-code survey false-flags the unchanged public surface (clarion-e12d424f1d)" + ); + + // Run 4: same (bumped) manifest, byte-identical source. The marker was + // persisted by run 3, so it now MATCHES and the skip re-engages — proving + // the force-full fires once per upgrade, not forever (which would silently + // disable incremental analysis after any upgrade). + loomweave_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + assert_eq!( + latest_run_stats(project_dir.path())["skipped_files"].as_u64(), + Some(2), + "once the marker is persisted, an unchanged re-run skips again — the force-full \ + is a one-shot per marker change, not a permanent full-reanalyse" + ); + + // Run 5: bump the plugin `version` (ontology unchanged). The marker keys on + // BOTH components, so a version-only move must also force a full re-dispatch. + let bumped_version = bumped.replace("version = \"0.1.0\"", "version = \"0.2.0\""); + assert_ne!( + bumped_version, bumped, + "the version bump must actually change the version line" + ); + std::fs::write(plugin_dir.path().join("plugin.toml"), &bumped_version).unwrap(); + loomweave_bin() + .args(["analyze"]) + .arg(project_dir.path()) + .env("PATH", &plugin_path) + .assert() + .success(); + assert_eq!( + latest_run_stats(project_dir.path())["skipped_files"].as_u64(), + Some(0), + "a plugin version bump must also force a full re-analyse (the marker keys on the \ + (version, ontology_version) pair)" + ); +} + // ── REQ-ANALYZE-06: parse-failure findings are persisted, not just logged ──── /// Mirrors the real Python plugin: every file yields one `module` entity, and a diff --git a/crates/loomweave-storage/migrations/0011_plugin_index_meta.sql b/crates/loomweave-storage/migrations/0011_plugin_index_meta.sql new file mode 100644 index 00000000..05459841 --- /dev/null +++ b/crates/loomweave-storage/migrations/0011_plugin_index_meta.sql @@ -0,0 +1,41 @@ +-- Migration 0011: per-plugin tag-schema marker for incremental re-analysis +-- (clarion-e12d424f1d). +-- +-- The incremental skip keys ONLY on a file's byte content +-- (`file_needs_reanalysis` -> whole-file hash), with no plugin tag-schema +-- component, and the plugin-advertised `ontology_version` was never persisted +-- (handshake-only). So after an operator upgrades a plugin whose emitted +-- vocabulary changed (e.g. the ADR-053/054 reachability-root tags), every +-- UNCHANGED file is silently skipped and keeps its pre-upgrade `entity_tags` +-- rows -- which carry no root tags. The dead-code survey then false-flags the +-- unchanged public surface as dead (the survey's per-plugin honest-empty guard +-- is all-or-nothing and is defeated the moment one re-edited file re-tags). +-- +-- This table records the (plugin_version, ontology_version) each plugin last +-- analysed the index under. `analyze` compares the live manifest marker against +-- the stored one and forces a full re-dispatch of that plugin's files when +-- EITHER component moves (or no row exists yet), so an upgrade can never leave +-- a mix of pre- and post-upgrade tag rows. The marker is rewritten in the SAME +-- transaction as the prior-index snapshot, so the two can never disagree after +-- a crash. +-- +-- Keyed by `plugin_id` (one row per plugin). A plugin that has never run leaves +-- no row -> treated as "marker absent" -> full re-dispatch (the safe, +-- fail-toward-work direction). Project isolation is by DB file. + +-- Wrapped in a single transaction (mirroring 0007) so the CREATE and the +-- migration record commit together; an interruption mid-way must not leave the +-- table in place without the schema_migrations.version=11 row. +BEGIN; + +CREATE TABLE plugin_index_meta ( + plugin_id TEXT PRIMARY KEY, + plugin_version TEXT NOT NULL, + ontology_version TEXT NOT NULL, + recorded_at TEXT NOT NULL +); + +INSERT INTO schema_migrations (version, name, applied_at) +VALUES (11, '0011_plugin_index_meta', strftime('%Y-%m-%dT%H:%M:%fZ', 'now')); + +COMMIT; diff --git a/crates/loomweave-storage/src/commands.rs b/crates/loomweave-storage/src/commands.rs index 14703679..5c7dda70 100644 --- a/crates/loomweave-storage/src/commands.rs +++ b/crates/loomweave-storage/src/commands.rs @@ -15,7 +15,7 @@ pub use loomweave_core::EdgeConfidence; use crate::cache::{InferredEdgeCacheEntry, SummaryCacheEntry, SummaryCacheKey}; use crate::error::StorageError; -use crate::prior_index::PriorIndexEntry; +use crate::prior_index::{PluginIndexMarker, PriorIndexEntry}; use crate::sei::{SeiBindingRecord, SeiLineageEntry}; use crate::unresolved::UnresolvedCallSiteRecord; use crate::wardline_taint::TaintFact; @@ -264,14 +264,22 @@ pub enum WriterCmd { /// resolve qualnames. UpsertWardlineTaintFact { fact: Box, ack: Ack<()> }, /// Rewrite the prior-index snapshot to exactly the current run's entities - /// (Wave 0 / WS3). FULL-SNAPSHOT REPLACE — clears `sei_prior_index` and - /// inserts every entry in one transaction, so stale rows from the prior run - /// are removed (despite the `Upsert` name, this is a whole-table replace). + /// (Wave 0 / WS3) AND upsert the per-plugin tag-schema markers + /// (clarion-e12d424f1d), in ONE transaction. The snapshot is a FULL-SNAPSHOT + /// REPLACE — clears `sei_prior_index` and inserts every entry, so stale rows + /// from the prior run are removed (despite the `Upsert` name, this part is a + /// whole-table replace). `plugin_markers` are upserted per `plugin_id`, so a + /// run only advances the markers of the plugins it actually dispatched. + /// Bundling both in one transaction keeps the marker from ever advancing + /// without the snapshot it describes (see [`replace_prior_index_and_markers`]). /// Query-time write: it runs after `CommitRun` (no active run transaction), /// best-effort, and never gates the run's own outcome. `recorded_at` is the /// run-completion timestamp stamped onto every row. + /// + /// [`replace_prior_index_and_markers`]: crate::replace_prior_index_and_markers UpsertPriorIndex { entries: Vec, + plugin_markers: Vec, recorded_at: String, ack: Ack<()>, }, diff --git a/crates/loomweave-storage/src/lib.rs b/crates/loomweave-storage/src/lib.rs index 7773fc50..c30a1622 100644 --- a/crates/loomweave-storage/src/lib.rs +++ b/crates/loomweave-storage/src/lib.rs @@ -46,8 +46,9 @@ pub use guidance::{ list_guidance_sheets, rule_match, slugify_guidance_name, upsert_guidance_sheet, }; pub use prior_index::{ - PriorIndexEntry, clear_prior_index, load_prior_index, previously_analyzed_files, - prior_locators_by_file, replace_prior_index, upsert_prior_index_entry, + PluginIndexMarker, PriorIndexEntry, clear_prior_index, load_plugin_index_markers, + load_prior_index, previously_analyzed_files, prior_locators_by_file, replace_prior_index, + replace_prior_index_and_markers, upsert_plugin_index_marker, upsert_prior_index_entry, }; pub use query::{ CallEdgeMatch, CanonicalProjectPath, ContainedEntities, EntityRow, EntitySubsystem, diff --git a/crates/loomweave-storage/src/prior_index.rs b/crates/loomweave-storage/src/prior_index.rs index f055b4da..a5e2ccdd 100644 --- a/crates/loomweave-storage/src/prior_index.rs +++ b/crates/loomweave-storage/src/prior_index.rs @@ -42,6 +42,30 @@ pub struct PriorIndexEntry { pub signature: Option, } +/// The tag-schema marker a plugin last analysed the index under +/// (`plugin_index_meta`, migration 0011 — clarion-e12d424f1d). +/// +/// `analyze` compares the live manifest's `(version, ontology_version)` against +/// the stored marker and forces a full re-dispatch of that plugin's files when +/// EITHER component moves (or no row exists yet). The incremental skip otherwise +/// keys only on a file's byte content, so a plugin upgrade that changes the +/// emitted vocabulary (e.g. ADR-053/054 reachability-root tags) would leave +/// unchanged files carrying stale `entity_tags` and false-flag the public +/// surface as dead. The marker is rewritten in the SAME transaction as the +/// prior-index snapshot (see [`replace_prior_index_and_markers`]). +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct PluginIndexMarker { + /// `[plugin].plugin_id` — the marker's primary key. + pub plugin_id: String, + /// `[plugin].version` — bumps on any plugin release (enforced by the + /// workspace/rust-manifest lockstep checks). + pub plugin_version: String, + /// `[ontology].ontology_version` — the declared vocabulary version. The + /// semantic tag-schema signal, but not gate-enforced for every plugin, so + /// the comparison keys on the PAIR (re-dispatch if either moved). + pub ontology_version: String, +} + /// Upsert one prior-index row (`INSERT OR REPLACE` on the `locator` PK). /// `recorded_at` is the ISO-8601 UTC stamp written to the row (the run's /// completion time). @@ -165,6 +189,78 @@ pub fn clear_prior_index(conn: &Connection) -> Result<()> { Ok(()) } +/// Load the per-plugin tag-schema markers (`plugin_index_meta`, migration +/// 0011), keyed by `plugin_id`. Read once at the start of a re-index by the +/// incremental fast path; a plugin with no row is absent from the map and +/// treated as "marker unknown → force full re-dispatch" by the caller. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] if the query fails. +pub fn load_plugin_index_markers(conn: &Connection) -> Result> { + let mut stmt = + conn.prepare("SELECT plugin_id, plugin_version, ontology_version FROM plugin_index_meta")?; + let rows = stmt.query_map([], |row| { + Ok(PluginIndexMarker { + plugin_id: row.get::<_, String>(0)?, + plugin_version: row.get::<_, String>(1)?, + ontology_version: row.get::<_, String>(2)?, + }) + })?; + let mut out = HashMap::new(); + for row in rows { + let marker = row.map_err(StorageError::from)?; + out.insert(marker.plugin_id.clone(), marker); + } + Ok(out) +} + +/// Upsert one plugin tag-schema marker (`INSERT … ON CONFLICT(plugin_id)`). +/// Unlike the prior-index snapshot (a full replace), markers are upserted: a +/// run only touches the plugins it actually dispatched, leaving other plugins' +/// markers intact. `recorded_at` is the run's completion stamp. +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] if the statement fails. +pub fn upsert_plugin_index_marker( + conn: &Connection, + marker: &PluginIndexMarker, + recorded_at: &str, +) -> Result<()> { + conn.execute( + "INSERT INTO plugin_index_meta \ + (plugin_id, plugin_version, ontology_version, recorded_at) \ + VALUES (?1, ?2, ?3, ?4) \ + ON CONFLICT(plugin_id) DO UPDATE SET \ + plugin_version = excluded.plugin_version, \ + ontology_version = excluded.ontology_version, \ + recorded_at = excluded.recorded_at", + params![ + marker.plugin_id, + marker.plugin_version, + marker.ontology_version, + recorded_at + ], + )?; + Ok(()) +} + +/// Clear + re-insert the prior-index snapshot inside an already-open +/// transaction. Shared by [`replace_prior_index`] and +/// [`replace_prior_index_and_markers`] so the two cannot drift. +fn write_prior_index_rows( + tx: &Connection, + entries: &[PriorIndexEntry], + recorded_at: &str, +) -> Result<()> { + clear_prior_index(tx)?; + for entry in entries { + upsert_prior_index_entry(tx, entry, recorded_at)?; + } + Ok(()) +} + /// Replace the entire prior-index snapshot with `entries`, atomically: a single /// transaction clears the table and inserts every entry, so a mid-flush failure /// rolls back to the previous snapshot rather than leaving a half-cleared one. @@ -186,9 +282,35 @@ pub fn replace_prior_index( recorded_at: &str, ) -> Result<()> { let tx = conn.unchecked_transaction()?; - clear_prior_index(&tx)?; - for entry in entries { - upsert_prior_index_entry(&tx, entry, recorded_at)?; + write_prior_index_rows(&tx, entries, recorded_at)?; + tx.commit()?; + Ok(()) +} + +/// Rewrite the prior-index snapshot AND upsert the per-plugin tag-schema markers +/// in ONE transaction (the end-of-run index commit — clarion-e12d424f1d). Doing +/// both atomically guarantees the marker never advances without the snapshot it +/// describes: a crash between the two cannot leave a plugin marked "current" at +/// a vocabulary the index was never re-dispatched under, which would re-arm the +/// false-dead bug this fix closes. +/// +/// `entries` fully replace the snapshot (stale rows dropped); `markers` are +/// upserted (only the plugins that ran this time are touched). +/// +/// # Errors +/// +/// Returns [`StorageError::Sqlite`] if any statement fails; the transaction is +/// rolled back without commit on error. +pub fn replace_prior_index_and_markers( + conn: &Connection, + entries: &[PriorIndexEntry], + markers: &[PluginIndexMarker], + recorded_at: &str, +) -> Result<()> { + let tx = conn.unchecked_transaction()?; + write_prior_index_rows(&tx, entries, recorded_at)?; + for marker in markers { + upsert_plugin_index_marker(&tx, marker, recorded_at)?; } tx.commit()?; Ok(()) @@ -266,6 +388,65 @@ mod tests { assert!(load_prior_index(&conn).unwrap().is_empty()); } + fn marker(plugin_id: &str, version: &str, ontology: &str) -> PluginIndexMarker { + PluginIndexMarker { + plugin_id: plugin_id.to_owned(), + plugin_version: version.to_owned(), + ontology_version: ontology.to_owned(), + } + } + + #[test] + fn plugin_marker_upsert_then_load_roundtrips() { + let conn = migrated_conn(); + upsert_plugin_index_marker(&conn, &marker("python", "1.3.1", "0.9.0"), "t0").unwrap(); + let loaded = load_plugin_index_markers(&conn).unwrap(); + assert_eq!(loaded.len(), 1); + assert_eq!( + loaded.get("python"), + Some(&marker("python", "1.3.1", "0.9.0")) + ); + } + + #[test] + fn plugin_marker_upsert_overwrites_per_plugin_and_leaves_others() { + // A re-run touches only the plugin it dispatched: the upsert advances + // that plugin's marker and leaves every other plugin's marker intact — + // the per-plugin (not table-wide) semantics the force-full decision + // relies on. + let conn = migrated_conn(); + upsert_plugin_index_marker(&conn, &marker("python", "1.3.1", "0.9.0"), "t0").unwrap(); + upsert_plugin_index_marker(&conn, &marker("rust", "1.3.1", "0.6.0"), "t0").unwrap(); + + // The rust plugin upgrades its ontology; python is untouched this run. + upsert_plugin_index_marker(&conn, &marker("rust", "1.3.1", "0.7.0"), "t1").unwrap(); + + let loaded = load_plugin_index_markers(&conn).unwrap(); + assert_eq!(loaded.len(), 2); + assert_eq!(loaded["rust"].ontology_version, "0.7.0"); + assert_eq!( + loaded["python"].ontology_version, "0.9.0", + "the unrelated plugin's marker must survive a single-plugin upsert" + ); + } + + #[test] + fn replace_prior_index_and_markers_commits_both_atomically() { + let conn = migrated_conn(); + replace_prior_index_and_markers( + &conn, + &[entry("python:function:a", "a0")], + &[marker("python", "1.3.1", "0.9.0")], + "t0", + ) + .unwrap(); + assert_eq!(load_prior_index(&conn).unwrap().len(), 1); + assert_eq!( + load_plugin_index_markers(&conn).unwrap()["python"], + marker("python", "1.3.1", "0.9.0") + ); + } + #[test] fn replace_makes_the_snapshot_equal_the_new_set_and_drops_stale_rows() { // The load-bearing WS3 behaviour: the snapshot after a run must be diff --git a/crates/loomweave-storage/src/schema.rs b/crates/loomweave-storage/src/schema.rs index e3e92eb3..38b85947 100644 --- a/crates/loomweave-storage/src/schema.rs +++ b/crates/loomweave-storage/src/schema.rs @@ -65,12 +65,17 @@ const MIGRATIONS: &[Migration] = &[ name: "0010_dedupe_findings_drop_run_scoped_ids", sql: include_str!("../migrations/0010_dedupe_findings_drop_run_scoped_ids.sql"), }, + Migration { + version: 11, + name: "0011_plugin_index_meta", + sql: include_str!("../migrations/0011_plugin_index_meta.sql"), + }, ]; /// Highest migration version known to this build. Mirrored into the /// `SQLite` `user_version` header (STO-02) so a future-built database is /// refused at open instead of silently corrupting state. -pub const CURRENT_SCHEMA_VERSION: u32 = 10; +pub const CURRENT_SCHEMA_VERSION: u32 = 11; const _CURRENT_SCHEMA_VERSION_MATCHES_LAST_MIGRATION: () = { // Compile-time check: `CURRENT_SCHEMA_VERSION` must equal the highest diff --git a/crates/loomweave-storage/src/writer.rs b/crates/loomweave-storage/src/writer.rs index e5c04106..16edfac4 100644 --- a/crates/loomweave-storage/src/writer.rs +++ b/crates/loomweave-storage/src/writer.rs @@ -278,11 +278,17 @@ fn run_actor( } WriterCmd::UpsertPriorIndex { entries, + plugin_markers, recorded_at, ack, } => { let res = query_time_write(conn, &mut state, commits_observed, |conn| { - crate::prior_index::replace_prior_index(conn, &entries, &recorded_at) + crate::prior_index::replace_prior_index_and_markers( + conn, + &entries, + &plugin_markers, + &recorded_at, + ) }); reply(ack, res); } diff --git a/crates/loomweave-storage/tests/schema_apply.rs b/crates/loomweave-storage/tests/schema_apply.rs index 03328bb9..f0dea431 100644 --- a/crates/loomweave-storage/tests/schema_apply.rs +++ b/crates/loomweave-storage/tests/schema_apply.rs @@ -842,7 +842,7 @@ fn migrations_are_idempotent() { let tempdir = tempfile::tempdir().unwrap(); let mut conn = open_fresh(&tempdir); schema::apply_migrations(&mut conn).expect("second apply should be a no-op"); - assert_eq!(schema::applied_count(&conn).unwrap(), 10); + assert_eq!(schema::applied_count(&conn).unwrap(), 11); let tables_after = table_names(&conn); assert!(tables_after.contains(&"entities".to_owned())); } @@ -856,7 +856,7 @@ fn schema_migrations_records_each_applied_migration() { row.get(0) }) .unwrap(); - assert_eq!(count, 10); + assert_eq!(count, 11); let names: Vec = { let mut stmt = conn .prepare("SELECT name FROM schema_migrations ORDER BY version") @@ -877,10 +877,47 @@ fn schema_migrations_records_each_applied_migration() { "0008_run_owner_heartbeat", "0009_drop_fts_content_text", "0010_dedupe_findings_drop_run_scoped_ids", + "0011_plugin_index_meta", ] ); } +// ---------------------------------------------------------------------------- +// Migration 0011 — per-plugin tag-schema marker (clarion-e12d424f1d). Records +// the (plugin_version, ontology_version) each plugin last analysed the index +// under, so `analyze` can force a full re-dispatch of a plugin's files when its +// emitted tag vocabulary moves. Pins the table shape the analyze force-full +// decision and the marker round-trip depend on. +// ---------------------------------------------------------------------------- + +#[test] +fn migration_0011_creates_plugin_index_meta_table() { + let tempdir = tempfile::tempdir().unwrap(); + let conn = open_fresh(&tempdir); + assert!( + table_names(&conn).contains(&"plugin_index_meta".to_owned()), + "migration 0011 must create plugin_index_meta" + ); + let columns: Vec = { + let mut stmt = conn + .prepare("SELECT name FROM pragma_table_info('plugin_index_meta')") + .unwrap(); + let rows = stmt.query_map([], |row| row.get(0)).unwrap(); + rows.map(std::result::Result::unwrap).collect() + }; + for expected in &[ + "plugin_id", + "plugin_version", + "ontology_version", + "recorded_at", + ] { + assert!( + columns.iter().any(|column| column == expected), + "missing plugin_index_meta.{expected} in {columns:?}" + ); + } +} + // ---------------------------------------------------------------------------- // Migration 0005 — SEI identity store (Wave 1 / WS1, ADR-038). The identity // store lives in `sei_bindings` (NOT a column on the cumulative `entities` diff --git a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md index da8c38ae..ed230544 100644 --- a/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md +++ b/docs/loomweave/adr/ADR-054-rust-reachability-root-model.md @@ -161,6 +161,32 @@ Additive tag-vocabulary change: Rust plugin `ontology_version` **0.5.0 → `scripts/check-rust-plugin-manifest-lockstep.py`), `serve.rs`'s `initialize` response, and the `docs/operator/language-support.md` table. +### Incremental re-analysis must re-dispatch when the tag schema moves (clarion-e12d424f1d) + +A tag-vocabulary change has a second, load-bearing obligation. The incremental +`analyze` fast path (Wave 2 / T3.1) skips a file purely on its whole-file +content hash. Adding the root tags above changes only the *emitted tags*, not +the *source bytes*, so a plain `analyze` after a plugin upgrade would silently +skip every unchanged file and keep its pre-upgrade `entity_tags` — which carry +no root tags. The dead-code survey then false-flags the unchanged public +surface as dead (its per-plugin honest-empty guard is all-or-nothing and is +defeated the moment one re-edited file re-tags). Because false-dead public API +is the exact error this whole feature exists to prevent (error-cost asymmetry, +above), this cannot be left to operator discipline. + +**Invariant.** Any plugin change to the emitted tag/entity/edge vocabulary +**MUST** bump `ontology_version` (or, failing that, `[plugin].version`). Loomweave +persists a per-plugin `(version, ontology_version)` marker (`plugin_index_meta`, +migration 0011) and, on the next `analyze`, **forces a full re-dispatch of that +plugin's files when either component moved** — even on an incremental run. The +marker is rewritten in the same transaction as the prior-index snapshot, and +only on a fully-successful run, so it can never advance past an index the new +vocabulary was actually written into. An index with no marker yet (first run +after this shipped) re-dispatches once, healing any skew an earlier silent +upgrade left behind. The comparison keys on the *pair* because `ontology_version` +is not gate-enforced for every plugin; `[plugin].version` bumps on any release +and is the conservative backstop. + ## Increment 2 (implemented): framework handlers + `pub`-method rooting A focused framework-attribute survey (a 6-agent taxonomy sweep across the Rust