fix: disclose same-locator collisions on the entity read path (clarion-48af930f2a)#74
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c30ce047f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Some(entity_id) = hf.metadata.get("anchor_entity_id") { | ||
| return entity_id.clone(); |
There was a problem hiding this comment.
Gate anchor_entity_id to duplicate locator findings
When a plugin-reported finding includes a metadata key named anchor_entity_id, this new precedence path will treat that untrusted metadata as the storage FK instead of the host-added anchor_file_path (plugin metadata is copied through before anchor_file_path is inserted in host_validate.rs). In that scenario, an otherwise valid plugin finding can anchor to a nonexistent entity and make InsertFinding fail the entire analyze run, or anchor to the wrong entity if the value happens to exist. This should be limited to the host-generated duplicate-locator finding (or use a reserved/internal metadata channel).
Useful? React with 👍 / 👎.
| "SELECT evidence FROM findings \ | ||
| WHERE entity_id = ?1 AND rule_id = ?2 \ | ||
| ORDER BY updated_at DESC, id DESC \ |
There was a problem hiding this comment.
Avoid disclosing resolved suppressed collisions
Because this lookup ignores finding status and only checks for any duplicate-locator row, a collision that was suppressed or promoted will keep showing collision even after a clean full pass stops reproducing it: the stale-finding sweep explicitly preserves non-open and Filigree-linked findings, so those rows are never deleted. The result is a permanently stale chimera warning for users who suppress/acknowledge the finding and then fix the duplicate locator. Consider adding freshness criteria to this disclosure path or giving this rule a lifecycle that can retire resolved non-open rows.
Useful? React with 👍 / 👎.
| .flatten()?; | ||
| Some(json!({ | ||
| "shadowed": true, | ||
| "declarations": collision.declarations, |
There was a problem hiding this comment.
Redact blocked paths from collision declarations
When a locator collision spans an unblocked file and a briefing-blocked file, and the unblocked declaration is the survivor row, this unconditional declarations projection exposes the blocked file path. The briefing guard at the top of entity_json only considers the survivor entity's briefing_blocked flag, while these paths come from both declarations' finding metadata; redact or drop declaration paths whose file anchor is briefing-blocked before returning this field.
Useful? React with 👍 / 👎.
… path (clarion-48af930f2a) The entity store keeps one row per id — a cross-product PK (SEI, Filigree associations, Wardline taint). When two source declarations assemble the same locator (Python module `colliding.py` vs package `colliding/__init__.py`, conditional redefinition, re-exports), the writer's load-bearing `ON CONFLICT(id) DO UPDATE` absorbs the collision as last-write-wins: the survivor row is a chimera and one declaration is silently shadowed. `DuplicateLocatorGuard` (clarion-b19fe90c3e) already detects this and files an ERROR `LMWV-DUPLICATE-LOCATOR` finding — but that finding anchored to the FILE, so entity-scoped reads (`entity_finding_list`, `entity_json`) gave no signal: a consumer reading the shadowed id silently got a clean-looking row. Keying colliding declarations as distinct rows (the issue's option 1) is rejected: `id` is the cross-product PK, and the collision is genuinely ambiguous (the package shadows the module at import) so two rows give no principled edge routing — false correctness. This is option 2 (disclosure), the honest ceiling: the shadowed declaration's edges were already collapsed at analyze and are NOT recoverable; disclosure flags the unreliability rather than pretending to fix it. - Re-anchor the finding to the COLLIDING ENTITY (anchor_entity_id metadata), not the file. `entity_finding_list(survivor)` now surfaces it; the Filigree emit still gets a real path via the entity's own source_file_path (findings_for_emit join), subsuming the old file-anchor workaround. - New `duplicate_locator_collision` storage query + a `collision` field on the `entity_json` read choke point (covers entity_at / entity_find / entity_callers_list / entity_relation_list neighbors). Read regardless of finding status, so suppression (a status flip; the row persists) does not hide the chimera. Lifecycle is the standing full-pass stale sweep: survives a no-op incremental run (loomweave's normal mode), clears on a clean full pass that no longer reproduces the collision. - Hoist the rule-id constant to loomweave-core so the analyze-side detector and the storage-side disclosure query share one string (no new cross-crate magic string; cf. clarion-570e9ac203). Tests: re-anchor + 3-run lifecycle E2E (duplicate_locator.rs); storage query (open / suppressed / other-rule / none); entity_json disclosure incl. the suppressed-survival case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9c30ce0 to
69bad9c
Compare
What
When two source declarations assemble the same entity locator/id (e.g. Python module
colliding.pyvs packagecolliding/__init__.py, conditional redefinition, re-exports), the writer's load-bearingON CONFLICT(id) DO UPDATEabsorbs the collision as last-write-wins: the surviving row is a chimera and one declaration is silently shadowed.DuplicateLocatorGuard(clarion-b19fe90c3e) already detects this and files an ERRORLMWV-DUPLICATE-LOCATORfinding — but the finding anchored to the file, so entity-scoped reads (entity_finding_list,entity_json) gave no signal. A consumer reading the shadowed id silently got a clean-looking row (the reported bug).Why disclosure, not distinct rows
The issue offered two directions. Option 1 (key colliding declarations as distinct rows by path) is rejected:
idis a cross-product PK (SEI, Filigree associations, Wardline taint), and the collision is genuinely ambiguous (in real Python the package shadows the module at import), so two rows give no principled edge/association routing — false correctness. This is option 2 (disclosure), the honest ceiling: the shadowed declaration's edges were already collapsed during analyze and are not recoverable; disclosure flags the unreliability rather than pretending to fix it.Changes
anchor_entity_id), not the file.entity_finding_list(survivor)now surfaces it; the Filigree emit still gets a real path via the entity's ownsource_file_path(thefindings_for_emitjoin), subsuming the old file-anchor workaround.duplicate_locator_collisionstorage query + acollisionfield on theentity_jsonread choke point — coversentity_at,entity_find,entity_callers_list,entity_relation_listneighbors (verified all four route throughentity_json). Read regardless of finding status, so suppression (a status flip; the row persists) does not hide the chimera.loomweave-coreso the analyze-side detector and the storage-side query share one string (no new cross-crate magic string; cf. clarion-570e9ac203).Tests
duplicate_locator.rs): collision → finding anchored to the entity; survives no-op incremental; clears on full re-pass with one declaration removed.entity_jsondisclosure incl. the suppressed-survival case.Full CI floor green locally (fmt, clippy on touched crates, build, nextest 1948 passed, doc, deny).
Known limitation (disclosure-not-recovery, stated honestly)
Anchoring to the survivor means the flag clears only on a clean full pass. In normal incremental operation, after a user resolves a collision the flag can persist stale until a full
analyze --no-incremental. Consistent with ADR-047 (findings are regenerable current-state).🤖 Generated with Claude Code