Skip to content

fix: disclose same-locator collisions on the entity read path (clarion-48af930f2a)#74

Merged
tachyon-beep merged 1 commit into
mainfrom
fix/duplicate-locator-disclosure
Jun 25, 2026
Merged

fix: disclose same-locator collisions on the entity read path (clarion-48af930f2a)#74
tachyon-beep merged 1 commit into
mainfrom
fix/duplicate-locator-disclosure

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

What

When two source declarations assemble the same entity locator/id (e.g. Python module colliding.py vs package colliding/__init__.py, conditional redefinition, re-exports), the writer's load-bearing ON CONFLICT(id) DO UPDATE absorbs the collision as last-write-wins: the surviving row is a chimera and one declaration is silently shadowed. DuplicateLocatorGuard (clarion-b19fe90c3e) already detects this and files an ERROR LMWV-DUPLICATE-LOCATOR finding — but the finding anchored to the file, so entity-scoped reads (entity_finding_list, entity_json) gave no signal. A consumer reading the shadowed id silently got a clean-looking row (the reported bug).

Why disclosure, not distinct rows

The issue offered two directions. Option 1 (key colliding declarations as distinct rows by path) is rejected: id is a cross-product PK (SEI, Filigree associations, Wardline taint), and the collision is genuinely ambiguous (in real Python the package shadows the module at import), so two rows give no principled edge/association routing — false correctness. This is option 2 (disclosure), the honest ceiling: the shadowed declaration's edges were already collapsed during analyze and are not recoverable; disclosure flags the unreliability rather than pretending to fix it.

Changes

  • Re-anchor the finding to the colliding entity (anchor_entity_id), not the file. entity_finding_list(survivor) now surfaces it; the Filigree emit still gets a real path via the entity's own source_file_path (the findings_for_emit join), subsuming the old file-anchor workaround.
  • New duplicate_locator_collision storage query + a collision field on the entity_json read choke point — covers entity_at, entity_find, entity_callers_list, entity_relation_list neighbors (verified all four route through entity_json). Read regardless of finding status, so suppression (a status flip; the row persists) does not hide the chimera.
  • Lifecycle is the standing full-pass stale sweep: the disclosure survives a no-op incremental run (loomweave's normal mode) and clears on a clean full pass that no longer reproduces the collision.
  • Hoist the rule-id constant to loomweave-core so the analyze-side detector and the storage-side query share one string (no new cross-crate magic string; cf. clarion-570e9ac203).

Tests

  • E2E re-anchor + 3-run lifecycle (duplicate_locator.rs): collision → finding anchored to the entity; survives no-op incremental; clears on full re-pass with one declaration removed.
  • Storage query unit tests (open / suppressed / other-rule / none).
  • entity_json disclosure incl. the suppressed-survival case.

Full CI floor green locally (fmt, clippy on touched crates, build, nextest 1948 passed, doc, deny).

Known limitation (disclosure-not-recovery, stated honestly)

Anchoring to the survivor means the flag clears only on a clean full pass. In normal incremental operation, after a user resolves a collision the flag can persist stale until a full analyze --no-incremental. Consistent with ADR-047 (findings are regenerable current-state).

🤖 Generated with Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c30ce047f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3737 to +3738
if let Some(entity_id) = hf.metadata.get("anchor_entity_id") {
return entity_id.clone();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate anchor_entity_id to duplicate locator findings

When a plugin-reported finding includes a metadata key named anchor_entity_id, this new precedence path will treat that untrusted metadata as the storage FK instead of the host-added anchor_file_path (plugin metadata is copied through before anchor_file_path is inserted in host_validate.rs). In that scenario, an otherwise valid plugin finding can anchor to a nonexistent entity and make InsertFinding fail the entire analyze run, or anchor to the wrong entity if the value happens to exist. This should be limited to the host-generated duplicate-locator finding (or use a reserved/internal metadata channel).

Useful? React with 👍 / 👎.

Comment on lines +1206 to +1208
"SELECT evidence FROM findings \
WHERE entity_id = ?1 AND rule_id = ?2 \
ORDER BY updated_at DESC, id DESC \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid disclosing resolved suppressed collisions

Because this lookup ignores finding status and only checks for any duplicate-locator row, a collision that was suppressed or promoted will keep showing collision even after a clean full pass stops reproducing it: the stale-finding sweep explicitly preserves non-open and Filigree-linked findings, so those rows are never deleted. The result is a permanently stale chimera warning for users who suppress/acknowledge the finding and then fix the duplicate locator. Consider adding freshness criteria to this disclosure path or giving this rule a lifecycle that can retire resolved non-open rows.

Useful? React with 👍 / 👎.

.flatten()?;
Some(json!({
"shadowed": true,
"declarations": collision.declarations,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Redact blocked paths from collision declarations

When a locator collision spans an unblocked file and a briefing-blocked file, and the unblocked declaration is the survivor row, this unconditional declarations projection exposes the blocked file path. The briefing guard at the top of entity_json only considers the survivor entity's briefing_blocked flag, while these paths come from both declarations' finding metadata; redact or drop declaration paths whose file anchor is briefing-blocked before returning this field.

Useful? React with 👍 / 👎.

… path (clarion-48af930f2a)

The entity store keeps one row per id — a cross-product PK (SEI, Filigree
associations, Wardline taint). When two source declarations assemble the same
locator (Python module `colliding.py` vs package `colliding/__init__.py`,
conditional redefinition, re-exports), the writer's load-bearing
`ON CONFLICT(id) DO UPDATE` absorbs the collision as last-write-wins: the
survivor row is a chimera and one declaration is silently shadowed.
`DuplicateLocatorGuard` (clarion-b19fe90c3e) already detects this and files an
ERROR `LMWV-DUPLICATE-LOCATOR` finding — but that finding anchored to the FILE,
so entity-scoped reads (`entity_finding_list`, `entity_json`) gave no signal: a
consumer reading the shadowed id silently got a clean-looking row.

Keying colliding declarations as distinct rows (the issue's option 1) is
rejected: `id` is the cross-product PK, and the collision is genuinely ambiguous
(the package shadows the module at import) so two rows give no principled edge
routing — false correctness. This is option 2 (disclosure), the honest ceiling:
the shadowed declaration's edges were already collapsed at analyze and are NOT
recoverable; disclosure flags the unreliability rather than pretending to fix it.

- Re-anchor the finding to the COLLIDING ENTITY (anchor_entity_id metadata),
  not the file. `entity_finding_list(survivor)` now surfaces it; the Filigree
  emit still gets a real path via the entity's own source_file_path
  (findings_for_emit join), subsuming the old file-anchor workaround.
- New `duplicate_locator_collision` storage query + a `collision` field on the
  `entity_json` read choke point (covers entity_at / entity_find /
  entity_callers_list / entity_relation_list neighbors). Read regardless of
  finding status, so suppression (a status flip; the row persists) does not hide
  the chimera. Lifecycle is the standing full-pass stale sweep: survives a no-op
  incremental run (loomweave's normal mode), clears on a clean full pass that no
  longer reproduces the collision.
- Hoist the rule-id constant to loomweave-core so the analyze-side detector and
  the storage-side disclosure query share one string (no new cross-crate magic
  string; cf. clarion-570e9ac203).

Tests: re-anchor + 3-run lifecycle E2E (duplicate_locator.rs); storage query
(open / suppressed / other-rule / none); entity_json disclosure incl. the
suppressed-survival case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tachyon-beep tachyon-beep force-pushed the fix/duplicate-locator-disclosure branch from 9c30ce0 to 69bad9c Compare June 25, 2026 20:06
@tachyon-beep tachyon-beep merged commit ceee560 into main Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant