Skip to content

fix(storage): heal stale contains edge on file_scope claim move (clarion-abda98c869)#75

Merged
tachyon-beep merged 1 commit into
mainfrom
fix/parent-contains-dual-claim-prune
Jun 25, 2026
Merged

fix(storage): heal stale contains edge on file_scope claim move (clarion-abda98c869)#75
tachyon-beep merged 1 commit into
mainfrom
fix/parent-contains-dual-claim-prune

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

What

A file_scope entity's parent_id and its contains edge are dual encodings of one fact (ADR-026). When the claiming file moves between runs — a module/package dual-claim whose claimer flips (Python colliding.py vs colliding/__init__.py both declaring module specimen.colliding), or a genuine entity move — the new claimer sets parent_id + its own contains edge, but the previous claimer's contains edge is never pruned. The two edges then contradict the single parent_id, and parent_contains_mismatch aborts the whole run at flush/commit with LMWV-INFRA-PARENT-CONTAINS-MISMATCH.

The collect_source_files sort (PR #57) only made the move case deterministic; the dual-claim case fails regardless of order, and any index already carrying a stale edge fails every subsequent full analyze.

Root cause (systematic-debugging)

Established from real DB state + controlled reproduction, not assumption:

  • A single fresh full analyze is consistent (one contains edge) — so this is not within-run.
  • The corruption is cross-run accumulation: when the claim flips, the old claimer's contains edge lingers. Reproduced deterministically (writer-level + the lacuna specimen).
  • The periodic auto-commit (bump_writes_and_maybe_commit) commits intermediate state without validation, which is how an inconsistent state becomes durable (and why a later full run trips the global check).

Fix

In the writer's insert_entity, after the entity upsert, prune any contains edge into the entity whose from_id != parent_id, in the same transaction as the parent write. parent_id is authoritative, so this only removes contradictions, never the matching edge (from_id == parent_id); a root entity (no parent_id) is untouched.

  • Order-independent: the matching edge is never pruned regardless of entity-vs-edge insert order.
  • Self-healing: an already-corrupted index recovers on the next analysis that re-emits the entity.
  • Safe at scale: periodic auto-commit does not validate, so there is no mid-batch validation window between the prune and the new edge insert.

Verification

  • Writer-level regression test (deterministic, no file-order dependence): reclaiming_entity_under_a_new_parent_prunes_the_stale_contains_edge.
  • Both existing parent-contains validations still reject genuinely-broken graphs (missing edge; orphan edge).
  • Full floor green: fmt, clippy --all-targets, nextest 1948, doc, deny.
  • End-to-end: re-running the original failing repro (analyze <lacuna> --no-incremental) now completes and collapses the module's two stale contains edges to the single consistent one.

Caveats (recorded honestly)

  • Existing corrupt indexes heal on the next full re-analysis (or any run that re-emits the entity — the move case is covered, since the changed new file re-emits). A plain incremental that re-emits neither colliding file won't heal until a full pass.
  • Deeper systemic note (out of scope here): periodic auto-commit persists intermediate state without validation, so a run that fails the final check can still leave durable partial state. This fix makes such state self-heal on re-emit but does not change the commit/validation cadence.

🤖 Generated with Claude Code

… (clarion-abda98c869)

A file_scope entity's `parent_id` and its `contains` edge are dual encodings of
one fact (ADR-026). When the claiming file moves between runs — a module/package
dual-claim whose claimer flips (e.g. Python `colliding.py` vs
`colliding/__init__.py` both declaring module `specimen.colliding`), or a
genuine entity move — the new claimer sets `parent_id` + its own `contains`
edge, but the PREVIOUS claimer's `contains` edge is never pruned. The two edges
then disagree with the single `parent_id`, and `parent_contains_mismatch` aborts
the entire run at flush/commit with LMWV-INFRA-PARENT-CONTAINS-MISMATCH. The
collect_source_files sort (PR #57) only made the *move* case deterministic; the
dual-claim case fails regardless of order, and any index already carrying a
stale edge fails every subsequent full `analyze`.

Fix: in the writer's `insert_entity`, after the entity upsert, prune any
`contains` edge into the entity whose `from_id != parent_id` — in the same
transaction as the parent write. `parent_id` is authoritative, so this only
removes contradictions, never the matching edge (`from_id == parent_id`); a
root entity (no parent_id) is left untouched. The result is order-independent
(the matching edge is never pruned regardless of entity-vs-edge insert order)
and self-healing: an already-corrupted index recovers on the next analysis that
re-emits the entity. Verified: re-running the original failing repro
(`analyze <lacuna> --no-incremental`) now completes and collapses the module's
two stale contains edges to the single consistent one.

Both existing parent-contains validations still reject genuinely-broken graphs
(missing edge; orphan edge / no parent) — the prune fires only when parent_id is
set and only deletes from_id != parent_id.

Regression test (writer-level, deterministic, no file-order dependence):
reclaiming_entity_under_a_new_parent_prunes_the_stale_contains_edge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a733289bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// The claiming file's own contains edge (from_id == parent_id) is never
// touched, so this only removes contradictions, never the matching edge; an
// entity with no parent_id (a root) is left entirely alone.
if let Some(parent_id) = entity.parent_id.as_deref() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune stale contains edges when an entity becomes a root

When an existing contained entity is re-emitted as a root (parent_id = None), this new self-heal path is skipped entirely, leaving the old contains edge into that entity in the database. The commit/flush inverse check rejects contains rows whose child now has parent_id IS NULL, so legitimate moves/promotions from nested-to-root still fail with LMWV-INFRA-PARENT-CONTAINS-MISMATCH instead of self-healing; the None case should delete all incoming contains edges for this entity.

Useful? React with 👍 / 👎.

@tachyon-beep tachyon-beep merged commit 88bdf4d into main Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant