fix(storage): heal stale contains edge on file_scope claim move (clarion-abda98c869)#75
Conversation
… (clarion-abda98c869) A file_scope entity's `parent_id` and its `contains` edge are dual encodings of one fact (ADR-026). When the claiming file moves between runs — a module/package dual-claim whose claimer flips (e.g. Python `colliding.py` vs `colliding/__init__.py` both declaring module `specimen.colliding`), or a genuine entity move — the new claimer sets `parent_id` + its own `contains` edge, but the PREVIOUS claimer's `contains` edge is never pruned. The two edges then disagree with the single `parent_id`, and `parent_contains_mismatch` aborts the entire run at flush/commit with LMWV-INFRA-PARENT-CONTAINS-MISMATCH. The collect_source_files sort (PR #57) only made the *move* case deterministic; the dual-claim case fails regardless of order, and any index already carrying a stale edge fails every subsequent full `analyze`. Fix: in the writer's `insert_entity`, after the entity upsert, prune any `contains` edge into the entity whose `from_id != parent_id` — in the same transaction as the parent write. `parent_id` is authoritative, so this only removes contradictions, never the matching edge (`from_id == parent_id`); a root entity (no parent_id) is left untouched. The result is order-independent (the matching edge is never pruned regardless of entity-vs-edge insert order) and self-healing: an already-corrupted index recovers on the next analysis that re-emits the entity. Verified: re-running the original failing repro (`analyze <lacuna> --no-incremental`) now completes and collapses the module's two stale contains edges to the single consistent one. Both existing parent-contains validations still reject genuinely-broken graphs (missing edge; orphan edge / no parent) — the prune fires only when parent_id is set and only deletes from_id != parent_id. Regression test (writer-level, deterministic, no file-order dependence): reclaiming_entity_under_a_new_parent_prunes_the_stale_contains_edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a733289bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // The claiming file's own contains edge (from_id == parent_id) is never | ||
| // touched, so this only removes contradictions, never the matching edge; an | ||
| // entity with no parent_id (a root) is left entirely alone. | ||
| if let Some(parent_id) = entity.parent_id.as_deref() { |
There was a problem hiding this comment.
Prune stale contains edges when an entity becomes a root
When an existing contained entity is re-emitted as a root (parent_id = None), this new self-heal path is skipped entirely, leaving the old contains edge into that entity in the database. The commit/flush inverse check rejects contains rows whose child now has parent_id IS NULL, so legitimate moves/promotions from nested-to-root still fail with LMWV-INFRA-PARENT-CONTAINS-MISMATCH instead of self-healing; the None case should delete all incoming contains edges for this entity.
Useful? React with 👍 / 👎.
What
A file_scope entity's
parent_idand itscontainsedge are dual encodings of one fact (ADR-026). When the claiming file moves between runs — a module/package dual-claim whose claimer flips (Pythoncolliding.pyvscolliding/__init__.pyboth declaring modulespecimen.colliding), or a genuine entity move — the new claimer setsparent_id+ its owncontainsedge, but the previous claimer'scontainsedge is never pruned. The two edges then contradict the singleparent_id, andparent_contains_mismatchaborts the whole run at flush/commit withLMWV-INFRA-PARENT-CONTAINS-MISMATCH.The
collect_source_filessort (PR #57) only made the move case deterministic; the dual-claim case fails regardless of order, and any index already carrying a stale edge fails every subsequent fullanalyze.Root cause (systematic-debugging)
Established from real DB state + controlled reproduction, not assumption:
containsedge lingers. Reproduced deterministically (writer-level + the lacuna specimen).bump_writes_and_maybe_commit) commits intermediate state without validation, which is how an inconsistent state becomes durable (and why a later full run trips the global check).Fix
In the writer's
insert_entity, after the entity upsert, prune anycontainsedge into the entity whosefrom_id != parent_id, in the same transaction as the parent write.parent_idis authoritative, so this only removes contradictions, never the matching edge (from_id == parent_id); a root entity (no parent_id) is untouched.Verification
reclaiming_entity_under_a_new_parent_prunes_the_stale_contains_edge.--all-targets, nextest 1948, doc, deny.analyze <lacuna> --no-incremental) now completes and collapses the module's two stale contains edges to the single consistent one.Caveats (recorded honestly)
🤖 Generated with Claude Code