Skip to content

[critical] Durability vector over-advances from the live index after a push #103

Description

@mbertschler

Summary

Every handler advances the durability vector by reading the live present set at advance time (AdvanceDestinationVectorpresentOriginMaxima), but the advance runs after the transfer/upload, and index runs are not mutually exclusive with syncs. A row committed between the push's enumeration and the advance is claimed durable on the destination although it was never transferred. For the content-addressed layout this never self-heals.

Where

  • sync/sync.go (RunPair advances from the live set after a verified push)
  • sync/node.go (peer close-phase advance)
  • sync/content_addressed.go (transactional advance; delta fixed at push start, advance reads live set)
  • store/runs.goBeginSyncRunIfClear blocks only same-pair sync; BeginIndexRunIfClear blocks only index/audit; no cross-kind exclusion

Scenario

Content-addressed (permanent): index run 100 and content-addressed sync run 101 run concurrently (nothing blocks the pair). Sync 101 fixes its delta at T1, uploads, writes its segment. Index 100 commits a new file row stamped status_changed_run_id = 100 after T1. Sync 101 then advances the vector from the live set → self component ≥ 100, even though that object was never uploaded. The next delta watermark is 101, so status_changed_run_id = 100 ≤ 101 means the row is never exported in any future manifest. The vector permanently claims durability for an object that does not exist on the destination → offload deletes the only copy.

Mirror/kopia (self-heals, but offload-window): a file lands and is indexed during the multi-hour rclone transfer; the post-transfer advance covers it though rclone never copied it. The next sync re-uploads it, but an offload in the window deletes a false-durable file.

Fix shape

  1. Compute the origin maxima to advance once, from the push's own enumeration snapshot (the delta for content-addressed, the plan for peer, the pre-transfer listing for mirror), and advance to exactly those components — never re-read the live table post-transfer.
  2. Make content-addressed sync and index/audit runs mutually exclusive per volume (extend both *IfClear gates cross-kind, the way BeginOffloadRunIfClear already blocks on every running kind).

The freshness condition in the sibling gate issue is the belt to this suspenders: even if a stale advance slips through, a path that became present after the last push fails the gate.

Adversarial audit of offload-v1 (auditors B CRITICAL-1 / HIGH-2, D F2).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdata-lossCould cause silent data losssecuritySecurity / data-integrity finding

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions