offload: time-based durability-evidence staleness policy (#131)#134
Open
mbertschler wants to merge 1 commit into
Open
offload: time-based durability-evidence staleness policy (#131)#134mbertschler wants to merge 1 commit into
mbertschler wants to merge 1 commit into
Conversation
The offload gate weighed only version-vector staleness — a component was stale when its covered origin run fell below the file's origin run. It had no notion of freshness in wall-clock time, so evidence for a destination dead or unreachable for months still gated offload indefinitely, letting the laptop delete local bytes on the strength of a claim never since re-confirmed (issue #131). This is defence-in-depth, not a live data-loss path: coverage was already correct, only freshness was missing. Migration v23 adds a nullable verified_at_ns to destination_run_ids. updated_at_ns keeps its meaning (the last applied write, bumped even by an equal-value re-confirmation); verified_at_ns advances only on a write backed by genuine re-verification — a content-verified method or a strict run advance — so a no-op touch never makes stale evidence look freshly checked. The history table is unchanged (it already records each advance's at_ns) and the gate reads only the live vector. NULL is fail-closed: an unknown verification time reads as infinitely stale. The gate gains an injected nowNs (captured once per invocation) and an opt-in offload_max_evidence_age knob (per volume, default disabled so existing configs are unaffected). When set, a required target whose verified_at_ns is unknown or older than the max age is refused, fail-closed, naming the age and provenance. For a peer-asserted component verified_at_ns records when this node last pulled a fresh assertion — the peer's own verification instant never travels the wire — so the policy bounds how long relayed evidence is trusted without hearing from the peer again, the dead-peer defence the issue calls for. The staleness policy only refuses to offload; it never deletes or expires any evidence row. Closes #131
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #133 — merge #133 first
This PR is stacked on #133 (branch
issue-104-durability-provenance, the per-peer provenance tagging). Its base is that branch, so the diff here shows only the #131 work. Merge #133 before this. Once #133 merges tomain, GitHub will retarget this PR tomain.Closes #131
What this does
The offload durability gate weighed only version-vector staleness (a component stale when its covered origin run < the file's origin run). It had no notion of wall-clock freshness, so evidence for a destination dead/unreachable for months still gated
offloadindefinitely — the laptop kept deleting local bytes on the strength of a "durable" claim never since re-confirmed. This is defence-in-depth, not a live data-loss path: coverage was already correct; only time-based freshness was missing. It pairs with the periodic re-verification story (squirrel verify, scrub).Store (migration v23)
verified_at_nstodestination_run_ids.updated_at_nskeeps its meaning (last applied write, bumped even by an equal-value re-confirmation).verified_at_nsadvances only on a write backed by genuine re-verification — a content-verified method, or a strict run advance — so a no-op touch never makes stale evidence look freshly checked.destination_run_ids_historyis unchanged (it already records each advance'sat_ns); the gate reads only the live vector. No index (the gate loads the whole vector and filters in Go).Gate (
offload/gate.go)nowNs(captured once per invocation inOffload, so all candidates are judged against one instant and tests are deterministic) + amaxEvidenceAge.staleEvidenceFailurecheck, applied per target after coverage passes: refuses whenverified_at_nsis unknown or older thannow − maxEvidenceAge, fail-closed, naming the age and provenance.Config + CLI
offload_max_evidence_ageknob (e.g."720h"), plumbed throughOptions.MaxEvidenceAge. Default disabled (zero = no max age).Decisions (need your sign-off)
offload_requires; existing configs are unaffected and no one is surprised by new refusals. Opt-in via the knob.verified_at_nscolumn vs reuseupdated_at_ns. I added a separate column. Reusingupdated_at_nsis unsound because an equal-value re-advance bumps it without any re-verification (the issue's own concern) — a timestamp wouldn't imply "recently checked."verified_at_nsadvances only on genuine re-verification, leavingupdated_at_nssemantics intact for any existing reader. Tradeoff: +1 nullable column + a migration vs. zero schema change; I judged the soundness worth it. NULL (pre-v23 rows, or methodless-only advances) is fail-closed.verified_at_nsrecords when this node last pulled a fresh assertion — the peer's own verification instant never travels the wire. So the policy bounds how long this node trusts relayed evidence without hearing from the peer again (the dead-peer defence). The refusal names the asserting peer, not the local node. A live peer's pull carrying a verified method re-stampsverified_at_ns, keeping its relayed evidence fresh.#104 / #133 regression check
No change to #133's provenance logic; I only read
VerifiedAtNsalongsideSourceNodeIDin the gate and SELECTs. The fail-closed freshness behavior (push-watermark / push-freshness) is untouched — the new check is an additional, independent refusal. The wire protocol (syncproto) is unchanged.Tests (deterministic, injected clock)
store: v22→v23 migration addsverified_at_nsNULL on carried rows; verified advance stamps it; an equal-value methodless re-confirm bumpsupdated_at_nsbut notverified_at_ns; a strict methodless advance re-stamps it.offload: stale evidence refuses; fresh passes; disabled policy is a no-op; NULL is fail-closed; peer-relayed staleness ages out and names the peer; end-to-endOffloadwith a max age deletes a freshly-verified file.config: knob parses; absent defaults to zero; garbage/sub-second/unitless rejected.go vet ./...,go test ./...,golangci-lint runall green;store/schema.sqlregenerated at v23 (TestSchemaSnapshotpasses).