offload: time-based durability-evidence staleness policy (#131) by mbertschler · Pull Request #134 · mbertschler/squirrel

mbertschler · 2026-06-20T23:50:49Z

Stacked on #133 — merge #133 first

This PR is stacked on #133 (branch issue-104-durability-provenance, the per-peer provenance tagging). Its base is that branch, so the diff here shows only the #131 work. Merge #133 before this. Once #133 merges to main, GitHub will retarget this PR to main.

Closes #131

What this does

The offload durability gate weighed only version-vector staleness (a component stale when its covered origin run < the file's origin run). It had no notion of wall-clock freshness, so evidence for a destination dead/unreachable for months still gated offload indefinitely — the laptop kept deleting local bytes on the strength of a "durable" claim never since re-confirmed. This is defence-in-depth, not a live data-loss path: coverage was already correct; only time-based freshness was missing. It pairs with the periodic re-verification story (squirrel verify, scrub).

Store (migration v23)

Adds a nullable verified_at_ns to destination_run_ids.
updated_at_ns keeps its meaning (last applied write, bumped even by an equal-value re-confirmation). verified_at_ns advances only on a write backed by genuine re-verification — a content-verified method, or a strict run advance — so a no-op touch never makes stale evidence look freshly checked.
destination_run_ids_history is unchanged (it already records each advance's at_ns); the gate reads only the live vector. No index (the gate loads the whole vector and filters in Go).

Gate (`offload/gate.go`)

Injected nowNs (captured once per invocation in Offload, so all candidates are judged against one instant and tests are deterministic) + a maxEvidenceAge.
New staleEvidenceFailure check, applied per target after coverage passes: refuses when verified_at_ns is unknown or older than now − maxEvidenceAge, fail-closed, naming the age and provenance.

Config + CLI

New per-volume offload_max_evidence_age knob (e.g. "720h"), plumbed through Options.MaxEvidenceAge. Default disabled (zero = no max age).

Decisions (need your sign-off)

Refuse vs warn → refuse (fail-closed). A staleness policy must refuse to offload when evidence is stale, never delete/expire evidence rows. The policy only blocks the delete; no rows are touched.
Default → disabled (zero). Matches the existing "explicit opt-in, no default" philosophy of offload_requires; existing configs are unaffected and no one is surprised by new refusals. Opt-in via the knob.
New verified_at_ns column vs reuse updated_at_ns. I added a separate column. Reusing updated_at_ns is unsound because an equal-value re-advance bumps it without any re-verification (the issue's own concern) — a timestamp wouldn't imply "recently checked." verified_at_ns advances only on genuine re-verification, leaving updated_at_ns semantics intact for any existing reader. Tradeoff: +1 nullable column + a migration vs. zero schema change; I judged the soundness worth it. NULL (pre-v23 rows, or methodless-only advances) is fail-closed.
Peer-relayed freshness. For a peer-asserted component (built on sync: tag pulled durability evidence with its asserting peer (#104 residual) #133's provenance), verified_at_ns records when this node last pulled a fresh assertion — the peer's own verification instant never travels the wire. So the policy bounds how long this node trusts relayed evidence without hearing from the peer again (the dead-peer defence). The refusal names the asserting peer, not the local node. A live peer's pull carrying a verified method re-stamps verified_at_ns, keeping its relayed evidence fresh.

#104 / #133 regression check

No change to #133's provenance logic; I only read VerifiedAtNs alongside SourceNodeID in the gate and SELECTs. The fail-closed freshness behavior (push-watermark / push-freshness) is untouched — the new check is an additional, independent refusal. The wire protocol (syncproto) is unchanged.

Tests (deterministic, injected clock)

store: v22→v23 migration adds verified_at_ns NULL on carried rows; verified advance stamps it; an equal-value methodless re-confirm bumps updated_at_ns but not verified_at_ns; a strict methodless advance re-stamps it.
offload: stale evidence refuses; fresh passes; disabled policy is a no-op; NULL is fail-closed; peer-relayed staleness ages out and names the peer; end-to-end Offload with a max age deletes a freshly-verified file.
config: knob parses; absent defaults to zero; garbage/sub-second/unitless rejected.

go vet ./..., go test ./..., golangci-lint run all green; store/schema.sql regenerated at v23 (TestSchemaSnapshot passes).

The offload gate weighed only version-vector staleness — a component was stale when its covered origin run fell below the file's origin run. It had no notion of freshness in wall-clock time, so evidence for a destination dead or unreachable for months still gated offload indefinitely, letting the laptop delete local bytes on the strength of a claim never since re-confirmed (issue #131). This is defence-in-depth, not a live data-loss path: coverage was already correct, only freshness was missing. Migration v23 adds a nullable verified_at_ns to destination_run_ids. updated_at_ns keeps its meaning (the last applied write, bumped even by an equal-value re-confirmation); verified_at_ns advances only on a write backed by genuine re-verification — a content-verified method or a strict run advance — so a no-op touch never makes stale evidence look freshly checked. The history table is unchanged (it already records each advance's at_ns) and the gate reads only the live vector. NULL is fail-closed: an unknown verification time reads as infinitely stale. The gate gains an injected nowNs (captured once per invocation) and an opt-in offload_max_evidence_age knob (per volume, default disabled so existing configs are unaffected). When set, a required target whose verified_at_ns is unknown or older than the max age is refused, fail-closed, naming the age and provenance. For a peer-asserted component verified_at_ns records when this node last pulled a fresh assertion — the peer's own verification instant never travels the wire — so the policy bounds how long relayed evidence is trusted without hearing from the peer again, the dead-peer defence the issue calls for. The staleness policy only refuses to offload; it never deletes or expires any evidence row. Closes #131

mbertschler mentioned this pull request Jun 21, 2026

store: harden v13→v14 backfill + broaden migration integrity corpus (#113) #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

offload: time-based durability-evidence staleness policy (#131)#134

offload: time-based durability-evidence staleness policy (#131)#134
mbertschler wants to merge 1 commit into
issue-104-durability-provenancefrom
issue-131-offload-staleness

mbertschler commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbertschler commented Jun 20, 2026

Stacked on #133 — merge #133 first

What this does

Store (migration v23)

Gate (offload/gate.go)

Config + CLI

Decisions (need your sign-off)

#104 / #133 regression check

Tests (deterministic, injected clock)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gate (`offload/gate.go`)