Skip to content

Schema & robustness hardening: contents immutability triggers (v21) + #114 cluster#125

Merged
mbertschler merged 7 commits into
offload-v1from
fix-schema-robustness
Jun 11, 2026
Merged

Schema & robustness hardening: contents immutability triggers (v21) + #114 cluster#125
mbertschler merged 7 commits into
offload-v1from
fix-schema-robustness

Conversation

@mbertschler

Copy link
Copy Markdown
Owner

Final hardening PR of the adversarial-audit follow-up. Schema defense-in-depth (#113) plus the robustness cluster (#114). Every change is additive or a refusal; nothing relaxes an existing guarantee.

Addresses #113
Addresses #114

Schema change (v21)

New migration migrateV20ToV21 (additive; no existing migration edited) installs two triggers on contents, restoring the schema-level immutability the AGENTS.md append-only contract implies and that the v13→v14 reshape dropped along with files_blake3_immutable:

  • contents_no_updateBEFORE UPDATE ON contentsRAISE(ABORT, …)
  • contents_no_deleteBEFORE DELETE ON contentsRAISE(ABORT, …)

store/schema.sql regenerated via go test ./store -update-schema; TestSchemaSnapshot green. Defense-in-depth: the auditor confirmed there is no UPDATE/DELETE path on contents today.

#113 — index/migration integrity

Item Disposition
12a contents immutability trigger Landed — v21 triggers above.
12b v14 backfill same-hash-different-size guard N/A in v14 — cannot edit the v14 migration (immutable-migration rule). The runtime equivalent (item 2 below) is degenerate post-v14.
12c migration test shapes LandedTestMigrateV18ChainToV21 drives a populated v18 fixture (contents, files, remote_objects, destination_run_ids) through v19→v21, asserts the offload-substrate rows survive (verify_method NULL-backfilled, remote_objects fingerprint intact), and asserts the contents triggers ABORT an UPDATE and a DELETE.

Skipped with note — contents/files blake3↔size consistency check (db check): post-v14, contents.blake3 is UNIQUE and files reference content_id (not blake3/size), so a blake3→size disagreement is unrepresentable by construction; a runtime check would have nothing to find. The v21 immutability trigger plus the merged #107 stat-after-hash fix already prevent new occurrences. Did not force a degenerate check in; happy to file a follow-up if a non-degenerate framing is wanted.

#114 — robustness cluster

All five items in scope landed (each a focused change with a test):

  • Run-gate asymmetry — added 'offload' to the blocking sets of BeginSyncRunIfClear and BeginIndexRunIfClear so index/sync refuse while an offload is in flight (offload already blocks on every kind). Tested both directions.
  • FinishHookRun terminal guard — mirrored FinishRun's first-terminal-write-wins guard (read status in the finishing tx, refuse with ErrAlreadyFinished when already terminal). Double-finish test asserts the recorded outcome survives.
  • --checksum blake3 silent degrade — detect rclone's no hashes in common fallback notice in parseJSONLog at any log level (NOTICE-level events were dropped before), flag it on RunResult, and downgrade rcloneVerification to size+mtime/unverified so the durability vector does not advance on a copy rclone never content-checked. The guard only ever downgrades, so a false match can never wrongly mark a run verified (the chosen err-toward-refusing option for this vector-gating item). Detection keys on the stable substring no hashes in common (the trailing verb has varied across rclone versions).
  • config-path ↔ DB-volume-path agreement — added the v.Path != vol.Path cross-check (the same one offload and restore already make) to the shared requireIndexedVolume gate, covering every push handler (bucket, kopia, content-addressed, peer). Mismatch refused before rclone is invoked.
  • kopia repo re-create guardensureRepository now creates on connect-fail only when --init is set (the same flag that already gates the local-destination marker bootstrap). Without it, a connect failure is a hard error rather than minting a fresh empty repository while the monotonic, un-rewindable durability vector keeps claiming coverage. Test asserts connect-fail without --init refuses and never creates; --init help text and the kopia/integration tests updated accordingly.

Out of scope (note): the issue body's sixth bullet — durability-evidence staleness policy — was not in this PR's task list and is untouched here; it can stay open on #114 or move to its own follow-up.

Verification

  • go vet ./... clean
  • go test ./... clean (full module)
  • golangci-lint run — 0 issues
  • TestSchemaSnapshot green after regenerating store/schema.sql

Judgment calls

  • v21 is purely additive (trigger DDL + schema_version row), so it uses the plain single-tx recipe (like v18→v20), not the FK-off table rebuild.
  • Extended the minimal v18 fixture in destination_run_ids_test.go with the contents table a real v18 DB carries, since the chain now attaches triggers to it.
  • Reused the existing ErrAlreadyFinished sentinel for the hook guard (it is the runs-family terminal-write sentinel; callers already errors.Is it).
  • For the silent-degrade item, chose output detection over a per-destination hash-capability preflight: detection is self-contained on the JSON log we already parse and is fail-safe (downgrade-only). No follow-up filed — the implemented guard is the conservative one the task asked for.

The v13->v14 reshape dropped files_blake3_immutable without installing
an equivalent guard on the new contents table. v21 re-asserts the
append-only contract the AGENTS.md guarantee implies: BEFORE UPDATE and
BEFORE DELETE triggers on contents that RAISE(ABORT). Defense in depth --
there is no UPDATE/DELETE path on contents today.

Regenerated store/schema.sql via -update-schema; golden test green.
Extended the minimal v18 fixture with the contents table a real v18 DB
carries, so the chain to v21 can attach the triggers.

Addresses #113.
Adds a populated v18 fixture (contents, files, remote_objects,
destination_run_ids) and drives it through the v19-v21 chain, asserting
the offload-substrate rows survive intact (verify_method NULL-backfilled,
remote_objects fingerprint preserved). Asserts the new v21 contents
triggers abort an in-place UPDATE and a DELETE while the row is unchanged.

Addresses #113 (#12c migration-shape coverage).
Offload defers to every run kind, but index and sync did not block on an
in-flight offload, so a sync could enumerate (or an index could
observe-and-flip) a tree mid-unlink. Add 'offload' to both begin-gate
blocking sets to make the exclusion symmetric. Tests both directions:
offload blocks a new sync and a new index/audit, and admits them again
once it finishes.

Addresses #114 (run-gate asymmetry).
hook_runs was the only runs-like table without FinishRun's terminal-state
guard: a double finish silently overwrote the first terminal record. Read
the status inside the finishing transaction and refuse with
ErrAlreadyFinished when it is already terminal, leaving the row untouched.
Test a double-finish is refused and the recorded outcome survives.

Addresses #114 (FinishHookRun terminal guard).
The shared requireIndexedVolume gate looked the volume up by name only. A
stale volumes.path would let a handler enumerate the config path's tree
while the durability advance covered the DB volume's rows — a wholesale
false durability claim. Refuse on mismatch, the same cross-check offload
and restore already make. Covers every push handler (bucket, kopia,
content-addressed, peer) since they share the gate. Test the mismatch is
refused before rclone is invoked.

Addresses #114 (config-path vs DB-volume-path agreement).
ensureRepository created a fresh repository on any connect failure. On a
transient outage or a mistyped path that mints an empty repository while
the destination's monotonic durability vector keeps claiming coverage the
new repository cannot honour — and there is no CLI to rewind the vector.
Require opt-in (the existing --init flag, which already gates the local
marker bootstrap): without it, a connect failure is an error, not a silent
create. Test connect-fail without --init refuses and never creates.

Addresses #114 (kopia repo re-create guard).
rcloneVerification set verified purely from flags + exit-0. When a backend
shares no hash with the source, rclone emits a NOTICE and silently falls
back to a size comparison; parseJSONLog dropped NOTICE-level events, so
the run was still recorded as blake3-verified. Detect the stable
'no hashes in common' phrase at any log level, flag it on RunResult, and
downgrade the verification to size+mtime (unverified) so the durability
vector does not advance on a copy rclone never content-checked. The guard
only ever downgrades, so a false match cannot wrongly mark a run verified.

Addresses #114 (--checksum blake3 silent degrade).
@mbertschler mbertschler merged commit 985e7a4 into offload-v1 Jun 11, 2026
2 checks passed
@mbertschler mbertschler deleted the fix-schema-robustness branch June 11, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant