Skip to content

store: harden v13→v14 backfill + broaden migration integrity corpus (#113)#135

Open
mbertschler wants to merge 1 commit into
issue-131-offload-stalenessfrom
issue-113-index-integrity
Open

store: harden v13→v14 backfill + broaden migration integrity corpus (#113)#135
mbertschler wants to merge 1 commit into
issue-131-offload-stalenessfrom
issue-113-index-integrity

Conversation

@mbertschler

Copy link
Copy Markdown
Owner

Index/migration integrity hardening (issue #113)

Closes #113

Schema- and migration-level defenses for the append-only guarantee. All four
sub-items of #113 are covered; two were already landed on main by earlier
work in this stack, so this PR adds the remaining guard + the missing test
corpus around all four.

Stacked PR. Base is issue-131-offload-staleness (PR #134), which stacks
on issue-104-durability-provenance (PR #133). Merge order: #133#134
this.
The diff here shows only the #113 changes relative to #131.

12a — contents immutability triggers — already on main

The contents_no_update / contents_no_delete BEFORE UPDATE/BEFORE DELETE
ABORT triggers (and their fresh-baseline + schema.sql snapshot) already
shipped as migration v20→v21 (contentsImmutableTriggers()), with
assertContentsTriggersAbort asserting both abort. No new migration was
added
— a v24 duplicating v21 would be redundant. SchemaVersion stays 23.

12b — v13→v14 backfill consistency guard — NEW

createAndSeedContentsV14 now runs refuseSameHashDifferentSizeV14 before
the contents seed: if any blake3 in the old files table carries more than one
size_bytes, the migration refuses loudly instead of silently coalescing to
the earliest observation's size. A BLAKE3 digest is over the bytes, so this
shape is only reachable via prior corruption or a stat/hash TOCTOU. The guard
fires only on that genuinely-corrupt shape; valid same-hash-same-size
duplicates still migrate to one contents row (negative control test included).

12c — migration test corpus — NEW

  • same-hash-different-size refusal + same-size-accepted negative control
  • orphaned filesruns FK → caught by the v13→v14 foreign_key_check
  • duplicate live row in a legacy DB missing uniq_files_live_per_path → caught
    when the rebuild recreates that partial unique index
  • populated v16 remote_objects driven through the v16→v17 table rebuild
    (the v18 fixture seeded the table but started after that rebuild)
  • status_changed_run_id backfill values asserted on the migrated v13 rows
    (previously unasserted anywhere)

12d — indexer stat-after-hash — already on main, test added

hashFile already stats the open handle after hashing (commit "index: pin
row size/mtime to the hashed-handle stat"). Added two index tests: one pinning
the (digest, size) consistency contract, one showing an append between index
runs supersedes to a row whose size matches its hash.

Notes / decisions for review

  • Modified the shipped v13→v14 migration function to add the 12b guard
    (additive, no version bump — the issue's prescribed approach). It only changes
    behavior for corrupt input; all existing migration tests pass.
  • Touched an existing test fixture: TestMigrateV2ToV3 seeded one hash at
    two different sizes for convenience. That is exactly the corrupt shape 12b
    now rejects, so the two distinct files got distinct hashes. The test's purpose
    (v2→v3 run synthesis) is unchanged.
  • No new migration, so store/schema.sql is unchanged and TestSchemaSnapshot
    passes with no drift.

Invariants

Append-only/immutability is strengthened, never weakened: the guard turns a
silent loss-of-size into a loud refusal; no history is deleted or overwritten.
No regression to the #104/#131 work below.

Gates: go vet ./..., go test ./..., golangci-lint run (0 issues),
TestSchemaSnapshot all green locally.

…en migration corpus

12b: refuse the v13→v14 contents seed when two files rows share a blake3
with differing size_bytes — corruption (or a stat/hash TOCTOU) that the
seed would otherwise silently coalesce to the earliest observation's size.
A BLAKE3 digest is over the bytes, so this shape is impossible from honest
indexing; turning it into a loud pre-migration failure lets the operator
recover from the pre-migration snapshot. The existing v2→v3 fixture seeded
one hash at two differing sizes for convenience; give the two distinct
files distinct hashes so the data is physically valid.

12c: add migration-corpus fixtures for previously untested legacy shapes —
same-hash-different-size refusal (with a same-size negative control), an
orphaned files→runs FK caught by the v13→v14 foreign_key_check, a
duplicate live row caught when the rebuild recreates uniq_files_live_per_path,
a populated v16 remote_objects table driven through the v16→v17 rebuild,
and status_changed_run_id backfill assertions on the migrated v13 rows.

12d: add an index test pinning hashFile's stat-after-hash contract (size
pairs with the hashed bytes) and one showing an append between index runs
supersedes to a row whose size matches its hash.

Refs #113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant