diff --git a/AGENTS.md b/AGENTS.md index a707ea6..ea3ae8b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,11 +3,15 @@ Squirrel indexes **content** (BLAKE3 hashes), not paths. A hash ever observed must stay retrievable. Paths are observations of content; content is the entity. -So `Upsert` never rewrites a row's `blake3` in place: when content at a path -changes it marks the prior row `superseded` and inserts a new one, keeping at -most one live (non-`superseded`) row per path. The schema enforces this on -`files` — the `files_blake3_immutable` trigger and the `uniq_files_live_per_path` -partial unique index (`store/migrations.go`). +The schema makes this literal: `contents` is the append-only content entity +(one row per BLAKE3, with size and origin), and `files` rows are path↔content +observations referencing it. `Upsert` never rewrites a row's `content_id` in +place: when content at a path changes it marks the prior row `superseded` and +inserts a new one, keeping at most one live (non-`superseded`) row per path — +enforced by the `uniq_files_live_per_path` partial unique index +(`store/migrations.go`); the id↔hash binding itself is immutable by +construction (`contents.blake3` is UNIQUE and contents rows are never +updated). The `runs` table follows the same no-loss spirit by policy, not schema: squirrel never auto-prunes runs — they're an audit trail, and any retention is explicit diff --git a/README.md b/README.md index bb66078..65440c0 100644 --- a/README.md +++ b/README.md @@ -55,13 +55,174 @@ bucket = "squirrel-backup" root = "/squirrel" ``` -Supported destination types: `local`, `sftp`, `s3`, `b2`, `gcs`. Secrets accept either a literal string or an inline `{ env = "VAR_NAME" }` table that is resolved at load time. Unknown fields, missing required fields, and unset env vars are rejected immediately — squirrel will not invoke rclone with a misconfigured destination. +Supported destination types: `local`, `sftp`, `s3`, `b2`, `gcs` (rclone-backed), and `kopia` (see [kopia destinations](#kopia-destinations)). Secrets accept either a literal string or an inline `{ env = "VAR_NAME" }` table that is resolved at load time. Unknown fields, missing required fields, and unset env vars are rejected immediately — squirrel will not invoke rclone with a misconfigured destination. + +Some optional params are specific to one backend type and rejected on the others (as an unknown field): + +- **`sftp` host-key validation** — `known_hosts_file` points rclone at a known_hosts file so it validates the server's host key before transferring; `host_key_algorithms` is rclone's space-separated list pinning the accepted host-key algorithms. Both map to the rclone sftp options of the same name. **Without `known_hosts_file`, rclone does not validate the server's host key** and will connect to whatever host answers — set it (recommended) so a redirected or impersonated server is rejected. + + ```toml + [destinations.nas] + type = "sftp" + host = "nas.local" + user = "martin" + password = { env = "NAS_PASSWORD" } + root = "/volume1/squirrel" + known_hosts_file = "~/.ssh/known_hosts" # validate the server host key (recommended) + host_key_algorithms = "ssh-ed25519 ssh-rsa" # optional: pin accepted host-key algorithms + ``` + +- **`s3` storage class** — `storage_class` maps to rclone's s3 `storage_class` config key and accepts whatever value the chosen s3-compatible backend supports (typically a default tier plus one or more cheaper archive/cold tiers); absent, the backend's default class is used. Use the exact value string your provider documents. + + ```toml + [destinations.offsite] + type = "s3" + # ... + storage_class = "" # archive tiers cost less to store, more to read + ``` Squirrel writes its own `rclone.conf` next to the config (`~/.squirrel/rclone.conf`, mode 0600) on every sync invocation. You do not run `rclone config` and you should not edit `rclone.conf` by hand. +### Encrypted destinations + +Any non-`local` destination can add a `crypt` block to encrypt file contents client-side before upload, via rclone's [crypt](https://rclone.org/crypt/) overlay: + +```toml +[destinations.offsite.crypt] +password = { env = "OFFSITE_CRYPT_PASSWORD" } +password2 = { env = "OFFSITE_CRYPT_SALT" } # salt — optional but recommended +``` + +`password` and `password2` are **rclone-obscured** values, the same representation `rclone config` stores for its own crypt remotes — generate one with `rclone obscure `. Both accept a literal or `{ env = "VAR" }`. Squirrel renders two sections into its `rclone.conf` — the underlying remote plus a crypt remote wrapping it — and addresses all sync and restore transfers through the crypt remote. Keep the passwords safe: restoring from an encrypted destination requires them. + +Two properties to be aware of: + +- **Contents only.** File and directory names are stored in clear at the destination (`filename_encryption = off`, fixed by design) — the tree stays browsable and keeps the same layout as an unencrypted destination. If the names themselves are sensitive, this overlay does not hide them. +- **Verification falls back to size+mtime.** rclone crypt remotes cannot expose content hashes, so the end-to-end BLAKE3 check (`--checksum --hash blake3`) cannot pass through the overlay. Transfers to and from an encrypted destination compare by size+mtime instead — the same comparison `--shallow` uses — and say so in the run output; the runs row records the transfer as shallow. Content-addressed destinations regain deeper verification through provider-side ciphertext fingerprints — see [Offsite verification](#offsite-verification-squirrel-verify). + +### Kopia destinations + +A `kopia` destination pushes a volume into a local [kopia](https://kopia.io) repository instead of an rclone remote — useful as a second, independently-verifiable backup format on another disk: + +```toml +[destinations.mirror] +type = "kopia" +root = "/mnt/backup/kopia-repo" # repository path +password = { env = "KOPIA_REPO_PASSWORD" } + +[volumes.pictures] +path = "~/Pictures" +sync_to = ["nas", "mirror"] +``` + +Like rclone, the kopia binary is driven as an opaque child process with squirrel owning the command line: each sync connects to the repository at `root` (creating it on first use), runs `kopia snapshot create` on the volume path, then `kopia snapshot verify` on the new snapshot. The repository password is passed to kopia via its environment, never on the command line, and the per-destination kopia config file lives next to squirrel's own config — your personal kopia configuration is never touched. + +Properties that differ from rclone destinations: + +- **kopia verifies its own content hashes**, so the runs row is never recorded as shallow and `--shallow` has no effect on kopia pairs; whether a given run counts as verified comes from kopia itself (a clean snapshot plus a passing `snapshot verify`). `--dry-run` is refused — kopia has no equivalent. +- **A `crypt` block is rejected**: kopia encrypts its repository itself. Keep the repository password safe; the repository is unreadable without it. +- **Restore goes through the kopia CLI** (`kopia snapshot restore`), since the repository is kopia's own format. `squirrel restore` refuses kopia destinations and says so. + +### Content-addressed destinations + +By default a destination mirrors the volume's tree (see [Destination layout](#destination-layout)). Any rclone-remote destination — with or without a `crypt` block — can instead opt into an **append-only, content-addressed** layout, built for cold archive storage where objects should never be rewritten or moved: + +```toml +[destinations.archive] +type = "sftp" +host = "archive.example" +user = "u" +root = "/data" +layout = "content-addressed" +``` + +Instead of a browsable tree, the destination holds two streams: + +- **`objects/<hash>`** (at the destination root, shared by all volumes) — one object per BLAKE3 content hash (lowercase hex), the raw file bytes (encrypted client-side when the destination has a `crypt` block). Each hash is uploaded **exactly once** per destination and never moved, overwritten, or deleted. A local rename or reorg changes only the path mapping — no re-upload, no server-side copy — and content duplicated across volumes is stored once. +- **`<volume>/index/run-<id>`** — one immutable **manifest segment** per sync run, per volume: the path-level delta of that run (see the format below). Replaying a volume's segments in run order yields its full current path→content mapping, and any past state. + +Durability is **transactional per run**: the run only counts as successful — and only then feeds the durability evidence squirrel records per destination — once *both* all its content objects *and* its manifest segment are confirmed on the remote (each transfer's success plus a follow-up presence/size listing). A failed run may leave objects without a segment; they are harmless (nothing maps them) and the next run skips re-uploading anything already recorded, pushing only what's missing. + +Properties that differ from mirrored destinations: + +- **Verification is presence+size**, recorded as such: per-object transfers can't carry the end-to-end BLAKE3 check (and `crypt` remotes expose no hashes at all), so the runs row is recorded shallow and the push never claims content verification. On top of that, each upload's provider-side ciphertext fingerprint is recorded in the index and re-checked by [`squirrel verify`](#offsite-verification-squirrel-verify). +- **Pick the layout when the destination is first used.** Switching an existing mirrored destination to `content-addressed` (or back) is not supported — point the new layout at a fresh destination or root. The push detects a mirrored history (a recorded successful sync without its manifest segment) and refuses. +- **`squirrel restore` refuses the layout** for now; recovery tooling ships separately. The format is deliberately simple enough to recover without squirrel — see below. +- `--dry-run` is not supported yet. + +#### Offsite verification (`squirrel verify`) + +Cold archive storage is exactly the copy you can't cheaply re-download and re-hash. Content-addressed destinations therefore get a metadata-only integrity check, the **scan-back fingerprint**: after each object upload is confirmed, squirrel reads the *provider's own checksum* of the stored bytes (the ciphertext, for `crypt` destinations) back from the remote via `rclone lsjson --hash` and records it in the index next to the upload. Verification then re-fetches the same metadata later and compares **provider value then vs provider value now** — squirrel never recomputes a provider checksum, so provider-specific composite forms are handled as opaque strings, and no object body is ever transferred. + +What gets recorded depends on the backend type: + +- **`s3`** — the object **ETag**, recorded as `etag-md5` (or `etag-md5-composite` for multipart-style values). Reading it is a listing/metadata operation, so it works on archive-tier objects without a restore. +- **`sftp`** — the checksum computed server-side by the remote's hash command. Content-addressed sftp destinations default to **SHA-256** (`hash_algo = "sha256"`, rendered as rclone's sftp `hashes` option so the selection is explicit rather than rclone's md5/sha1 preference); set `hash_algo` if your server only offers another type. +- **other backends** — whatever hash `rclone lsjson --hash` exposes, recorded under its rclone hash name (e.g. `sha1` on b2). A backend exposing no checksum leaves the fingerprint pending, with a warning in the sync output. + +Re-verify a destination (or all content-addressed destinations) at any time: + +``` +squirrel verify archive +squirrel verify +``` + +The pass lists the destination's `objects/` directory once (batched, metadata-only), then per recorded object: a **match** stamps the object verified in the index; an object **without a fingerprint yet** (uploaded before this feature, or whose capture failed) gets one recorded and is counted separately; a **mismatch or missing object** prints one loud line per object and exits non-zero — that is potential offsite corruption or tampering, and squirrel deliberately leaves both the destination and the recorded fingerprint untouched for inspection. Each pass is recorded as an `audit` run, with the destination and counters in the run's audit trail. + +Because crypt encrypts with a random per-file nonce, the fingerprint is a property of the *uploaded ciphertext*, not of the content — which is exactly right here: the layout is append-only and each object is uploaded once, so the fingerprint is stable for the life of the object. + +Two related destination knobs (both optional): + +```toml +[destinations.archive] +# ... +hash_algo = "sha256" # sftp only: which server-side hash the fingerprint uses +checkers = 4 # cap rclone's concurrent checkers (providers that limit connections) +``` + +`checkers` flows into `--checkers` on the rclone invocations squirrel runs against that destination — useful when a provider caps simultaneous connections (server-side hashing typically uses one connection per concurrent check). + +#### Manifest segment format + +Each `<volume>/index/run-<id>` segment is JSONL — one JSON object per line, lines sorted by `(path, status)`: + +```json +{"path":"2024/cat.jpg","blake3":"26e7…e5ad","status":"present","size_bytes":123,"mtime_ns":1712345678901234567} +``` + +- `path` — volume-relative path +- `blake3` — 64-char lowercase hex BLAKE3-256 of the file content; the bytes live at `objects/<blake3>` +- `status` — `present`, `superseded`, `missing`, or `offloaded` +- `size_bytes`, `mtime_ns` — as indexed + +To replay: process segments in ascending run id; each line with status `present`, `missing`, or `offloaded` sets that path's current `(content, status)` — last write wins per path. `superseded` lines are history only (the outgoing content of a path that changed) and update no mapping. A full recovery script is: replay every segment, then for each `present`/`offloaded` path download `objects/<blake3>` (decrypting with the `crypt` password if one was set). `missing` paths are known-but-lost at the origin — the object may still exist from an earlier upload. + +### Offloading + +`squirrel offload` deletes the **local** copy of files whose content is provably stored on every target the volume's offload policy requires — never a blind delete. It is the only squirrel command that deletes user data. + +```toml +[volumes.pictures] +path = "~/Pictures" +sync_to = ["nas", "offsite"] +offload_requires = ["nas", "offsite"] +``` + +`offload_requires` is the explicit per-volume policy: every named target's recorded durability must cover a file's content before its bytes may go, and a volume without the key refuses to offload entirely. The names share the flat destination/node namespace that `sync_to` uses. They may also name targets only a *peer* pushes to: evidence about those arrives through the peer durability pull (`squirrel peer-sync pull-durability`), and a name with no recorded evidence simply keeps the gate closed. + +``` +squirrel offload pictures 2019/ # a subtree +squirrel offload pictures --older-than 90d # by age (indexed mtime) +squirrel offload pictures . --dry-run # print the gate decisions, touch nothing +``` + +Selectors are volume-relative paths/prefixes plus `--older-than` (combinable); selecting the whole volume takes an explicit `.`. The durability gate is evaluated per file, entirely offline, against the durability version vectors in the local index: content with origin `(node, run)` passes for a target iff the target's recorded vector component for that node is ≥ `run`, for **every** required target. Files failing the gate are skipped and reported per target (`missing component for origin X` / `stale: have 40 need 45`). + +Immediately before each unlink, squirrel re-verifies the on-disk bytes against the indexed row — size, mtime, and BLAKE3, with symlink-refusing traversal — and skips loudly on any difference: the disk is newer than the index, and unindexed bytes are never deleted. Offloaded files flip `present → offloaded` in the index under one `kind='offload'` run. The indexer treats an offloaded row's on-disk absence as expected (it never becomes `missing`), and re-acquiring the bytes (restore or copy-back) flips the row back to `present`. + ### Hooks -A volume can declare a per-volume **hook** — a command the agent runs to nudge an external tool when the volume's content changes. squirrel stays tool-agnostic: it never learns what the command does (a backup with kopia/restic, an `rclone copy`, a shell script — all the same to squirrel). It exec's the command **without a shell**, passes context through environment variables, and records only the generic outcome (exit code, timestamps). +A volume can declare a per-volume **hook** — a command the agent runs to nudge an external tool when the volume's content changes. squirrel stays tool-agnostic: it never learns what the command does (a backup with kopia/restic, an `rclone copy`, a shell script — all the same to squirrel). It exec's the command **without a shell**, passes context through environment variables, and records only the generic outcome (exit code, timestamps). That generic outcome is the ceiling: only the built-in destination types report verification results; a hook's exit code never counts as one. (For kopia specifically, squirrel can own the snapshot end-to-end instead via a [kopia destination](#kopia-destinations).) ```toml [volumes.pictures.hook] @@ -126,7 +287,7 @@ squirrel sync pictures --to nas # just one squirrel sync # every (volume, destination) pair in config ``` -Sync verifies each uploaded file's BLAKE3 against the destination (using rclone's `--checksum --hash blake3`). Mismatches abort that file before the runs row is marked success. Use `--shallow` to fall back to rclone's default size+mtime comparison if you want speed over integrity for a big initial push. +Sync verifies each uploaded file's BLAKE3 against the destination (using rclone's `--checksum --hash blake3`). Mismatches abort that file before the runs row is marked success. Use `--shallow` to fall back to rclone's default size+mtime comparison if you want speed over integrity for a big initial push. Encrypted (`crypt`) destinations always use the size+mtime comparison (see [Encrypted destinations](#encrypted-destinations)). Look up a file by its BLAKE3 hex hash: @@ -167,6 +328,8 @@ squirrel # bare invocation opens the TUI when stdin/stdout are a terminal ``` squirrel index <volume> [--shallow] [--dry-run] [--workers N] squirrel sync [<volume>] [--to DEST] [--shallow] [--dry-run] +squirrel verify [<destination>] +squirrel offload <volume> [path...] [--older-than DUR] [--dry-run] squirrel query <hash-or-path> [--history] squirrel query --duplicates squirrel query --missing @@ -185,7 +348,7 @@ squirrel tui ## Destination layout -Each destination is a tree shaped like the local volumes: +Each mirrored destination (`layout = "mirror"`, the default) is a tree shaped like the local volumes: ``` <dest.root>/ @@ -202,6 +365,8 @@ Each destination is a tree shaped like the local volumes: `.squirrel-index/` holds the index snapshots ridden along after each successful sync (see [Index snapshots](#index-snapshots)). Like `.squirrel-history`, it is filtered out of all sync and restore transfers and from peer-sync, so a snapshot is never mistaken for user content. +A [content-addressed destination](#content-addressed-destinations) holds a shared `objects/` directory at its root and `index/` under each per-volume directory instead of a mirrored tree (plus the same `.squirrel-index/` ride-along). + ## Notes - Hash: BLAKE3-256 via `github.com/zeebo/blake3`. Stored as a 32-byte `BLOB` in the `blake3` column. The CLI accepts and prints hex. diff --git a/SAFETY-AUDIT.md b/SAFETY-AUDIT.md index 4004240..ba926f8 100644 --- a/SAFETY-AUDIT.md +++ b/SAFETY-AUDIT.md @@ -1022,6 +1022,143 @@ when restoreFromNode lands. --- +## Durability evidence & offsite verification (offload-v1) + +Findings specific to the offload-v1 feature set — the durability version +vectors that gate `offload`, the peer durability pull that carries +evidence between nodes, and the content-addressed offsite push. These +are framed for the intended deployment: a **single operator** whose +nodes (laptop, NAS) are all machines they control and hold the only +credentials to. Under that model the adversary is overwhelmingly +*entropy and bugs*, not a hostile peer; the findings are sized +accordingly. + +### D1: Durability-pull trust boundary (relayed offsite evidence) + +**Severity:** Medium (defence-in-depth) • **Likelihood:** N/A +(documented assumption, not a live defect). + +**Where** + +- `sync/durability.go` — `PullDurability` / `pullDurability`, + `validateComponent`, `validateFreshness`. +- `offload/gate.go` — `check`, `methodVerified`, `freshnessFailure`. +- `store/nodes.go` — `GetOrCreateOriginNode`. + +**The boundary** + +A durability component is recorded one of two ways, and they differ in +what they trust: + +- **Direct, self-verified.** When this node pushes to a target itself — + the NAS via peer sync (`sync/node.go`, tagged `peer-blake3` after the + receiver re-hashes every path) or a bucket via rclone (`sync/sync.go`) + — it writes the component into its **own** store from its **own** + confirmed transfer (`AdvanceDestinationVectorTo`). No peer is trusted. +- **Relayed, peer-asserted.** For a target this node never pushes to (an + offsite only the NAS reaches), the only evidence is what the NAS + reports over the durability pull (`UpsertDestinationRunIDVerified`, + reached only from `pullDurability`). Putting such a target in a + volume's `offload_requires` means the local delete decision trusts the + NAS's recorded `(origin, run, method)` assertion. The pull validates + shape (positive run, valid origin name, recognised method) and is + monotonic, but carries **no proof of possession** — a peer that + asserts an inflated run for a destination in the accepted set would be + believed. + +**Decision (intended):** the relayed-evidence trust is **accepted**. The +NAS is in the same trust domain as the laptop; a NAS that lies about +durability is a compromised-or-broken NAS, in which case the archive it +holds is already in question and `offload` is a footnote. The gate fails +*closed* on absent evidence, so a peer can only ever *withhold* +offload-eligibility, and the redundancy decision (gate on **all** copies +via `offload_requires`, not the fewest-trusted subset) is what protects +against data loss — see the offload section of `README.md`. + +**Defence-in-depth implemented in this branch** (cheap; turns *bugs* +into loud failures, not a security control): + +- **Verify-method allow-list at the pull boundary** — `validateComponent` + refuses a non-empty `verify_method` that isn't one this build defines + (`store.KnownVerifyMethod`). Previously an unknown method was stored + and then silently treated as unverified by the gate; now a peer bug or + version-skew string is rejected at receipt. Empty (legitimately + "unverified") still passes. +- **Origin-node creation cap** — `pullDurability` refuses a pull that + names more than `maxOriginNodesPerPull` (256) distinct origins, so a + runaway peer cannot grow the local `nodes` table without bound via + `GetOrCreateOriginNode`. A real volume references a handful of origins; + the cap only converts a flood into an observable refusal. + +**Not done (deliberately):** no proof-of-possession protocol, no +laptop-side independent verification of relayed offsites. Those defend +against a malicious NAS, which is out of model. The random per-file +nonce in the rclone crypt overlay also makes "the NAS proves the stored +ciphertext decrypts to the right content" impractical without either a +content-derived nonce or the laptop downloading and decrypting the +object — neither warranted here. + +**Issue:** `durability: document the relayed-evidence trust boundary; add verify-method allow-list and origin-node cap as defence-in-depth` (implemented in this branch) + +### D2: Content-addressed offsite push proves presence+size, not decrypt-correctness + +**Severity:** Medium • **Likelihood:** Low (requires a transfer-time +corruption that preserves decrypted size, or the documented +re-hash→read TOCTOU window to fire). + +**Where** + +- `sync/content_addressed.go` — `uploadOneObject` (re-hash → `copyTo` → + `statRemote` size check), `captureFingerprints`. +- `sync/verify_remote.go` — `VerifyRemote` (scan-back re-check). + +**What it does / doesn't establish** + +At upload the push: (1) re-hashes the **local plaintext** and refuses on +drift, so the encryption input is the right content; (2) confirms the +object is **present** and its **decrypted size** (stat is through the +crypt overlay) matches the index; (3) records the provider's checksum of +the ciphertext as the scan-back baseline. The underlying backend's own +transfer integrity (e.g. S3 Content-MD5 on PUT) covers "the ciphertext +rclone sent is the ciphertext stored." + +It does **not** confirm that the stored ciphertext *decrypts back* to the +indexed hash — there is no post-upload decrypt-and-rehash. The +unguarded slivers are the documented fork/exec window between the +re-hash and rclone's open (`uploadOneObject` "Residual:" comment) and a +hypothetical crypt bug that produces a right-decrypted-size, wrong +content object. Ongoing bitrot is caught by the scan-back re-verify; a +*wrong-at-upload* object is the gap. + +**Proposed mitigation (opt-in, NAS-local — sketch, not built):** + +- Add a `--verify` mode to the content-addressed push that, after an + object lands, downloads it back **through the crypt overlay**, + BLAKE3s the plaintext, and compares to the indexed hash. This is the + only check that closes decrypt-correctness, and it lives entirely on + the pushing node (which holds the plaintext) with no protocol or + laptop change. +- Scope it to the **initial upload** of each content hash (or a sampled + subset), not every run — the object is append-only and immutable, so + one read-back per object is sufficient. Cost is one download per + verified object (egress), so it must be opt-in and never run against + cold-tier targets (e.g. Glacier Deep Archive, where a read needs a + restore). +- Tightening the re-hash→read TOCTOU window further (snapshot/lock the + source) is **not** recommended — the window is one fork/exec, the + indexer and scrub already surface drift, and chasing it is + disproportionate. + +**Acceptance** + +- With `--verify`, a seeded object whose stored ciphertext decrypts to + the wrong content is caught and the run fails before the durability + vector advances; without it, behaviour is unchanged. + +**Issue:** `sync: optional read-back-decrypt-rehash verification for content-addressed uploads` + +--- + ## Cross-cutting recommendations ### Tests we should add now diff --git a/agent/durability.go b/agent/durability.go new file mode 100644 index 0000000..fe0c0e2 --- /dev/null +++ b/agent/durability.go @@ -0,0 +1,102 @@ +package agent + +import ( + "context" + "fmt" + "net/http" + + "github.com/mbertschler/squirrel/store" + "github.com/mbertschler/squirrel/syncproto" +) + +// handleDurability implements POST /v1/sync/durability: a session-less, +// read-only listing of this node's recorded destination durability +// vectors for one volume. Peers pull it (after a sync, or standalone) +// to hold offline evidence about destinations only this node can see. +// Node identity travels as names — local node ids mean nothing to the +// caller. +func (r *peerSyncRouter) handleDurability(w http.ResponseWriter, req *http.Request) { + var body syncproto.DurabilityRequest + if err := decodeJSON(w, req, &body); err != nil { + writeError(w, http.StatusBadRequest, err.Error()) + return + } + if body.Volume == "" { + writeError(w, http.StatusBadRequest, "volume is required") + return + } + if _, ok := r.volumes[body.Volume]; !ok { + writeError(w, http.StatusNotFound, fmt.Sprintf("volume %q is not declared on this node", body.Volume)) + return + } + resp, err := r.durabilityResponse(req.Context(), body.Volume) + if err != nil { + writeError(w, http.StatusInternalServerError, err.Error()) + return + } + writeJSON(w, http.StatusOK, resp) +} + +// durabilityResponse assembles the wire components for one volume. A +// declared volume with no store row (never indexed or synced) yields an +// empty component list rather than an error — "no recorded durability" +// is a valid answer. +func (r *peerSyncRouter) durabilityResponse(ctx context.Context, volumeName string) (syncproto.DurabilityResponse, error) { + v, err := r.srv.store.GetVolumeByName(ctx, volumeName) + if store.IsNotFound(err) { + return syncproto.DurabilityResponse{}, nil + } + if err != nil { + return syncproto.DurabilityResponse{}, fmt.Errorf("lookup volume: %w", err) + } + rows, err := r.srv.store.ListVolumeDestinationRunIDs(ctx, v.ID) + if err != nil { + return syncproto.DurabilityResponse{}, fmt.Errorf("list destination vectors: %w", err) + } + fresh, err := r.srv.store.ListVolumeDestinationPushFreshness(ctx, v.ID) + if err != nil { + return syncproto.DurabilityResponse{}, fmt.Errorf("list push freshness: %w", err) + } + names := make(map[int64]string, 4) + resolve := func(nodeID int64) (string, error) { + if name, ok := names[nodeID]; ok { + return name, nil + } + node, err := r.srv.store.GetNodeByID(ctx, nodeID) + if err != nil { + return "", fmt.Errorf("resolve origin node %d: %w", nodeID, err) + } + names[nodeID] = node.Name + return node.Name, nil + } + resp := syncproto.DurabilityResponse{ + Components: make([]syncproto.DurabilityComponent, 0, len(rows)), + Freshness: make([]syncproto.DurabilityFreshness, 0, len(fresh)), + } + for _, row := range rows { + name, err := resolve(row.OriginNodeID) + if err != nil { + return syncproto.DurabilityResponse{}, err + } + resp.Components = append(resp.Components, syncproto.DurabilityComponent{ + Destination: row.Destination, + OriginNode: name, + OriginRun: row.OriginRunID, + UpdatedAtNs: row.UpdatedAtNs, + VerifyMethod: row.VerifyMethod, + }) + } + for _, row := range fresh { + name, err := resolve(row.OriginNodeID) + if err != nil { + return syncproto.DurabilityResponse{}, err + } + resp.Freshness = append(resp.Freshness, syncproto.DurabilityFreshness{ + Destination: row.Destination, + OriginNode: name, + OriginRun: row.OriginRunID, + UpdatedAtNs: row.UpdatedAtNs, + }) + } + return resp, nil +} diff --git a/agent/durability_test.go b/agent/durability_test.go new file mode 100644 index 0000000..0bbfbf3 --- /dev/null +++ b/agent/durability_test.go @@ -0,0 +1,116 @@ +package agent + +import ( + "bytes" + "context" + "encoding/json" + "net/http" + "net/http/httptest" + "testing" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/syncproto" +) + +// postDurability drives POST /v1/sync/durability against the server's +// handler and decodes the response into out. Returns the HTTP status. +func postDurability(t *testing.T, srv *Server, body syncproto.DurabilityRequest, out any) int { + t.Helper() + encoded, err := json.Marshal(body) + if err != nil { + t.Fatalf("marshal request: %v", err) + } + req := httptest.NewRequest(http.MethodPost, "/v1/sync/durability", bytes.NewReader(encoded)) + req.Header.Set("Authorization", "Bearer test-token") + req.Header.Set("Content-Type", "application/json") + rec := httptest.NewRecorder() + srv.Handler().ServeHTTP(rec, req) + if out != nil && rec.Code == http.StatusOK { + if err := json.Unmarshal(rec.Body.Bytes(), out); err != nil { + t.Fatalf("decode response: %v (%s)", err, rec.Body.String()) + } + } + return rec.Code +} + +// TestDurabilityEndpointListsComponents: the endpoint returns every +// recorded vector component for the volume with origin nodes resolved +// to names — the cross-node identity the caller can map locally. +func TestDurabilityEndpointListsComponents(t *testing.T) { + ctx := context.Background() + vol := &config.Volume{Name: "pics", Path: t.TempDir()} + srv := newTestServer(t, Config{Volumes: map[string]*config.Volume{vol.Name: vol}}) + + v, err := srv.store.CreateVolume(ctx, vol.Name, vol.Path) + if err != nil { + t.Fatalf("CreateVolume: %v", err) + } + self, err := srv.store.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + ext, err := srv.store.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + for _, seed := range []struct { + dest string + nodeID int64 + run int64 + }{ + {"offsite-a", self.ID, 12}, + {"offsite-a", ext.ID, 4}, + {"mirror", self.ID, 9}, + } { + if err := srv.store.UpsertDestinationRunID(ctx, v.ID, seed.dest, seed.nodeID, seed.run, false); err != nil { + t.Fatalf("seed %+v: %v", seed, err) + } + } + + var resp syncproto.DurabilityResponse + if code := postDurability(t, srv, syncproto.DurabilityRequest{Volume: "pics"}, &resp); code != http.StatusOK { + t.Fatalf("status = %d, want 200", code) + } + if len(resp.Components) != 3 { + t.Fatalf("components = %d, want 3: %+v", len(resp.Components), resp.Components) + } + got := map[string]int64{} + for _, c := range resp.Components { + if c.UpdatedAtNs == 0 { + t.Fatalf("component %+v has zero updated_at_ns", c) + } + got[c.Destination+"/"+c.OriginNode] = c.OriginRun + } + want := map[string]int64{ + "offsite-a/" + self.Name: 12, + "offsite-a/ext": 4, + "mirror/" + self.Name: 9, + } + for k, w := range want { + if got[k] != w { + t.Fatalf("component %s = %d, want %d (full: %+v)", k, got[k], w, got) + } + } +} + +// TestDurabilityEndpointGuards: an undeclared volume 404s, a missing +// volume name 400s, and a declared volume with no store row answers +// with an empty component list (a valid "nothing recorded yet"). +func TestDurabilityEndpointGuards(t *testing.T) { + vol := &config.Volume{Name: "pics", Path: t.TempDir()} + srv := newTestServer(t, Config{Volumes: map[string]*config.Volume{vol.Name: vol}}) + + if code := postDurability(t, srv, syncproto.DurabilityRequest{Volume: "ghost"}, nil); code != http.StatusNotFound { + t.Fatalf("undeclared volume status = %d, want 404", code) + } + if code := postDurability(t, srv, syncproto.DurabilityRequest{}, nil); code != http.StatusBadRequest { + t.Fatalf("missing volume status = %d, want 400", code) + } + var resp syncproto.DurabilityResponse + if code := postDurability(t, srv, syncproto.DurabilityRequest{Volume: "pics"}, &resp); code != http.StatusOK { + t.Fatalf("declared-but-unmaterialised volume status = %d, want 200", code) + } + if len(resp.Components) != 0 { + t.Fatalf("components = %+v, want empty", resp.Components) + } +} diff --git a/agent/sync.go b/agent/sync.go index e677a6a..d8765d2 100644 --- a/agent/sync.go +++ b/agent/sync.go @@ -2,6 +2,7 @@ package agent import ( "context" + "database/sql" "encoding/hex" "encoding/json" "errors" @@ -37,6 +38,35 @@ const HistoryDirName = ".squirrel-history" // (conflicts) without parsing run-id semantics. const ConflictsDirName = ".squirrel-conflicts" +// RestoreHistoryDirName and IndexDirName mirror sync.RestoreHistoryDirName +// and sync.IndexDirName at the agent side (lowercase-duplicated for the +// same reason as HistoryDirName). The receiver never writes into either +// itself, but a peer's wire path can name them; the path validators +// reject them so the initiator-side reserved filter +// (sync.isReservedSyncPath / isReservedFolderPath) and the receiver +// allow-list cover the same four names. A wire path under +// .squirrel-restore-history could otherwise overwrite the receiver's +// only pre-restore backup. +const ( + RestoreHistoryDirName = ".squirrel-restore-history" + IndexDirName = ".squirrel-index" +) + +// maxPlanBodyBytes caps a decoded request body. decodeJSON applies it to +// every sync endpoint's body (begin/plan/verify/close/durability); plan +// bodies are the largest, carrying one IndexEntry per differing file at a +// few hundred bytes each, so 256 MiB is a generous ceiling for even a +// full-volume flat plan while still refusing the unbounded body a +// token-holding peer could otherwise stream to OOM the agent (#110c). A +// var so tests can drive the boundary without a 256 MiB payload. +var maxPlanBodyBytes int64 = 256 << 20 + +// maxPlanEntries caps len(PlanRequest.Entries) so a single /plan can't +// pin an unbounded slice in memory even within the byte ceiling. One +// million entries is far above any realistic differing-folder slice the +// Merkle walk produces and still bounds the worst case. +var maxPlanEntries = 1 << 20 + // peerSyncRouter holds per-server state shared by all peer-sync // endpoints: the volume-level lock map (one in-flight session per // volume) and the session table (transient state between /begin and @@ -54,11 +84,19 @@ type peerSyncRouter struct { // drops all in-flight sessions (acceptable for v1 — the next sync // replans from scratch). type peerSession struct { - receiverRunID int64 - volume *config.Volume - volumeID int64 - peerNodeID int64 - correlatedRunID int64 + receiverRunID int64 + volume *config.Volume + volumeID int64 + peerNodeID int64 + // initiatorNodeName is the caller identity declared at /begin, + // recorded so a phase call can be bound back to the node that + // opened the session. Under the single shared agent token there is + // no per-request authenticated identity to compare it against yet + // (see #110d); lookupSession is the chokepoint where that + // comparison lands once per-peer tokens make a caller identity + // recoverable. + initiatorNodeName string + correlatedRunID int64 // dedupStrategy is the initiator-supplied preference applied by // classify: "copy" enables the CopyFromExisting branch, "off" // disables it (every missing path stays a Transfer). Validated at @@ -92,6 +130,14 @@ type sessionEntry struct { blake3 []byte size int64 mtimeNs int64 + // originNode + originRun are the content's global origin + // coordinate as declared on the wire (node name + origin-space + // run id), recorded verbatim on the contents row at /close. Both + // empty when the initiator predates the origin exchange; /close + // then falls back to attributing the content to the initiator at + // its declared sync run. + originNode string + originRun int64 // priorRow is the receiver's pre-stage view of the row at this // path. Populated for supersede and conflict; nil otherwise. priorRow *store.FileRow @@ -120,8 +166,8 @@ func newPeerSyncRouter(srv *Server, volumes map[string]*config.Volume) *peerSync } } -// register attaches the four /v1/sync/* routes to mux. Health and -// the placeholder /v1/plan stay where buildHandler put them; this +// register attaches the /v1/sync/* routes to mux. Health and the +// placeholder /v1/plan stay where buildHandler put them; this // function is the only place new routes land. func (r *peerSyncRouter) register(mux *http.ServeMux) { mux.Handle("POST /v1/sync/begin", r.srv.requireBearer(http.HandlerFunc(r.handleBegin))) @@ -129,6 +175,7 @@ func (r *peerSyncRouter) register(mux *http.ServeMux) { mux.Handle("POST /v1/sync/plan-folders", r.srv.requireBearer(http.HandlerFunc(r.handlePlanFolders))) mux.Handle("POST /v1/sync/verify", r.srv.requireBearer(http.HandlerFunc(r.handleVerify))) mux.Handle("POST /v1/sync/close", r.srv.requireBearer(http.HandlerFunc(r.handleClose))) + mux.Handle("POST /v1/sync/durability", r.srv.requireBearer(http.HandlerFunc(r.handleDurability))) } // acquireVolumeLock takes the per-volume lock or returns false. The @@ -155,28 +202,64 @@ func (r *peerSyncRouter) storeSession(s *peerSession) { r.sessions[s.receiverRunID] = s } -func (r *peerSyncRouter) takeSession(receiverRunID int64) *peerSession { +// takeSession resolves and removes the session for receiverRunID, +// binding it to the caller the same way lookupSession does: a non-empty +// callerNode that doesn't match the begin-recorded initiator leaves the +// session in place and returns errSessionCallerMismatch, so a foreign +// token-holder cannot /close (and thereby abort) another node's session. +// ok is false when no session exists. +func (r *peerSyncRouter) takeSession(receiverRunID int64, callerNode string) (sess *peerSession, ok bool, err error) { r.mu.Lock() defer r.mu.Unlock() - s, ok := r.sessions[receiverRunID] + sess, ok = r.sessions[receiverRunID] if !ok { - return nil + return nil, false, nil + } + if callerNode != "" && callerNode != sess.initiatorNodeName { + return nil, true, errSessionCallerMismatch } delete(r.sessions, receiverRunID) - return s + return sess, true, nil } -func (r *peerSyncRouter) lookupSession(receiverRunID int64) *peerSession { +// errSessionCallerMismatch is returned when a phase call presents a +// caller identity that differs from the node that opened the session. +var errSessionCallerMismatch = errors.New("caller node does not own this session") + +// lookupSession resolves the session for receiverRunID and binds it to +// the caller. A non-empty callerNode must equal the initiator name +// recorded at /begin, so a second node holding the shared token cannot +// drive another node's in-flight session. callerNode is empty today +// because the single shared agent token carries no per-request identity +// (#110d): the comparison is a no-op until per-peer tokens make a caller +// identity recoverable, at which point this is the single place it is +// enforced. ok is false when no session exists; err is non-nil only on a +// caller mismatch. +func (r *peerSyncRouter) lookupSession(receiverRunID int64, callerNode string) (sess *peerSession, ok bool, err error) { r.mu.Lock() defer r.mu.Unlock() - return r.sessions[receiverRunID] + sess, ok = r.sessions[receiverRunID] + if !ok { + return nil, false, nil + } + if callerNode != "" && callerNode != sess.initiatorNodeName { + return nil, true, errSessionCallerMismatch + } + return sess, true, nil } +// callerNodeName returns the authenticated initiator identity for a +// phase request, or "" when none is recoverable. The single shared +// agent token authenticates every peer identically, so no per-request +// identity exists yet (#110d); this returns "" until per-peer tokens +// land, keeping the lookupSession binding point in one spot. +func callerNodeName(*http.Request) string { return "" } + // handleBegin implements POST /v1/sync/begin. The handler is the // thin HTTP shell over beginSession, which carries the actual flow. func (r *peerSyncRouter) handleBegin(w http.ResponseWriter, req *http.Request) { var body syncproto.BeginRequest - if err := decodeJSON(req, &body); err != nil { + if err := decodeJSON(w, req, &body); err != nil { writeError(w, http.StatusBadRequest, err.Error()) return } @@ -259,7 +342,7 @@ func (r *peerSyncRouter) ensureVolumeRow(ctx context.Context, name, absPath stri // insertion, in-memory session registration. The caller releases the // volume lock on any non-nil error. func (r *peerSyncRouter) finishBegin(ctx context.Context, body syncproto.BeginRequest, vol *config.Volume, v store.Volume, dedupStrategy string) (syncproto.BeginResponse, int, error) { - peer, err := r.srv.store.GetOrCreatePeerNode(ctx, body.InitiatorNodeName, peerEndpoint(body)) + peer, err := r.srv.store.GetOrCreatePeerNode(ctx, body.InitiatorNodeName, peerEndpoint(body), false) if err != nil { return syncproto.BeginResponse{}, http.StatusConflict, err } @@ -277,14 +360,15 @@ func (r *peerSyncRouter) finishBegin(ctx context.Context, body syncproto.BeginRe } protocol := negotiateProtocol(body.ProtocolVersion) r.storeSession(&peerSession{ - receiverRunID: runID, - volume: vol, - volumeID: v.ID, - peerNodeID: peer.ID, - correlatedRunID: body.InitiatorRunID, - dedupStrategy: dedupStrategy, - protocolVersion: protocol, - dispositions: make(map[string]*sessionEntry), + receiverRunID: runID, + volume: vol, + volumeID: v.ID, + peerNodeID: peer.ID, + initiatorNodeName: body.InitiatorNodeName, + correlatedRunID: body.InitiatorRunID, + dedupStrategy: dedupStrategy, + protocolVersion: protocol, + dispositions: make(map[string]*sessionEntry), }) return syncproto.BeginResponse{ ReceiverRunID: runID, @@ -355,14 +439,14 @@ func (r *peerSyncRouter) collectDriftWarnings(ctx context.Context, volumeName st } // peerEndpoint resolves the endpoint string to store on the peer -// nodes row. Single-writer initiators don't expose an agent of their -// own, so the empty case yields a stable name-derived placeholder -// that satisfies the "non-empty endpoint" invariant without leaking a -// real URL onto the wire. +// nodes row at /begin. It always yields the stable name-derived +// "peer://<name>" placeholder rather than the wire-supplied +// InitiatorEndpoint: that field is unauthenticated, and binding it to a +// node row would let a peer point an arbitrary node-name's dial-back URL +// at an attacker address (latent until peer-initiated pulls land, an +// SSRF/redirect then). A real endpoint is bound only by operator +// configuration through the initiator-side GetOrCreatePeerNode upgrade. func peerEndpoint(body syncproto.BeginRequest) string { - if body.InitiatorEndpoint != "" { - return body.InitiatorEndpoint - } return "peer://" + body.InitiatorNodeName } @@ -370,12 +454,20 @@ func peerEndpoint(body syncproto.BeginRequest) string { // slice against the receiver's store and pre-move the supersede paths. func (r *peerSyncRouter) handlePlan(w http.ResponseWriter, req *http.Request) { var body syncproto.PlanRequest - if err := decodeJSON(req, &body); err != nil { + if err := decodeJSON(w, req, &body); err != nil { writeError(w, http.StatusBadRequest, err.Error()) return } - sess := r.lookupSession(body.ReceiverRunID) - if sess == nil { + if len(body.Entries) > maxPlanEntries { + writeError(w, http.StatusBadRequest, fmt.Sprintf("plan carries %d entries, exceeding the %d cap", len(body.Entries), maxPlanEntries)) + return + } + sess, ok, err := r.lookupSession(body.ReceiverRunID, callerNodeName(req)) + if err != nil { + writeError(w, http.StatusForbidden, err.Error()) + return + } + if !ok { writeError(w, http.StatusNotFound, "no session for receiver_run_id") return } @@ -402,6 +494,10 @@ func (r *peerSyncRouter) handlePlan(w http.ResponseWriter, req *http.Request) { // only ever move bytes off paths the initiator is overwriting, never // onto paths the dedup branch needs. func (r *peerSyncRouter) planSession(ctx context.Context, sess *peerSession, entries []syncproto.IndexEntry) (syncproto.PlanResponse, error) { + self, err := r.srv.store.GetSelfNode(ctx) + if err != nil { + return syncproto.PlanResponse{}, fmt.Errorf("look up self node: %w", err) + } for _, e := range entries { if err := validateRelPath(e.Path); err != nil { return syncproto.PlanResponse{}, fmt.Errorf("path %q: %w", e.Path, err) @@ -410,7 +506,13 @@ func (r *peerSyncRouter) planSession(ctx context.Context, sess *peerSession, ent if err != nil || len(digest) != 32 { return syncproto.PlanResponse{}, fmt.Errorf("invalid blake3 hex %q for path %q", e.Blake3Hex, e.Path) } - entry := &sessionEntry{blake3: digest, size: e.SizeBytes, mtimeNs: e.MtimeNs} + if err := validateEntryOrigin(e, self.Name, sess.receiverRunID); err != nil { + return syncproto.PlanResponse{}, fmt.Errorf("path %q: %w", e.Path, err) + } + entry := &sessionEntry{ + blake3: digest, size: e.SizeBytes, mtimeNs: e.MtimeNs, + originNode: e.OriginNode, originRun: e.OriginRun, + } disp, err := r.classify(ctx, sess, e.Path, entry) if err != nil { return syncproto.PlanResponse{}, fmt.Errorf("classify %q: %w", e.Path, err) @@ -437,6 +539,12 @@ func (r *peerSyncRouter) planSession(ctx context.Context, sess *peerSession, ent if err := r.preStageConflicts(ctx, sess); err != nil { return syncproto.PlanResponse{}, fmt.Errorf("pre-stage conflicts: %w", err) } + // A failure here can leave already-preserved out-of-band files under + // .squirrel-history/run-<id>/; that history is recoverable, so the + // pre-stage is intentionally not rolled back on a later planning error. + if err := r.preStageTransfers(sess); err != nil { + return syncproto.PlanResponse{}, fmt.Errorf("pre-stage transfers: %w", err) + } resp := syncproto.PlanResponse{ Dispositions: make([]syncproto.PlanDisposition, 0, len(entries)), } @@ -468,12 +576,16 @@ func (r *peerSyncRouter) planSession(ctx context.Context, sess *peerSession, ent // abort the whole walk. func (r *peerSyncRouter) handlePlanFolders(w http.ResponseWriter, req *http.Request) { var body syncproto.PlanFoldersRequest - if err := decodeJSON(req, &body); err != nil { + if err := decodeJSON(w, req, &body); err != nil { writeError(w, http.StatusBadRequest, err.Error()) return } - sess := r.lookupSession(body.ReceiverRunID) - if sess == nil { + sess, ok, err := r.lookupSession(body.ReceiverRunID, callerNodeName(req)) + if err != nil { + writeError(w, http.StatusForbidden, err.Error()) + return + } + if !ok { writeError(w, http.StatusNotFound, "no session for receiver_run_id") return } @@ -562,8 +674,7 @@ func validateFolderPath(p string) error { if cleaned == ".." || strings.HasPrefix(cleaned, "../") { return errors.New("path escapes the volume root") } - if cleaned == HistoryDirName || strings.HasPrefix(cleaned, HistoryDirName+"/") || - cleaned == ConflictsDirName || strings.HasPrefix(cleaned, ConflictsDirName+"/") { + if isReservedSyncDir(cleaned) { return errors.New("path is under a reserved sync directory") } return nil @@ -591,13 +702,62 @@ func validateRelPath(p string) error { if cleaned == ".." || strings.HasPrefix(cleaned, "../") { return errors.New("path escapes the volume root") } - if cleaned == HistoryDirName || strings.HasPrefix(cleaned, HistoryDirName+"/") || - cleaned == ConflictsDirName || strings.HasPrefix(cleaned, ConflictsDirName+"/") { + if isReservedSyncDir(cleaned) { return errors.New("path is under a reserved sync directory") } return nil } +// isReservedSyncDir reports whether a cleaned, slash-separated path is +// one of the four reserved sync directories or lives under one. The +// allow-list matches the initiator-side filter +// (sync.isReservedSyncPath / isReservedFolderPath) so the receiver +// can't be made to write where the initiator would never publish — +// including .squirrel-restore-history, whose contents are the +// receiver's only pre-restore backup. +func isReservedSyncDir(cleaned string) bool { + for _, dir := range []string{HistoryDirName, ConflictsDirName, RestoreHistoryDirName, IndexDirName} { + if cleaned == dir || strings.HasPrefix(cleaned, dir+"/") { + return true + } + } + return false +} + +// validateEntryOrigin rejects malformed or self-attributed content-origin +// declarations at /plan so they never reach the /close commit: the pair +// must be set together, the run id must be a positive origin-space id, and +// the name must satisfy the node-name rule (it becomes a local nodes row +// on commit). An entry with neither field is a pre-origin-exchange +// initiator and is accepted. +// +// An origin naming the receiver's own self node is refused: a peer is +// never authoritative about the receiver's own introductions, so it must +// re-introduce locally with a NULL origin. selfRunCeiling bounds a +// self-named run to the receiver's latest allocated local run id (the +// run committing this session): a legitimate self introduction is a prior +// local run, so a self-named run beyond the ceiling is a forged +// origin-space coordinate that would poison the self component of the +// durability vector. Mirrors GetOrCreatePeerNode's self-name refusal. +func validateEntryOrigin(e syncproto.IndexEntry, selfName string, selfRunCeiling int64) error { + if e.OriginNode == "" && e.OriginRun == 0 { + return nil + } + if e.OriginNode == "" || e.OriginRun <= 0 { + return fmt.Errorf("origin_node %q and origin_run %d must be set together with a positive run id", e.OriginNode, e.OriginRun) + } + if !store.ValidNodeName(e.OriginNode) { + return fmt.Errorf("origin_node %q is not a valid node name", e.OriginNode) + } + if e.OriginNode == selfName { + if e.OriginRun > selfRunCeiling { + return fmt.Errorf("origin_run %d for the receiver's own node %q exceeds the latest local run id %d", e.OriginRun, selfName, selfRunCeiling) + } + return fmt.Errorf("origin_node %q is the receiver's own node; a peer must re-introduce locally with a NULL origin", selfName) + } + return nil +} + // collectConflicts builds the wire-format conflict list from the // post-pre-stage session entries in the order the initiator sent // them. Iterating sess.conflictOrder (a slice) instead of @@ -674,19 +834,23 @@ func (r *peerSyncRouter) classifyMissingPath(ctx context.Context, sess *peerSess return syncproto.DispositionCopyFromExisting, nil } -// dispositionForExisting is the provenance check that distinguishes +// dispositionForExisting is the delivery check that distinguishes // supersede from conflict (per CLAUDE.md "check authoritative state -// first"). Rules: +// first"). The deciding question is who *delivered* the receiver's row +// — the local run that first materialised it (first_seen_run_id and +// its runs row's peer linkage) — not the content's origin: +// contents.origin_* carries the global introduction coordinate +// verbatim across hops, so a row forwarded along a chain names a node +// that never spoke to this receiver directly. Rules: // -// - source_node_id IS NULL → local write on receiver → conflict. -// - source_node_id != this initiator → another peer wrote it → -// conflict. -// - source_node_id == this initiator → supersede, provided the -// row's correlated initiator run-id is ≤ the per-(volume, peer) -// watermark. Translating the row's local source_run_id back into -// the initiator's id space requires looking up the receiver-side -// runs row and reading its correlated_run_id (the two columns are -// in different id spaces — receiver-local vs. initiator-local). +// - delivery run has no peer linkage → the row was written by a +// local index/audit run on the receiver → conflict. +// - delivery run's peer != this initiator → another peer delivered +// it → conflict. +// - delivered by this initiator → supersede, provided the delivery +// run's correlated initiator run-id is ≤ the per-(volume, peer) +// watermark (the two run columns are in different id spaces — +// receiver-local vs. initiator-local). // // All three branches can fire in the multi-writer flow: the // receiver may have local writes (a NAS web app dropped a file in, @@ -694,10 +858,14 @@ func (r *peerSyncRouter) classifyMissingPath(ctx context.Context, sess *peerSess // it may carry rows from a different peer that haven't synced // through us yet. func (r *peerSyncRouter) dispositionForExisting(ctx context.Context, sess *peerSession, existing store.FileRow) (string, string) { - if !existing.SourceNodeID.Valid { + deliveryRun, err := r.srv.store.GetRun(ctx, existing.FirstSeenRunID) + if err != nil { + return syncproto.DispositionConflict, fmt.Sprintf("delivery run lookup error: %v", err) + } + if !deliveryRun.PeerNodeID.Valid { return syncproto.DispositionConflict, "local write on receiver" } - if existing.SourceNodeID.Int64 != sess.peerNodeID { + if deliveryRun.PeerNodeID.Int64 != sess.peerNodeID { return syncproto.DispositionConflict, "sourced from a different peer" } state, err := r.srv.store.GetPeerSyncState(ctx, sess.volumeID, sess.peerNodeID) @@ -705,23 +873,16 @@ func (r *peerSyncRouter) dispositionForExisting(ctx context.Context, sess *peerS return syncproto.DispositionConflict, fmt.Sprintf("peer_sync_state lookup error: %v", err) } // No watermark yet (first sync with this peer): the only way a - // peer-sourced row materialised is via a prior /close, so trust + // peer-delivered row materialised is via a prior /close, so trust // it and treat as supersede. if !state.LastSharedRunID.Valid { return syncproto.DispositionSupersede, "" } - if !existing.SourceRunID.Valid { - return syncproto.DispositionConflict, "peer-sourced row has no run attribution" + if !deliveryRun.CorrelatedRunID.Valid { + return syncproto.DispositionConflict, "delivery run has no correlated initiator id" } - sourceRun, err := r.srv.store.GetRun(ctx, existing.SourceRunID.Int64) - if err != nil { - return syncproto.DispositionConflict, fmt.Sprintf("source run lookup error: %v", err) - } - if !sourceRun.CorrelatedRunID.Valid { - return syncproto.DispositionConflict, "source run has no correlated initiator id" - } - if sourceRun.CorrelatedRunID.Int64 > state.LastSharedRunID.Int64 { - return syncproto.DispositionConflict, "peer attribution newer than the last shared watermark" + if deliveryRun.CorrelatedRunID.Int64 > state.LastSharedRunID.Int64 { + return syncproto.DispositionConflict, "peer delivery newer than the last shared watermark" } return syncproto.DispositionSupersede, "" } @@ -1020,9 +1181,13 @@ func (r *peerSyncRouter) preStageConflicts(ctx context.Context, sess *peerSessio // index-recorded blake3, relabels entry.priorRow (the single source of // truth the wire response also reads) to the digest/size actually on // disk so the preserved content is described truthfully everywhere. -// When the file has already vanished the index values are used -// unchanged: there is nothing to re-hash, and the row still records the -// loser's identity so the prior blake3 stays queryable. +// Drifted bytes are an out-of-band local write, so the relabelled row +// drops the prior content's origin — a fresh contents row for them +// records a local introduction, keeping the durability vector honest +// (the prior origin's coordinate belongs to bytes that no longer +// exist here). When the file has already vanished the index values +// are used unchanged: there is nothing to re-hash, and the row still +// records the loser's identity so the prior blake3 stays queryable. func (r *peerSyncRouter) conflictRowForPreStage(sess *peerSession, path, srcAbs, preservedRel string, entry *sessionEntry) (store.FileRow, error) { state, err := rehashSource(srcAbs, entry.priorRow.Blake3) if err != nil { @@ -1032,6 +1197,8 @@ func (r *peerSyncRouter) conflictRowForPreStage(sess *peerSession, path, srcAbs, relabelled := *entry.priorRow relabelled.Blake3 = state.digest relabelled.SizeBytes = state.size + relabelled.OriginNodeID = sql.NullInt64{} + relabelled.OriginRunID = sql.NullInt64{} entry.priorRow = &relabelled } return store.FileRow{ @@ -1047,28 +1214,82 @@ func (r *peerSyncRouter) conflictRowForPreStage(sess *peerSession, path, srcAbs, }, nil } -// priorProvenance lifts the prior row's (source_node_id, source_run_id) -// into a *store.Provenance the way Upsert expects: nil for a local -// write (both NULLs), pointer-carrying for peer-sourced rows. Either -// half being NULL is treated as "local write" — partial provenance is -// a schema-impossible state today, but degrading gracefully here keeps -// the conflict path open if a future migration ever ends up with one. +// priorProvenance lifts the prior row's (origin_node_id, origin_run_id) +// into a *store.Provenance the way Upsert expects: nil for locally +// introduced content (both NULLs), pointer-carrying for peer-sourced +// rows. Either half being NULL is treated as "introduced locally" — +// partial provenance is a schema-impossible state today, but degrading +// gracefully here keeps the conflict path open if a future migration +// ever ends up with one. func priorProvenance(r *store.FileRow) *store.Provenance { - if r == nil || !r.SourceNodeID.Valid || !r.SourceRunID.Valid { + if r == nil || !r.OriginNodeID.Valid || !r.OriginRunID.Valid { return nil } - return &store.Provenance{NodeID: r.SourceNodeID.Int64, RunID: r.SourceRunID.Int64} + return &store.Provenance{NodeID: r.OriginNodeID.Int64, RunID: r.OriginRunID.Int64} +} + +// preStageTransfers preserves out-of-band bytes that rclone is about to +// overwrite at a Transfer destination. classify chose Transfer because +// the receiver has no live (present) index row at the path, yet a +// regular file can still exist there (dropped in by a web app, scp, or +// created since the last index). Without this move the upcoming rclone +// copy — which runs with no --backup-dir for node syncs — would destroy +// those bytes with no history. The guard mirrors the one +// preStageCopyFromExisting applies to its own destinations: Lstat, +// then move any regular file into .squirrel-history/run-<receiverRunID>/. +// +// This is a move-only pass (rclone delivers the bytes after /plan +// returns), so a failure aborts the plan with the already-moved files +// left under run-<id>/ — recoverable by the operator, and the next +// /plan replans the same Transfer. +func (r *peerSyncRouter) preStageTransfers(sess *peerSession) error { + histRoot := filepath.Join(sess.volume.Path, HistoryDirName, "run-"+strconv.FormatInt(sess.receiverRunID, 10)) + for relPath, entry := range sess.dispositions { + if entry.disposition != syncproto.DispositionTransfer { + continue + } + dstAbs := filepath.Join(sess.volume.Path, relPath) + info, err := os.Lstat(dstAbs) + if err != nil { + if errors.Is(err, os.ErrNotExist) { + continue + } + return fmt.Errorf("stat transfer dst %s: %w", relPath, err) + } + if !info.Mode().IsRegular() { + continue + } + histDst := filepath.Join(histRoot, relPath) + if err := os.MkdirAll(filepath.Dir(histDst), 0o755); err != nil { + return fmt.Errorf("mkdir history for %s: %w", relPath, err) + } + if err := os.Rename(dstAbs, histDst); err != nil { + // Tolerate a concurrent unlink between Lstat and Rename: the + // file we wanted to preserve is gone, so the Transfer can + // proceed against the now-empty destination. Mirrors the same + // race handling in preStageCopyFromExisting and preMoveSupersedes. + if errors.Is(err, os.ErrNotExist) { + continue + } + return fmt.Errorf("preserve out-of-band %s → %s: %w", relPath, histDst, err) + } + } + return nil } // handleVerify implements POST /v1/sync/verify. func (r *peerSyncRouter) handleVerify(w http.ResponseWriter, req *http.Request) { var body syncproto.VerifyRequest - if err := decodeJSON(req, &body); err != nil { + if err := decodeJSON(w, req, &body); err != nil { writeError(w, http.StatusBadRequest, err.Error()) return } - sess := r.lookupSession(body.ReceiverRunID) - if sess == nil { + sess, ok, err := r.lookupSession(body.ReceiverRunID, callerNodeName(req)) + if err != nil { + writeError(w, http.StatusForbidden, err.Error()) + return + } + if !ok { writeError(w, http.StatusNotFound, "no session for receiver_run_id") return } @@ -1190,12 +1411,16 @@ func rehashSource(srcAbs string, expected []byte) (onDiskState, error) { // handleClose implements POST /v1/sync/close. func (r *peerSyncRouter) handleClose(w http.ResponseWriter, req *http.Request) { var body syncproto.CloseRequest - if err := decodeJSON(req, &body); err != nil { + if err := decodeJSON(w, req, &body); err != nil { writeError(w, http.StatusBadRequest, err.Error()) return } - sess := r.takeSession(body.ReceiverRunID) - if sess == nil { + sess, ok, err := r.takeSession(body.ReceiverRunID, callerNodeName(req)) + if err != nil { + writeError(w, http.StatusForbidden, err.Error()) + return + } + if !ok { writeError(w, http.StatusNotFound, "no session for receiver_run_id") return } @@ -1243,12 +1468,16 @@ func (r *peerSyncRouter) finalizeFailedClose(ctx context.Context, runID int64, c // Returns the number of file rows the function wrote, distinct from // the original plan size when some paths were dropped due to verify // mismatch. +// +// Each committed row carries its entry's declared content origin — +// the wire-carried (node name, origin-space run) pair mapped to a +// local nodes row — so a forwarded origin survives every hop verbatim. func (r *peerSyncRouter) closeSession(ctx context.Context, sess *peerSession, status string, failedPaths []string) (int, error) { skip := make(map[string]struct{}, len(failedPaths)) for _, p := range failedPaths { skip[p] = struct{}{} } - prov := &store.Provenance{NodeID: sess.peerNodeID, RunID: sess.receiverRunID} + origins := newOriginResolver(r.srv.store, sess) committed := 0 for path, entry := range sess.dispositions { if !materializesAtPath(entry.disposition) { @@ -1257,6 +1486,10 @@ func (r *peerSyncRouter) closeSession(ctx context.Context, sess *peerSession, st if _, dropped := skip[path]; dropped { continue } + prov, err := origins.provenance(ctx, entry) + if err != nil { + return committed, fmt.Errorf("resolve origin for %s: %w", path, err) + } row := store.FileRow{ VolumeID: sess.volumeID, Path: path, @@ -1288,11 +1521,48 @@ func (r *peerSyncRouter) closeSession(ctx context.Context, sess *peerSession, st return committed, nil } +// originResolver maps wire-carried origin node names to local nodes +// rows for one /close commit, caching get-or-create lookups so a plan +// full of same-origin entries costs one row resolution. +type originResolver struct { + store *store.Store + sess *peerSession + ids map[string]int64 +} + +func newOriginResolver(s *store.Store, sess *peerSession) *originResolver { + return &originResolver{store: s, sess: sess, ids: make(map[string]int64)} +} + +// provenance returns the contents-origin attribution for one entry: +// the declared origin verbatim when the wire carried one, else the +// pre-origin-exchange fallback — the initiator itself at its declared +// sync run (correlatedRunID is in the initiator's run space, the same +// coordinate system declared origins use). +func (o *originResolver) provenance(ctx context.Context, entry *sessionEntry) (*store.Provenance, error) { + if entry.originNode == "" { + return &store.Provenance{NodeID: o.sess.peerNodeID, RunID: o.sess.correlatedRunID}, nil + } + id, ok := o.ids[entry.originNode] + if !ok { + node, err := o.store.GetOrCreateOriginNode(ctx, entry.originNode) + if err != nil { + return nil, err + } + id = node.ID + o.ids[entry.originNode] = id + } + return &store.Provenance{NodeID: id, RunID: entry.originRun}, nil +} + // decodeJSON parses the request body with strict field handling so // initiator typos surface as 400 rather than as a silently-ignored -// field. -func decodeJSON(req *http.Request, v any) error { - dec := json.NewDecoder(req.Body) +// field. The body is wrapped in http.MaxBytesReader at maxPlanBodyBytes +// so a token-holding peer can't OOM the agent with one huge body +// (#110c); a body over the cap surfaces as a decode error the handler +// returns as 400. +func decodeJSON(w http.ResponseWriter, req *http.Request, v any) error { + dec := json.NewDecoder(http.MaxBytesReader(w, req.Body, maxPlanBodyBytes)) dec.DisallowUnknownFields() if err := dec.Decode(v); err != nil { return fmt.Errorf("decode body: %w", err) diff --git a/agent/sync_test.go b/agent/sync_test.go index 9f01dcd..89e3684 100644 --- a/agent/sync_test.go +++ b/agent/sync_test.go @@ -1,8 +1,12 @@ package agent import ( + "bytes" "context" "encoding/hex" + "encoding/json" + "net/http" + "net/http/httptest" "os" "path/filepath" "strconv" @@ -26,13 +30,14 @@ func blakeHex(b []byte) string { // seed the index and the volume tree, then call plan with one initiator // entry per path. type preStageFixture struct { - router *peerSyncRouter - store *store.Store - vol *config.Volume - volID int64 - peerID int64 - peerRun int64 - recvRun int64 + router *peerSyncRouter + store *store.Store + vol *config.Volume + volID int64 + peerID int64 + peerRun int64 + localRun int64 + recvRun int64 } // newPreStageFixture builds the fixture: a fresh volume directory, a @@ -67,18 +72,29 @@ func newPreStageFixture(t *testing.T) *preStageFixture { if err := srv.store.UpsertPeerSyncState(ctx, v.ID, peer.ID, correlated, false); err != nil { t.Fatalf("UpsertPeerSyncState: %v", err) } + // A finished local index run for rows that must classify as + // receiver-local writes (delivery is judged by the first-seen run's + // peer linkage, and an index run has none). + localRun, err := srv.store.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false) + if err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + if err := srv.store.FinishRun(ctx, localRun, store.RunStatusSuccess, "", 0); err != nil { + t.Fatalf("FinishRun local: %v", err) + } recvRun, err := srv.store.BeginPeerSyncRun(ctx, v.ID, peer.ID, correlated+1, "peerA") if err != nil { t.Fatalf("BeginPeerSyncRun (receiver): %v", err) } return &preStageFixture{ - router: srv.router, - store: srv.store, - vol: vol, - volID: v.ID, - peerID: peer.ID, - peerRun: peerRun, - recvRun: recvRun, + router: srv.router, + store: srv.store, + vol: vol, + volID: v.ID, + peerID: peer.ID, + peerRun: peerRun, + localRun: localRun, + recvRun: recvRun, } } @@ -223,8 +239,8 @@ func TestPreStageReHashRelabelsConflictDrift(t *testing.T) { contentY := []byte("out-of-band replacement bytes Y Y Y") contentZ := []byte("incoming initiator bytes Z") - // Seed a present row with NO provenance (local write) so classify - // returns Conflict rather than Supersede. + // Seed a present row first-seen by a local index run (no peer + // linkage) so classify returns Conflict rather than Supersede. abs := filepath.Join(f.vol.Path, "local.txt") if err := os.WriteFile(abs, contentX, 0o644); err != nil { t.Fatalf("write: %v", err) @@ -238,8 +254,8 @@ func TestPreStageReHashRelabelsConflictDrift(t *testing.T) { SizeBytes: int64(len(contentX)), MtimeNs: store.NowNs(), Status: store.StatusPresent, - FirstSeenRunID: f.peerRun, - LastSeenRunID: f.peerRun, + FirstSeenRunID: f.localRun, + LastSeenRunID: f.localRun, IndexedAtNs: store.NowNs(), }, nil); err != nil { t.Fatalf("Upsert local: %v", err) @@ -260,6 +276,9 @@ func TestPreStageReHashRelabelsConflictDrift(t *testing.T) { if len(plan.Conflicts) != 1 { t.Fatalf("conflicts = %d, want 1: %+v", len(plan.Conflicts), plan.Conflicts) } + if plan.Conflicts[0].Reason != "local write on receiver" { + t.Fatalf("reason = %q, want 'local write on receiver' (classify-time conflict, not the drift downgrade)", plan.Conflicts[0].Reason) + } preservedRel := plan.Conflicts[0].PreservedAtPath row, err := f.store.GetByPath(ctx, f.volID, preservedRel) if err != nil { @@ -313,3 +332,388 @@ func TestPreStageSupersedeNoDriftStillBuckets(t *testing.T) { t.Fatalf("conflicts dir exists for a clean supersede (err=%v)", err) } } + +// TestCloseRecordsVerbatimOrigin: a /plan entry declaring a forwarded +// origin lands on the contents row exactly as declared — the origin +// node name is resolved (created) locally and the origin-space run id +// is stored untranslated. The receiver has never peered with "delta"; +// only the forwarding initiator has. +func TestCloseRecordsVerbatimOrigin(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + + content := []byte("forwarded bytes") + sess := f.newSession() + if _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "fwd.txt", Blake3Hex: blakeHex(content), SizeBytes: int64(len(content)), + OriginNode: "delta", OriginRun: 5}, + }); err != nil { + t.Fatalf("planSession: %v", err) + } + if _, err := f.router.closeSession(ctx, sess, store.RunStatusSuccess, nil); err != nil { + t.Fatalf("closeSession: %v", err) + } + + row, err := f.store.GetByPath(ctx, f.volID, "fwd.txt") + if err != nil { + t.Fatalf("GetByPath fwd.txt: %v", err) + } + delta, err := f.store.GetNodeByName(ctx, "delta") + if err != nil { + t.Fatalf("origin node row for delta was not created: %v", err) + } + if !row.OriginNodeID.Valid || row.OriginNodeID.Int64 != delta.ID { + t.Fatalf("OriginNodeID = %+v, want delta's row %d (verbatim, not the initiator %d)", + row.OriginNodeID, delta.ID, f.peerID) + } + if !row.OriginRunID.Valid || row.OriginRunID.Int64 != 5 { + t.Fatalf("OriginRunID = %+v, want 5 (origin run space, untranslated)", row.OriginRunID) + } +} + +// TestCloseFallbackAttributesToInitiator: an entry without the origin +// pair (a pre-origin-exchange initiator) is attributed to the +// initiator itself at its declared sync run — the initiator-run-space +// coordinate the session already carries. +func TestCloseFallbackAttributesToInitiator(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + + content := []byte("legacy entry without origin") + sess := f.newSession() + if _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "old.txt", Blake3Hex: blakeHex(content), SizeBytes: int64(len(content))}, + }); err != nil { + t.Fatalf("planSession: %v", err) + } + if _, err := f.router.closeSession(ctx, sess, store.RunStatusSuccess, nil); err != nil { + t.Fatalf("closeSession: %v", err) + } + + row, err := f.store.GetByPath(ctx, f.volID, "old.txt") + if err != nil { + t.Fatalf("GetByPath old.txt: %v", err) + } + if !row.OriginNodeID.Valid || row.OriginNodeID.Int64 != f.peerID { + t.Fatalf("OriginNodeID = %+v, want the initiator %d", row.OriginNodeID, f.peerID) + } + if !row.OriginRunID.Valid || row.OriginRunID.Int64 != sess.correlatedRunID { + t.Fatalf("OriginRunID = %+v, want the initiator's sync run %d", row.OriginRunID, sess.correlatedRunID) + } +} + +// TestPlanRejectsMalformedOrigin: half-declared or invalid origins are +// refused at /plan, before any bytes move or rows commit. +func TestPlanRejectsMalformedOrigin(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + content := []byte("x") + + cases := []struct { + name string + entry syncproto.IndexEntry + }{ + {"node without run", syncproto.IndexEntry{ + Path: "a.txt", Blake3Hex: blakeHex(content), OriginNode: "delta"}}, + {"run without node", syncproto.IndexEntry{ + Path: "a.txt", Blake3Hex: blakeHex(content), OriginRun: 9}}, + {"negative run", syncproto.IndexEntry{ + Path: "a.txt", Blake3Hex: blakeHex(content), OriginNode: "delta", OriginRun: -1}}, + {"invalid name", syncproto.IndexEntry{ + Path: "a.txt", Blake3Hex: blakeHex(content), OriginNode: "../up", OriginRun: 9}}, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + sess := f.newSession() + if _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{c.entry}); err == nil { + t.Fatalf("planSession accepted %+v, want error", c.entry) + } + }) + } +} + +// TestConflictDriftRelabelClearsOrigin: when the conflict pre-stage +// finds drifted bytes, the preserved row describes a fresh, locally +// introduced content — the prior content's origin coordinate must not +// be copied onto bytes that never travelled from that origin, or the +// durability vector would vouch for content no destination holds. +func TestConflictDriftRelabelClearsOrigin(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + + contentX := []byte("peer-delivered bytes") + contentY := []byte("out-of-band replacement") + contentZ := []byte("incoming initiator bytes") + f.seedIndexedFile(t, ctx, "doc.md", contentX) + if err := os.WriteFile(filepath.Join(f.vol.Path, "doc.md"), contentY, 0o644); err != nil { + t.Fatalf("drift write: %v", err) + } + + sess := f.newSession() + plan, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "doc.md", Blake3Hex: blakeHex(contentZ), SizeBytes: int64(len(contentZ))}, + }) + if err != nil { + t.Fatalf("planSession: %v", err) + } + if len(plan.Conflicts) != 1 { + t.Fatalf("conflicts = %d, want 1", len(plan.Conflicts)) + } + row, err := f.store.GetByPath(ctx, f.volID, plan.Conflicts[0].PreservedAtPath) + if err != nil { + t.Fatalf("GetByPath preserved: %v", err) + } + if row.OriginNodeID.Valid || row.OriginRunID.Valid { + t.Fatalf("preserved drifted row origin = (%+v, %+v), want NULLs (local introduction)", + row.OriginNodeID, row.OriginRunID) + } +} + +// TestPlanRejectsSelfAttributedOrigin (#105): a peer is never +// authoritative about the receiver's own introductions, so an origin +// naming the receiver's self node is refused at /plan — whether it +// carries a plausible run id (a peer must re-introduce locally with a +// NULL origin) or an absurd one above the receiver's latest allocated +// run id (the durability-vector poisoning attack). A legitimate +// forwarded third-party origin under the same plan still commits. +func TestPlanRejectsSelfAttributedOrigin(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + self, err := f.store.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + content := []byte("incoming bytes") + + t.Run("self name with plausible run refused", func(t *testing.T) { + sess := f.newSession() + _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "a.txt", Blake3Hex: blakeHex(content), SizeBytes: int64(len(content)), + OriginNode: self.Name, OriginRun: 1}, + }) + if err == nil { + t.Fatalf("planSession accepted a self-attributed origin, want refusal") + } + }) + + t.Run("self name with absurd run refused", func(t *testing.T) { + sess := f.newSession() + _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "a.txt", Blake3Hex: blakeHex(content), SizeBytes: int64(len(content)), + OriginNode: self.Name, OriginRun: sess.receiverRunID + 1_000_000}, + }) + if err == nil { + t.Fatalf("planSession accepted an absurd self origin_run, want refusal") + } + }) + + t.Run("forwarded third-party origin accepted", func(t *testing.T) { + sess := f.newSession() + if _, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "fwd.txt", Blake3Hex: blakeHex(content), SizeBytes: int64(len(content)), + OriginNode: "delta", OriginRun: 7}, + }); err != nil { + t.Fatalf("planSession refused a legitimate third-party origin: %v", err) + } + if _, err := f.router.closeSession(ctx, sess, store.RunStatusSuccess, nil); err != nil { + t.Fatalf("closeSession: %v", err) + } + row, err := f.store.GetByPath(ctx, f.volID, "fwd.txt") + if err != nil { + t.Fatalf("GetByPath fwd.txt: %v", err) + } + delta, err := f.store.GetNodeByName(ctx, "delta") + if err != nil { + t.Fatalf("origin node delta not created: %v", err) + } + if !row.OriginNodeID.Valid || row.OriginNodeID.Int64 != delta.ID { + t.Fatalf("OriginNodeID = %+v, want delta's row %d", row.OriginNodeID, delta.ID) + } + if !row.OriginRunID.Valid || row.OriginRunID.Int64 != 7 { + t.Fatalf("OriginRunID = %+v, want 7 (origin run space, untranslated)", row.OriginRunID) + } + }) +} + +// TestPreStageTransferPreservesOutOfBandFile (#106a): a regular file +// dropped at a Transfer destination out-of-band (no live index row) must +// be moved into .squirrel-history/run-<id>/ before the rclone pass +// overwrites it, since node syncs run without --backup-dir. The destination +// is freed for the incoming bytes and the prior bytes stay recoverable. +func TestPreStageTransferPreservesOutOfBandFile(t *testing.T) { + ctx := context.Background() + f := newPreStageFixture(t) + + outOfBand := []byte("bytes a web app dropped in, never indexed") + incoming := []byte("the initiator's incoming content") + abs := filepath.Join(f.vol.Path, "drop.bin") + if err := os.WriteFile(abs, outOfBand, 0o644); err != nil { + t.Fatalf("write out-of-band file: %v", err) + } + + sess := f.newSession() + plan, err := f.router.planSession(ctx, sess, []syncproto.IndexEntry{ + {Path: "drop.bin", Blake3Hex: blakeHex(incoming), SizeBytes: int64(len(incoming))}, + }) + if err != nil { + t.Fatalf("planSession: %v", err) + } + if got := sess.dispositions["drop.bin"].disposition; got != syncproto.DispositionTransfer { + t.Fatalf("disposition = %q, want transfer", got) + } + if len(plan.Conflicts) != 0 { + t.Fatalf("conflicts = %d, want 0 (a plain out-of-band file is history, not a conflict)", len(plan.Conflicts)) + } + + // The destination is now free for rclone to deliver the incoming bytes. + if _, err := os.Lstat(abs); !os.IsNotExist(err) { + t.Fatalf("Lstat drop.bin err = %v, want the destination cleared for the transfer", err) + } + + // The prior bytes are preserved verbatim under this run's history dir. + histPath := filepath.Join(f.vol.Path, HistoryDirName, "run-"+strconv.FormatInt(f.recvRun, 10), "drop.bin") + got, err := os.ReadFile(histPath) + if err != nil { + t.Fatalf("read preserved history file: %v", err) + } + if string(got) != string(outOfBand) { + t.Fatalf("preserved bytes = %q, want the out-of-band bytes", got) + } +} + +// TestValidateRelPathRejectsAllReservedDirs (#106b): the receiver's wire +// path allow-list must reject all four reserved sync directories, matching +// the initiator-side filter. A path under .squirrel-restore-history or +// .squirrel-index could otherwise let a peer overwrite the receiver's only +// pre-restore backup or its index ride-along. +func TestValidateRelPathRejectsAllReservedDirs(t *testing.T) { + reserved := []string{ + HistoryDirName + "/run-1/x", + ConflictsDirName + "/run-1/x", + RestoreHistoryDirName + "/run-1/x", + IndexDirName + "/index.db", + RestoreHistoryDirName, + IndexDirName, + } + for _, p := range reserved { + t.Run(p, func(t *testing.T) { + if err := validateRelPath(p); err == nil { + t.Fatalf("validateRelPath(%q) = nil, want a reserved-dir rejection", p) + } + if err := validateFolderPath(p); err == nil { + t.Fatalf("validateFolderPath(%q) = nil, want a reserved-dir rejection", p) + } + }) + } + if err := validateRelPath("photos/2024/img.jpg"); err != nil { + t.Fatalf("validateRelPath rejected an ordinary path: %v", err) + } +} + +// TestSessionBoundToCaller (#110a): a phase call presenting a caller +// identity that differs from the node that opened the session is refused. +// The single shared agent token carries no per-request identity yet +// (#110d), so the production phase handlers pass "" (no binding) — this +// exercises the binding directly to prove the chokepoint is correct for +// when per-peer tokens make a caller identity recoverable. +func TestSessionBoundToCaller(t *testing.T) { + f := newPreStageFixture(t) + r := f.router + r.storeSession(&peerSession{ + receiverRunID: f.recvRun, + volume: f.vol, + volumeID: f.volID, + peerNodeID: f.peerID, + initiatorNodeName: "owner", + dispositions: make(map[string]*sessionEntry), + }) + + if _, _, err := r.lookupSession(f.recvRun, "intruder"); err == nil { + t.Fatalf("lookupSession bound to a foreign caller, want %v", errSessionCallerMismatch) + } + if sess, ok, err := r.lookupSession(f.recvRun, "owner"); err != nil || !ok || sess == nil { + t.Fatalf("lookupSession for the owning caller = (%v, %v, %v), want the session", sess, ok, err) + } + if sess, ok, err := r.lookupSession(f.recvRun, ""); err != nil || !ok || sess == nil { + t.Fatalf("lookupSession with no caller identity = (%v, %v, %v), want the session (pre-#110d)", sess, ok, err) + } + // A foreign caller must not be able to take (and thereby abort) the + // session: it stays in place for the legitimate owner. + if _, _, err := r.takeSession(f.recvRun, "intruder"); err == nil { + t.Fatalf("takeSession bound to a foreign caller, want %v", errSessionCallerMismatch) + } + if sess, ok, err := r.takeSession(f.recvRun, "owner"); err != nil || !ok || sess == nil { + t.Fatalf("takeSession for the owning caller = (%v, %v, %v), want the session removed", sess, ok, err) + } +} + +// TestPlanRejectsOversizedBody (#110c): /plan wraps the request body in +// http.MaxBytesReader, so a body past the cap is refused with 400 before +// it can be buffered into memory — a token-holding peer can't OOM the +// agent with one huge body. A separate len(Entries) cap guards against an +// entry count that stays within the byte ceiling. +func TestPlanRejectsOversizedBody(t *testing.T) { + vol := &config.Volume{Name: "pics", Path: t.TempDir()} + srv := newTestServer(t, Config{Volumes: map[string]*config.Volume{vol.Name: vol}}) + + t.Run("body over the byte cap", func(t *testing.T) { + prev := maxPlanBodyBytes + maxPlanBodyBytes = 64 + defer func() { maxPlanBodyBytes = prev }() + + body := append([]byte(`{"receiver_run_id":1,"entries":[`), bytes.Repeat([]byte(" "), 256)...) + body = append(body, ']', '}') + if code := postRaw(t, srv, "/v1/sync/plan", body); code != http.StatusBadRequest { + t.Fatalf("status = %d, want 400 for an over-cap body", code) + } + }) + + t.Run("entry count over the cap", func(t *testing.T) { + prev := maxPlanEntries + maxPlanEntries = 1 + defer func() { maxPlanEntries = prev }() + + req := syncproto.PlanRequest{ + ReceiverRunID: 1, + Entries: []syncproto.IndexEntry{ + {Path: "a.txt", Blake3Hex: blakeHex([]byte("a"))}, + {Path: "b.txt", Blake3Hex: blakeHex([]byte("b"))}, + }, + } + encoded, err := json.Marshal(req) + if err != nil { + t.Fatalf("marshal: %v", err) + } + if code := postRaw(t, srv, "/v1/sync/plan", encoded); code != http.StatusBadRequest { + t.Fatalf("status = %d, want 400 for an over-cap entry count", code) + } + }) +} + +// TestPeerEndpointIgnoresWireEndpoint (#110b): the receiver derives the +// peer-row endpoint from the node name alone, never the unauthenticated +// InitiatorEndpoint, so a peer cannot bind an arbitrary dial-back URL at +// /begin. A real endpoint is bound only by operator config on the +// initiator side. +func TestPeerEndpointIgnoresWireEndpoint(t *testing.T) { + got := peerEndpoint(syncproto.BeginRequest{ + InitiatorNodeName: "owner", + InitiatorEndpoint: "https://attacker.example:8443", + }) + if got != "peer://owner" { + t.Fatalf("peerEndpoint = %q, want the name-derived placeholder peer://owner", got) + } +} + +// postRaw POSTs body verbatim to urlPath with the test bearer token and +// returns the HTTP status, so a malformed or oversized body can be driven +// without the typed marshal helpers rejecting it first. +func postRaw(t *testing.T, srv *Server, urlPath string, body []byte) int { + t.Helper() + req := httptest.NewRequest(http.MethodPost, urlPath, bytes.NewReader(body)) + req.Header.Set("Authorization", "Bearer test-token") + req.Header.Set("Content-Type", "application/json") + rec := httptest.NewRecorder() + srv.Handler().ServeHTTP(rec, req) + return rec.Code +} diff --git a/cmd/squirrel-desktop/app/handlers/runs.go b/cmd/squirrel-desktop/app/handlers/runs.go index 7ac8cbe..c32a7b7 100644 --- a/cmd/squirrel-desktop/app/handlers/runs.go +++ b/cmd/squirrel-desktop/app/handlers/runs.go @@ -161,7 +161,7 @@ func (h *Runs) StartSync(w http.ResponseWriter, r *http.Request) { return } - pair, rcl, err := h.resolveSyncTarget(r.Context(), name, dest) + pair, tools, err := h.resolveSyncTarget(r.Context(), name, dest) if err != nil { log.Printf("desktop: sync %s → %s: %v", name, dest, err) http.Redirect(w, r, "/runs", http.StatusSeeOther) @@ -178,7 +178,7 @@ func (h *Runs) StartSync(w http.ResponseWriter, r *http.Request) { priorMax = h.latestSyncRunID(r.Context(), v.ID, dest) } - go h.runSyncGoroutine(name, dest, pair, rcl) + go h.runSyncGoroutine(name, dest, pair, tools) if volumeID == 0 { // No volumes row yet means the volume has never been @@ -225,26 +225,30 @@ func (h *Runs) runIndexGoroutine(path, name string) { } // resolveSyncTarget converts the validated (name, dest) into the -// concrete sync.Pair plus a configured Rclone, isolating the +// concrete sync.Pair plus the configured tool wrappers, isolating the // fail-fast surface that should redirect to /runs from the // validate-and-redirect-to-detail flow in StartSync. -func (h *Runs) resolveSyncTarget(ctx context.Context, name, dest string) (syncpkg.Pair, *syncpkg.Rclone, error) { +func (h *Runs) resolveSyncTarget(ctx context.Context, name, dest string) (syncpkg.Pair, syncpkg.Tools, error) { pairs, err := syncpkg.PairsFor(h.Config, name, dest) if err != nil { - return syncpkg.Pair{}, nil, fmt.Errorf("pairs: %w", err) + return syncpkg.Pair{}, syncpkg.Tools{}, fmt.Errorf("pairs: %w", err) } rcl, err := h.prepareRclone(ctx) if err != nil { - return syncpkg.Pair{}, nil, fmt.Errorf("prepare rclone: %w", err) + return syncpkg.Pair{}, syncpkg.Tools{}, fmt.Errorf("prepare rclone: %w", err) } - return pairs[0], rcl, nil + tools, err := syncpkg.ToolsFor(h.Config, pairs[:1], rcl) + if err != nil { + return syncpkg.Pair{}, syncpkg.Tools{}, err + } + return pairs[0], tools, nil } // runSyncGoroutine is the body of the background sync goroutine. It // uses a fresh context.Background() because the sync may outlive the // HTTP request that kicked it off, and surfaces non-success outcomes // via the request log — the runs table carries the durable state. -func (h *Runs) runSyncGoroutine(name, dest string, pair syncpkg.Pair, rcl *syncpkg.Rclone) { +func (h *Runs) runSyncGoroutine(name, dest string, pair syncpkg.Pair, tools syncpkg.Tools) { ctx := context.Background() var runID int64 opts := syncpkg.Options{ @@ -263,7 +267,7 @@ func (h *Runs) runSyncGoroutine(name, dest string, pair syncpkg.Pair, rcl *syncp h.hub.Close(runID) } }() - rep, err := syncpkg.RunPair(ctx, h.Store, rcl, pair, opts) + rep, err := syncpkg.RunPair(ctx, h.Store, tools, pair, opts) switch { case err != nil: log.Printf("desktop: sync %s → %s: %v", name, dest, err) @@ -287,9 +291,14 @@ func (h *Runs) prepareRclone(ctx context.Context) (*syncpkg.Rclone, error) { if err != nil { return nil, err } - // Shallow=false matches CLI defaults — the desktop trigger runs - // the full integrity-checking sync, not a fast path. - if err := syncpkg.EnsureMinVersion(ctx, rcl, io.Discard, false); err != nil { + // Desktop triggers run the full integrity-checking sync (the CLI + // default, Shallow=false); the pairs scope the version preflight to + // the blake3 use the configured targets will actually invoke. + pairs, err := syncpkg.PairsFor(h.Config, "", "") + if err != nil { + return nil, err + } + if err := syncpkg.EnsureMinVersion(ctx, rcl, io.Discard, syncpkg.ShallowForPairs(pairs, false)); err != nil { return nil, err } confPath := filepath.Join(filepath.Dir(h.Config.Path), "rclone.conf") diff --git a/cmd/squirrel/agent.go b/cmd/squirrel/agent.go index d4a5ff5..943be49 100644 --- a/cmd/squirrel/agent.go +++ b/cmd/squirrel/agent.go @@ -132,12 +132,12 @@ func resolveAgentDBPath(cmd *cobra.Command, cfg *config.Config) (string, error) // lookup so a host without rclone installed can still run the agent // for its peer-sync surface or its index cadences. // -// EnsureMinVersion runs with shallow=false because scheduled syncs go -// through sync.RunPair with the default sync.Options{} (Shallow=false) -// — i.e. they will pass `--hash blake3`, which is only available in -// rclone ≥ MinRcloneVersion. Failing here means the operator gets a -// clear startup error rather than a midnight pager when the first -// scheduled sync fires and rclone rejects the flag. +// The version preflight mirrors what scheduled syncs will invoke: they +// run with the default sync.Options{} (Shallow=false), so `--hash blake3` +// requires rclone ≥ MinRcloneVersion unless every configured target is a +// crypt destination, which forces shallow. Failing here means the +// operator gets a clear startup error rather than a midnight pager when +// the first scheduled sync fires and rclone rejects the flag. func resolveSchedulerRclone(cmd *cobra.Command, cfg *config.Config) (*sync.Rclone, error) { if !anyVolumeNeedsScheduledSync(cfg) { return nil, nil @@ -146,7 +146,11 @@ func resolveSchedulerRclone(cmd *cobra.Command, cfg *config.Config) (*sync.Rclon if err != nil { return nil, fmt.Errorf("scheduler needs rclone for scheduled syncs: %w", err) } - if err := sync.EnsureMinVersion(cmd.Context(), rcl, cmd.ErrOrStderr(), false); err != nil { + pairs, err := sync.PairsFor(cfg, "", "") + if err != nil { + return nil, fmt.Errorf("scheduler rclone preflight: %w", err) + } + if err := sync.EnsureMinVersion(cmd.Context(), rcl, cmd.ErrOrStderr(), sync.ShallowForPairs(pairs, false)); err != nil { return nil, fmt.Errorf("scheduler rclone preflight: %w", err) } if _, err := rcl.WriteRcloneConfig(rcloneConfigPathFor(cfg), cfg.Destinations); err != nil { @@ -180,6 +184,13 @@ func buildSchedulerSyncRunner(cfg *config.Config, s *store.Store, rcl *sync.Rclo if err != nil { return agent.SyncRunReport{Err: err} } + // Per-kick because the kopia lookup belongs to the kicks that + // target a kopia destination: a host whose schedule never + // touches one runs fine without the binary installed. + tools, err := sync.ToolsFor(cfg, []sync.Pair{pair}, rcl) + if err != nil { + return agent.SyncRunReport{Err: err} + } // Snapshot-on-sync fires on each node's scheduled syncs too (#75): // this is the operating cadence the catalog churns on. Each kick is // a single pair, so a fresh Snapshotter per kick is the right unit. @@ -187,7 +198,7 @@ func buildSchedulerSyncRunner(cfg *config.Config, s *store.Store, rcl *sync.Rclo if cfg.Backups.Enabled { opts.Snapshot = sync.NewSnapshotter(s, rcl, snapshotConfig(cfg, s.Path())) } - rep, runErr := sync.RunPair(ctx, s, rcl, pair, opts) + rep, runErr := sync.RunPair(ctx, s, tools, pair, opts) return agent.SyncRunReport{RunID: rep.RunID, Status: rep.Status, Err: runErr} } } diff --git a/cmd/squirrel/db.go b/cmd/squirrel/db.go index 8e9ac4e..263771f 100644 --- a/cmd/squirrel/db.go +++ b/cmd/squirrel/db.go @@ -171,20 +171,85 @@ func runDBRestore(cmd *cobra.Command, snapshotPath string, force bool) error { } } - // Atomic-ish swap: rename the snapshot over the live DB and remove - // any stale -wal/-shm sidecars. os.Rename is atomic within one - // filesystem; if the snapshot and the live DB are on different - // filesystems the rename will fail and the user can copy first. + preRestore, err := preserveLiveDB(liveAbs) + if err != nil { + return err + } + + // os.Rename is atomic within one filesystem; if the snapshot and the + // live DB are on different filesystems the rename fails and the user + // can copy first. preserveLiveDB has already moved the live DB and its + // sidecars aside, so no stale -wal can attach to the incoming snapshot. + // On failure, move the preserved DB back so the command doesn't strand + // the install with a missing live DB. if err := os.Rename(snapshotAbs, liveAbs); err != nil { + if preRestore != "" { + rollbackLiveDB(preRestore, liveAbs) + } return fmt.Errorf("replace live DB with snapshot: %w", err) } - for _, suffix := range []string{"-wal", "-shm"} { - _ = os.Remove(liveAbs + suffix) - } fmt.Fprintf(cmd.OutOrStdout(), "restored %s from %s\n", liveAbs, snapshotAbs) + if preRestore != "" { + fmt.Fprintf(cmd.OutOrStdout(), "preserved prior live DB at %s\n", preRestore) + } return nil } +// dbSidecarSuffixes are SQLite's WAL-mode side files. A stale one left +// beside the live path would replay into whatever DB takes that path next. +var dbSidecarSuffixes = []string{"-wal", "-shm"} + +// preserveLiveDB clears the live database path before a restore overwrites +// it. The main file (when present) is renamed aside to +// "<liveAbs>.pre-restore-<unixnano>" so a wrong restore is recoverable, +// and its -wal/-shm sidecars move with it, keeping the preserved copy's +// full state. Any sidecar without a main file (an orphan from a crash or +// manual move) is removed instead — leaving it would let the incoming +// snapshot replay a stale WAL. The timestamp follows how snapshot +// filenames are stamped elsewhere. Returns the preserved main-file path, +// or "" when no main live DB existed. +// +// Preserved copies are kept indefinitely — no rotation matches their +// name — so an operator who restores repeatedly reaps them by hand. +func preserveLiveDB(liveAbs string) (string, error) { + mainExists := true + if _, err := os.Stat(liveAbs); err != nil { + if !os.IsNotExist(err) { + return "", fmt.Errorf("stat live db: %w", err) + } + mainExists = false + } + if !mainExists { + for _, suffix := range dbSidecarSuffixes { + if err := os.Remove(liveAbs + suffix); err != nil && !os.IsNotExist(err) { + return "", fmt.Errorf("remove orphan %s sidecar: %w", suffix, err) + } + } + return "", nil + } + preRestore := fmt.Sprintf("%s.pre-restore-%d", liveAbs, time.Now().UTC().UnixNano()) + if err := os.Rename(liveAbs, preRestore); err != nil { + return "", fmt.Errorf("preserve live DB: %w", err) + } + for _, suffix := range dbSidecarSuffixes { + if err := os.Rename(liveAbs+suffix, preRestore+suffix); err != nil && !os.IsNotExist(err) { + rollbackLiveDB(preRestore, liveAbs) + return "", fmt.Errorf("preserve live DB %s sidecar: %w", suffix, err) + } + } + return preRestore, nil +} + +// rollbackLiveDB best-effort moves a preserved DB (and its sidecars) back +// to liveAbs after a later step fails, so a failed restore leaves the live +// DB where it started rather than stranded at the pre-restore path. +func rollbackLiveDB(preRestore, liveAbs string) { + _ = os.Rename(preRestore, liveAbs) + for _, suffix := range dbSidecarSuffixes { + _ = os.Rename(preRestore+suffix, liveAbs+suffix) + } +} + // defaultBackupsDir returns the parent directory squirrel uses for its // own backups, mirroring store.defaultBackupDir but lifted into the CLI // so subcommands don't have to open the store first to derive it. @@ -196,7 +261,10 @@ func defaultBackupsDir(dbPath string) string { // rotateBackups deletes the oldest snapshots in dir until only `keep` // remain. Snapshots are identified by the index-* and pre-migration-* // filename prefixes the store and CLI write — unknown files are left -// alone so we never delete something we didn't put there. +// alone so we never delete something we didn't put there. This is the +// explicit, operator-driven `db backup --keep` retention, so it does +// include pre-migration-* snapshots; the routine snapshot-on-sync +// rotation (sync.rotateSnapshots) exempts them. func rotateBackups(dir string, keep int) ([]string, error) { if keep <= 0 { return nil, nil diff --git a/cmd/squirrel/db_test.go b/cmd/squirrel/db_test.go index df2d363..3609527 100644 --- a/cmd/squirrel/db_test.go +++ b/cmd/squirrel/db_test.go @@ -92,6 +92,140 @@ func TestCLIDBRestoreSwapsLiveDB(t *testing.T) { } } +// TestCLIDBRestorePreservesPriorLiveDB is the #111a guard: a restore +// renames the prior live DB aside (recoverable) and prints its path. The +// preserved copy still carries the mutation the snapshot predates, so a +// wrong restore can be rolled back by moving it back. +func TestCLIDBRestorePreservesPriorLiveDB(t *testing.T) { + f := writeSyncFixture(t) + + snap := filepath.Join(t.TempDir(), "before.db") + runCLI(t, "--config", f.configPath, "db", "backup", "--to", snap) + + writeTestFile(t, filepath.Join(f.volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", f.volumeName) + + out := runCLI(t, "--config", f.configPath, "db", "restore", snap) + preserved := parsePreservedPath(t, out) + if !strings.HasPrefix(filepath.Base(preserved), "index.db.pre-restore-") { + t.Fatalf("preserved path %q lacks the pre-restore- stem", preserved) + } + if _, err := os.Stat(preserved); err != nil { + t.Fatalf("preserved prior live DB missing at reported path %q: %v", preserved, err) + } + + // The preserved copy still has the post-snapshot volume row, so the + // restore is reversible: swap it back and the volume reappears. + if err := os.Rename(preserved, f.dbPath); err != nil { + t.Fatalf("roll back to preserved DB: %v", err) + } + listing := runCLI(t, "--config", f.configPath, "volumes") + if !strings.Contains(listing, f.volumeName) { + t.Fatalf("rolled-back DB lost the volume row; restore was not reversible:\n%s", listing) + } +} + +// TestCLIDBRestoreClearsLiveSidecarsBeforeRename is the #111b guard: any +// -wal/-shm beside the live DB is moved aside with it before the snapshot +// is renamed in, so no stale WAL can be replayed into the restored +// snapshot. We pre-seed sidecars at the live path and restore with +// --force (the clean-open probe would otherwise checkpoint them away), +// then assert none remain at the live path while the preserved copy +// carries them — proving the clearing happens at the rename, not via the +// probe. +func TestCLIDBRestoreClearsLiveSidecarsBeforeRename(t *testing.T) { + f := writeSyncFixture(t) + + snap := filepath.Join(t.TempDir(), "before.db") + runCLI(t, "--config", f.configPath, "db", "backup", "--to", snap) + + // Force the live DB into existence, then plant stale sidecars beside it. + writeTestFile(t, filepath.Join(f.volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", f.volumeName) + for _, suffix := range []string{"-wal", "-shm"} { + if err := os.WriteFile(f.dbPath+suffix, []byte("stale"), 0o644); err != nil { + t.Fatal(err) + } + } + + out := runCLI(t, "--config", f.configPath, "db", "restore", "--force", snap) + preserved := parsePreservedPath(t, out) + + for _, suffix := range []string{"-wal", "-shm"} { + if _, err := os.Stat(f.dbPath + suffix); !os.IsNotExist(err) { + t.Fatalf("stale %s sidecar still beside live DB after restore (err=%v)", suffix, err) + } + if _, err := os.Stat(preserved + suffix); err != nil { + t.Fatalf("preserved DB missing its %s sidecar: %v", suffix, err) + } + } +} + +// TestPreserveLiveDBRemovesOrphanSidecars guards the case Copilot flagged: +// a missing main DB but lingering -wal/-shm (crash or manual move). The +// sidecars must be cleared so the incoming snapshot can't replay a stale +// WAL, even though there is no main file to move aside. +func TestPreserveLiveDBRemovesOrphanSidecars(t *testing.T) { + dir := t.TempDir() + live := filepath.Join(dir, "index.db") + for _, suffix := range []string{"-wal", "-shm"} { + if err := os.WriteFile(live+suffix, []byte("stale"), 0o644); err != nil { + t.Fatal(err) + } + } + preRestore, err := preserveLiveDB(live) + if err != nil { + t.Fatalf("preserveLiveDB: %v", err) + } + if preRestore != "" { + t.Fatalf("preRestore = %q, want empty (no main DB to preserve)", preRestore) + } + for _, suffix := range []string{"-wal", "-shm"} { + if _, err := os.Stat(live + suffix); !os.IsNotExist(err) { + t.Fatalf("orphan %s sidecar not cleared (err=%v)", suffix, err) + } + } +} + +// TestRollbackLiveDBRestoresPath asserts rollbackLiveDB moves a preserved +// DB and its sidecars back to the live path, so a failed restore leaves +// the live DB where it started. +func TestRollbackLiveDBRestoresPath(t *testing.T) { + dir := t.TempDir() + live := filepath.Join(dir, "index.db") + preRestore := live + ".pre-restore-1" + if err := os.WriteFile(preRestore, []byte("main"), 0o644); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(preRestore+"-wal", []byte("wal"), 0o644); err != nil { + t.Fatal(err) + } + rollbackLiveDB(preRestore, live) + if b, err := os.ReadFile(live); err != nil || string(b) != "main" { + t.Fatalf("live DB not restored: content=%q err=%v", b, err) + } + if b, err := os.ReadFile(live + "-wal"); err != nil || string(b) != "wal" { + t.Fatalf("live -wal not restored: content=%q err=%v", b, err) + } + if _, err := os.Stat(preRestore); !os.IsNotExist(err) { + t.Fatalf("preserved main still present after rollback (err=%v)", err) + } +} + +// parsePreservedPath extracts the path from the +// "preserved prior live DB at <path>" restore output line. +func parsePreservedPath(t *testing.T, out string) string { + t.Helper() + const marker = "preserved prior live DB at " + for _, line := range strings.Split(out, "\n") { + if strings.HasPrefix(line, marker) { + return strings.TrimSpace(strings.TrimPrefix(line, marker)) + } + } + t.Fatalf("restore output did not report a preserved path:\n%s", out) + return "" +} + // TestCLIDBRestoreRejectsSchemaMismatch covers the safety property: // a snapshot whose schema_version differs from the binary is refused. // We simulate this by feeding a file that isn't a squirrel DB at all. @@ -109,14 +243,14 @@ func TestCLIDBRestoreRejectsSchemaMismatch(t *testing.T) { // TestCLIDBSchemaPrintsDDL confirms `squirrel db schema` dumps the // opened database's DDL, including the invariants the schema enforces: -// the foundational volumes table, the blake3-immutability trigger, and -// the one-live-row-per-path partial unique index. +// the foundational volumes table, the content entity table, and the +// one-live-row-per-path partial unique index. func TestCLIDBSchemaPrintsDDL(t *testing.T) { f := writeSyncFixture(t) out := runCLI(t, "--config", f.configPath, "db", "schema") for _, want := range []string{ "CREATE TABLE volumes", - "CREATE TRIGGER files_blake3_immutable", + "CREATE TABLE contents", "uniq_files_live_per_path", } { if !strings.Contains(out, want) { diff --git a/cmd/squirrel/offload.go b/cmd/squirrel/offload.go new file mode 100644 index 0000000..0f5a5e7 --- /dev/null +++ b/cmd/squirrel/offload.go @@ -0,0 +1,151 @@ +package main + +import ( + "fmt" + "io" + "regexp" + "strconv" + "time" + + "github.com/spf13/cobra" + + "github.com/mbertschler/squirrel/offload" +) + +// newOffloadCmd returns the `squirrel offload <volume> [path...]` cobra +// command: delete the local bytes of files whose content is provably +// durable on every target the volume's offload_requires policy names. +// Paths are volume-relative files or directory prefixes; --older-than +// narrows by indexed mtime and combines with paths. At least one of the +// two selectors is required, and a volume without an offload_requires +// policy is refused outright. +func newOffloadCmd() *cobra.Command { + var ( + olderThan string + dryRun bool + ) + cmd := &cobra.Command{ + Use: "offload <volume> [path...]", + Short: "Delete local bytes whose content is durable on every required target", + Args: cobra.MinimumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + return runOffload(cmd, args[0], args[1:], olderThan, dryRun) + }, + } + cmd.Flags().StringVar(&olderThan, "older-than", "", "only files whose indexed mtime is older than this duration (Go durations like 720h, or whole days like 90d)") + cmd.Flags().BoolVar(&dryRun, "dry-run", false, "print the per-file durability gate decisions without deleting anything") + return cmd +} + +func runOffload(cmd *cobra.Command, volumeName string, paths []string, olderThanStr string, dryRun bool) error { + cfg, err := requireConfig(cmd) + if err != nil { + return err + } + vol, ok := cfg.Volumes[volumeName] + if !ok { + return fmt.Errorf("unknown volume %q (declare it in %s)", volumeName, cfg.Path) + } + if len(vol.OffloadRequires) == 0 { + return fmt.Errorf("volume %q has no offload_requires policy in %s; offload refuses to delete without an explicit list of required targets", volumeName, cfg.Path) + } + olderThan, err := parseOlderThan(olderThanStr) + if err != nil { + return err + } + + s, err := openStore(cmd, cfg) + if err != nil { + return err + } + defer s.Close() + + rep, err := offload.Offload(cmd.Context(), s, vol.Path, offload.Options{ + Name: volumeName, + Paths: paths, + OlderThan: olderThan, + Require: vol.OffloadRequires, + DryRun: dryRun, + }) + printOffloadReport(cmd.OutOrStdout(), cmd.ErrOrStderr(), rep, dryRun) + if err != nil { + return err + } + if rep.Errors > 0 { + return fmt.Errorf("%d file(s) failed during offload", rep.Errors) + } + return nil +} + +// olderThanDaysRE accepts the whole-day shorthand (e.g. "90d") that +// time.ParseDuration lacks. +var olderThanDaysRE = regexp.MustCompile(`^(\d+)d$`) + +// parseOlderThan parses the --older-than flag: empty means no age +// selector, "<n>d" means n whole days, anything else must parse as a +// positive Go duration. +func parseOlderThan(s string) (time.Duration, error) { + if s == "" { + return 0, nil + } + var d time.Duration + if m := olderThanDaysRE.FindStringSubmatch(s); m != nil { + days, err := strconv.Atoi(m[1]) + if err != nil { + return 0, fmt.Errorf("--older-than %q: %w", s, err) + } + d = time.Duration(days) * 24 * time.Hour + } else { + var err error + d, err = time.ParseDuration(s) + if err != nil { + return 0, fmt.Errorf("--older-than %q: %w (use a Go duration like 720h, or whole days like 90d)", s, err) + } + } + if d <= 0 { + return 0, fmt.Errorf("--older-than must be a positive duration, got %s", s) + } + return d, nil +} + +// printOffloadReport renders the per-file lines and the summary. Skips +// (gate failures, drift) go to stdout — they are decisions, part of the +// normal report — while selector warnings and per-file errors go to +// stderr. +func printOffloadReport(out, errOut io.Writer, rep offload.Report, dryRun bool) { + for _, miss := range rep.SelectorMisses { + fmt.Fprintf(errOut, "warning: selector %q matched no present files\n", miss) + } + verb := "offloaded" + if dryRun { + verb = "would offload" + } + for _, r := range rep.Results { + switch r.Outcome { + case offload.OutcomeOffloaded: + fmt.Fprintf(out, "%s %s\n", verb, r.Path) + case offload.OutcomeNotDurable: + fmt.Fprintf(out, "skipped %s: not durable\n", r.Path) + for _, reason := range r.Reasons { + fmt.Fprintf(out, " %s\n", reason) + } + case offload.OutcomeDrift: + fmt.Fprintf(out, "skipped %s: disk differs from index: %s\n", r.Path, r.Reasons[0]) + case offload.OutcomeError: + fmt.Fprintf(errOut, "error %s: %s\n", r.Path, r.Reasons[0]) + } + } + if rep.FinishErr != nil { + fmt.Fprintf(errOut, "warning: failed to record terminal run state: %v\n", rep.FinishErr) + } + prefix := "" + if dryRun { + prefix = "(dry-run) " + } + fmt.Fprintf(out, "%soffloaded=%d not_durable=%d drift=%d errors=%d", prefix, + rep.Offloaded, rep.NotDurable, rep.Drift, rep.Errors) + if rep.RunID != 0 { + fmt.Fprintf(out, " run=%d", rep.RunID) + } + fmt.Fprintln(out) +} diff --git a/cmd/squirrel/offload_test.go b/cmd/squirrel/offload_test.go new file mode 100644 index 0000000..5cb5787 --- /dev/null +++ b/cmd/squirrel/offload_test.go @@ -0,0 +1,184 @@ +package main + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + "testing" + "time" + + "github.com/mbertschler/squirrel/store" +) + +// writeOffloadConfig builds a config with one `pics` volume whose +// offload policy requires the listed targets, returning the fixture +// paths plus the volume directory. +func writeOffloadConfig(t *testing.T, requires []string) (configFixture, string) { + t.Helper() + dir := t.TempDir() + volumeDir := filepath.Join(dir, "pics") + if err := os.MkdirAll(volumeDir, 0o755); err != nil { + t.Fatalf("mkdir volume: %v", err) + } + dbPath := filepath.Join(dir, "index.db") + configPath := filepath.Join(dir, "config.toml") + var body strings.Builder + fmt.Fprintf(&body, "db = %q\n\n[volumes.pics]\npath = %q\n", dbPath, volumeDir) + if len(requires) != 0 { + fmt.Fprintf(&body, "offload_requires = [") + for i, r := range requires { + if i > 0 { + fmt.Fprint(&body, ", ") + } + fmt.Fprintf(&body, "%q", r) + } + fmt.Fprintln(&body, "]") + } + if err := os.WriteFile(configPath, []byte(body.String()), 0o600); err != nil { + t.Fatalf("write config: %v", err) + } + return configFixture{configPath: configPath, dbPath: dbPath}, volumeDir +} + +// seedOffloadEvidence opens the fixture DB directly and records the +// durability a verified whole-volume push leaves: a content-verified +// (blake3) vector component for the self node at the file's introduction +// run, plus a successful kind='sync' run that advances the freshness +// watermark past the file's became-present run — the same evidence the +// destination handlers leave behind. +func seedOffloadEvidence(t *testing.T, dbPath, relPath string, targets []string) { + t.Helper() + ctx := context.Background() + s, err := store.Open(dbPath) + if err != nil { + t.Fatalf("store.Open: %v", err) + } + defer s.Close() + v, err := s.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + row, err := s.GetByPath(ctx, v.ID, relPath) + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + for _, target := range targets { + if err := s.UpsertDestinationRunIDVerified(ctx, v.ID, target, self.ID, row.FirstSeenRunID, store.VerifyMethodBlake3, false); err != nil { + t.Fatalf("UpsertDestinationRunID(%s): %v", target, err) + } + id, blocker, err := s.BeginSyncRunIfClear(ctx, store.SyncRunSpec{VolumeID: v.ID, Destination: target}) + if err != nil || blocker != nil { + t.Fatalf("BeginSyncRunIfClear(%s): err=%v blocker=%+v", target, err, blocker) + } + if err := s.FinishRun(ctx, id, store.RunStatusSuccess, "", 0); err != nil { + t.Fatalf("FinishRun(%s): %v", target, err) + } + } +} + +func TestCLIOffloadRefusesWithoutPolicy(t *testing.T) { + f, volumeDir := writeOffloadConfig(t, nil) + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", "pics") + + out, err := runCLIExpectErr(t, "--config", f.configPath, "offload", "pics", ".") + if !strings.Contains(err.Error(), "offload_requires") { + t.Fatalf("err = %v (output %s), want offload_requires refusal", err, out) + } + if _, statErr := os.Stat(filepath.Join(volumeDir, "a.txt")); statErr != nil { + t.Fatalf("a.txt should be untouched: %v", statErr) + } +} + +func TestCLIOffloadHappyPath(t *testing.T) { + f, volumeDir := writeOffloadConfig(t, []string{"vault"}) + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", "pics") + seedOffloadEvidence(t, f.dbPath, "a.txt", []string{"vault"}) + + out := runCLI(t, "--config", f.configPath, "offload", "pics", ".") + if !strings.Contains(out, "offloaded a.txt") || !strings.Contains(out, "offloaded=1 not_durable=0 drift=0 errors=0") { + t.Fatalf("unexpected output:\n%s", out) + } + if _, err := os.Stat(filepath.Join(volumeDir, "a.txt")); err == nil { + t.Fatalf("a.txt should be deleted") + } +} + +func TestCLIOffloadDryRun(t *testing.T) { + f, volumeDir := writeOffloadConfig(t, []string{"vault"}) + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", "pics") + seedOffloadEvidence(t, f.dbPath, "a.txt", []string{"vault"}) + + out := runCLI(t, "--config", f.configPath, "offload", "pics", ".", "--dry-run") + if !strings.Contains(out, "would offload a.txt") || !strings.Contains(out, "(dry-run) offloaded=1") { + t.Fatalf("unexpected output:\n%s", out) + } + if _, err := os.Stat(filepath.Join(volumeDir, "a.txt")); err != nil { + t.Fatalf("a.txt should survive a dry-run: %v", err) + } +} + +func TestCLIOffloadReportsGateFailures(t *testing.T) { + f, volumeDir := writeOffloadConfig(t, []string{"vault", "second"}) + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", "pics") + seedOffloadEvidence(t, f.dbPath, "a.txt", []string{"vault"}) + + out := runCLI(t, "--config", f.configPath, "offload", "pics", ".") + if !strings.Contains(out, "skipped a.txt: not durable") || + !strings.Contains(out, "second: missing component for origin") || + !strings.Contains(out, "offloaded=0 not_durable=1") { + t.Fatalf("unexpected output:\n%s", out) + } + if _, err := os.Stat(filepath.Join(volumeDir, "a.txt")); err != nil { + t.Fatalf("a.txt should be untouched: %v", err) + } +} + +func TestCLIOffloadRejectsBadOlderThan(t *testing.T) { + f, volumeDir := writeOffloadConfig(t, []string{"vault"}) + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + runCLI(t, "--config", f.configPath, "index", "pics") + + _, err := runCLIExpectErr(t, "--config", f.configPath, "offload", "pics", "--older-than", "soon") + if !strings.Contains(err.Error(), "--older-than") { + t.Fatalf("err = %v, want --older-than parse error", err) + } +} + +func TestParseOlderThan(t *testing.T) { + cases := []struct { + in string + want time.Duration + wantErr bool + }{ + {"", 0, false}, + {"90d", 90 * 24 * time.Hour, false}, + {"720h", 720 * time.Hour, false}, + {"1h30m", 90 * time.Minute, false}, + {"0d", 0, true}, + {"-5h", 0, true}, + {"soon", 0, true}, + {"5", 0, true}, + } + for _, c := range cases { + got, err := parseOlderThan(c.in) + if c.wantErr { + if err == nil { + t.Fatalf("parseOlderThan(%q) = %v, want error", c.in, got) + } + continue + } + if err != nil || got != c.want { + t.Fatalf("parseOlderThan(%q) = (%v, %v), want %v", c.in, got, err, c.want) + } + } +} diff --git a/cmd/squirrel/peer_sync.go b/cmd/squirrel/peer_sync.go index b070fcb..2c15166 100644 --- a/cmd/squirrel/peer_sync.go +++ b/cmd/squirrel/peer_sync.go @@ -5,19 +5,21 @@ import ( ) // newPeerSyncCmd returns the `squirrel peer-sync` parent command. It is a -// namespace for the node-sync forensic subcommands; on its own it has no +// namespace for the node-sync auxiliary subcommands; on its own it has no // behaviour and prints help. Today it carries `history` (the append-only -// watermark transition log added with SAFETY-AUDIT H6); future per-peer -// inspection verbs belong here too rather than at the top level. +// watermark transition log added with SAFETY-AUDIT H6) and +// `pull-durability` (the standalone durability metadata pull); future +// per-peer verbs belong here too rather than at the top level. func newPeerSyncCmd() *cobra.Command { cmd := &cobra.Command{ Use: "peer-sync", - Short: "Inspect node-sync state and history", + Short: "Inspect node-sync state and exchange peer metadata", Args: cobra.NoArgs, RunE: func(cmd *cobra.Command, _ []string) error { return cmd.Help() }, } cmd.AddCommand(newPeerSyncHistoryCmd()) + cmd.AddCommand(newPeerSyncPullDurabilityCmd()) return cmd } diff --git a/cmd/squirrel/peer_sync_history_test.go b/cmd/squirrel/peer_sync_history_test.go index 4929b1a..1bd0f5d 100644 --- a/cmd/squirrel/peer_sync_history_test.go +++ b/cmd/squirrel/peer_sync_history_test.go @@ -67,7 +67,7 @@ func seedPeerSyncHistory(t *testing.T, dbPath, volumeName, peerName string, wate if err != nil { t.Fatalf("look up volume %q: %v", volumeName, err) } - peer, err := s.GetOrCreatePeerNode(ctx, peerName, "http://nas.example") + peer, err := s.GetOrCreatePeerNode(ctx, peerName, "http://nas.example", true) if err != nil { t.Fatalf("create peer %q: %v", peerName, err) } diff --git a/cmd/squirrel/peer_sync_pull_durability.go b/cmd/squirrel/peer_sync_pull_durability.go new file mode 100644 index 0000000..31679fc --- /dev/null +++ b/cmd/squirrel/peer_sync_pull_durability.go @@ -0,0 +1,76 @@ +package main + +import ( + "fmt" + "io" + + "github.com/spf13/cobra" + + "github.com/mbertschler/squirrel/sync" +) + +// newPeerSyncPullDurabilityCmd returns the `squirrel peer-sync +// pull-durability <volume> <peer>` subcommand: a standalone run of the +// metadata-only durability pull that also fires automatically after a +// successful node sync. It fetches the peer's destination durability +// vectors for the volume and merges them into the local +// destination_run_ids — monotonic, with refused rewinds reported and an +// --allow-rewind override mirroring the watermark store's opt-in. +func newPeerSyncPullDurabilityCmd() *cobra.Command { + var allowRewind bool + cmd := &cobra.Command{ + Use: "pull-durability <volume> <peer>", + Short: "Fetch a peer's destination durability vectors for a volume into the local index", + Args: cobra.ExactArgs(2), + RunE: func(cmd *cobra.Command, args []string) error { + return runPeerSyncPullDurability(cmd, args[0], args[1], allowRewind) + }, + } + cmd.Flags().BoolVar(&allowRewind, "allow-rewind", false, + "accept peer components below the locally recorded value (recovery override)") + return cmd +} + +func runPeerSyncPullDurability(cmd *cobra.Command, volumeName, peerName string, allowRewind bool) error { + cfg, err := requireConfig(cmd) + if err != nil { + return err + } + vol, ok := cfg.Volumes[volumeName] + if !ok { + return fmt.Errorf("unknown volume %q", volumeName) + } + node, ok := cfg.Nodes[peerName] + if !ok { + return fmt.Errorf("unknown node %q", peerName) + } + s, err := openStore(cmd, cfg) + if err != nil { + return err + } + defer s.Close() + + rep, err := sync.PullDurability(cmd.Context(), s, vol, node, allowRewind) + if err != nil { + return err + } + printDurabilityPull(cmd.OutOrStdout(), rep) + if len(rep.Rewinds) > 0 { + return fmt.Errorf("%d component(s) refused as rewinds; re-run with --allow-rewind to accept the peer's values", len(rep.Rewinds)) + } + return nil +} + +func printDurabilityPull(w io.Writer, rep sync.DurabilityPullReport) { + fmt.Fprintf(w, "%s ← %s fetched=%d applied=%d dropped=%d\n", + rep.Volume, rep.Peer, rep.Fetched, rep.Applied, rep.Dropped) + for _, rw := range rep.Rewinds { + fmt.Fprintf(w, " refused rewind: %s\n", rw) + } + for _, dr := range rep.Drops { + fmt.Fprintf(w, " dropped %s\n", dr) + } + if more := rep.Dropped - len(rep.Drops); more > 0 { + fmt.Fprintf(w, " … and %d more dropped\n", more) + } +} diff --git a/cmd/squirrel/peer_sync_pull_durability_test.go b/cmd/squirrel/peer_sync_pull_durability_test.go new file mode 100644 index 0000000..51f6fec --- /dev/null +++ b/cmd/squirrel/peer_sync_pull_durability_test.go @@ -0,0 +1,159 @@ +package main + +import ( + "context" + "fmt" + "net/http/httptest" + "os" + "path/filepath" + "strings" + "testing" + + "github.com/mbertschler/squirrel/agent" + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// pullDurabilityFixture wires an in-process receiver agent (own store, +// seeded destination vector) and an initiator config whose [nodes.nas] +// block dials the httptest URL, so the CLI command runs end-to-end +// without a network listener. +type pullDurabilityFixture struct { + configPath string + dbPath string + recvStore *store.Store +} + +func newPullDurabilityFixture(t *testing.T) pullDurabilityFixture { + t.Helper() + ctx := context.Background() + root := t.TempDir() + + recvVolPath := filepath.Join(root, "recv", "pics") + if err := os.MkdirAll(recvVolPath, 0o755); err != nil { + t.Fatalf("mkdir receiver volume: %v", err) + } + recvStore, err := store.OpenWithOptions(filepath.Join(root, "recv.db"), store.OpenOptions{NodeName: "nas"}) + if err != nil { + t.Fatalf("open receiver store: %v", err) + } + t.Cleanup(func() { _ = recvStore.Close() }) + srv, err := agent.New(agent.Config{ + Listen: "127.0.0.1:0", + Token: "test-token", + Version: "test", + Volumes: map[string]*config.Volume{"pics": {Name: "pics", Path: recvVolPath}}, + }, recvStore) + if err != nil { + t.Fatalf("agent.New: %v", err) + } + ts := httptest.NewServer(srv.Handler()) + t.Cleanup(ts.Close) + + v, err := recvStore.CreateVolume(ctx, "pics", recvVolPath) + if err != nil { + t.Fatalf("seed receiver volume: %v", err) + } + self, err := recvStore.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + if err := recvStore.UpsertDestinationRunID(ctx, v.ID, "offsite-a", self.ID, 12, false); err != nil { + t.Fatalf("seed receiver component: %v", err) + } + + srcVol := filepath.Join(root, "src") + if err := os.MkdirAll(srcVol, 0o755); err != nil { + t.Fatalf("mkdir source volume: %v", err) + } + writeTestFile(t, filepath.Join(srcVol, "a.txt"), "alpha") + dbPath := filepath.Join(root, "index.db") + configPath := filepath.Join(root, "config.toml") + body := fmt.Sprintf(`db = %q + +[volumes.pics] +path = %q +offload_requires = ["offsite-a"] + +[nodes.nas] +endpoint = %q +path = %q +auth = { bearer = "test-token" } +`, dbPath, srcVol, ts.URL, filepath.Join(root, "recv")) + if err := os.WriteFile(configPath, []byte(body), 0o600); err != nil { + t.Fatalf("write config: %v", err) + } + return pullDurabilityFixture{configPath: configPath, dbPath: dbPath, recvStore: recvStore} +} + +// TestCLIPeerSyncPullDurability drives the standalone pull end-to-end: +// the peer's component lands in the local destination_run_ids and the +// summary line reports it. A locally higher component then makes the +// re-pull fail with a rewind refusal, and --allow-rewind accepts it. +func TestCLIPeerSyncPullDurability(t *testing.T) { + f := newPullDurabilityFixture(t) + ctx := context.Background() + runCLI(t, "--config", f.configPath, "index", "pics") + + out := runCLI(t, "--config", f.configPath, "peer-sync", "pull-durability", "pics", "nas") + if !strings.Contains(out, "fetched=1 applied=1") { + t.Fatalf("output = %q, want fetched=1 applied=1", out) + } + + s, err := store.Open(f.dbPath) + if err != nil { + t.Fatalf("open local store: %v", err) + } + v, err := s.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("local volume: %v", err) + } + origin, err := s.GetNodeByName(ctx, "nas") + if err != nil { + t.Fatalf("local origin row for nas: %v", err) + } + got, err := s.GetDestinationRunID(ctx, v.ID, "offsite-a", origin.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 12 { + t.Fatalf("offsite-a component = %d, want 12", got.OriginRunID) + } + // Raise the local floor above the peer's value, then close so the + // CLI invocations below see the row (the store serialises on one + // connection). + if err := s.UpsertDestinationRunID(ctx, v.ID, "offsite-a", origin.ID, 20, false); err != nil { + t.Fatalf("raise local floor: %v", err) + } + if err := s.Close(); err != nil { + t.Fatalf("close local store: %v", err) + } + + out, err = runCLIExpectErr(t, "--config", f.configPath, "peer-sync", "pull-durability", "pics", "nas") + if !strings.Contains(err.Error(), "--allow-rewind") { + t.Fatalf("err = %v, want a pointer at --allow-rewind", err) + } + if !strings.Contains(out, "refused rewind") { + t.Fatalf("output = %q, want a refused-rewind line", out) + } + + out = runCLI(t, "--config", f.configPath, "peer-sync", "pull-durability", "pics", "nas", "--allow-rewind") + if !strings.Contains(out, "fetched=1 applied=1") { + t.Fatalf("override output = %q, want fetched=1 applied=1", out) + } +} + +// TestCLIPeerSyncPullDurabilityUnknownNames: unknown volume and node +// names fail fast with the same diagnostics the sync pairing uses. +func TestCLIPeerSyncPullDurabilityUnknownNames(t *testing.T) { + f := newPullDurabilityFixture(t) + + _, err := runCLIExpectErr(t, "--config", f.configPath, "peer-sync", "pull-durability", "ghost", "nas") + if !strings.Contains(err.Error(), `unknown volume "ghost"`) { + t.Fatalf("err = %v, want unknown volume", err) + } + _, err = runCLIExpectErr(t, "--config", f.configPath, "peer-sync", "pull-durability", "pics", "ghost") + if !strings.Contains(err.Error(), `unknown node "ghost"`) { + t.Fatalf("err = %v, want unknown node", err) + } +} diff --git a/cmd/squirrel/query.go b/cmd/squirrel/query.go index fd119e9..ea3f2de 100644 --- a/cmd/squirrel/query.go +++ b/cmd/squirrel/query.go @@ -39,7 +39,7 @@ func newQueryCmd() *cobra.Command { } defer s.Close() - filter, err := resolveSourceFilter(cmd, s, fromNode) + filter, err := resolveOriginFilter(cmd, s, fromNode) if err != nil { return err } @@ -60,7 +60,7 @@ func newQueryCmd() *cobra.Command { return queryArg(cmd, s, args[0], history, filter) } if filter.active { - return queryBySource(cmd, s, filter) + return queryByOrigin(cmd, s, filter) } return errors.New("query requires <hash>, <path>, --duplicates, --missing, or --from") } @@ -69,53 +69,54 @@ func newQueryCmd() *cobra.Command { cmd.Flags().BoolVar(&duplicates, "duplicates", false, "list hashes that appear at more than one path") cmd.Flags().BoolVar(&missing, "missing", false, "list previously-indexed paths no longer on disk") cmd.Flags().BoolVar(&history, "history", false, "when querying a path, also print the full content history at that path") - cmd.Flags().StringVar(&fromNode, "from", "", "restrict results to rows whose source_node_id matches this node name (use the self-node name for local writes)") + cmd.Flags().StringVar(&fromNode, "from", "", "restrict results to rows whose content originates at this node (use the self-node name for locally introduced content)") return cmd } -// sourceFilter encodes the result of `--from <name>`. active=false means +// originFilter encodes the result of `--from <name>`. active=false means // no filter; nodeID.Valid==true filters to that node id; nodeID.Valid==false -// (and active==true) means "self / local writes" (source_node_id IS NULL). -type sourceFilter struct { +// (and active==true) means "self / locally introduced" (origin_node_id IS +// NULL). +type originFilter struct { active bool nodeID sql.NullInt64 } -// matches reports whether the row's source_node_id passes the filter. +// matches reports whether the row's origin_node_id passes the filter. // A non-active filter matches everything. -func (f sourceFilter) matches(rowSource sql.NullInt64) bool { +func (f originFilter) matches(rowOrigin sql.NullInt64) bool { if !f.active { return true } if !f.nodeID.Valid { - return !rowSource.Valid + return !rowOrigin.Valid } - return rowSource.Valid && rowSource.Int64 == f.nodeID.Int64 + return rowOrigin.Valid && rowOrigin.Int64 == f.nodeID.Int64 } -// resolveSourceFilter turns the --from <name> argument into a -// sourceFilter. The self-node's name resolves to "match NULL source" -// (local writes); any other named node resolves to that node's id. -// An empty name produces an inactive filter. -func resolveSourceFilter(cmd *cobra.Command, s *store.Store, name string) (sourceFilter, error) { +// resolveOriginFilter turns the --from <name> argument into an +// originFilter. The self-node's name resolves to "match NULL origin" +// (locally introduced content); any other named node resolves to that +// node's id. An empty name produces an inactive filter. +func resolveOriginFilter(cmd *cobra.Command, s *store.Store, name string) (originFilter, error) { if name == "" { - return sourceFilter{}, nil + return originFilter{}, nil } self, err := s.GetSelfNode(cmd.Context()) if err != nil { - return sourceFilter{}, fmt.Errorf("lookup self node: %w", err) + return originFilter{}, fmt.Errorf("lookup self node: %w", err) } if name == self.Name { - return sourceFilter{active: true}, nil + return originFilter{active: true}, nil } node, err := s.GetNodeByName(cmd.Context(), name) if err != nil { if store.IsNotFound(err) { - return sourceFilter{}, fmt.Errorf("no node named %q (use the self-node name %q for local writes)", name, self.Name) + return originFilter{}, fmt.Errorf("no node named %q (use the self-node name %q for locally introduced content)", name, self.Name) } - return sourceFilter{}, fmt.Errorf("lookup node %q: %w", name, err) + return originFilter{}, fmt.Errorf("lookup node %q: %w", name, err) } - return sourceFilter{active: true, nodeID: sql.NullInt64{Int64: node.ID, Valid: true}}, nil + return originFilter{active: true, nodeID: sql.NullInt64{Int64: node.ID, Valid: true}}, nil } // queryArg disambiguates between a path lookup and a hex digest lookup. A @@ -124,7 +125,7 @@ func resolveSourceFilter(cmd *cobra.Command, s *store.Store, name string) (sourc // protects content-addressed workloads where filenames are themselves hex. // withHistory is only meaningful for path queries — hash lookups already // list every row for the digest. -func queryArg(cmd *cobra.Command, s *store.Store, arg string, withHistory bool, filter sourceFilter) error { +func queryArg(cmd *cobra.Command, s *store.Store, arg string, withHistory bool, filter originFilter) error { if !looksLikePath(arg) && isHashLike(arg) { if withHistory { return errors.New("--history applies to path queries, not hash queries (hash queries already list every row)") @@ -142,7 +143,7 @@ func looksLikePath(arg string) bool { return err == nil } -func queryByHash(cmd *cobra.Command, s *store.Store, hexDigest string, filter sourceFilter) error { +func queryByHash(cmd *cobra.Command, s *store.Store, hexDigest string, filter originFilter) error { digest, err := hex.DecodeString(hexDigest) if err != nil { return fmt.Errorf("decode hash: %w", err) @@ -154,7 +155,7 @@ func queryByHash(cmd *cobra.Command, s *store.Store, hexDigest string, filter so out := cmd.OutOrStdout() var any bool for _, r := range rows { - if !filter.matches(r.File.SourceNodeID) { + if !filter.matches(r.File.OriginNodeID) { continue } fmt.Fprintf(out, "%s\t%s\t%d\n", r.File.Status, joinVolumePath(r.Volume.Path, r.File.Path), r.File.SizeBytes) @@ -169,7 +170,7 @@ func queryByHash(cmd *cobra.Command, s *store.Store, hexDigest string, filter so return nil } -func queryByPath(cmd *cobra.Command, s *store.Store, arg string, withHistory bool, filter sourceFilter) error { +func queryByPath(cmd *cobra.Command, s *store.Store, arg string, withHistory bool, filter originFilter) error { absPath, err := filepath.Abs(arg) if err != nil { return err @@ -181,7 +182,7 @@ func queryByPath(cmd *cobra.Command, s *store.Store, arg string, withHistory boo } return err } - if !filter.matches(fv.File.SourceNodeID) { + if !filter.matches(fv.File.OriginNodeID) { return fmt.Errorf("row at %s does not match --from filter", absPath) } out := cmd.OutOrStdout() @@ -227,7 +228,7 @@ func printPathHistory(cmd *cobra.Command, s *store.Store, volumeID int64, relPat return tw.Flush() } -func queryDuplicates(cmd *cobra.Command, s *store.Store, filter sourceFilter) error { +func queryDuplicates(cmd *cobra.Command, s *store.Store, filter originFilter) error { rows, err := s.ListDuplicates(cmd.Context()) if err != nil { return err @@ -235,7 +236,7 @@ func queryDuplicates(cmd *cobra.Command, s *store.Store, filter sourceFilter) er out := cmd.OutOrStdout() var lastHex string for _, r := range rows { - if !filter.matches(r.File.SourceNodeID) { + if !filter.matches(r.File.OriginNodeID) { continue } h := hex.EncodeToString(r.File.Blake3) @@ -251,14 +252,14 @@ func queryDuplicates(cmd *cobra.Command, s *store.Store, filter sourceFilter) er return nil } -func queryMissing(cmd *cobra.Command, s *store.Store, filter sourceFilter) error { +func queryMissing(cmd *cobra.Command, s *store.Store, filter originFilter) error { rows, err := s.ListMissing(cmd.Context()) if err != nil { return err } out := cmd.OutOrStdout() for _, r := range rows { - if !filter.matches(r.File.SourceNodeID) { + if !filter.matches(r.File.OriginNodeID) { continue } fmt.Fprintf(out, "%s\t%s\n", hex.EncodeToString(r.File.Blake3), joinVolumePath(r.Volume.Path, r.File.Path)) @@ -266,19 +267,19 @@ func queryMissing(cmd *cobra.Command, s *store.Store, filter sourceFilter) error return nil } -// queryBySource lists every present row across volumes whose source -// matches the filter — the bare `--from <name>` case with no +// queryByOrigin lists every present row across volumes whose content +// origin matches the filter — the bare `--from <name>` case with no // positional, duplicates, or missing flag. The underlying -// ListPresentBySource is per-volume so we iterate (today's volume +// ListPresentByOrigin is per-volume so we iterate (today's volume // counts are small); cross-volume widening is out of scope for #15. -func queryBySource(cmd *cobra.Command, s *store.Store, filter sourceFilter) error { +func queryByOrigin(cmd *cobra.Command, s *store.Store, filter originFilter) error { vols, err := s.ListVolumes(cmd.Context()) if err != nil { return fmt.Errorf("list volumes: %w", err) } out := cmd.OutOrStdout() for _, v := range vols { - for row, err := range s.ListPresentBySource(cmd.Context(), v.ID, filter.nodeID) { + for row, err := range s.ListPresentByOrigin(cmd.Context(), v.ID, filter.nodeID) { if err != nil { return err } diff --git a/cmd/squirrel/restore.go b/cmd/squirrel/restore.go index 82ea065..deab82e 100644 --- a/cmd/squirrel/restore.go +++ b/cmd/squirrel/restore.go @@ -18,7 +18,7 @@ import ( // is there on a hash mismatch. Use --to to point at a scratch directory. // --from accepts either a destination name (pick which bucket to pull // from when the volume has multiple) or a node name (filter to paths -// whose source_node_id matches that node — handy on a receiver that +// whose content originates at that node — handy on a receiver that // holds files pushed by multiple peers). Destination and node names // share one namespace (config validation enforces uniqueness), so the // argument disambiguates by lookup, with no extra flag needed. @@ -43,7 +43,7 @@ func newRestoreCmd() *cobra.Command { }) }, } - cmd.Flags().StringVar(&from, "from", "", "destination name to pull from, or peer node name to filter source attribution (overloaded; names are unique across both kinds)") + cmd.Flags().StringVar(&from, "from", "", "destination name to pull from, or peer node name to filter by content origin (overloaded; names are unique across both kinds)") cmd.Flags().StringVar(&to, "to", "", "local target path (default: the volume's declared path)") cmd.Flags().BoolVar(&shallow, "shallow", false, "skip BLAKE3 verification on the way down") cmd.Flags().BoolVar(&dryRun, "dry-run", false, "preview rclone actions without transferring") @@ -67,13 +67,13 @@ func runRestore(cmd *cobra.Command, volumeName, fromName string, opts sync.Resto } defer s.Close() - dest, sourceNode, err := resolveRestoreTarget(cmd, s, cfg, vol, fromName) + dest, originNode, err := resolveRestoreTarget(cmd, s, cfg, vol, fromName) if err != nil { return err } - if sourceNode.active { - includeFile, cleanup, err := writeRestorePathFilter(cmd, s, vol, sourceNode.nodeID) + if originNode.active { + includeFile, cleanup, err := writeRestorePathFilter(cmd, s, vol, originNode.nodeID) if err != nil { return err } @@ -92,7 +92,7 @@ func runRestore(cmd *cobra.Command, volumeName, fromName string, opts sync.Resto if opts.Shallow { fmt.Fprintln(out, shallowSyncWarning) } - if err := sync.EnsureMinVersion(cmd.Context(), rcl, out, opts.Shallow); err != nil { + if err := sync.EnsureMinVersion(cmd.Context(), rcl, out, sync.EffectiveShallow(dest, opts.Shallow)); err != nil { return err } if err := writeRcloneConfigLogged(out, rcl, cfg); err != nil { @@ -107,18 +107,18 @@ func runRestore(cmd *cobra.Command, volumeName, fromName string, opts sync.Resto return nil } -// restoreSourceFilter is the resolution of `--from <name>` against the +// restoreOriginFilter is the resolution of `--from <name>` against the // `nodes` table. active=false means the name (if any) is a // destination, not a node; active=true with nodeID.Valid filters to -// that peer; active=true with nodeID zero filters to NULL source -// (local writes). -type restoreSourceFilter struct { +// that peer; active=true with nodeID zero filters to NULL origin +// (locally introduced content). +type restoreOriginFilter struct { active bool nodeID sql.NullInt64 } // resolveRestoreTarget decides what `--from <name>` means: a node name -// (returns a source filter, picks the destination from sync_to), a +// (returns an origin filter, picks the destination from sync_to), a // destination name (returns the destination, no filter), or empty // (auto-picks the destination). // @@ -128,33 +128,33 @@ type restoreSourceFilter struct { // [nodes.X] and [destinations.X], but does not check the top-level // node_name. An honest collision is surfaced rather than picked one // way or the other. -func resolveRestoreTarget(cmd *cobra.Command, s *store.Store, cfg *config.Config, vol *config.Volume, fromName string) (*config.Destination, restoreSourceFilter, error) { +func resolveRestoreTarget(cmd *cobra.Command, s *store.Store, cfg *config.Config, vol *config.Volume, fromName string) (*config.Destination, restoreOriginFilter, error) { if fromName == "" { dest, err := pickSingleRestoreDestination(cfg, vol) - return dest, restoreSourceFilter{}, err + return dest, restoreOriginFilter{}, err } filter, nodeFound, err := lookupNodeFilter(cmd, s, fromName) if err != nil { - return nil, restoreSourceFilter{}, err + return nil, restoreOriginFilter{}, err } destMatch, destOK := cfg.Destinations[fromName] if nodeFound && destOK { - return nil, restoreSourceFilter{}, fmt.Errorf("--from %q is ambiguous: it names both a node (self or peer) and a destination — rename one to disambiguate", fromName) + return nil, restoreOriginFilter{}, fmt.Errorf("--from %q is ambiguous: it names both a node (self or peer) and a destination — rename one to disambiguate", fromName) } if nodeFound { dest, err := pickSingleRestoreDestination(cfg, vol) if err != nil { - return nil, restoreSourceFilter{}, fmt.Errorf("--from %q is a node name; %w", fromName, err) + return nil, restoreOriginFilter{}, fmt.Errorf("--from %q is a node name; %w", fromName, err) } return dest, filter, nil } if destOK { if !slices.Contains(vol.SyncTo, fromName) { - return nil, restoreSourceFilter{}, fmt.Errorf("destination %q is not in sync_to for volume %q", fromName, vol.Name) + return nil, restoreOriginFilter{}, fmt.Errorf("destination %q is not in sync_to for volume %q", fromName, vol.Name) } - return destMatch, restoreSourceFilter{}, nil + return destMatch, restoreOriginFilter{}, nil } - return nil, restoreSourceFilter{}, fmt.Errorf("--from %q matches neither a configured destination nor a known node", fromName) + return nil, restoreOriginFilter{}, fmt.Errorf("--from %q matches neither a configured destination nor a known node", fromName) } // lookupNodeFilter asks the store whether name refers to a node — the @@ -163,22 +163,22 @@ func resolveRestoreTarget(cmd *cobra.Command, s *store.Store, cfg *config.Config // "node lookup itself failed" so the caller can keep the dispatch // flat. A surfaced error reflects an underlying store failure, not a // missing row. -func lookupNodeFilter(cmd *cobra.Command, s *store.Store, name string) (restoreSourceFilter, bool, error) { +func lookupNodeFilter(cmd *cobra.Command, s *store.Store, name string) (restoreOriginFilter, bool, error) { self, err := s.GetSelfNode(cmd.Context()) if err != nil { - return restoreSourceFilter{}, false, fmt.Errorf("lookup self node: %w", err) + return restoreOriginFilter{}, false, fmt.Errorf("lookup self node: %w", err) } if name == self.Name { - return restoreSourceFilter{active: true}, true, nil + return restoreOriginFilter{active: true}, true, nil } node, err := s.GetNodeByName(cmd.Context(), name) if err != nil { if store.IsNotFound(err) { - return restoreSourceFilter{}, false, nil + return restoreOriginFilter{}, false, nil } - return restoreSourceFilter{}, false, fmt.Errorf("lookup node %q: %w", name, err) + return restoreOriginFilter{}, false, fmt.Errorf("lookup node %q: %w", name, err) } - return restoreSourceFilter{active: true, nodeID: sql.NullInt64{Int64: node.ID, Valid: true}}, true, nil + return restoreOriginFilter{active: true, nodeID: sql.NullInt64{Int64: node.ID, Valid: true}}, true, nil } // pickSingleRestoreDestination resolves the destination when --from @@ -204,7 +204,7 @@ func pickSingleRestoreDestination(cfg *config.Config, vol *config.Volume) (*conf // writeRestorePathFilter materialises the path subset implied by // `--from <node>` to a tempfile suitable for rclone's --files-from -// flag. It iterates ListPresentBySource against the volume row in the +// flag. It iterates ListPresentByOrigin against the volume row in the // DB (not the volume from config) so a missing volume row surfaces // before rclone gets invoked. Returns the file path, a cleanup func, // and an error. The cleanup is non-nil even on error so deferring it @@ -223,11 +223,11 @@ func writeRestorePathFilter(cmd *cobra.Command, s *store.Store, vol *config.Volu } cleanup := func() { _ = os.Remove(f.Name()) } var count int - for row, iterErr := range s.ListPresentBySource(cmd.Context(), v.ID, nodeID) { + for row, iterErr := range s.ListPresentByOrigin(cmd.Context(), v.ID, nodeID) { if iterErr != nil { _ = f.Close() cleanup() - return "", func() {}, fmt.Errorf("list present by source: %w", iterErr) + return "", func() {}, fmt.Errorf("list present by origin: %w", iterErr) } if _, err := fmt.Fprintln(f, row.Path); err != nil { _ = f.Close() diff --git a/cmd/squirrel/restore_test.go b/cmd/squirrel/restore_test.go index 478d9d9..b83b67f 100644 --- a/cmd/squirrel/restore_test.go +++ b/cmd/squirrel/restore_test.go @@ -7,6 +7,8 @@ import ( "strings" "testing" + "github.com/zeebo/blake3" + "github.com/mbertschler/squirrel/store" "github.com/mbertschler/squirrel/volmark" ) @@ -123,29 +125,29 @@ func TestCLIRestoreInfersDestinationWhenUnambiguous(t *testing.T) { // TestCLIRestoreFromNodeFiltersByAttribution covers the issue-#15 // acceptance criterion: a receiver-side restore with --from <peer> -// produces a tree containing only that peer's source-attributed -// paths, even when other peers / local writes share the volume. The -// fixture indexes three local files, then re-stamps two of them with -// distinct peer provenance via Upsert(prov). Restoring with -// --from peer-a should land only the from-a path in the target tree. +// produces a tree containing only the paths whose content originates +// at that peer, even when other peers / local writes share the volume. +// The fixture indexes one local file, introduces two peer-origin files +// via Upsert(prov) (origin is recorded on the contents row at first +// introduction), and syncs the tree. Restoring with --from peer-a +// should land only the from-a path in the target tree. func TestCLIRestoreFromNodeFiltersByAttribution(t *testing.T) { requireRcloneCLI(t) f := writeSyncFixture(t) - writeTestFile(t, filepath.Join(f.volumeDir, "from-a.txt"), "alpha") - writeTestFile(t, filepath.Join(f.volumeDir, "from-b.txt"), "beta") writeTestFile(t, filepath.Join(f.volumeDir, "local.txt"), "local") runCLI(t, "--config", f.configPath, "index", f.volumeName) - runCLI(t, "--config", f.configPath, "sync", "pics") - // Inject peer attribution onto the from-* paths. The destination - // tree was just written by sync, so the rclone-side content is - // unchanged — restore will pull only the path subset we ask for - // via --files-from-raw. The store handle is closed before the - // subsequent runCLI so there's exactly one process holding the - // SQLite file when the CLI runs. + // Introduce the from-* files as peer-origin content before sync + // pushes the tree to the destination. The store handle is closed + // before the subsequent runCLI so there's exactly one process + // holding the SQLite file when the CLI runs. + writeTestFile(t, filepath.Join(f.volumeDir, "from-a.txt"), "alpha") + writeTestFile(t, filepath.Join(f.volumeDir, "from-b.txt"), "beta") stampPeerProvenance(t, f.dbPath) + runCLI(t, "--config", f.configPath, "sync", "pics") + target := filepath.Join(t.TempDir(), "recovered") out := runCLI(t, "--config", f.configPath, "restore", "pics", "--from", "peer-a", "--to", target) if !strings.Contains(out, "status=success") { @@ -163,9 +165,9 @@ func TestCLIRestoreFromNodeFiltersByAttribution(t *testing.T) { } // stampPeerProvenance opens the index DB at dbPath, creates peer-a / -// peer-b nodes plus a sync-kind run for each, then promotes the -// from-{a,b}.txt rows (already indexed under the "pics" volume) to -// the respective peer's source attribution via Upsert. The store +// peer-b nodes plus a sync-kind run for each, then records the +// on-disk from-{a,b}.txt files under the "pics" volume as content +// introduced by the respective peer via Upsert(prov). The store // handle is closed before returning so the next CLI invocation has // no concurrent SQLite connection from this process. func stampPeerProvenance(t *testing.T, dbPath string) { @@ -204,17 +206,23 @@ func stampPeerProvenance(t *testing.T, dbPath string) { t.Fatalf("FinishRun b: %v", err) } for _, c := range []struct { - path string - prov *store.Provenance + path string + runID int64 + prov *store.Provenance }{ - {"from-a.txt", &store.Provenance{NodeID: peerA.ID, RunID: runA}}, - {"from-b.txt", &store.Provenance{NodeID: peerB.ID, RunID: runB}}, + {"from-a.txt", runA, &store.Provenance{NodeID: peerA.ID, RunID: runA}}, + {"from-b.txt", runB, &store.Provenance{NodeID: peerB.ID, RunID: runB}}, } { - row, err := s.GetByPath(ctx, vol.ID, c.path) + data, err := os.ReadFile(filepath.Join(vol.Path, c.path)) if err != nil { - t.Fatalf("GetByPath %s: %v", c.path, err) + t.Fatalf("read %s: %v", c.path, err) } - if err := s.Upsert(ctx, row, c.prov); err != nil { + digest := blake3.Sum256(data) + if err := s.Upsert(ctx, store.FileRow{ + VolumeID: vol.ID, Path: c.path, Blake3: digest[:], + SizeBytes: int64(len(data)), MtimeNs: 1, Status: store.StatusPresent, + FirstSeenRunID: c.runID, LastSeenRunID: c.runID, IndexedAtNs: 1, + }, c.prov); err != nil { t.Fatalf("Upsert %s: %v", c.path, err) } } diff --git a/cmd/squirrel/root.go b/cmd/squirrel/root.go index 3be292f..4bb9def 100644 --- a/cmd/squirrel/root.go +++ b/cmd/squirrel/root.go @@ -48,8 +48,10 @@ func newRootCmd() *cobra.Command { root.AddCommand(newVolumesCmd()) root.AddCommand(newSyncCmd()) root.AddCommand(newRestoreCmd()) + root.AddCommand(newOffloadCmd()) root.AddCommand(newAgentCmd()) root.AddCommand(newAuditCmd()) + root.AddCommand(newVerifyCmd()) root.AddCommand(newPeerSyncCmd()) root.AddCommand(newTUICmd()) root.AddCommand(newDBCmd()) diff --git a/cmd/squirrel/source_test.go b/cmd/squirrel/source_test.go index 998c787..a6b229a 100644 --- a/cmd/squirrel/source_test.go +++ b/cmd/squirrel/source_test.go @@ -7,22 +7,23 @@ import ( "strings" "testing" + "github.com/zeebo/blake3" + "github.com/mbertschler/squirrel/store" ) -// sourceFixture lays out a config with a `pics` volume, indexes a few -// files, then injects synthetic peer-attribution rows directly into -// the store so the read-side CLI can be exercised without a full -// agent round-trip. The peer's name is "peer-a"; the self-node's -// name comes from the config's `node_name = "self-host"`. +// sourceFixture lays out a config with a `pics` volume, indexes one +// local file, then introduces two peer-origin files directly into the +// store (the way a receiver's /close records freshly transferred +// content) so the read-side CLI can be exercised without a full agent +// round-trip. The peers are "peer-a" / "peer-b"; the self-node's name +// comes from the config's `node_name = "self-host"`. type sourceFixture struct { configPath string - dbPath string volumeDir string selfName string peerAName string peerBName string - peerBID int64 localPath string peerAPath string peerBPath string @@ -39,8 +40,6 @@ func writeSourceFixture(t *testing.T) sourceFixture { peerAPath := "from-a.txt" peerBPath := "from-b.txt" writeTestFile(t, filepath.Join(volumeDir, localPath), "local content") - writeTestFile(t, filepath.Join(volumeDir, peerAPath), "from-a content") - writeTestFile(t, filepath.Join(volumeDir, peerBPath), "from-b content") dbPath := filepath.Join(root, "index.db") configPath := filepath.Join(root, "config.toml") @@ -52,24 +51,26 @@ func writeSourceFixture(t *testing.T) sourceFixture { t.Fatal(err) } - // Index once so the DB has rows; provenance is initially NULL for all. + // Index the local file so its content carries a NULL (local) origin. runCLI(t, "--config", configPath, "index", "pics") - // Promote the from-a / from-b rows to peer attribution by re-upserting - // with an explicit Provenance pointer. Upsert preserves blake3 and only - // rewrites the mutable columns. The store handle is opened and closed - // here, before any subsequent runCLI runs, so the test CLI has the DB - // to itself when it opens its own store. - peerBID := stampSourceFixture(t, dbPath, peerAPath, peerBPath) + // Introduce the from-a / from-b files as peer-origin content: bytes + // on disk plus an Upsert with an explicit Provenance, the same shape + // a receiver's /close write has. Origin is recorded on the contents + // row at first introduction, so these files must enter the store via + // the peer write, not via a local index run. The store handle is + // opened and closed here, before any subsequent runCLI runs, so the + // test CLI has the DB to itself when it opens its own store. + writeTestFile(t, filepath.Join(volumeDir, peerAPath), "from-a content") + writeTestFile(t, filepath.Join(volumeDir, peerBPath), "from-b content") + stampSourceFixture(t, dbPath, peerAPath, peerBPath) return sourceFixture{ configPath: configPath, - dbPath: dbPath, volumeDir: volumeDir, selfName: "self-host", peerAName: "peer-a", peerBName: "peer-b", - peerBID: peerBID, localPath: localPath, peerAPath: peerAPath, peerBPath: peerBPath, @@ -77,12 +78,12 @@ func writeSourceFixture(t *testing.T) sourceFixture { } // stampSourceFixture creates peer-a / peer-b nodes plus a sync run -// for each, attributes peerAPath to peer-a and peerBPath to peer-b, -// and returns peer-b's id (used by tests that need to inject further -// rows). The store handle is opened, used, and closed within the -// function so concurrent connections from runCLI calls can't race -// with this fixture phase. -func stampSourceFixture(t *testing.T, dbPath, peerAPath, peerBPath string) int64 { +// for each and records peerAPath as content introduced by peer-a and +// peerBPath by peer-b (hashing the on-disk bytes so a later index run +// re-observes them unchanged). The store handle is opened, used, and +// closed within the function so concurrent connections from runCLI +// calls can't race with this fixture phase. +func stampSourceFixture(t *testing.T, dbPath, peerAPath, peerBPath string) { t.Helper() s, err := store.OpenWithOptions(dbPath, store.OpenOptions{NodeName: "self-host"}) if err != nil { @@ -118,21 +119,26 @@ func stampSourceFixture(t *testing.T, dbPath, peerAPath, peerBPath string) int64 t.Fatalf("FinishRun b: %v", err) } for _, c := range []struct { - path string - prov *store.Provenance + path string + runID int64 + prov *store.Provenance }{ - {peerAPath, &store.Provenance{NodeID: peerA.ID, RunID: peerARun}}, - {peerBPath, &store.Provenance{NodeID: peerB.ID, RunID: peerBRun}}, + {peerAPath, peerARun, &store.Provenance{NodeID: peerA.ID, RunID: peerARun}}, + {peerBPath, peerBRun, &store.Provenance{NodeID: peerB.ID, RunID: peerBRun}}, } { - row, err := s.GetByPath(ctx, vol.ID, c.path) + data, err := os.ReadFile(filepath.Join(vol.Path, c.path)) if err != nil { - t.Fatalf("GetByPath %s: %v", c.path, err) + t.Fatalf("read %s: %v", c.path, err) } - if err := s.Upsert(ctx, row, c.prov); err != nil { + digest := blake3.Sum256(data) + if err := s.Upsert(ctx, store.FileRow{ + VolumeID: vol.ID, Path: c.path, Blake3: digest[:], + SizeBytes: int64(len(data)), MtimeNs: 1, Status: store.StatusPresent, + FirstSeenRunID: c.runID, LastSeenRunID: c.runID, IndexedAtNs: 1, + }, c.prov); err != nil { t.Fatalf("Upsert %s: %v", c.path, err) } } - return peerB.ID } // TestCLIQueryFromNodeListsAttributedRows is the acceptance test for @@ -212,63 +218,27 @@ func TestCLIRunsRendersPeerAndCorrelated(t *testing.T) { } // TestCLIQueryDuplicatesAndFromComposes verifies the post-filter -// against --duplicates: rows that share a hash are filtered down to -// the ones whose source matches. +// against --duplicates under content-level origin: every path sharing +// a hash points at one contents row, so --from <peer> lists all of the +// duplicate paths when the content originates at that peer and none of +// them otherwise. func TestCLIQueryDuplicatesAndFromComposes(t *testing.T) { f := writeSourceFixture(t) - // Add a duplicate of from-a content under a new path attributed to - // peer-b — that creates two hash-matched rows with different - // sources, which is exactly what --from must disambiguate. + // Duplicate the peer-a content under a new path. The new path's row + // resolves to the existing contents row, so it inherits peer-a's + // origin even though a local index run observed it. dupRel := "dup.txt" writeTestFile(t, filepath.Join(f.volumeDir, dupRel), "from-a content") runCLI(t, "--config", f.configPath, "index", "pics") - stampOneRow(t, f.dbPath, f.selfName, dupRel, f.peerBID) - - out := runCLI(t, "--config", f.configPath, "query", "--duplicates", "--from", f.peerBName) - if !strings.Contains(out, dupRel) { - t.Fatalf("duplicates --from peer-b missing %s:\n%s", dupRel, out) - } - if strings.Contains(out, f.peerAPath) { - t.Fatalf("duplicates --from peer-b leaked peer-a row:\n%s", out) - } -} -// stampOneRow attributes a single existing path to the given peer -// node. Same single-handle-at-a-time discipline as the rest of the -// fixture so concurrent runCLI store opens never overlap. -func stampOneRow(t *testing.T, dbPath, selfName, relPath string, peerNodeID int64) { - t.Helper() - s, err := store.OpenWithOptions(dbPath, store.OpenOptions{NodeName: selfName}) - if err != nil { - t.Fatalf("Open: %v", err) - } - defer s.Close() - ctx := context.Background() - vol, err := s.GetVolumeByName(ctx, "pics") - if err != nil { - t.Fatalf("GetVolumeByName: %v", err) - } - row, err := s.GetByPath(ctx, vol.ID, relPath) - if err != nil { - t.Fatalf("GetByPath %s: %v", relPath, err) - } - runID := mustLatestPeerRun(t, s, ctx, peerNodeID) - if err := s.Upsert(ctx, row, &store.Provenance{NodeID: peerNodeID, RunID: runID}); err != nil { - t.Fatalf("Upsert %s: %v", relPath, err) - } -} - -// mustLatestPeerRun returns the highest runs.id for kind='sync' with -// the given peer_node_id. Used to satisfy the Provenance.RunID FK -// when stamping a synthetic row in tests. -func mustLatestPeerRun(t *testing.T, s *store.Store, ctx context.Context, peerNodeID int64) int64 { - t.Helper() - runs, err := s.ListRunsByPeer(ctx, peerNodeID, 1) - if err != nil { - t.Fatalf("ListRunsByPeer: %v", err) + out := runCLI(t, "--config", f.configPath, "query", "--duplicates", "--from", f.peerAName) + for _, want := range []string{f.peerAPath, dupRel} { + if !strings.Contains(out, want) { + t.Fatalf("duplicates --from peer-a missing %s:\n%s", want, out) + } } - if len(runs) == 0 { - t.Fatalf("no peer-sync runs for peer id=%d", peerNodeID) + outB := runCLI(t, "--config", f.configPath, "query", "--duplicates", "--from", f.peerBName) + if strings.Contains(outB, dupRel) || strings.Contains(outB, f.peerAPath) { + t.Fatalf("duplicates --from peer-b leaked peer-a-origin content:\n%s", outB) } - return runs[0].ID } diff --git a/cmd/squirrel/sync.go b/cmd/squirrel/sync.go index bb5c4d1..83a1297 100644 --- a/cmd/squirrel/sync.go +++ b/cmd/squirrel/sync.go @@ -41,7 +41,7 @@ func newSyncCmd() *cobra.Command { cmd.Flags().StringVar(&to, "to", "", "limit to this destination name (default: every destination declared on the volume)") cmd.Flags().BoolVar(&shallow, "shallow", false, "skip BLAKE3 verification; trust rclone's default size+mtime comparison") cmd.Flags().BoolVar(&dryRun, "dry-run", false, "preview rclone actions without transferring; no runs row is written") - cmd.Flags().BoolVar(&initDst, "init", false, "bootstrap a .squirrel-volume marker at the destination on first sync (refused subsequently if the marker mismatches)") + cmd.Flags().BoolVar(&initDst, "init", false, "authorise first-use destination bootstrap: write a .squirrel-volume marker, or create a kopia repository when connect finds none (refused without --init so a typo or outage can't mint a fresh empty target)") return cmd } @@ -64,11 +64,15 @@ func runSync(cmd *cobra.Command, volumeName, destinationName string, opts sync.O if err != nil { return err } + tools, err := sync.ToolsFor(cfg, pairs, rcl) + if err != nil { + return err + } out := cmd.OutOrStdout() if opts.Shallow { fmt.Fprintln(out, shallowSyncWarning) } - if err := sync.EnsureMinVersion(cmd.Context(), rcl, out, opts.Shallow); err != nil { + if err := sync.EnsureMinVersion(cmd.Context(), rcl, out, sync.ShallowForPairs(pairs, opts.Shallow)); err != nil { return err } if err := writeRcloneConfigLogged(out, rcl, cfg); err != nil { @@ -84,7 +88,7 @@ func runSync(cmd *cobra.Command, volumeName, destinationName string, opts sync.O var anyFailed bool for _, p := range pairs { - rep, err := sync.RunPair(cmd.Context(), s, rcl, p, opts) + rep, err := sync.RunPair(cmd.Context(), s, tools, p, opts) printSyncReport(out, rep, err) if err != nil || rep.Status != "success" { anyFailed = true @@ -151,10 +155,31 @@ func printSyncReport(w io.Writer, rep sync.Report, runErr error) { for _, msg := range rep.NodePendingWarnings { fmt.Fprintf(w, "warning: peer reports %s\n", msg) } - fmt.Fprintf(w, "%s → %s status=%s transferred=%d checked=%d errors=%d bytes=%d run=%d\n", - rep.Volume, rep.Destination, rep.Status, - r.Transferred, r.Checked, r.Errors, r.Bytes, rep.RunID, - ) + switch rep.Verification.Method { + case sync.VerifyMethodKopia: + // Kopia pushes have no rclone counters; render the snapshot's + // own numbers instead. + fmt.Fprintf(w, "%s → %s status=%s files=%d bytes=%d snapshot=%s verified=%t run=%d\n", + rep.Volume, rep.Destination, rep.Status, + rep.Verification.Files, rep.Verification.Bytes, + rep.Verification.SnapshotID, rep.Verification.Verified(), rep.RunID, + ) + case sync.VerifyMethodPresenceSize: + // Content-addressed pushes count objects, with skipped = hashes + // the destination already recorded, entries = manifest segment + // lines, and fingerprints = provider checksums captured for the + // fresh uploads. + fmt.Fprintf(w, "%s → %s status=%s objects=%d skipped=%d errors=%d bytes=%d entries=%d fingerprints=%d run=%d\n", + rep.Volume, rep.Destination, rep.Status, + r.Transferred, r.Checked, r.Errors, r.Bytes, + rep.Verification.Files, rep.Fingerprints, rep.RunID, + ) + default: + fmt.Fprintf(w, "%s → %s status=%s transferred=%d checked=%d errors=%d bytes=%d run=%d\n", + rep.Volume, rep.Destination, rep.Status, + r.Transferred, r.Checked, r.Errors, r.Bytes, rep.RunID, + ) + } if rep.NodeReceiverRunID != 0 { fmt.Fprintf(w, " receiver_run=%d matched=%d mismatched=%d missing=%d conflicts=%d\n", rep.NodeReceiverRunID, @@ -166,6 +191,15 @@ func printSyncReport(w io.Writer, rep sync.Report, runErr error) { for _, m := range rep.NodeVerify.Mismatched { fmt.Fprintf(w, " mismatched %s: expected %s, actual %s\n", m.Path, m.ExpectedHex, m.ActualHex) } + if rep.DurabilityPull.Fetched > 0 { + fmt.Fprintf(w, " durability: applied %d/%d peer entries", + rep.DurabilityPull.Applied, rep.DurabilityPull.Fetched) + if rep.DurabilityPull.Dropped > 0 { + fmt.Fprintf(w, " (dropped %d for unconfigured destinations)", + rep.DurabilityPull.Dropped) + } + fmt.Fprintln(w) + } } for _, c := range rep.NodeConflicts { fmt.Fprintf(w, " conflict %s: %s — was %s, now %s\n", diff --git a/cmd/squirrel/sync_test.go b/cmd/squirrel/sync_test.go index 6d331f2..5902cbe 100644 --- a/cmd/squirrel/sync_test.go +++ b/cmd/squirrel/sync_test.go @@ -3,6 +3,7 @@ package main import ( "os" "path/filepath" + "runtime" "strings" "testing" ) @@ -49,6 +50,63 @@ func TestCLISyncUnknownDestinationFlag(t *testing.T) { } } +// TestCLISyncKopiaDestination drives `squirrel sync` against a +// kopia-typed destination through a fake kopia binary on PATH, pinning +// the wiring end to end: PairsFor accepts the destination as a sync_to +// target, the handler runs connect → snapshot create → verify, and the +// per-pair output line renders the snapshot's own numbers. +func TestCLISyncKopiaDestination(t *testing.T) { + requireRcloneCLI(t) + if runtime.GOOS == "windows" { + t.Skip("fake kopia shim is a POSIX shell script") + } + root := t.TempDir() + volumeDir := filepath.Join(root, "pics") + if err := os.MkdirAll(volumeDir, 0o755); err != nil { + t.Fatal(err) + } + writeTestFile(t, filepath.Join(volumeDir, "a.txt"), "alpha") + + binDir := filepath.Join(root, "bin") + if err := os.MkdirAll(binDir, 0o755); err != nil { + t.Fatal(err) + } + shim := "#!/bin/sh\n" + `case "$1 $2" in +"repository connect"|"snapshot verify") exit 0 ;; +"snapshot create") echo '{"id":"snap123","rootEntry":{"summ":{"size":5,"files":1}}}' ;; +*) echo "unexpected kopia subcommand: $*" >&2; exit 64 ;; +esac +` + writeTestFile(t, filepath.Join(binDir, "kopia"), shim) + if err := os.Chmod(filepath.Join(binDir, "kopia"), 0o755); err != nil { + t.Fatal(err) + } + t.Setenv("PATH", binDir+string(os.PathListSeparator)+os.Getenv("PATH")) + + configPath := filepath.Join(root, "config.toml") + writeTestFile(t, configPath, ` +db = "`+filepath.Join(root, "index.db")+`" + +[destinations.mirror] +type = "kopia" +root = "`+filepath.Join(root, "repo")+`" +password = "hunter2" + +[volumes.pics] +path = "`+volumeDir+`" +sync_to = ["mirror"] +`) + + runCLI(t, "--config", configPath, "index", "pics") + out := runCLI(t, "--config", configPath, "sync", "pics") + if !strings.Contains(out, "pics → mirror") || !strings.Contains(out, "status=success") { + t.Fatalf("sync did not report success for the kopia pair:\n%s", out) + } + if !strings.Contains(out, "snapshot=snap123") || !strings.Contains(out, "verified=true") { + t.Fatalf("output missing the kopia snapshot summary:\n%s", out) + } +} + func TestCLISyncRequiresConfig(t *testing.T) { // No config file at the chosen --config path: sync errors with a // pointer to the missing file instead of a generic IO error. diff --git a/cmd/squirrel/verify.go b/cmd/squirrel/verify.go new file mode 100644 index 0000000..15cea5c --- /dev/null +++ b/cmd/squirrel/verify.go @@ -0,0 +1,130 @@ +package main + +import ( + "fmt" + "io" + "sort" + + "github.com/spf13/cobra" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/sync" +) + +// newVerifyCmd returns the `squirrel verify [<destination>]` cobra +// command: re-read the provider checksums of every object recorded on a +// content-addressed destination and compare them against the +// fingerprints captured at upload time. Matches stamp the object +// verified; objects uploaded before fingerprint capture (or whose +// capture failed) get their fingerprint recorded on the first pass. A +// mismatch or a missing object is potential offsite corruption or +// tampering: it is reported per object and fails the command, with the +// destination and the recorded fingerprint left exactly as found so the +// operator inspects the evidence. +func newVerifyCmd() *cobra.Command { + return &cobra.Command{ + Use: "verify [<destination>]", + Short: "Re-check recorded offsite objects against their upload fingerprints", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + destName := "" + if len(args) == 1 { + destName = args[0] + } + return runVerify(cmd, destName) + }, + } +} + +func runVerify(cmd *cobra.Command, destName string) error { + cfg, err := requireConfig(cmd) + if err != nil { + return err + } + names, err := verifyTargetNames(cfg, destName) + if err != nil { + return err + } + s, err := openStore(cmd, cfg) + if err != nil { + return err + } + defer s.Close() + + rcl, err := sync.Find() + if err != nil { + return err + } + out := cmd.OutOrStdout() + if err := writeRcloneConfigLogged(out, rcl, cfg); err != nil { + return err + } + + var anyFailed bool + for _, name := range names { + rep, err := sync.VerifyRemote(cmd.Context(), s, rcl, cfg.Destinations[name]) + printVerifyReport(out, cmd.ErrOrStderr(), rep, err) + if err != nil || !rep.Clean() { + anyFailed = true + } + } + if anyFailed { + return fmt.Errorf("one or more destinations failed verification") + } + return nil +} + +// verifyTargetNames resolves the verification subjects in deterministic +// order: an explicit destination (validated to exist and be +// content-addressed), or every content-addressed destination in config. +func verifyTargetNames(cfg *config.Config, destName string) ([]string, error) { + if destName != "" { + d, ok := cfg.Destinations[destName] + if !ok { + return nil, fmt.Errorf("unknown destination %q (declare it in %s)", destName, cfg.Path) + } + if d.Layout != config.LayoutContentAddressed { + return nil, fmt.Errorf("destination %q has layout %q — verify covers the recorded objects of content-addressed destinations", destName, d.Layout) + } + return []string{destName}, nil + } + var names []string + for name, d := range cfg.Destinations { + if d.Layout == config.LayoutContentAddressed { + names = append(names, name) + } + } + if len(names) == 0 { + return nil, fmt.Errorf("no content-addressed destinations declared in %s", cfg.Path) + } + sort.Strings(names) + return names, nil +} + +// printVerifyReport renders one destination's pass: a loud stderr line +// per missing or mismatched object, then the summary counters. +func printVerifyReport(out, errOut io.Writer, rep sync.RemoteVerifyReport, runErr error) { + for _, hash := range rep.Missing { + fmt.Fprintf(errOut, "error: object %s on %q: recorded as uploaded but absent from the remote\n", hash, rep.Destination) + } + for _, m := range rep.Mismatched { + if m.Actual == "" { + fmt.Fprintf(errOut, "error: object %s on %q: recorded %s %s, but the remote no longer exposes a %s checksum\n", + m.Hash, rep.Destination, m.Algo, m.Recorded, m.Algo) + continue + } + fmt.Fprintf(errOut, "error: object %s on %q: recorded %s %s, remote now reports %s — possible corruption or tampering\n", + m.Hash, rep.Destination, m.Algo, m.Recorded, m.Actual) + } + if runErr != nil { + fmt.Fprintf(errOut, "verify %s: %v\n", rep.Destination, runErr) + return + } + if rep.Objects == 0 { + fmt.Fprintf(out, "verify %s: no recorded objects\n", rep.Destination) + return + } + fmt.Fprintf(out, "verify %s: run=%d objects=%d verified=%d fingerprinted=%d pending=%d mismatched=%d missing=%d unrecorded=%d\n", + rep.Destination, rep.RunID, rep.Objects, rep.Verified, rep.Populated, rep.Pending, + len(rep.Mismatched), len(rep.Missing), rep.Unrecorded) +} diff --git a/cmd/squirrel/verify_test.go b/cmd/squirrel/verify_test.go new file mode 100644 index 0000000..f7483fb --- /dev/null +++ b/cmd/squirrel/verify_test.go @@ -0,0 +1,57 @@ +package main + +import ( + "errors" + "strings" + "testing" + + "github.com/mbertschler/squirrel/sync" +) + +func TestVerifyUnknownDestination(t *testing.T) { + fx := writeSyncFixture(t) + _, err := runCLIExpectErr(t, "verify", "nope", "--config", fx.configPath) + if !strings.Contains(err.Error(), `unknown destination "nope"`) { + t.Fatalf("err = %v, want unknown destination", err) + } +} + +func TestVerifyRefusesMirrorDestination(t *testing.T) { + fx := writeSyncFixture(t) + _, err := runCLIExpectErr(t, "verify", "scratch", "--config", fx.configPath) + if !strings.Contains(err.Error(), "content-addressed") { + t.Fatalf("err = %v, want content-addressed refusal", err) + } +} + +func TestVerifyNoContentAddressedDestinations(t *testing.T) { + fx := writeSyncFixture(t) + _, err := runCLIExpectErr(t, "verify", "--config", fx.configPath) + if !strings.Contains(err.Error(), "no content-addressed destinations") { + t.Fatalf("err = %v, want no-destinations refusal", err) + } +} + +// TestPrintVerifyReportErrorSuppressesSummary: when the pass errored +// before producing object counts, the report shows only the error on +// stderr — never the misleading "no recorded objects" summary. +func TestPrintVerifyReportErrorSuppressesSummary(t *testing.T) { + var out, errOut strings.Builder + printVerifyReport(&out, &errOut, sync.RemoteVerifyReport{Destination: "offsite"}, errors.New("rclone exploded")) + if strings.Contains(out.String(), "no recorded objects") || out.Len() != 0 { + t.Fatalf("stdout = %q, want empty on an error run", out.String()) + } + if !strings.Contains(errOut.String(), "rclone exploded") { + t.Fatalf("stderr = %q, want the error surfaced", errOut.String()) + } +} + +// TestPrintVerifyReportCleanEmptyShowsSummary: a clean run with no +// recorded objects still prints its summary line. +func TestPrintVerifyReportCleanEmptyShowsSummary(t *testing.T) { + var out, errOut strings.Builder + printVerifyReport(&out, &errOut, sync.RemoteVerifyReport{Destination: "offsite"}, nil) + if !strings.Contains(out.String(), "no recorded objects") { + t.Fatalf("stdout = %q, want the no-objects summary on a clean run", out.String()) + } +} diff --git a/config/backups.go b/config/backups.go index 060c77c..e2e63f0 100644 --- a/config/backups.go +++ b/config/backups.go @@ -23,8 +23,10 @@ type Backups struct { // rather than here. Dir string // Keep bounds the local snapshot directory: after writing, the oldest - // snapshots are rotated away until at most Keep remain. Zero means no - // rotation. + // index-* snapshots are rotated away until at most Keep remain. Zero + // means no rotation. Pre-migration snapshots in the same directory are + // exempt from this sync-time rotation — only an explicit `db backup + // --keep` retention removes them. Keep int // Cloud gates the destination ride-along. Ignored when Enabled is // false (no snapshot is taken to upload). diff --git a/config/config.go b/config/config.go index 568a4bb..d6f2431 100644 --- a/config/config.go +++ b/config/config.go @@ -69,6 +69,17 @@ type Volume struct { Name string Path string // absolute, ~ expanded SyncTo []string // destination names declared on this volume + // OffloadRequires is the volume's offload policy: the target names + // (destinations or peer nodes, the same flat namespace sync_to and + // runs.destination use) whose recorded durability must each cover a + // file's content before `squirrel offload` may delete its local + // bytes. An empty list means offload is refused for the volume — + // the policy is an explicit opt-in, there is no default target set. + // Entries may name targets beyond this config's destinations and + // nodes because durability evidence can arrive via a peer's + // durability pull about targets only that peer reaches; a name with + // no recorded evidence keeps the gate closed. + OffloadRequires []string // SyncEvery is the agent-scheduler cadence for full syncs of this // volume. Zero means "no scheduled sync" — the agent never auto- // triggers a sync for this volume; manual `squirrel sync` still @@ -123,16 +134,73 @@ type VolumeHook struct { // skipped rather than stacked. const DefaultHookTimeout = time.Hour -// Destination is one rclone-backed remote. Type drives which Params are -// required and how the destination is rendered into rclone.conf. +// Destination is one sync target driven by a curated external tool. Type +// selects the handler and drives which Params are required: the rclone +// types (local, sftp, s3, b2, gcs) render into rclone.conf, while +// type=kopia drives the kopia binary against a local repository. type Destination struct { Name string - Type string // local, sftp, s3, b2, gcs - Root string // remote-side base directory for syncing volumes into - // Params are type-specific rclone backend parameters with any - // { env = "VAR" } references already resolved to literal strings. - // Empty for type=local (no rclone remote needed). + Type string // local, sftp, s3, b2, gcs, kopia + Root string // remote-side base directory; for kopia, the repository path + // Layout selects how the destination stores a volume's data: + // LayoutMirror replicates the volume's tree, LayoutContentAddressed + // stores append-only content objects plus per-run manifest segments. + // Resolved at load time to one of the Layout* constants; an absent + // `layout` key resolves to LayoutMirror. + Layout string + // Params are type-specific parameters with any { env = "VAR" } + // references already resolved to literal strings: rclone backend + // parameters for the rclone types, the repository password for + // kopia. Empty for type=local (no rclone remote needed). Params map[string]string + // Crypt is non-nil when the destination declares a + // [destinations.<name>.crypt] block: client-side encryption through + // rclone's crypt overlay. Transfers then address the overlay remote + // (CryptRemoteName) instead of the underlying remote. + Crypt *Crypt + // HashAlgo is the provider checksum type (rclone hash name, e.g. + // "sha256") that scan-back fingerprints record for this destination. + // Settable on sftp destinations only — the one backend where rclone + // must be told which server-side hash command to run (rendered as + // the sftp `hashes` option). Defaults to "sha256" for + // content-addressed sftp destinations; empty otherwise. + HashAlgo string + // Checkers caps rclone's concurrent checkers (--checkers) on + // invocations against this destination, for providers that cap + // simultaneous connections. Zero leaves rclone's default in force. + Checkers int +} + +// Destination layout values. The layout shapes what sync writes under +// the destination's per-volume directory; it never changes how the +// local volume is indexed. +const ( + // LayoutMirror is the default: the destination holds a tree shaped + // like the local volume, with overwrites preserved under + // .squirrel-history/run-<id>/. + LayoutMirror = "mirror" + // LayoutContentAddressed stores one immutable object per BLAKE3 + // content hash under objects/ plus one manifest segment per sync + // run under index/, an append-only archive layout where a local + // rename re-uploads nothing. Valid for rclone-remote destinations + // only. + LayoutContentAddressed = "content-addressed" +) + +// Crypt is the optional client-side encryption overlay for a destination. +// squirrel renders it as an rclone crypt remote stacked on the underlying +// remote and addresses sync/restore transfers through it, so file contents +// are encrypted before they leave the machine. Contents only: +// filename_encryption is fixed off, keeping the destination tree layout +// identical to an unencrypted destination. +type Crypt struct { + // Password is the content-encryption password in rclone-obscured form, + // the same representation rclone's own crypt config stores (generate + // one with `rclone obscure`). Accepts a literal or { env = "VAR" }. + Password string + // Password2 is the salt, also rclone-obscured. Optional but + // recommended, matching rclone's crypt config. + Password2 string } // nameRE is the syntactic rule for volume and destination names. We pick a @@ -198,11 +266,12 @@ type rawConfig struct { } type rawVolume struct { - Path string `toml:"path"` - SyncTo []string `toml:"sync_to"` - SyncEvery string `toml:"sync_every"` - IndexEvery string `toml:"index_every"` - Hook *rawVolumeHook `toml:"hook"` + Path string `toml:"path"` + SyncTo []string `toml:"sync_to"` + OffloadRequires []string `toml:"offload_requires"` + SyncEvery string `toml:"sync_every"` + IndexEvery string `toml:"index_every"` + Hook *rawVolumeHook `toml:"hook"` } type rawVolumeHook struct { @@ -238,6 +307,9 @@ func (r *rawConfig) resolve(path string) (*Config, error) { } cfg.Destinations[name] = dest } + if err := validateCryptRemoteNames(cfg.Destinations); err != nil { + return nil, err + } for name, raw := range r.Nodes { if _, clash := cfg.Destinations[name]; clash { return nil, fmt.Errorf("nodes.%s: name also declared as a destination — names must be unique across both kinds", name) @@ -290,6 +362,9 @@ func resolveVolume(name string, raw rawVolume, dests map[string]*Destination, no } return nil, fmt.Errorf("sync_to references unknown destination or node %q", dst) } + if err := validateOffloadRequires(raw.OffloadRequires); err != nil { + return nil, err + } syncEvery, err := parseVolumeCadence("sync_every", raw.SyncEvery) if err != nil { return nil, err @@ -310,15 +385,36 @@ func resolveVolume(name string, raw rawVolume, dests map[string]*Destination, no return nil, err } return &Volume{ - Name: name, - Path: abs, - SyncTo: raw.SyncTo, - SyncEvery: syncEvery, - IndexEvery: indexEvery, - Hook: hook, + Name: name, + Path: abs, + SyncTo: raw.SyncTo, + OffloadRequires: raw.OffloadRequires, + SyncEvery: syncEvery, + IndexEvery: indexEvery, + Hook: hook, }, nil } +// validateOffloadRequires checks the offload policy entries +// syntactically: each must be a well-formed target name and appear +// once. Membership in this config's destinations/nodes is deliberately +// looser than sync_to's (see Volume.OffloadRequires): a typo'd name is +// still fail-safe because a target without recorded durability evidence +// can never let the gate pass. +func validateOffloadRequires(names []string) error { + seen := make(map[string]struct{}, len(names)) + for _, n := range names { + if !nameRE.MatchString(n) { + return fmt.Errorf("offload_requires entry %q is invalid (must match %s)", n, nameRE) + } + if _, dup := seen[n]; dup { + return fmt.Errorf("offload_requires lists %q more than once", n) + } + seen[n] = struct{}{} + } + return nil +} + // resolveVolumeHook validates an optional `[volumes.X.hook]` block. A nil // raw (no block) yields a nil hook. When present, command is required and // every argv element must be non-empty (an empty element is almost always diff --git a/config/config_test.go b/config/config_test.go index 6a46508..cc1ef3c 100644 --- a/config/config_test.go +++ b/config/config_test.go @@ -237,6 +237,55 @@ sync_to = ["does-not-exist"] } } +// TestLoadOffloadRequires: the per-volume offload policy is parsed +// verbatim, including names with no matching local destination or node +// — durability evidence for such targets can be peer-pulled, and an +// unknown name fails closed at the gate. +func TestLoadOffloadRequires(t *testing.T) { + p := writeConfig(t, ` +[destinations.scratch] +type = "local" +root = "/tmp/dst" + +[volumes.pictures] +path = "/tmp/pictures" +sync_to = ["scratch"] +offload_requires = ["scratch", "peer-only-offsite"] +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + got := cfg.Volumes["pictures"].OffloadRequires + if len(got) != 2 || got[0] != "scratch" || got[1] != "peer-only-offsite" { + t.Fatalf("OffloadRequires = %v, want [scratch peer-only-offsite]", got) + } +} + +func TestLoadRejectsInvalidOffloadRequiresName(t *testing.T) { + p := writeConfig(t, ` +[volumes.pictures] +path = "/tmp/pictures" +offload_requires = ["has space"] +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "offload_requires entry") { + t.Fatalf("expected invalid offload_requires error, got %v", err) + } +} + +func TestLoadRejectsDuplicateOffloadRequires(t *testing.T) { + p := writeConfig(t, ` +[volumes.pictures] +path = "/tmp/pictures" +offload_requires = ["nas", "nas"] +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "more than once") { + t.Fatalf("expected duplicate offload_requires error, got %v", err) + } +} + func TestLoadRejectsInvalidName(t *testing.T) { // Names that wouldn't survive being a filesystem subfolder or an // rclone.conf section are rejected at load time. @@ -356,6 +405,310 @@ root = "/p" } } +// TestLoadDestinationS3StorageClass parses the optional s3 storage_class and +// confirms it renders verbatim into the s3 section. +func TestLoadDestinationS3StorageClass(t *testing.T) { + p := writeConfig(t, ` +[destinations.archive] +type = "s3" +provider = "AWS" +bucket = "squirrel" +root = "/p" +storage_class = "DEEP_ARCHIVE" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + d := cfg.Destinations["archive"] + if d.Params["storage_class"] != "DEEP_ARCHIVE" { + t.Fatalf("storage_class not resolved: %v", d.Params) + } + if !strings.Contains(d.RcloneSection(), "storage_class = DEEP_ARCHIVE") { + t.Fatalf("section missing storage_class:\n%s", d.RcloneSection()) + } +} + +// TestLoadRejectsStorageClassOnNonS3 confirms storage_class is confined to +// the s3 type by the unknown-field check — it has no meaning on an sftp +// destination. +func TestLoadRejectsStorageClassOnNonS3(t *testing.T) { + p := writeConfig(t, ` +[destinations.nas] +type = "sftp" +host = "h" +user = "u" +root = "/r" +storage_class = "GLACIER" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `unknown field "storage_class"`) { + t.Fatalf("expected storage_class rejected on sftp, got %v", err) + } +} + +// TestLoadDestinationSFTPHostKeyValidation parses the optional sftp +// known_hosts_file and host_key_algorithms params and confirms both render +// verbatim into the sftp section. Pointing rclone at a known_hosts file is +// what turns on server host-key validation; absent, rclone accepts any host +// key the server presents. +func TestLoadDestinationSFTPHostKeyValidation(t *testing.T) { + p := writeConfig(t, ` +[destinations.nas] +type = "sftp" +host = "h" +user = "u" +root = "/r" +password = "p" +known_hosts_file = "~/.ssh/known_hosts" +host_key_algorithms = "ssh-ed25519 ssh-rsa" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + d := cfg.Destinations["nas"] + if d.Params["known_hosts_file"] != "~/.ssh/known_hosts" { + t.Fatalf("known_hosts_file not resolved: %v", d.Params) + } + if d.Params["host_key_algorithms"] != "ssh-ed25519 ssh-rsa" { + t.Fatalf("host_key_algorithms not resolved: %v", d.Params) + } + section := d.RcloneSection() + for _, want := range []string{ + "known_hosts_file = ~/.ssh/known_hosts", + "host_key_algorithms = ssh-ed25519 ssh-rsa", + } { + if !strings.Contains(section, want) { + t.Fatalf("section missing %q:\n%s", want, section) + } + } +} + +// TestLoadRejectsKnownHostsFileOnNonSFTP confirms the host-key params are +// confined to the sftp type by the unknown-field check. +func TestLoadRejectsKnownHostsFileOnNonSFTP(t *testing.T) { + p := writeConfig(t, ` +[destinations.s3] +type = "s3" +provider = "AWS" +bucket = "squirrel" +root = "/p" +known_hosts_file = "~/.ssh/known_hosts" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `unknown field "known_hosts_file"`) { + t.Fatalf("expected known_hosts_file rejected on s3, got %v", err) + } +} + +// TestLoadDestinationCrypt parses a crypt block with one env-resolved and +// one literal password, the same secret forms destination credentials +// accept. +func TestLoadDestinationCrypt(t *testing.T) { + t.Setenv("CRYPT_PASSWORD", "obscured-pw") + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +password = "transport-pw" + +[destinations.offsite.crypt] +password = { env = "CRYPT_PASSWORD" } +password2 = "obscured-salt" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + d := cfg.Destinations["offsite"] + if d.Crypt == nil { + t.Fatalf("Crypt not parsed: %+v", d) + } + if d.Crypt.Password != "obscured-pw" || d.Crypt.Password2 != "obscured-salt" { + t.Fatalf("Crypt = %+v, want resolved password + literal salt", d.Crypt) + } + if d.CryptRemoteName() != "offsite-crypt" { + t.Fatalf("CryptRemoteName = %q, want offsite-crypt", d.CryptRemoteName()) + } +} + +// TestRcloneSectionCryptStacked pins the exact two-section render for a +// crypt destination: the underlying remote exactly as without crypt, then +// the overlay wrapping it at the destination root with the fixed +// filename-encryption settings. +func TestRcloneSectionCryptStacked(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +password = "transport-pw" + +[destinations.offsite.crypt] +password = "obscured-pw" +password2 = "obscured-salt" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + want := `[offsite] +type = sftp +host = host.example +user = u +blake3sum_command = b3sum +password = transport-pw + +[offsite-crypt] +type = crypt +remote = offsite:/data +filename_encryption = off +directory_name_encryption = false +password = obscured-pw +password2 = obscured-salt +` + if got := cfg.Destinations["offsite"].RcloneSection(); got != want { + t.Fatalf("RcloneSection:\n%s\nwant:\n%s", got, want) + } +} + +// TestRcloneSectionSFTPEmitsBlake3sumCommand pins that every sftp section +// carries a blake3sum_command. rclone never autodetects one, so without it +// squirrel's `--hash blake3` syncs fail with "hash type not supported". The +// line is sftp-only: backends with a fixed provider checksum must not get it. +func TestRcloneSectionSFTPEmitsBlake3sumCommand(t *testing.T) { + p := writeConfig(t, ` +[destinations.nas] +type = "sftp" +host = "h" +user = "u" +root = "/r" + +[destinations.s3] +type = "s3" +provider = "AWS" +bucket = "b" +root = "/r" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + if got := cfg.Destinations["nas"].RcloneSection(); !strings.Contains(got, "blake3sum_command = b3sum") { + t.Fatalf("sftp section missing blake3sum_command:\n%s", got) + } + if got := cfg.Destinations["s3"].RcloneSection(); strings.Contains(got, "blake3sum_command") { + t.Fatalf("non-sftp section should not carry blake3sum_command:\n%s", got) + } +} + +// TestRcloneSectionCryptOmitsEmptySalt: password2 is optional, mirroring +// rclone's own crypt config, and an absent salt renders no password2 line. +func TestRcloneSectionCryptOmitsEmptySalt(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "h" +user = "u" +root = "/data" + +[destinations.offsite.crypt] +password = "obscured-pw" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + section := cfg.Destinations["offsite"].RcloneSection() + if strings.Contains(section, "password2") { + t.Fatalf("section has a password2 line for an absent salt:\n%s", section) + } + if !strings.Contains(section, "password = obscured-pw") { + t.Fatalf("section missing crypt password:\n%s", section) + } +} + +func TestLoadDestinationCryptMissingPassword(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "h" +user = "u" +root = "/r" + +[destinations.offsite.crypt] +password2 = "salt-only" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "crypt.password is required") { + t.Fatalf("expected crypt.password-required error, got %v", err) + } +} + +func TestLoadRejectsCryptOnLocalDestination(t *testing.T) { + p := writeConfig(t, ` +[destinations.scratch] +type = "local" +root = "/tmp/scratch" + +[destinations.scratch.crypt] +password = "obscured-pw" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `type "local"`) { + t.Fatalf("expected crypt-on-local rejection, got %v", err) + } +} + +// TestLoadRejectsUnknownCryptField doubles as the "filename encryption is +// fixed, not configurable" pin: a user trying to switch it on gets a +// load-time error. +func TestLoadRejectsUnknownCryptField(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "h" +user = "u" +root = "/r" + +[destinations.offsite.crypt] +password = "obscured-pw" +filename_encryption = "standard" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `unknown field "filename_encryption"`) { + t.Fatalf("expected unknown-crypt-field error, got %v", err) + } +} + +// TestLoadRejectsCryptRemoteNameCollision: the overlay's rclone.conf +// section is named <dest>-crypt, so a sibling destination already holding +// that name would render two sections under one header. +func TestLoadRejectsCryptRemoteNameCollision(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "h" +user = "u" +root = "/r" + +[destinations.offsite.crypt] +password = "obscured-pw" + +[destinations.offsite-crypt] +type = "local" +root = "/tmp/x" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `crypt remote name "offsite-crypt"`) { + t.Fatalf("expected crypt-name collision error, got %v", err) + } +} + // TestLoadNodeName checks that the top-level node_name key is parsed and // surfaced on Config.NodeName for the store to consume on first migration. func TestLoadNodeName(t *testing.T) { diff --git a/config/destinations.go b/config/destinations.go index 4193bd0..b995836 100644 --- a/config/destinations.go +++ b/config/destinations.go @@ -10,14 +10,16 @@ import ( // destSchema declares the parameter schema for one destination type. The // schema drives validation (required/optional fields, secret handling) and -// the rclone.conf rendering — every param key here that resolves to a -// non-empty value is written verbatim to rclone.conf, except for "root" -// which is squirrel's own concept (we use it to compose the destination URI -// passed to rclone, not as an rclone backend param). +// the rclone.conf rendering — for rclone-backed types, every param key here +// that resolves to a non-empty value is written verbatim to rclone.conf, +// except for "root" which is squirrel's own concept (we use it to compose +// the destination URI passed to rclone, not as an rclone backend param). +// Types with an empty rcloneType render no section, so their params stay +// out of rclone.conf entirely. type destSchema struct { // rcloneType is the value written as `type = ...` in rclone.conf for - // this destination. Empty means "no rclone remote" (used by the local - // backend, which is addressed by absolute path). + // this destination. Empty means "no rclone remote" (the local backend + // is addressed by absolute path; kopia drives its own binary). rcloneType string // requiredString fields must be present as plain strings and non-empty. requiredString []string @@ -26,6 +28,9 @@ type destSchema struct { // secretFields accept either a string literal or an inline table // { env = "VAR" }. The resolved literal is written to rclone.conf. secretFields []string + // requiredSecret fields accept the same forms as secretFields but + // must resolve to a non-empty value. + requiredSecret []string } // destSchemas registers every supported destination type. Adding a new type @@ -38,15 +43,25 @@ var destSchemas = map[string]destSchema{ // an rclone backend param for the local case. }, "sftp": { + // known_hosts_file points rclone at a known_hosts file so it + // validates the server's host key before transferring; absent, rclone + // accepts whatever host key the server presents. host_key_algorithms + // pins the accepted host-key algorithms (rclone's space-separated + // list). Both map straight to the rclone sftp options of the same + // name. The unknown-field check confines them to this type. rcloneType: "sftp", requiredString: []string{"host", "user"}, - optionalString: []string{"port", "key_file"}, + optionalString: []string{"port", "key_file", "known_hosts_file", "host_key_algorithms"}, secretFields: []string{"password"}, }, "s3": { + // storage_class maps to rclone's s3 storage_class config key; its + // accepted values are whatever the backend supports (commonly + // STANDARD and various archive tiers). The unknown-field check + // confines it to this type. rcloneType: "s3", requiredString: []string{"provider", "bucket"}, - optionalString: []string{"region", "endpoint"}, + optionalString: []string{"region", "endpoint", "storage_class"}, secretFields: []string{"access_key_id", "secret_access_key"}, }, "b2": { @@ -60,11 +75,23 @@ var destSchemas = map[string]destSchema{ optionalString: []string{"service_account_file"}, secretFields: []string{"service_account_credentials"}, }, + "kopia": { + // root is the local filesystem path of the kopia repository. + // The password unlocks the repository (and creates it on first + // use); kopia encrypts the repository contents itself, which is + // also why a crypt block is rejected for this type. + // verify_files_percent is the fraction of snapshot file bytes + // `kopia snapshot verify` reads back when this destination gates + // offload (default applied by the kopia handler when unset). + rcloneType: "", + optionalString: []string{"verify_files_percent"}, + requiredSecret: []string{"password"}, + }, } // SupportedTypes returns the sorted list of destination types squirrel -// knows how to render into rclone.conf. Used by error messages so users -// see what they could have typed. +// supports. Used by error messages so users see what they could have +// typed. func SupportedTypes() []string { out := make([]string, 0, len(destSchemas)) for t := range destSchemas { @@ -94,11 +121,176 @@ func resolveDestination(name string, raw map[string]any) (*Destination, error) { if !ok || root == "" { return nil, errors.New("root must be a non-empty string") } + crypt, err := resolveCrypt(raw, typ) + if err != nil { + return nil, err + } + layout, err := resolveLayout(raw, typ) + if err != nil { + return nil, err + } + hashAlgo, err := resolveHashAlgo(raw, typ, layout) + if err != nil { + return nil, err + } + checkers, err := resolveCheckers(raw, typ) + if err != nil { + return nil, err + } params, err := validateAndResolveParams(schema, raw) if err != nil { return nil, err } - return &Destination{Name: name, Type: typ, Root: root, Params: params}, nil + return &Destination{ + Name: name, Type: typ, Root: root, Layout: layout, Params: params, + Crypt: crypt, HashAlgo: hashAlgo, Checkers: checkers, + }, nil +} + +// sftpHashAlgos are the checksum types rclone's sftp backend can read +// via a server-side sum command, the valid values for `hash_algo`. +var sftpHashAlgos = map[string]bool{ + "md5": true, "sha1": true, "sha256": true, "crc32": true, + "blake3": true, "xxh3": true, "xxh128": true, +} + +// resolveHashAlgo validates the optional `hash_algo` key. sftp is the +// one backend where rclone must be told which server-side hash command +// to run; every other type exposes a fixed checksum, so the key is +// rejected there. Content-addressed sftp destinations default to +// "sha256" so scan-back fingerprints get a strong checksum without +// relying on rclone's md5/sha1 preference. +func resolveHashAlgo(raw map[string]any, typ, layout string) (string, error) { + v, err := optionalString(raw, "hash_algo") + if err != nil { + return "", err + } + if v == "" { + if typ == "sftp" && layout == LayoutContentAddressed { + return "sha256", nil + } + return "", nil + } + if typ != "sftp" { + return "", fmt.Errorf(`hash_algo is only supported on type "sftp" destinations; type %q exposes a fixed checksum`, typ) + } + if !sftpHashAlgos[v] { + return "", fmt.Errorf("unknown hash_algo %q (supported: %v)", v, sortedKeys(sftpHashAlgos)) + } + return v, nil +} + +// resolveCheckers validates the optional `checkers` key: a positive +// integer cap on rclone's concurrent checkers for this destination. +func resolveCheckers(raw map[string]any, typ string) (int, error) { + v, ok := raw["checkers"] + if !ok { + return 0, nil + } + switch typ { + case "local", "kopia": + return 0, fmt.Errorf("checkers requires an rclone-remote destination type, not %q", typ) + } + n, isInt := v.(int64) + if !isInt || n <= 0 { + return 0, errors.New("checkers must be a positive integer") + } + return int(n), nil +} + +func sortedKeys(m map[string]bool) []string { + out := make([]string, 0, len(m)) + for k := range m { + out = append(out, k) + } + sort.Strings(out) + return out +} + +// resolveLayout validates the optional `layout` key of a destination. An +// absent key resolves to LayoutMirror. LayoutContentAddressed drives +// per-object rclone transfers, so it requires an rclone-remote type: +// type "local" is addressed by filesystem path, and "kopia" repositories +// already use kopia's own content-addressed format. +func resolveLayout(raw map[string]any, typ string) (string, error) { + v, err := optionalString(raw, "layout") + if err != nil { + return "", err + } + switch v { + case "", LayoutMirror: + return LayoutMirror, nil + case LayoutContentAddressed: + switch typ { + case "local": + return "", fmt.Errorf(`layout %q requires an rclone-remote destination; type "local" is addressed by filesystem path`, LayoutContentAddressed) + case "kopia": + return "", fmt.Errorf(`layout %q requires an rclone-remote destination; type "kopia" repositories are content-addressed by kopia itself`, LayoutContentAddressed) + } + return LayoutContentAddressed, nil + default: + return "", fmt.Errorf("unknown layout %q (supported: %q, %q)", v, LayoutMirror, LayoutContentAddressed) + } +} + +// resolveCrypt validates the optional `crypt` sub-table of a destination. +// A missing key yields nil (no encryption overlay). The two password +// fields go through the same secret resolution as destination credentials; +// password is required, password2 (the salt) is optional. +func resolveCrypt(raw map[string]any, typ string) (*Crypt, error) { + v, ok := raw["crypt"] + if !ok { + return nil, nil + } + switch typ { + case "local": + return nil, errors.New(`crypt requires an rclone-remote destination; type "local" is addressed by filesystem path`) + case "kopia": + return nil, errors.New(`crypt requires an rclone-remote destination; type "kopia" repositories are encrypted by kopia itself`) + } + table, ok := v.(map[string]any) + if !ok { + return nil, errors.New("crypt must be a table, e.g. [destinations.<name>.crypt]") + } + password, err := resolveSecret(table, "password") + if err != nil { + return nil, fmt.Errorf("crypt: %w", err) + } + if password == "" { + return nil, errors.New("crypt.password is required (rclone-obscured; generate with `rclone obscure`)") + } + password2, err := resolveSecret(table, "password2") + if err != nil { + return nil, fmt.Errorf("crypt: %w", err) + } + for k := range table { + if k != "password" && k != "password2" { + return nil, fmt.Errorf("crypt: unknown field %q", k) + } + } + return &Crypt{Password: password, Password2: password2}, nil +} + +// validateCryptRemoteNames rejects a config where one destination's crypt +// remote name is itself a declared destination — both would render an +// rclone.conf section under the same name, and rclone would resolve the +// shared name to whichever section comes last. +func validateCryptRemoteNames(dests map[string]*Destination) error { + names := make([]string, 0, len(dests)) + for name := range dests { + names = append(names, name) + } + sort.Strings(names) + for _, name := range names { + d := dests[name] + if d.Crypt == nil { + continue + } + if _, clash := dests[d.CryptRemoteName()]; clash { + return fmt.Errorf("destinations.%s: crypt remote name %q is already taken by another destination — rename one of them", name, d.CryptRemoteName()) + } + } + return nil } // validateAndResolveParams walks the schema, pulling each declared field @@ -108,7 +300,10 @@ func resolveDestination(name string, raw map[string]any) (*Destination, error) { // silently disabling a field at rclone time. func validateAndResolveParams(schema destSchema, raw map[string]any) (map[string]string, error) { out := make(map[string]string) - seen := map[string]bool{"type": true, "root": true} + seen := map[string]bool{ + "type": true, "root": true, "crypt": true, "layout": true, + "hash_algo": true, "checkers": true, + } for _, key := range schema.requiredString { v, err := requireString(raw, key) if err != nil { @@ -137,6 +332,17 @@ func validateAndResolveParams(schema destSchema, raw map[string]any) (map[string } seen[key] = true } + for _, key := range schema.requiredSecret { + v, err := resolveSecret(raw, key) + if err != nil { + return nil, err + } + if v == "" { + return nil, fmt.Errorf("%s is required", key) + } + out[key] = v + seen[key] = true + } for k := range raw { if !seen[k] { return nil, fmt.Errorf("unknown field %q", k) @@ -205,7 +411,10 @@ func resolveSecret(raw map[string]any, key string) (string, error) { // RcloneSection returns the rclone.conf section body for this destination, // or the empty string for type=local (which doesn't need a named remote — // rclone treats absolute paths as local-filesystem destinations directly). -// The returned bytes do not include a trailing newline. +// A destination with a crypt block renders two sections: the underlying +// remote exactly as without crypt, then the crypt overlay wrapping it. +// Each rendered section ends with a trailing newline, so sections +// concatenate directly into a valid rclone.conf. func (d *Destination) RcloneSection() string { schema := destSchemas[d.Type] if schema.rcloneType == "" { @@ -226,11 +435,50 @@ func (d *Destination) RcloneSection() string { fmt.Fprintf(&b, "%s = %s\n", key, v) } } + if d.Type == "sftp" { + // rclone's sftp backend only autodetects md5sum/sha1sum, so BLAKE3 + // must be named explicitly or squirrel's `--hash blake3` syncs abort + // with "hash type not supported". b3sum is the canonical BLAKE3 CLI + // and must be on the remote's PATH. + fmt.Fprintf(&b, "blake3sum_command = b3sum\n") + if d.HashAlgo != "" { + fmt.Fprintf(&b, "hashes = %s\n", d.HashAlgo) + } + } for _, key := range sortedSubset(schema.secretFields) { if v, ok := d.Params[key]; ok { fmt.Fprintf(&b, "%s = %s\n", key, v) } } + if d.Crypt != nil { + b.WriteString("\n") + b.WriteString(d.cryptSection()) + } + return b.String() +} + +// CryptRemoteName is the rclone.conf section name of the crypt overlay +// stacked on this destination. Meaningful only when Crypt is non-nil. +func (d *Destination) CryptRemoteName() string { + return d.Name + "-crypt" +} + +// cryptSection renders the crypt overlay remote. Its remote line bakes the +// destination root in, so transfers through the overlay address +// volume-relative paths directly. filename_encryption is fixed off: the +// overlay encrypts file contents only, and the destination keeps the same +// browsable tree layout as an unencrypted destination. +func (d *Destination) cryptSection() string { + var b strings.Builder + fmt.Fprintf(&b, "[%s]\n", d.CryptRemoteName()) + b.WriteString("type = crypt\n") + fmt.Fprintf(&b, "remote = %s:%s\n", d.Name, d.Root) + b.WriteString("filename_encryption = off\n") + b.WriteString("directory_name_encryption = false\n") + fmt.Fprintf(&b, "password = %s\n", d.Crypt.Password) + if d.Crypt.Password2 != "" { + fmt.Fprintf(&b, "password2 = %s\n", d.Crypt.Password2) + } return b.String() } diff --git a/config/fingerprint_knobs_test.go b/config/fingerprint_knobs_test.go new file mode 100644 index 0000000..37c8930 --- /dev/null +++ b/config/fingerprint_knobs_test.go @@ -0,0 +1,161 @@ +package config + +import ( + "strings" + "testing" +) + +// TestLoadHashAlgoDefaultsForContentAddressedSFTP: a content-addressed +// sftp destination defaults to sha256 fingerprints (rendered as the sftp +// `hashes` option) while a mirrored one keeps rclone's own hash +// behaviour untouched. +func TestLoadHashAlgoDefaultsForContentAddressedSFTP(t *testing.T) { + cfg, err := Load(writeConfig(t, ` +[destinations.archive] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +layout = "content-addressed" + +[destinations.mirror] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +`)) + if err != nil { + t.Fatalf("Load: %v", err) + } + archive := cfg.Destinations["archive"] + if archive.HashAlgo != "sha256" { + t.Fatalf("HashAlgo = %q, want default sha256 for content-addressed sftp", archive.HashAlgo) + } + if !strings.Contains(archive.RcloneSection(), "hashes = sha256\n") { + t.Fatalf("section lacks hashes line:\n%s", archive.RcloneSection()) + } + mirror := cfg.Destinations["mirror"] + if mirror.HashAlgo != "" { + t.Fatalf("HashAlgo = %q, want empty for mirrored sftp", mirror.HashAlgo) + } + if strings.Contains(mirror.RcloneSection(), "hashes") { + t.Fatalf("mirrored section unexpectedly renders hashes:\n%s", mirror.RcloneSection()) + } +} + +// TestLoadHashAlgoExplicit: an explicit hash_algo overrides the default +// and renders on mirrored sftp destinations too. +func TestLoadHashAlgoExplicit(t *testing.T) { + cfg, err := Load(writeConfig(t, ` +[destinations.archive] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +layout = "content-addressed" +hash_algo = "md5" + +[destinations.mirror] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +hash_algo = "sha256" +`)) + if err != nil { + t.Fatalf("Load: %v", err) + } + if got := cfg.Destinations["archive"].HashAlgo; got != "md5" { + t.Fatalf("HashAlgo = %q, want md5", got) + } + if got := cfg.Destinations["mirror"].HashAlgo; got != "sha256" { + t.Fatalf("HashAlgo = %q, want sha256", got) + } + if !strings.Contains(cfg.Destinations["mirror"].RcloneSection(), "hashes = sha256\n") { + t.Fatalf("mirror section lacks hashes line:\n%s", cfg.Destinations["mirror"].RcloneSection()) + } +} + +func TestLoadRejectsHashAlgoOnNonSFTP(t *testing.T) { + _, err := Load(writeConfig(t, ` +[destinations.bucket] +type = "s3" +provider = "AWS" +bucket = "b" +root = "p" +hash_algo = "sha256" +`)) + if err == nil || !strings.Contains(err.Error(), "hash_algo") { + t.Fatalf("err = %v, want hash_algo rejection on s3", err) + } +} + +func TestLoadRejectsUnknownHashAlgo(t *testing.T) { + _, err := Load(writeConfig(t, ` +[destinations.archive] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +hash_algo = "sha512" +`)) + if err == nil || !strings.Contains(err.Error(), "hash_algo") { + t.Fatalf("err = %v, want unknown hash_algo rejection", err) + } +} + +func TestLoadCheckers(t *testing.T) { + cfg, err := Load(writeConfig(t, ` +[destinations.archive] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +checkers = 4 +`)) + if err != nil { + t.Fatalf("Load: %v", err) + } + d := cfg.Destinations["archive"] + if d.Checkers != 4 { + t.Fatalf("Checkers = %d, want 4", d.Checkers) + } + if strings.Contains(d.RcloneSection(), "checkers") { + t.Fatalf("checkers leaked into rclone.conf (it is an invocation flag):\n%s", d.RcloneSection()) + } +} + +func TestLoadRejectsBadCheckers(t *testing.T) { + cases := []struct{ name, body string }{ + {"zero", "checkers = 0"}, + {"negative", "checkers = -2"}, + {"string", `checkers = "4"`}, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + _, err := Load(writeConfig(t, ` +[destinations.archive] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +`+c.body+"\n")) + if err == nil || !strings.Contains(err.Error(), "checkers") { + t.Fatalf("err = %v, want checkers rejection", err) + } + }) + } +} + +func TestLoadRejectsCheckersOnKopia(t *testing.T) { + _, err := Load(writeConfig(t, ` +[destinations.repo] +type = "kopia" +root = "/repo" +password = "pw" +checkers = 4 +`)) + if err == nil || !strings.Contains(err.Error(), "checkers") { + t.Fatalf("err = %v, want checkers rejection on kopia", err) + } +} diff --git a/config/kopia_test.go b/config/kopia_test.go new file mode 100644 index 0000000..b3e4df3 --- /dev/null +++ b/config/kopia_test.go @@ -0,0 +1,100 @@ +package config + +import ( + "strings" + "testing" +) + +func TestLoadDestinationKopia(t *testing.T) { + t.Setenv("REPO_PASSWORD", "hunter2") + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/kopia-repo" +password = { env = "REPO_PASSWORD" } + +[volumes.pictures] +path = "/tmp/pictures" +sync_to = ["mirror"] +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + d, ok := cfg.Destinations["mirror"] + if !ok { + t.Fatalf("missing destination") + } + if d.Type != "kopia" || d.Root != "/tmp/kopia-repo" { + t.Fatalf("unexpected destination: %+v", d) + } + if d.Params["password"] != "hunter2" { + t.Fatalf("password not resolved: %v", d.Params) + } + // type=kopia drives the kopia binary, so the repository password + // must never leak into the rendered rclone.conf. + if d.RcloneSection() != "" { + t.Fatalf("kopia destination should produce empty rclone section, got:\n%s", d.RcloneSection()) + } + if got := cfg.Volumes["pictures"].SyncTo; len(got) != 1 || got[0] != "mirror" { + t.Fatalf("SyncTo = %v, want [mirror]", got) + } +} + +func TestLoadDestinationKopiaPasswordLiteral(t *testing.T) { + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/kopia-repo" +password = "literal-pw" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + if got := cfg.Destinations["mirror"].Params["password"]; got != "literal-pw" { + t.Fatalf("password = %q, want literal-pw", got) + } +} + +func TestLoadDestinationKopiaRequiresPassword(t *testing.T) { + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/kopia-repo" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "password is required") { + t.Fatalf("expected password-required error, got %v", err) + } +} + +func TestLoadRejectsCryptOnKopiaDestination(t *testing.T) { + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/kopia-repo" +password = "pw" + +[destinations.mirror.crypt] +password = "obscured-pw" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `type "kopia"`) { + t.Fatalf("expected crypt-on-kopia rejection, got %v", err) + } +} + +func TestLoadRejectsUnknownFieldOnKopia(t *testing.T) { + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/kopia-repo" +password = "pw" +host = "nas.local" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `unknown field "host"`) { + t.Fatalf("expected unknown-field error, got %v", err) + } +} diff --git a/config/layout_test.go b/config/layout_test.go new file mode 100644 index 0000000..8f43aa2 --- /dev/null +++ b/config/layout_test.go @@ -0,0 +1,106 @@ +package config + +import ( + "strings" + "testing" +) + +func TestLoadDestinationLayoutDefaultsToMirror(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "example" +user = "u" +root = "/data" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + if got := cfg.Destinations["offsite"].Layout; got != LayoutMirror { + t.Fatalf("Layout = %q, want %q", got, LayoutMirror) + } +} + +func TestLoadDestinationContentAddressedLayout(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "example" +user = "u" +root = "/data" +layout = "content-addressed" + +[destinations.offsite.crypt] +password = "obscured-pw" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + d := cfg.Destinations["offsite"] + if d.Layout != LayoutContentAddressed { + t.Fatalf("Layout = %q, want %q", d.Layout, LayoutContentAddressed) + } + if d.Crypt == nil { + t.Fatalf("crypt block lost when combined with layout") + } +} + +func TestLoadDestinationExplicitMirrorLayout(t *testing.T) { + p := writeConfig(t, ` +[destinations.scratch] +type = "local" +root = "/tmp/dst" +layout = "mirror" +`) + cfg, err := Load(p) + if err != nil { + t.Fatalf("Load: %v", err) + } + if got := cfg.Destinations["scratch"].Layout; got != LayoutMirror { + t.Fatalf("Layout = %q, want %q", got, LayoutMirror) + } +} + +func TestLoadRejectsContentAddressedOnLocal(t *testing.T) { + p := writeConfig(t, ` +[destinations.scratch] +type = "local" +root = "/tmp/dst" +layout = "content-addressed" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "rclone-remote") { + t.Fatalf("expected rclone-remote requirement error, got %v", err) + } +} + +func TestLoadRejectsContentAddressedOnKopia(t *testing.T) { + p := writeConfig(t, ` +[destinations.mirror] +type = "kopia" +root = "/tmp/repo" +password = "hunter2" +layout = "content-addressed" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), "kopia") { + t.Fatalf("expected kopia rejection, got %v", err) + } +} + +func TestLoadRejectsUnknownLayout(t *testing.T) { + p := writeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "example" +user = "u" +root = "/data" +layout = "object-store" +`) + _, err := Load(p) + if err == nil || !strings.Contains(err.Error(), `unknown layout "object-store"`) { + t.Fatalf("expected unknown-layout error, got %v", err) + } +} diff --git a/index/index.go b/index/index.go index b34552c..d945cdc 100644 --- a/index/index.go +++ b/index/index.go @@ -555,16 +555,16 @@ func (i *indexer) process(w workItem, buf []byte) resultItem { return resultItem{row: existing, kind: kindUnchanged} } - digest, err := hashFile(w.absPath, buf) + hashed, err := hashFile(w.absPath, buf) if err != nil { return resultItem{err: fmt.Errorf("hash %s: %w", w.absPath, err)} } - row := i.rowFor(w, digest) + row := i.rowFor(w, hashed) if !hasExisting { return resultItem{row: row, kind: kindAdded} } - if bytes.Equal(existing.Blake3, digest) && existing.Status == store.StatusPresent { + if bytes.Equal(existing.Blake3, hashed.digest) && existing.Status == store.StatusPresent { return resultItem{row: existing, kind: kindUnchanged} } return resultItem{row: row, kind: kindModified} @@ -579,13 +579,18 @@ func metadataMatches(existing store.FileRow, w workItem) bool { existing.MtimeNs == w.mtimeNs } -func (i *indexer) rowFor(w workItem, digest []byte) store.FileRow { +// rowFor builds the file row from the hashed-file result rather than the +// walk-time workItem: SizeBytes and MtimeNs come from a Stat of the open +// handle taken after hashing, so the digest and the metadata describe the +// same inode state — keeping the minted contents row internally +// consistent against the immutable-contents size cross-check. +func (i *indexer) rowFor(w workItem, hashed hashedFile) store.FileRow { return store.FileRow{ VolumeID: i.volumeID, Path: w.relPath, - Blake3: digest, - SizeBytes: w.sizeBytes, - MtimeNs: w.mtimeNs, + Blake3: hashed.digest, + SizeBytes: hashed.sizeBytes, + MtimeNs: hashed.mtimeNs, Status: store.StatusPresent, FirstSeenRunID: i.runID, LastSeenRunID: i.runID, @@ -676,15 +681,41 @@ func resolveNamedVolume(ctx context.Context, s *store.Store, name, absRoot strin // the run; allocating per-file made GC pressure outweigh the syscall win. const hashReadBufferSize = 1 << 20 -func hashFile(path string, buf []byte) ([]byte, error) { +// hashedFile is the digest of a file's bytes paired with the size and +// mtime read from the same open handle immediately after hashing. +type hashedFile struct { + digest []byte + sizeBytes int64 + mtimeNs int64 +} + +// hashFile hashes the file at path and reads its size and mtime from the +// open handle after the hash completes. Stat-after-hash on the live handle +// (rather than re-opening by path, which would reintroduce a race) pins the +// metadata to the same bytes that produced the digest, even if the file was +// growing during the walk-to-hash window. +// +// Residual: an append landing between the hash reaching EOF and the Stat +// can still report a size above the bytes hashed; the window is a single +// syscall gap (vs. the whole walk-to-hash span this closes), and a later +// re-index of the settled file supersedes to a consistent row. +func hashFile(path string, buf []byte) (hashedFile, error) { f, err := os.Open(path) if err != nil { - return nil, err + return hashedFile{}, err } defer f.Close() h := blake3.New() if _, err := io.CopyBuffer(h, f, buf); err != nil { - return nil, err + return hashedFile{}, err + } + fi, err := f.Stat() + if err != nil { + return hashedFile{}, err } - return h.Sum(nil), nil + return hashedFile{ + digest: h.Sum(nil), + sizeBytes: fi.Size(), + mtimeNs: fi.ModTime().UnixNano(), + }, nil } diff --git a/index/index_test.go b/index/index_test.go index 5b0f38d..e409af5 100644 --- a/index/index_test.go +++ b/index/index_test.go @@ -12,10 +12,18 @@ import ( "testing" "time" + "github.com/zeebo/blake3" + "github.com/mbertschler/squirrel/store" "github.com/mbertschler/squirrel/volmark" ) +func blake3Of(t *testing.T, content string) []byte { + t.Helper() + sum := blake3.Sum256([]byte(content)) + return sum[:] +} + func setupStore(t *testing.T) *store.Store { t.Helper() dsn := filepath.Join(t.TempDir(), "test.db") @@ -984,3 +992,172 @@ func TestIndexRefusesMismatchedVolumeMarker(t *testing.T) { t.Fatalf("err type = %T (%v), want *volmark.ErrMismatch", err, err) } } + +// markOffloaded flips the live row at relPath to status='offloaded' +// via an Upsert carrying the same content — the status transition the +// future offload command records once durability is proven. +func markOffloaded(t *testing.T, s *store.Store, volumeID int64, relPath string) { + t.Helper() + ctx := context.Background() + row, err := s.GetByPath(ctx, volumeID, relPath) + if err != nil { + t.Fatalf("GetByPath %s: %v", relPath, err) + } + row.Status = store.StatusOffloaded + if err := s.Upsert(ctx, row, nil); err != nil { + t.Fatalf("Upsert offloaded %s: %v", relPath, err) + } +} + +// TestIndexLeavesOffloadedRowsAlone: an offloaded row's on-disk absence +// is intentional, so a re-index neither flips it to missing nor counts +// it in the report's Missing tally. +func TestIndexLeavesOffloadedRowsAlone(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "keep.txt"), "kept") + writeFile(t, filepath.Join(root, "cold.txt"), "rarely needed") + + s := setupStore(t) + ctx := context.Background() + if _, err := Index(ctx, s, root, Options{}); err != nil { + t.Fatalf("first Index: %v", err) + } + absRoot, _ := filepath.Abs(root) + vol := volumeFor(t, s, absRoot) + + markOffloaded(t, s, vol.ID, "cold.txt") + if err := os.Remove(filepath.Join(root, "cold.txt")); err != nil { + t.Fatal(err) + } + + rep, err := Index(ctx, s, root, Options{}) + if err != nil { + t.Fatalf("re-Index: %v", err) + } + if rep.Missing != 0 { + t.Fatalf("report.Missing = %d, want 0 (offloaded absence is expected)", rep.Missing) + } + row, err := s.GetByPath(ctx, vol.ID, "cold.txt") + if err != nil { + t.Fatalf("GetByPath cold.txt: %v", err) + } + if row.Status != store.StatusOffloaded { + t.Fatalf("cold.txt status = %q after re-index, want offloaded", row.Status) + } +} + +// TestIndexFlipsOffloadedBackToPresent: when the file reappears on disk +// with its recorded content (a restore or manual copy-back), the next +// index run flips the row back to present, preserving first_seen. +func TestIndexFlipsOffloadedBackToPresent(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "cold.txt"), "rarely needed") + + s := setupStore(t) + ctx := context.Background() + if _, err := Index(ctx, s, root, Options{}); err != nil { + t.Fatalf("first Index: %v", err) + } + absRoot, _ := filepath.Abs(root) + vol := volumeFor(t, s, absRoot) + before, err := s.GetByPath(ctx, vol.ID, "cold.txt") + if err != nil { + t.Fatalf("GetByPath before: %v", err) + } + + markOffloaded(t, s, vol.ID, "cold.txt") + if err := os.Remove(filepath.Join(root, "cold.txt")); err != nil { + t.Fatal(err) + } + if _, err := Index(ctx, s, root, Options{}); err != nil { + t.Fatalf("Index while offloaded: %v", err) + } + + writeFile(t, filepath.Join(root, "cold.txt"), "rarely needed") + rep, err := Index(ctx, s, root, Options{}) + if err != nil { + t.Fatalf("Index after reappearance: %v", err) + } + if rep.Errors != 0 { + t.Fatalf("report errors = %+v", rep.ErrorList) + } + + after, err := s.GetByPath(ctx, vol.ID, "cold.txt") + if err != nil { + t.Fatalf("GetByPath after: %v", err) + } + if after.Status != store.StatusPresent { + t.Fatalf("cold.txt status = %q after reappearance, want present", after.Status) + } + if !bytes.Equal(after.Blake3, before.Blake3) { + t.Fatalf("cold.txt content changed across offload round trip") + } + if after.FirstSeenRunID != before.FirstSeenRunID { + t.Fatalf("first_seen_run_id = %d, want %d (reappearance must not rewrite it)", + after.FirstSeenRunID, before.FirstSeenRunID) + } +} + +// TestHashStatPinnedToHashedBytes simulates a file that grows between the +// walker's stat and the worker's hash: process must record the size read +// from the open handle after hashing (matching the hashed content), not +// the stale walk size. Binding the new digest to the old size would mint a +// contents row whose size_bytes can never match the honest content again. +func TestHashStatPinnedToHashedBytes(t *testing.T) { + root := t.TempDir() + abs := filepath.Join(root, "growing.txt") + content := "the full on-disk content after the append" + writeFile(t, abs, content) + + s := setupStore(t) + ctx := context.Background() + idx, err := newIndexer(ctx, s, root, Options{Name: "vol"}) + if err != nil { + t.Fatalf("newIndexer: %v", err) + } + + stale := workItem{ + absPath: abs, + relPath: "growing.txt", + sizeBytes: 3, // what the walker stat saw before the append + mtimeNs: 1, + } + res := idx.process(stale, make([]byte, hashReadBufferSize)) + if res.err != nil { + t.Fatalf("process: %v", res.err) + } + if res.row.SizeBytes != int64(len(content)) { + t.Fatalf("row size = %d, want %d (the hashed bytes, not the stale walk size %d)", + res.row.SizeBytes, len(content), stale.sizeBytes) + } + want := blake3Of(t, content) + if !bytes.Equal(res.row.Blake3, want) { + t.Fatalf("row digest = %x, want %x", res.row.Blake3, want) + } +} + +// TestReindexStableBytesNoSizeMismatch indexes a file, then re-indexes the +// same untouched bytes. The contents row minted on the first pass carries +// the size of the bytes that were hashed, so the second pass resolves the +// same (digest, size) pair and ApplyIndexBatch does not hit the immutable +// size cross-check that aborts the whole batch. +func TestReindexStableBytesNoSizeMismatch(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "stable.txt"), "stable content that never changes") + + s := setupStore(t) + ctx := context.Background() + if _, err := Index(ctx, s, root, Options{Name: "vol"}); err != nil { + t.Fatalf("first Index: %v", err) + } + rep, err := Index(ctx, s, root, Options{Name: "vol"}) + if err != nil { + t.Fatalf("second Index aborted (size cross-check?): %v", err) + } + if rep.Errors != 0 { + t.Fatalf("re-index reported errors: %+v", rep.ErrorList) + } + if rep.Unchanged != 1 { + t.Fatalf("re-index unchanged = %d, want 1", rep.Unchanged) + } +} diff --git a/offload/durability_soundness_test.go b/offload/durability_soundness_test.go new file mode 100644 index 0000000..dcc5721 --- /dev/null +++ b/offload/durability_soundness_test.go @@ -0,0 +1,409 @@ +package offload + +import ( + "context" + "os" + "path/filepath" + "strings" + "testing" + + "github.com/mbertschler/squirrel/store" +) + +// seedVerifiedComponent records only the vector component (content- +// verified, blake3) for a target, without the freshness push run — +// isolating the freshness condition. +func seedVerifiedComponent(t *testing.T, s *store.Store, volumeID int64, target string, nodeID, run int64) { + t.Helper() + if err := s.UpsertDestinationRunIDVerified(context.Background(), volumeID, target, nodeID, run, store.VerifyMethodBlake3, false); err != nil { + t.Fatalf("UpsertDestinationRunIDVerified(%s): %v", target, err) + } +} + +// TestOffloadFreshnessRefusesReacquiredFile is the headline #115 fix: a +// path deleted and re-acquired after the last whole-volume push is held +// on disk — the origin vector covers its content, but the freshness +// watermark (last successful push in local run space) is behind the run +// in which the path became present again. A fresh push then clears it. +func TestOffloadFreshnessRefusesReacquiredFile(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + // A full verified push at index time: vector covers the content and + // the freshness watermark sits at this push. + seedVector(t, s, v.ID, "t1", self.ID, idx.RunID) + + // The user deletes the file on disk, an index run flips it missing, + // then the file is restored and re-indexed — reviving the row with a + // fresh status_changed_run_id past the last push. The origin vector + // is unchanged (same content, same origin run), so only the freshness + // condition can catch this. + if err := os.Remove(filepath.Join(root, "a.txt")); err != nil { + t.Fatal(err) + } + indexVolume(t, s, root) // flips a.txt -> missing + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + reacquired := indexVolume(t, s, root) // revives a.txt -> present + + row := rowAt(t, s, v.ID, "a.txt") + if !row.StatusChangedRunID.Valid || row.StatusChangedRunID.Int64 != reacquired.RunID { + t.Fatalf("status_changed_run_id = %v, want revive run %d", row.StatusChangedRunID, reacquired.RunID) + } + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "not freshly pushed") { + t.Fatalf("reasons = %v, want one freshness failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + + // A fresh whole-volume push now covers the re-acquired path. + recordPush(t, s, v.ID, "t1") + rep, err = Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload after fresh push: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadFreshnessRefusesUnpushedTarget: a target with a covering +// vector component but neither a local whole-volume push nor pulled +// push-freshness evidence never gates. With no local push the target is +// treated as relayed, and with no freshness evidence the relayed branch +// refuses — the safe direction. This also covers the #103 over-advance +// windows: a row indexed mid-push leaves a target whose freshness can't +// reach it. +func TestOffloadFreshnessRefusesUnpushedTarget(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + // Vector covers the content (e.g. a stale advance slipped through), + // but no push run and no freshness evidence exist. + seedVerifiedComponent(t, s, v.ID, "t1", self.ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "no whole-volume push freshness") { + t.Fatalf("reasons = %v, want a no-freshness-evidence failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) +} + +// seedRelayedFreshness records pulled origin-space push-freshness for a +// relayed target (one the local node never pushes to): the coordinate the +// peer durability pull merges from the pushing node's most recent +// whole-volume push. +func seedRelayedFreshness(t *testing.T, s *store.Store, volumeID int64, target string, nodeID, run int64) { + t.Helper() + if err := s.MergeDestinationPushFreshness(context.Background(), volumeID, target, nodeID, run); err != nil { + t.Fatalf("MergeDestinationPushFreshness(%s): %v", target, err) + } +} + +// TestOffloadPeerRelayedTargetGatesOnPulledFreshness is the anti-wedge +// proof. A peer-relayed target — named in offload_requires but never +// pushed to by this node, so its local whole-volume push watermark is +// always 0 — must NOT wedge into a permanent no-op. It offloads when the +// pulled origin-space push-freshness covers the content's origin run, and +// refuses when the freshness is below it or absent. Without the pulled +// freshness coordinate every file on the offsite tier would fail +// freshness forever — exactly the workflow the feature exists for. +func TestOffloadPeerRelayedTargetGatesOnPulledFreshness(t *testing.T) { + const target = "remote-archive" + + t.Run("passes when freshness covers origin", func(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + // Pulled vector + pulled freshness both cover the content's origin + // run; there is no local push to this target (watermark 0). + seedVerifiedComponent(t, s, v.ID, target, self.ID, idx.RunID) + seedRelayedFreshness(t, s, v.ID, target, self.ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{target}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) + }) + + t.Run("refuses when freshness is below origin", func(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + seedVerifiedComponent(t, s, v.ID, target, self.ID, idx.RunID) + // Freshness predates the content's origin run: the pushing node's + // latest whole-volume push did not cover it. + seedRelayedFreshness(t, s, v.ID, target, self.ID, idx.RunID-1) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{target}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "push freshness") { + t.Fatalf("reasons = %v, want a stale-freshness failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + }) + + t.Run("refuses when no freshness evidence", func(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + // Vector covers the content but no freshness was ever pulled. + seedVerifiedComponent(t, s, v.ID, target, self.ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{target}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "no whole-volume push freshness") { + t.Fatalf("reasons = %v, want a no-freshness-evidence failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + }) +} + +// TestOffloadLocalPushTargetIgnoresRelayedFreshness: a target this node +// pushes to directly still gates on the local-run-space watermark, not on +// pulled push-freshness. A local push watermark behind the path's +// became-present run refuses even when relayed freshness would cover it — +// the local determination wins for a locally-pushed target. +func TestOffloadLocalPushTargetIgnoresRelayedFreshness(t *testing.T) { + const target = "t1" + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + seedVerifiedComponent(t, s, v.ID, target, self.ID, idx.RunID) + // Relayed freshness would cover the origin, but this node pushes to + // the target directly, so the gate uses the local watermark instead. + seedRelayedFreshness(t, s, v.ID, target, self.ID, idx.RunID) + + // A local push exists but it predates the re-acquisition of the path. + recordPush(t, s, v.ID, target) + if err := os.Remove(filepath.Join(root, "a.txt")); err != nil { + t.Fatal(err) + } + indexVolume(t, s, root) + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + indexVolume(t, s, root) // re-acquire past the local push + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{target}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "last whole-volume push run") { + t.Fatalf("reasons = %v, want a local-push freshness failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadPresenceSizeHeldOutUntilFingerprint is the #109 fix: a +// content-addressed component advanced with the presence+size method +// does not gate on its own; once a verified scan-back fingerprint backs +// the object (remote_objects.checksum + verified_at_ns), it does. +func TestOffloadPresenceSizeHeldOutUntilFingerprint(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + // Whole-volume push + freshness watermark satisfied, but the + // component is presence+size only (crypt offsite, no content hash). + if err := s.UpsertDestinationRunIDVerified(ctx, v.ID, "offsite", self.ID, idx.RunID, store.VerifyMethodPresenceSize, false); err != nil { + t.Fatalf("UpsertDestinationRunIDVerified: %v", err) + } + recordPush(t, s, v.ID, "offsite") + + row := rowAt(t, s, v.ID, "a.txt") + if err := s.InsertRemoteObject(ctx, store.RemoteObject{ + ContentID: row.ContentID, + Destination: "offsite", + UploadedRunID: idx.RunID, + }); err != nil { + t.Fatalf("InsertRemoteObject: %v", err) + } + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"offsite"}, + }) + if err != nil { + t.Fatalf("Offload (pending fingerprint): %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "not content-verified") { + t.Fatalf("reasons = %v, want a not-content-verified failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + + // The scan-back pass records a fingerprint and confirms it: now the + // presence+size component gates. + if err := s.SetRemoteObjectChecksum(ctx, row.ContentID, "offsite", "sftp-sha256", "deadbeef"); err != nil { + t.Fatalf("SetRemoteObjectChecksum: %v", err) + } + if err := s.MarkRemoteObjectVerified(ctx, row.ContentID, "offsite", store.NowNs()); err != nil { + t.Fatalf("MarkRemoteObjectVerified: %v", err) + } + + rep, err = Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"offsite"}, + }) + if err != nil { + t.Fatalf("Offload (verified fingerprint): %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadPresenceSizeUnverifiedFingerprintHeldOut: a recorded but +// not-yet-verified fingerprint (checksum present, verified_at_ns NULL) +// is not enough — the gate requires the re-read confirmation, not just +// the upload-time record. +func TestOffloadPresenceSizeUnverifiedFingerprintHeldOut(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + if err := s.UpsertDestinationRunIDVerified(ctx, v.ID, "offsite", self.ID, idx.RunID, store.VerifyMethodPresenceSize, false); err != nil { + t.Fatalf("UpsertDestinationRunIDVerified: %v", err) + } + recordPush(t, s, v.ID, "offsite") + row := rowAt(t, s, v.ID, "a.txt") + if err := s.InsertRemoteObject(ctx, store.RemoteObject{ + ContentID: row.ContentID, + Destination: "offsite", + UploadedRunID: idx.RunID, + }); err != nil { + t.Fatalf("InsertRemoteObject: %v", err) + } + if err := s.SetRemoteObjectChecksum(ctx, row.ContentID, "offsite", "sftp-sha256", "deadbeef"); err != nil { + t.Fatalf("SetRemoteObjectChecksum: %v", err) + } + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"offsite"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeNotDurable) + mustExist(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadContentVerifiedMethodsGate: blake3, peer-blake3, and +// kopia-verify components each gate on their own (no fingerprint needed) +// once the vector and freshness conditions hold — the stricter gate does +// not refuse legitimately content-verified copies. +func TestOffloadContentVerifiedMethodsGate(t *testing.T) { + for _, method := range []string{store.VerifyMethodBlake3, store.VerifyMethodPeer, store.VerifyMethodKopia} { + t.Run(method, func(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + + if err := s.UpsertDestinationRunIDVerified(ctx, v.ID, "t1", self.ID, idx.RunID, method, false); err != nil { + t.Fatalf("UpsertDestinationRunIDVerified: %v", err) + } + recordPush(t, s, v.ID, "t1") + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) + }) + } +} + +// TestOffloadDurableFileStillPasses is the anti-wedge guard: a file with +// a fresh whole-volume push, a content-verified method, and a vector +// that covers its origin still offloads — the stricter gate refuses more +// only where durability is actually in question. +func TestOffloadDurableFileStillPasses(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + writeFile(t, filepath.Join(root, "sub", "b.txt"), "bravo") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + seedVector(t, s, v.ID, "t1", self.ID, idx.RunID) + seedVector(t, s, v.ID, "t2", self.ID, idx.RunID) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1", "t2"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 2 || rep.NotDurable != 0 { + t.Fatalf("report = %+v, want 2 offloaded 0 not-durable", rep) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + oneResult(t, rep, "sub/b.txt", OutcomeOffloaded) +} diff --git a/offload/gate.go b/offload/gate.go new file mode 100644 index 0000000..3cc20c3 --- /dev/null +++ b/offload/gate.go @@ -0,0 +1,249 @@ +package offload + +import ( + "context" + "database/sql" + "errors" + "fmt" + + "github.com/mbertschler/squirrel/store" +) + +// component is one loaded durability-vector entry: the highest origin +// run covered for an origin node, plus the verification method that +// advanced it. The method lets the gate refuse a presence-only +// component that has no content verification behind it. +type component struct { + coveredRun int64 + method string +} + +// gate is the offline durability evidence for one invocation: the self +// node (the coordinate content with NULL origin counts under) and one +// durability vector per required target, loaded once up front, plus two +// freshness sources per target — the last-successful-whole-volume-push +// watermark in local run space (for a target this node pushes to +// directly) and the pulled origin-space push-freshness coordinates (for a +// relayed target this node never pushes to). These locally stored rows — +// including evidence pulled from peers about targets only they can reach +// — are the entire evidence base; the gate makes no network calls. +type gate struct { + store *store.Store + volumeID int64 + self store.Node + require []string + vectors map[string]map[int64]component // target → origin node id → component + lastPush map[string]int64 // target → last whole-volume push run (local space) + freshness map[string]map[int64]int64 // target → origin node id → pulled push-freshness origin run + nodeNames map[int64]string +} + +func loadGate(ctx context.Context, s *store.Store, volumeID int64, require []string) (*gate, error) { + self, err := s.GetSelfNode(ctx) + if err != nil { + return nil, fmt.Errorf("lookup self node: %w", err) + } + g := &gate{ + store: s, + volumeID: volumeID, + self: self, + require: require, + vectors: make(map[string]map[int64]component, len(require)), + lastPush: make(map[string]int64, len(require)), + freshness: make(map[string]map[int64]int64, len(require)), + nodeNames: map[int64]string{self.ID: self.Name}, + } + for _, target := range require { + components, err := s.ListDestinationRunIDs(ctx, volumeID, target) + if err != nil { + return nil, fmt.Errorf("load durability vector for %q: %w", target, err) + } + vector := make(map[int64]component, len(components)) + for _, c := range components { + vector[c.OriginNodeID] = component{coveredRun: c.OriginRunID, method: c.VerifyMethod} + } + g.vectors[target] = vector + + push, err := s.LastSuccessfulWholeVolumePushRunID(ctx, volumeID, target) + if err != nil { + return nil, fmt.Errorf("load last push watermark for %q: %w", target, err) + } + g.lastPush[target] = push + + fresh, err := s.ListDestinationPushFreshness(ctx, volumeID, target) + if err != nil { + return nil, fmt.Errorf("load push freshness for %q: %w", target, err) + } + coords := make(map[int64]int64, len(fresh)) + for _, f := range fresh { + coords[f.OriginNodeID] = f.OriginRunID + } + g.freshness[target] = coords + } + return g, nil +} + +// check evaluates the gate for one present row. Content with origin +// (N, r) is durable on a target only when all three conditions hold: +// +// - origin vector: the target's component for N covers r; +// - freshness: a successful whole-volume push covers the run in which +// the path last became present, so a path re-acquired after the last +// push is held until a fresh push covers it. For a target this node +// pushes to directly the watermark is the last push in local run +// space; for a relayed target it never pushes to, the watermark is +// the pulled origin-space push-freshness coordinate for N; +// - method: that component is content-verified, or — for a presence- +// only content-addressed component — a verified scan-back +// fingerprint backs the gated object. +// +// The file passes only when every required target satisfies all three. +// The returned failures name each failing target and reason; an empty +// slice means the gate passed. +func (g *gate) check(ctx context.Context, row store.FileRow) ([]string, error) { + originNode, originRun, err := g.origin(ctx, row) + if err != nil { + return nil, err + } + var failures []string + for _, target := range g.require { + comp, ok := g.vectors[target][originNode] + switch { + case !ok: + failures = append(failures, + fmt.Sprintf("%s: missing component for origin %s (need %d)", target, g.nodeName(ctx, originNode), originRun)) + continue + case comp.coveredRun < originRun: + failures = append(failures, + fmt.Sprintf("%s: stale: have %d need %d (origin %s)", target, comp.coveredRun, originRun, g.nodeName(ctx, originNode))) + continue + } + if reason := g.freshnessFailure(ctx, target, row, originNode, originRun); reason != "" { + failures = append(failures, reason) + continue + } + verified, err := g.methodVerified(ctx, target, comp, row) + if err != nil { + return nil, err + } + if !verified { + failures = append(failures, + fmt.Sprintf("%s: not content-verified (method %q); a verified fingerprint must back the object before offload", target, displayMethod(comp.method))) + } + } + return failures, nil +} + +// freshnessFailure refuses the target when no successful whole-volume +// push covers the run in which the path last became present, closing the +// re-acquisition hole: a path deleted, re-introduced, and re-indexed must +// not be claimed durable on the strength of an origin-vector component +// alone. +// +// Two coordinate spaces, by whether this node pushes to the target +// directly: +// +// - Local push (lastPush > 0): the watermark is the last successful +// whole-volume push in local run space, compared against the path's +// status_changed_run_id. A row with no recorded status_changed_run_id +// (a pre-v18 row never re-stamped) is treated as "became present at +// first_seen" — the conservative floor. +// - Relayed target (no local push): the watermark is the pulled +// origin-space push-freshness coordinate for the content's origin +// node, compared against the content's origin run. The pushing node +// determines freshness in its own run space and reports the maxima of +// its latest whole-volume push per origin; the gate compares the +// gated content's origin run against it. Absence of freshness +// evidence refuses — a relayed target with no recorded push never +// gates. +func (g *gate) freshnessFailure(ctx context.Context, target string, row store.FileRow, originNode, originRun int64) string { + if g.lastPush[target] > 0 { + changed := row.FirstSeenRunID + if row.StatusChangedRunID.Valid { + changed = row.StatusChangedRunID.Int64 + } + if g.lastPush[target] < changed { + return fmt.Sprintf("%s: not freshly pushed: last whole-volume push run %d < became-present run %d", target, g.lastPush[target], changed) + } + return "" + } + fresh, ok := g.freshness[target][originNode] + if !ok { + return fmt.Sprintf("%s: not freshly pushed: no whole-volume push freshness for origin %s (need %d)", + target, g.nodeName(ctx, originNode), originRun) + } + if fresh < originRun { + return fmt.Sprintf("%s: not freshly pushed: push freshness %d < origin run %d (origin %s)", + target, fresh, originRun, g.nodeName(ctx, originNode)) + } + return "" +} + +// methodVerified reports whether the target's component for this row +// rests on genuine content verification. A blake3 / peer-blake3 / +// kopia-verify component passes directly. A presence+size component (a +// content-addressed offsite, where crypt hides the content hash) passes +// only once a verified scan-back fingerprint backs the gated object: +// remote_objects must carry a checksum and a verified_at_ns for this +// (content, destination). Any other method (including a size+mtime push +// or an unknown/pre-v19 component) does not gate. +func (g *gate) methodVerified(ctx context.Context, target string, comp component, row store.FileRow) (bool, error) { + if store.ContentVerifiedMethod(comp.method) { + return true, nil + } + if comp.method != store.VerifyMethodPresenceSize { + return false, nil + } + obj, err := g.store.GetRemoteObject(ctx, row.ContentID, target) + if errors.Is(err, sql.ErrNoRows) { + return false, nil + } + if err != nil { + return false, fmt.Errorf("load fingerprint for content %d on %q: %w", row.ContentID, target, err) + } + return obj.Checksum.Valid && obj.Checksum.String != "" && obj.VerifiedAtNs.Valid, nil +} + +// displayMethod renders a possibly-empty method for a failure message. +func displayMethod(method string) string { + if method == "" { + return "unknown" + } + return method +} + +// origin resolves the row's content to its origin coordinate (node, +// run). Content with a recorded origin uses it verbatim; content with +// NULL (or partially NULL) origin is locally introduced and counts +// under the self node at its introduction run — the content's earliest +// first_seen_run_id in the volume, the same coordinate +// AdvanceDestinationVector and the peer-sync sender use, so the gate +// compares against exactly what the vectors were advanced with. +func (g *gate) origin(ctx context.Context, row store.FileRow) (int64, int64, error) { + if row.OriginNodeID.Valid && row.OriginRunID.Valid { + return row.OriginNodeID.Int64, row.OriginRunID.Int64, nil + } + intro, err := g.store.ContentIntroductionRunID(ctx, g.volumeID, row.ContentID) + if err != nil { + return 0, 0, fmt.Errorf("introduction run for content %d: %w", row.ContentID, err) + } + return g.self.ID, intro, nil +} + +// nodeName resolves an origin node id to its name for the failure +// messages, cached per invocation. A lookup failure degrades to the +// numeric id — the gate decision is already made, naming is cosmetic. +func (g *gate) nodeName(ctx context.Context, nodeID int64) string { + if name, ok := g.nodeNames[nodeID]; ok { + return name + } + name := fmt.Sprintf("node-%d", nodeID) + node, err := g.store.GetNodeByID(ctx, nodeID) + if err == nil { + name = node.Name + } else if !errors.Is(err, sql.ErrNoRows) { + return name + } + g.nodeNames[nodeID] = name + return name +} diff --git a/offload/offload.go b/offload/offload.go new file mode 100644 index 0000000..129e9e1 --- /dev/null +++ b/offload/offload.go @@ -0,0 +1,396 @@ +// Package offload deletes local file bytes whose content is provably +// durable on every target the volume's offload policy requires. It is +// the only place squirrel ever deletes user data: the gate is evaluated +// entirely offline against the local index (the durability version +// vectors in destination_run_ids, including components pulled from +// peers), every candidate is re-verified against the on-disk bytes +// immediately before the unlink, and the operation is recorded as a +// kind='offload' run with each touched row flipped present → offloaded. +package offload + +import ( + "context" + "errors" + "fmt" + "os" + "path" + "path/filepath" + "sort" + "strings" + "time" + + "github.com/mbertschler/squirrel/store" + "github.com/mbertschler/squirrel/volmark" +) + +// Options shapes one Offload invocation. +type Options struct { + // Name is the config-declared volume name. + Name string + // Paths are volume-relative path or prefix selectors. A selector + // matches the file at exactly that path plus every file under it as + // a directory prefix; "." selects the whole volume. Multiple + // selectors are ORed. + Paths []string + // OlderThan, when positive, narrows the selection to files whose + // indexed mtime is older than now − OlderThan. ANDed with Paths. + OlderThan time.Duration + // Require is the volume's offload policy: the target names whose + // durability vectors must each cover a file's content origin before + // its bytes may be deleted. Offload refuses to run when empty — the + // policy is an explicit precondition, there is no default target + // set. + Require []string + // DryRun evaluates and reports the per-file gate decisions from the + // index alone: no runs row, no file reads, no deletions, no status + // flips. Disk-drift checks only happen on a real run, immediately + // before each unlink. + DryRun bool +} + +// Outcome classifies one file's result. +type Outcome int + +const ( + // OutcomeOffloaded: the gate passed, the on-disk bytes matched the + // index exactly, and the file was unlinked and recorded (in dry-run + // mode: the gate passed and the file would be offloaded). + OutcomeOffloaded Outcome = iota + // OutcomeNotDurable: at least one required target's vector fails to + // cover the file's content origin; nothing was touched. + OutcomeNotDurable + // OutcomeDrift: the on-disk state disagrees with the indexed row + // (presence, type, size, mtime, or content hash) — the disk is + // newer than the index, so the file was skipped. Re-index, re-sync, + // and re-run to offload it. + OutcomeDrift + // OutcomeError: an operational failure (open, unlink, or the status + // flip) left this file unprocessed; details in Reasons. + OutcomeError +) + +// FileResult is one per-file decision; Results are reported in path +// order. +type FileResult struct { + Path string + Outcome Outcome + // Reasons carries the per-target gate failures for + // OutcomeNotDurable (one entry per failing target) and the single + // drift or error detail otherwise. Empty for OutcomeOffloaded. + Reasons []string +} + +// Report summarises one Offload invocation. Offloaded + NotDurable + +// Drift + Errors equals len(Results). +type Report struct { + // RunID is the kind='offload' runs row recorded for this + // invocation. Zero in dry-run mode. + RunID int64 + Offloaded int + NotDurable int + Drift int + Errors int + Results []FileResult + // SelectorMisses lists path selectors that matched no present file + // — usually a typo'd path — so a no-op invocation explains itself. + SelectorMisses []string + // FinishErr is set when the terminal runs-row write failed; the + // per-file work already happened and stands. + FinishErr error +} + +func (r *Report) record(res FileResult) { + switch res.Outcome { + case OutcomeOffloaded: + r.Offloaded++ + case OutcomeNotDurable: + r.NotDurable++ + case OutcomeDrift: + r.Drift++ + case OutcomeError: + r.Errors++ + } + r.Results = append(r.Results, res) +} + +// Offload runs the durability-gated deletion against the volume rooted +// at root. Per-file refusals (gate failures, disk drift) are reported +// on the Report and never abort the run; per-file operational errors +// are likewise reported and counted, leaving the runs row 'partial'. A +// returned error is fatal — preconditions failed or the run had to stop +// — and finalises the runs row as 'failed' with whatever per-file +// progress the Report carries. +func Offload(ctx context.Context, s *store.Store, root string, opts Options) (report Report, err error) { + selectors, err := validateOptions(opts) + if err != nil { + return Report{}, err + } + vol, err := resolveVolume(ctx, s, opts.Name, root) + if err != nil { + return Report{}, err + } + g, err := loadGate(ctx, s, vol.ID, opts.Require) + if err != nil { + return Report{}, err + } + rows, err := s.LoadVolumeIndex(ctx, vol.ID) + if err != nil { + return Report{}, err + } + candidates, misses := selectCandidates(rows, selectors, opts.OlderThan) + report.SelectorMisses = misses + + if opts.DryRun { + err := evaluateOnly(ctx, g, candidates, &report) + return report, err + } + + runID, err := beginRun(ctx, s, vol.ID, opts.Name) + if err != nil { + return report, err + } + report.RunID = runID + defer func() { finishRun(ctx, s, runID, &report, err) }() + + err = offloadFiles(ctx, s, g, root, vol.ID, runID, candidates, &report) + return report, err +} + +// validateOptions enforces the two invocation preconditions — an +// explicit policy and an explicit selector — and normalises the path +// selectors. +func validateOptions(opts Options) ([]string, error) { + if len(opts.Require) == 0 { + return nil, fmt.Errorf("volume %q declares no offload policy; offload refuses to delete without an explicit list of required targets (offload_requires)", opts.Name) + } + if opts.OlderThan < 0 { + return nil, fmt.Errorf("--older-than %s is negative; the age cutoff must be a positive duration", opts.OlderThan) + } + if len(opts.Paths) == 0 && opts.OlderThan == 0 { + return nil, errors.New(`offload needs a selector: volume-relative paths/prefixes ("." for the whole volume) and/or an --older-than age`) + } + return cleanSelectors(opts.Paths) +} + +// cleanSelectors normalises the volume-relative selectors and refuses +// anything that could reach outside the volume root. +func cleanSelectors(paths []string) ([]string, error) { + out := make([]string, 0, len(paths)) + for _, p := range paths { + if p == "" { + return nil, errors.New(`empty path selector (use "." to select the whole volume)`) + } + if filepath.IsAbs(p) { + return nil, fmt.Errorf("selector %q must be volume-relative", p) + } + c := path.Clean(filepath.ToSlash(p)) + if c == ".." || strings.HasPrefix(c, "../") { + return nil, fmt.Errorf("selector %q escapes the volume root", p) + } + if c == "." && strings.Contains(p, "..") { + return nil, fmt.Errorf(`selector %q collapses to the whole volume; spell out "." to select everything`, p) + } + out = append(out, c) + } + return out, nil +} + +// resolveVolume looks up the volume row by its config-declared name and +// cross-checks both identities the deletion is about to trust: the DB +// row's recorded path must equal the root the caller resolved from +// config, and the on-disk .squirrel-volume marker must name this +// volume. Either mismatch means config, index, and disk disagree about +// what tree this is — exactly the state in which a delete must refuse. +func resolveVolume(ctx context.Context, s *store.Store, name, root string) (store.Volume, error) { + v, err := s.GetVolumeByName(ctx, name) + if err != nil { + if store.IsNotFound(err) { + return store.Volume{}, fmt.Errorf("volume %q has no index rows; index it before offloading", name) + } + return store.Volume{}, fmt.Errorf("lookup volume %q: %w", name, err) + } + if v.Path != root { + return store.Volume{}, fmt.Errorf("volume %q is at %q in the DB but config says %q — resolve the conflict before offloading", name, v.Path, root) + } + if err := volmark.Validate(root, name); err != nil { + return store.Volume{}, fmt.Errorf("volume marker at %s: %w", root, err) + } + return v, nil +} + +// reservedSubtrees are the squirrel-owned preservation directories. +// They are excluded from sync transfers and from +// AdvanceDestinationVector's present set, so a row under them can carry +// an origin coordinate the vectors cover through *other* paths' content +// even though these bytes never travelled — the gate math is unsound +// for them, and they hold preserved history besides. Offload never +// selects them. +var reservedSubtrees = []string{ + ".squirrel-history", + ".squirrel-conflicts", + ".squirrel-restore-history", + ".squirrel-index", +} + +func underReservedSubtree(p string) bool { + for _, r := range reservedSubtrees { + if p == r || strings.HasPrefix(p, r+"/") { + return true + } + } + return false +} + +// selectCandidates filters the volume's live rows down to the offload +// candidates: status 'present', outside the reserved subtrees, matching at +// least one path selector (none means every path), and — when olderThan +// is set — with an indexed mtime older than the cutoff. The age cutoff +// is applied after selector-hit tracking so a selector that only +// matched younger files still counts as matched. Candidates come back +// in path order for deterministic reports. +func selectCandidates(rows map[string]store.FileRow, selectors []string, olderThan time.Duration) ([]store.FileRow, []string) { + ageFiltered := olderThan > 0 + var cutoffNs int64 + if ageFiltered { + cutoffNs = time.Now().Add(-olderThan).UnixNano() + } + hit := make(map[string]bool, len(selectors)) + var out []store.FileRow + for p, row := range rows { + if row.Status != store.StatusPresent || underReservedSubtree(p) { + continue + } + if len(selectors) > 0 { + sel, ok := matchSelector(p, selectors) + if !ok { + continue + } + hit[sel] = true + } + if ageFiltered && row.MtimeNs >= cutoffNs { + continue + } + out = append(out, row) + } + sort.Slice(out, func(i, j int) bool { return out[i].Path < out[j].Path }) + var misses []string + for _, sel := range selectors { + if !hit[sel] { + misses = append(misses, sel) + } + } + return out, misses +} + +// matchSelector returns the first selector that matches p: "." matches +// everything, an exact path matches itself, and any selector matches +// the files under it as a directory prefix. +func matchSelector(p string, selectors []string) (string, bool) { + for _, sel := range selectors { + if sel == "." || p == sel || strings.HasPrefix(p, sel+"/") { + return sel, true + } + } + return "", false +} + +// evaluateOnly is the dry-run body: gate decisions straight from the +// index. +func evaluateOnly(ctx context.Context, g *gate, candidates []store.FileRow, report *Report) error { + for _, row := range candidates { + if err := ctx.Err(); err != nil { + return err + } + failures, err := g.check(ctx, row) + if err != nil { + return err + } + if len(failures) > 0 { + report.record(FileResult{Path: row.Path, Outcome: OutcomeNotDurable, Reasons: failures}) + continue + } + report.record(FileResult{Path: row.Path, Outcome: OutcomeOffloaded}) + } + return nil +} + +func beginRun(ctx context.Context, s *store.Store, volumeID int64, volumeName string) (int64, error) { + runID, blocker, err := s.BeginOffloadRunIfClear(ctx, volumeID) + if err != nil { + return 0, fmt.Errorf("begin offload run: %w", err) + } + if blocker != nil { + return 0, fmt.Errorf("offload of %s refused: %s run %d is already running (started %s)", + volumeName, blocker.Kind, blocker.ID, + time.Unix(0, blocker.StartedAtNs).UTC().Format(time.RFC3339)) + } + return runID, nil +} + +// finishRun finalises the kind='offload' runs row: 'failed' carrying +// the fatal error, 'partial' when any per-file operation errored, and +// 'success' otherwise (skipped files are decisions, so a run that only +// skipped is still a success). file_count records how many files were +// actually offloaded. A failed terminal write is surfaced on the report +// — the per-file flips already committed individually and stand. +func finishRun(ctx context.Context, s *store.Store, runID int64, report *Report, fatalErr error) { + status, errMsg := store.RunStatusSuccess, "" + switch { + case fatalErr != nil: + status, errMsg = store.RunStatusFailed, fatalErr.Error() + case report.Errors > 0: + status = store.RunStatusPartial + } + if err := s.FinishRun(ctx, runID, status, errMsg, int64(report.Offloaded)); err != nil { + report.FinishErr = fmt.Errorf("finish offload run %d: %w", runID, err) + } +} + +// offloadFiles is the real-run body. Each candidate independently +// passes the gate, survives the pre-unlink verification, is unlinked, +// and only then has its row flipped present → offloaded with +// last_seen_run_id = the offload run. The unlink-then-flip order means +// an 'offloaded' row is always the record of a deletion that actually +// happened; a crash in the window between the two leaves a 'present' +// row whose file is gone, which the next index run surfaces as +// 'missing' — a loud, truthful drift signal for an unrecorded but +// durability-gated deletion. +func offloadFiles(ctx context.Context, s *store.Store, g *gate, root string, volumeID, runID int64, candidates []store.FileRow, report *Report) error { + dir, err := os.OpenRoot(root) + if err != nil { + return fmt.Errorf("open volume root: %w", err) + } + defer func() { _ = dir.Close() }() + + buf := make([]byte, hashReadBufferSize) + for _, row := range candidates { + if err := ctx.Err(); err != nil { + return err + } + failures, err := g.check(ctx, row) + if err != nil { + return fmt.Errorf("gate %s: %w", row.Path, err) + } + if len(failures) > 0 { + report.record(FileResult{Path: row.Path, Outcome: OutcomeNotDurable, Reasons: failures}) + continue + } + drift, opErr := verifyAndRemove(dir, row, buf) + switch { + case opErr != nil: + report.record(FileResult{Path: row.Path, Outcome: OutcomeError, Reasons: []string{opErr.Error()}}) + continue + case drift != "": + report.record(FileResult{Path: row.Path, Outcome: OutcomeDrift, Reasons: []string{drift}}) + continue + } + if err := s.MarkOffloaded(ctx, volumeID, row.Path, row.ContentID, runID); err != nil { + report.record(FileResult{Path: row.Path, Outcome: OutcomeError, Reasons: []string{ + fmt.Sprintf("bytes removed but the status flip failed — the next index run will report the path as missing: %v", err), + }}) + continue + } + report.record(FileResult{Path: row.Path, Outcome: OutcomeOffloaded}) + } + return nil +} diff --git a/offload/offload_test.go b/offload/offload_test.go new file mode 100644 index 0000000..254f1c5 --- /dev/null +++ b/offload/offload_test.go @@ -0,0 +1,733 @@ +package offload + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + "testing" + "time" + + "github.com/zeebo/blake3" + + "github.com/mbertschler/squirrel/index" + "github.com/mbertschler/squirrel/store" + "github.com/mbertschler/squirrel/volmark" +) + +func setupStore(t *testing.T) *store.Store { + t.Helper() + s, err := store.Open(filepath.Join(t.TempDir(), "test.db")) + if err != nil { + t.Fatalf("store.Open: %v", err) + } + t.Cleanup(func() { s.Close() }) + return s +} + +func writeFile(t *testing.T, path, content string) { + t.Helper() + if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { + t.Fatalf("mkdir: %v", err) + } + if err := os.WriteFile(path, []byte(content), 0o644); err != nil { + t.Fatalf("write %s: %v", path, err) + } +} + +// volName is the single volume name every test in this package indexes +// and offloads under. +const volName = "vol" + +func indexVolume(t *testing.T, s *store.Store, root string) index.Report { + t.Helper() + rep, err := index.Index(context.Background(), s, root, index.Options{Name: volName, Workers: 2}) + if err != nil { + t.Fatalf("Index %s: %v", root, err) + } + if rep.Errors > 0 { + t.Fatalf("index errors: %v", rep.ErrorList) + } + return rep +} + +func testVolume(t *testing.T, s *store.Store) store.Volume { + t.Helper() + v, err := s.GetVolumeByName(context.Background(), volName) + if err != nil { + t.Fatalf("GetVolumeByName(%q): %v", volName, err) + } + return v +} + +func selfNode(t *testing.T, s *store.Store) store.Node { + t.Helper() + n, err := s.GetSelfNode(context.Background()) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + return n +} + +// seedVector records durability evidence the same way a verified +// whole-volume push does: a content-verified (blake3) vector component +// plus a successful kind='sync' run for the target, so the gate's +// origin-vector, freshness, and method conditions all clear. Tests that +// want to exercise a single failing condition seed the rest through this +// and break the one under test explicitly. +func seedVector(t *testing.T, s *store.Store, volumeID int64, target string, nodeID, run int64) { + t.Helper() + if err := s.UpsertDestinationRunIDVerified(context.Background(), volumeID, target, nodeID, run, store.VerifyMethodBlake3, false); err != nil { + t.Fatalf("UpsertDestinationRunID(%s): %v", target, err) + } + recordPush(t, s, volumeID, target) +} + +// recordPush records a successful whole-volume push run for (volume, +// target), advancing the freshness watermark the gate reads to "now" +// (the latest run id), past every present row's status_changed_run_id. +func recordPush(t *testing.T, s *store.Store, volumeID int64, target string) int64 { + t.Helper() + id, blocker, err := s.BeginSyncRunIfClear(context.Background(), store.SyncRunSpec{ + VolumeID: volumeID, + Destination: target, + }) + if err != nil || blocker != nil { + t.Fatalf("BeginSyncRunIfClear(%s): err=%v blocker=%+v", target, err, blocker) + } + if err := s.FinishRun(context.Background(), id, store.RunStatusSuccess, "", 0); err != nil { + t.Fatalf("FinishRun(%s): %v", target, err) + } + return id +} + +func rowAt(t *testing.T, s *store.Store, volumeID int64, relPath string) store.FileRow { + t.Helper() + r, err := s.GetByPath(context.Background(), volumeID, relPath) + if err != nil { + t.Fatalf("GetByPath(%s): %v", relPath, err) + } + return r +} + +func mustExist(t *testing.T, path string) { + t.Helper() + if _, err := os.Lstat(path); err != nil { + t.Fatalf("expected %s on disk: %v", path, err) + } +} + +func mustBeGone(t *testing.T, path string) { + t.Helper() + if _, err := os.Lstat(path); err == nil { + t.Fatalf("expected %s to be deleted", path) + } +} + +func countRuns(t *testing.T, s *store.Store) int { + t.Helper() + runs, err := s.ListRuns(context.Background(), store.ListRunsOpts{}) + if err != nil { + t.Fatalf("ListRuns: %v", err) + } + return len(runs) +} + +// oneResult asserts the report carries exactly one result for relPath +// with the given outcome and returns it. +func oneResult(t *testing.T, rep Report, relPath string, outcome Outcome) FileResult { + t.Helper() + for _, r := range rep.Results { + if r.Path == relPath { + if r.Outcome != outcome { + t.Fatalf("%s outcome = %d (%v), want %d", relPath, r.Outcome, r.Reasons, outcome) + } + return r + } + } + t.Fatalf("no result for %s in %+v", relPath, rep.Results) + return FileResult{} +} + +// TestOffloadHappyPath: with every required target's vector covering +// the volume's content, all selected files are unlinked and their rows +// flipped present → offloaded with last_seen_run_id stamped to the one +// kind='offload' run that wraps the invocation. +func TestOffloadHappyPath(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + writeFile(t, filepath.Join(root, "sub", "b.txt"), "bravo") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + seedVector(t, s, v.ID, "t1", self.ID, idx.RunID) + seedVector(t, s, v.ID, "t2", self.ID, idx.RunID) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1", "t2"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 2 || rep.NotDurable != 0 || rep.Drift != 0 || rep.Errors != 0 { + t.Fatalf("report = %+v", rep) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + oneResult(t, rep, "sub/b.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) + mustBeGone(t, filepath.Join(root, "sub", "b.txt")) + + for _, p := range []string{"a.txt", "sub/b.txt"} { + row := rowAt(t, s, v.ID, p) + if row.Status != store.StatusOffloaded { + t.Fatalf("%s status = %q, want offloaded", p, row.Status) + } + if row.LastSeenRunID != rep.RunID { + t.Fatalf("%s last_seen_run_id = %d, want offload run %d", p, row.LastSeenRunID, rep.RunID) + } + if row.FirstSeenRunID != idx.RunID { + t.Fatalf("%s first_seen_run_id = %d, want index run %d", p, row.FirstSeenRunID, idx.RunID) + } + } + + run, err := s.GetRun(ctx, rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Kind != store.RunKindOffload || run.Destination.Valid || + run.Status != store.RunStatusSuccess || run.FileCount != 2 { + t.Fatalf("offload run = %+v, want kind=offload destination=NULL status=success file_count=2", run) + } +} + +// TestOffloadGateMissingComponent: a required target with no vector +// component for the content's origin keeps the file on disk, with the +// failure reported per target. +func TestOffloadGateMissingComponent(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + seedVector(t, s, v.ID, "t1", self.ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1", "t2"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 0 || rep.NotDurable != 1 || rep.Errors != 0 { + t.Fatalf("report = %+v", rep) + } + res := oneResult(t, rep, "a.txt", OutcomeNotDurable) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], "t2: missing component for origin "+self.Name) { + t.Fatalf("reasons = %v, want one t2 missing-component failure", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + if row := rowAt(t, s, v.ID, "a.txt"); row.Status != store.StatusPresent { + t.Fatalf("status = %q, want present", row.Status) + } + + run, err := s.GetRun(context.Background(), rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusSuccess || run.FileCount != 0 { + t.Fatalf("skip-only run = %+v, want status=success file_count=0", run) + } +} + +// TestOffloadGateStaleComponent: content introduced after the target's +// recorded watermark is refused with the have/need pair, while content +// the watermark covers offloads in the same invocation. +func TestOffloadGateStaleComponent(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + first := indexVolume(t, s, root) + v := testVolume(t, s) + self := selfNode(t, s) + seedVector(t, s, v.ID, "t1", self.ID, first.RunID) + + writeFile(t, filepath.Join(root, "b.txt"), "bravo") + second := indexVolume(t, s, root) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + res := oneResult(t, rep, "b.txt", OutcomeNotDurable) + want := fmt.Sprintf("t1: stale: have %d need %d", first.RunID, second.RunID) + if len(res.Reasons) != 1 || !strings.Contains(res.Reasons[0], want) { + t.Fatalf("reasons = %v, want %q", res.Reasons, want) + } + mustBeGone(t, filepath.Join(root, "a.txt")) + mustExist(t, filepath.Join(root, "b.txt")) +} + +// TestOffloadPeerOriginContent: content carrying a recorded origin +// (node, run) is gated against that origin's vector component — the +// self component is irrelevant for it. +func TestOffloadPeerOriginContent(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "seed.txt"), "seed") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + peer, err := s.CreateNode(ctx, "peer1", "http://peer1.internal") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + + for _, f := range []struct { + name string + content string + originRun int64 + }{ + {"covered.txt", "from peer covered", 7}, + {"ahead.txt", "from peer ahead", 9}, + } { + p := filepath.Join(root, f.name) + writeFile(t, p, f.content) + fi, err := os.Stat(p) + if err != nil { + t.Fatal(err) + } + sum := blake3.Sum256([]byte(f.content)) + err = s.Upsert(ctx, store.FileRow{ + VolumeID: v.ID, Path: f.name, Blake3: sum[:], SizeBytes: fi.Size(), + MtimeNs: fi.ModTime().UnixNano(), Status: store.StatusPresent, + FirstSeenRunID: idx.RunID, LastSeenRunID: idx.RunID, IndexedAtNs: store.NowNs(), + }, &store.Provenance{NodeID: peer.ID, RunID: f.originRun}) + if err != nil { + t.Fatalf("Upsert %s: %v", f.name, err) + } + } + seedVector(t, s, v.ID, "t1", peer.ID, 7) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"covered.txt", "ahead.txt"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "covered.txt", OutcomeOffloaded) + res := oneResult(t, rep, "ahead.txt", OutcomeNotDurable) + if !strings.Contains(res.Reasons[0], "t1: stale: have 7 need 9 (origin peer1)") { + t.Fatalf("reasons = %v, want stale failure naming origin peer1", res.Reasons) + } + mustBeGone(t, filepath.Join(root, "covered.txt")) + mustExist(t, filepath.Join(root, "ahead.txt")) +} + +// TestOffloadPeerPulledEvidenceTarget: the policy may require a target +// whose vector component arrives via the peer durability pull, landing +// in destination_run_ids under the target name. The gate consumes those +// rows like any other — here the target was also pushed to locally +// (seedVector records the push), so its freshness watermark is current. +// The peer-relayed-only case (no local push, gated on pulled freshness) +// is covered by TestOffloadPeerRelayedTargetGatesOnPulledFreshness. +func TestOffloadPeerPulledEvidenceTarget(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "remote-archive", selfNode(t, s).ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"a.txt"}, Require: []string{"remote-archive"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustBeGone(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadEmptyPolicyRefused: without an explicit offload policy the +// command refuses before touching the index, the disk, or the runs +// table. +func TestOffloadEmptyPolicyRefused(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + indexVolume(t, s, root) + runsBefore := countRuns(t, s) + + _, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, + }) + if err == nil || !strings.Contains(err.Error(), "no offload policy") { + t.Fatalf("err = %v, want empty-policy refusal", err) + } + mustExist(t, filepath.Join(root, "a.txt")) + if got := countRuns(t, s); got != runsBefore { + t.Fatalf("runs count = %d, want %d (refusal must record nothing)", got, runsBefore) + } +} + +// TestOffloadRequiresSelector: a bare invocation with neither paths nor +// an age bound is refused — selecting the whole volume takes an +// explicit ".". +func TestOffloadRequiresSelector(t *testing.T) { + s := setupStore(t) + _, err := Offload(context.Background(), s, t.TempDir(), Options{ + Name: volName, Require: []string{"t1"}, + }) + if err == nil || !strings.Contains(err.Error(), "needs a selector") { + t.Fatalf("err = %v, want selector refusal", err) + } +} + +// TestOffloadSelectorSemantics: an exact file path selects one file, a +// directory prefix selects its subtree on path-segment boundaries, and +// unmatched siblings stay untouched. +func TestOffloadSelectorSemantics(t *testing.T) { + root := t.TempDir() + files := []string{"Photos/2019/a.jpg", "Photos/2019-extra/b.jpg", "Photos/2020/c.jpg", "docs/d.txt"} + for i, f := range files { + writeFile(t, filepath.Join(root, filepath.FromSlash(f)), fmt.Sprintf("content-%d", i)) + } + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + opts := Options{Name: volName, Require: []string{"t1"}} + + opts.Paths = []string{"Photos/2019", "docs/d.txt"} + rep, err := Offload(context.Background(), s, root, opts) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 2 || len(rep.Results) != 2 { + t.Fatalf("report = %+v, want exactly Photos/2019/a.jpg and docs/d.txt", rep) + } + oneResult(t, rep, "Photos/2019/a.jpg", OutcomeOffloaded) + oneResult(t, rep, "docs/d.txt", OutcomeOffloaded) + mustExist(t, filepath.Join(root, "Photos", "2019-extra", "b.jpg")) + mustExist(t, filepath.Join(root, "Photos", "2020", "c.jpg")) +} + +// TestOffloadOlderThanSelector: the age bound filters on the indexed +// mtime and ANDs with path selectors. +func TestOffloadOlderThanSelector(t *testing.T) { + root := t.TempDir() + old := time.Now().Add(-48 * time.Hour) + writeFile(t, filepath.Join(root, "sub", "old.txt"), "old in sub") + writeFile(t, filepath.Join(root, "sub", "new.txt"), "new in sub") + writeFile(t, filepath.Join(root, "top-old.txt"), "old at top") + for _, p := range []string{filepath.Join(root, "sub", "old.txt"), filepath.Join(root, "top-old.txt")} { + if err := os.Chtimes(p, old, old); err != nil { + t.Fatalf("chtimes: %v", err) + } + } + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"sub"}, OlderThan: 24 * time.Hour, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 1 || len(rep.Results) != 1 { + t.Fatalf("report = %+v, want exactly sub/old.txt", rep) + } + oneResult(t, rep, "sub/old.txt", OutcomeOffloaded) + mustExist(t, filepath.Join(root, "sub", "new.txt")) + mustExist(t, filepath.Join(root, "top-old.txt")) + + rep, err = Offload(context.Background(), s, root, Options{ + Name: volName, OlderThan: 24 * time.Hour, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload age-only: %v", err) + } + if rep.Offloaded != 1 { + t.Fatalf("age-only report = %+v, want exactly top-old.txt", rep) + } + oneResult(t, rep, "top-old.txt", OutcomeOffloaded) + mustExist(t, filepath.Join(root, "sub", "new.txt")) + + if _, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"sub"}, OlderThan: -time.Hour, Require: []string{"t1"}, + }); err == nil || !strings.Contains(err.Error(), "negative") { + t.Fatalf("negative --older-than err = %v, want refusal naming the negative duration", err) + } +} + +// TestOffloadSelectorMissReported: a selector matching nothing is +// surfaced instead of silently producing an empty run. +func TestOffloadSelectorMissReported(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + indexVolume(t, s, root) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"nope"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if len(rep.Results) != 0 { + t.Fatalf("results = %+v, want none", rep.Results) + } + if len(rep.SelectorMisses) != 1 || rep.SelectorMisses[0] != "nope" { + t.Fatalf("SelectorMisses = %v, want [nope]", rep.SelectorMisses) + } +} + +// TestOffloadDryRunTouchesNothing: dry-run reports the gate decision +// per file and leaves the disk, the rows, and the runs table exactly as +// they were. +func TestOffloadDryRunTouchesNothing(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + first := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, first.RunID) + writeFile(t, filepath.Join(root, "b.txt"), "bravo") + indexVolume(t, s, root) + runsBefore := countRuns(t, s) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, DryRun: true, + }) + if err != nil { + t.Fatalf("Offload dry-run: %v", err) + } + if rep.RunID != 0 { + t.Fatalf("dry-run RunID = %d, want 0", rep.RunID) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + oneResult(t, rep, "b.txt", OutcomeNotDurable) + mustExist(t, filepath.Join(root, "a.txt")) + mustExist(t, filepath.Join(root, "b.txt")) + for _, p := range []string{"a.txt", "b.txt"} { + if row := rowAt(t, s, v.ID, p); row.Status != store.StatusPresent { + t.Fatalf("%s status = %q after dry-run, want present", p, row.Status) + } + } + if got := countRuns(t, s); got != runsBefore { + t.Fatalf("runs count = %d, want %d (dry-run must record nothing)", got, runsBefore) + } +} + +// TestOffloadIndexerIntegration: a follow-up index run treats the +// offloaded row's on-disk absence as expected, and re-acquiring the +// bytes flips it back to present with first_seen preserved. +func TestOffloadIndexerIntegration(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + writeFile(t, filepath.Join(root, "keep.txt"), "kept") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"a.txt"}, Require: []string{"t1"}, + }) + if err != nil || rep.Offloaded != 1 { + t.Fatalf("Offload: %v, report %+v", err, rep) + } + + after := indexVolume(t, s, root) + if after.Missing != 0 { + t.Fatalf("re-index Missing = %d, want 0 (offloaded absence is expected)", after.Missing) + } + if row := rowAt(t, s, v.ID, "a.txt"); row.Status != store.StatusOffloaded { + t.Fatalf("status after re-index = %q, want offloaded", row.Status) + } + + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + indexVolume(t, s, root) + row := rowAt(t, s, v.ID, "a.txt") + if row.Status != store.StatusPresent { + t.Fatalf("status after re-acquire = %q, want present", row.Status) + } + if row.FirstSeenRunID != idx.RunID { + t.Fatalf("first_seen_run_id = %d, want %d (re-acquire must not rewrite it)", row.FirstSeenRunID, idx.RunID) + } +} + +// TestOffloadMissingRowsNeverConsidered: a 'missing' row is no longer +// on disk to delete and must stay out of the candidate set entirely. +func TestOffloadMissingRowsNeverConsidered(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "gone.txt"), "gone") + writeFile(t, filepath.Join(root, "here.txt"), "here") + s := setupStore(t) + indexVolume(t, s, root) + if err := os.Remove(filepath.Join(root, "gone.txt")); err != nil { + t.Fatal(err) + } + second := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, second.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if len(rep.Results) != 1 || rep.Offloaded != 1 { + t.Fatalf("report = %+v, want exactly here.txt offloaded", rep) + } + oneResult(t, rep, "here.txt", OutcomeOffloaded) + if row := rowAt(t, s, v.ID, "gone.txt"); row.Status != store.StatusMissing { + t.Fatalf("gone.txt status = %q, want missing (untouched)", row.Status) + } +} + +// TestOffloadSupersededContentNeverGates: when a path's content +// changed since the watermark, the live row is gated on the *new* +// content's origin — the superseded predecessor's covered coordinate +// must not let the new bytes be deleted. +func TestOffloadSupersededContentNeverGates(t *testing.T) { + root := t.TempDir() + p := filepath.Join(root, "a.txt") + writeFile(t, p, "version one") + s := setupStore(t) + first := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, first.RunID) + + writeFile(t, p, "version two") + indexVolume(t, s, root) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"a.txt"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + oneResult(t, rep, "a.txt", OutcomeNotDurable) + mustExist(t, p) + + history, err := s.ListHistoryByPath(context.Background(), v.ID, "a.txt") + if err != nil { + t.Fatalf("ListHistoryByPath: %v", err) + } + if len(history) != 2 || history[0].Status != store.StatusSuperseded { + t.Fatalf("history = %+v, want superseded v1 + present v2", history) + } +} + +// TestOffloadReservedSubtreesExcluded: squirrel-owned preservation +// directories are never offload candidates even when a selector covers +// them and the gate would pass. +func TestOffloadReservedSubtreesExcluded(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, ".squirrel-history", "run-1", "x.bin"), "preserved") + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if len(rep.Results) != 1 || rep.Offloaded != 1 { + t.Fatalf("report = %+v, want exactly a.txt", rep) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + mustExist(t, filepath.Join(root, ".squirrel-history", "run-1", "x.bin")) +} + +// TestOffloadRefusedWhileRunInFlight: offload defers to any running run +// on the volume. +func TestOffloadRefusedWhileRunInFlight(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + if _, err := s.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false); err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + + _, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err == nil || !strings.Contains(err.Error(), "already running") { + t.Fatalf("err = %v, want already-running refusal", err) + } + mustExist(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadIdentityMismatchRefused: a DB/config path disagreement or +// a volume marker naming another volume aborts before any deletion. +func TestOffloadIdentityMismatchRefused(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + indexVolume(t, s, root) + + _, err := Offload(context.Background(), s, t.TempDir(), Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err == nil || !strings.Contains(err.Error(), "resolve the conflict") { + t.Fatalf("err = %v, want DB/config path mismatch refusal", err) + } + + if err := volmark.Write(root, volmark.Marker{Volume: "other"}); err != nil { + t.Fatalf("tamper marker: %v", err) + } + _, err = Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err == nil || !strings.Contains(err.Error(), "volume marker") { + t.Fatalf("err = %v, want marker refusal", err) + } + mustExist(t, filepath.Join(root, "a.txt")) +} + +// TestOffloadUnindexedVolumeRefused: a volume with no index rows has no +// evidence to gate on. +func TestOffloadUnindexedVolumeRefused(t *testing.T) { + s := setupStore(t) + _, err := Offload(context.Background(), s, t.TempDir(), Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err == nil || !strings.Contains(err.Error(), "no index rows") { + t.Fatalf("err = %v, want unindexed refusal", err) + } +} + +func TestCleanSelectors(t *testing.T) { + got, err := cleanSelectors([]string{"a/b/", "./c", ".", "./"}) + if err != nil { + t.Fatalf("cleanSelectors: %v", err) + } + if len(got) != 4 || got[0] != "a/b" || got[1] != "c" || got[2] != "." || got[3] != "." { + t.Fatalf("cleaned = %v, want [a/b c . .]", got) + } + for _, bad := range []string{"", "/abs/path", "../escape", "a/../../b", "a/..", "b/c/../.."} { + if _, err := cleanSelectors([]string{bad}); err == nil { + t.Fatalf("cleanSelectors(%q) succeeded, want refusal", bad) + } + } +} diff --git a/offload/remove.go b/offload/remove.go new file mode 100644 index 0000000..4e6189a --- /dev/null +++ b/offload/remove.go @@ -0,0 +1,125 @@ +package offload + +import ( + "bytes" + "errors" + "fmt" + "io" + "os" + + "github.com/zeebo/blake3" + + "github.com/mbertschler/squirrel/store" +) + +// hashReadBufferSize matches the indexer's read buffer for the same +// reason: 1 MiB reads keep the syscall count low and let filesystem +// readahead engage on the multi-MB files offload typically targets. +const hashReadBufferSize = 1 << 20 + +// verifyAndRemove re-verifies that the on-disk bytes at row.Path are +// exactly the bytes the indexed row describes, then unlinks the file — +// and only the file. Every disagreement between disk and index is a +// refusal returned as a drift reason ("the disk is newer than the +// index"): bytes the index never recorded must survive the run. +// +// Traversal safety mirrors the indexer's walk, which treats symlinks as +// opaque: every parent component is Lstat'ed through the os.Root (which +// also confines the whole resolution to the volume) and must be a real +// directory, and the final component must Lstat as a regular file. The +// opened handle is then bound to the Lstat'ed inode with os.SameFile — +// the O_NOFOLLOW-equivalent check — the size, mtime, and BLAKE3 are +// verified from that handle, and a final Lstat+SameFile narrows the +// verify→unlink race window to the Remove call itself. +func verifyAndRemove(dir *os.Root, row store.FileRow, buf []byte) (drift string, err error) { + if drift, err := checkParents(dir, row.Path); drift != "" || err != nil { + return drift, err + } + fi, err := dir.Lstat(row.Path) + if errors.Is(err, os.ErrNotExist) { + return "missing on disk while the index row is 'present'; re-index", nil + } + if err != nil { + return "", fmt.Errorf("lstat: %w", err) + } + if fi.Mode()&os.ModeSymlink != 0 { + return "path is a symlink on disk, indexed as a regular file", nil + } + if !fi.Mode().IsRegular() { + return fmt.Sprintf("path is %s on disk, indexed as a regular file", fi.Mode().Type()), nil + } + + f, err := dir.Open(row.Path) + if err != nil { + return "", fmt.Errorf("open: %w", err) + } + defer f.Close() + if drift, err := verifyHandle(f, fi, row, buf); drift != "" || err != nil { + return drift, err + } + + post, err := dir.Lstat(row.Path) + if err != nil || !os.SameFile(fi, post) { + return "file replaced during verification", nil + } + if err := dir.Remove(row.Path); err != nil { + return "", fmt.Errorf("remove: %w", err) + } + return "", nil +} + +// verifyHandle checks the opened handle against both the just-taken +// Lstat (same inode) and the indexed row (size, mtime, BLAKE3). The +// hash is computed from the handle itself, so the verified bytes are +// the ones behind the inode the surrounding checks pin down. +func verifyHandle(f *os.File, lstat os.FileInfo, row store.FileRow, buf []byte) (string, error) { + hfi, err := f.Stat() + if err != nil { + return "", fmt.Errorf("stat open handle: %w", err) + } + if !os.SameFile(lstat, hfi) { + return "file replaced between check and open", nil + } + if hfi.Size() != row.SizeBytes { + return fmt.Sprintf("size changed: disk %d, indexed %d", hfi.Size(), row.SizeBytes), nil + } + if hfi.ModTime().UnixNano() != row.MtimeNs { + return fmt.Sprintf("mtime changed: disk %d, indexed %d", hfi.ModTime().UnixNano(), row.MtimeNs), nil + } + h := blake3.New() + if _, err := io.CopyBuffer(h, f, buf); err != nil { + return "", fmt.Errorf("hash: %w", err) + } + if digest := h.Sum(nil); !bytes.Equal(digest, row.Blake3) { + return fmt.Sprintf("content hash changed: disk %x, indexed %x", digest, row.Blake3), nil + } + return "", nil +} + +// checkParents Lstats every parent component of relPath inside the +// root, shallowest first, requiring each to be a real directory. The +// indexer's walk records paths whose every component was a directory, +// so a symlink (or anything else) appearing in the chain since is +// drift. +func checkParents(dir *os.Root, relPath string) (string, error) { + for i := 0; i < len(relPath); i++ { + if relPath[i] != '/' { + continue + } + prefix := relPath[:i] + fi, err := dir.Lstat(prefix) + if errors.Is(err, os.ErrNotExist) { + return fmt.Sprintf("parent %s missing on disk", prefix), nil + } + if err != nil { + return "", fmt.Errorf("lstat parent %s: %w", prefix, err) + } + if fi.Mode()&os.ModeSymlink != 0 { + return fmt.Sprintf("parent %s is a symlink", prefix), nil + } + if !fi.IsDir() { + return fmt.Sprintf("parent %s is not a directory", prefix), nil + } + } + return "", nil +} diff --git a/offload/remove_test.go b/offload/remove_test.go new file mode 100644 index 0000000..25e37b7 --- /dev/null +++ b/offload/remove_test.go @@ -0,0 +1,222 @@ +package offload + +import ( + "context" + "os" + "path/filepath" + "strings" + "testing" + "time" + + "github.com/mbertschler/squirrel/store" +) + +// skipIfRoot mirrors the index package's guard: uid 0 bypasses DAC +// permission checks, so chmod-based failure simulation silently stops +// failing. +func skipIfRoot(t *testing.T) { + t.Helper() + if os.Geteuid() == 0 { + t.Skip("test relies on permission denial; root bypasses DAC checks") + } +} + +// restoreMtime resets a file's timestamps to the indexed mtime so a +// test can change bytes while keeping the (size, mtime) shortcut +// inputs identical. +func restoreMtime(t *testing.T, path string, mtimeNs int64) { + t.Helper() + ts := time.Unix(0, mtimeNs) + if err := os.Chtimes(path, ts, ts); err != nil { + t.Fatalf("chtimes %s: %v", path, err) + } +} + +// TestOffloadDriftSkips: any disagreement between the on-disk file and +// the indexed row — content with matching size+mtime, size, or mtime — +// refuses the deletion, leaving disk and row untouched. +func TestOffloadDriftSkips(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "content.txt"), "alpha") + writeFile(t, filepath.Join(root, "size.txt"), "bravo") + writeFile(t, filepath.Join(root, "mtime.txt"), "carol") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + contentRow := rowAt(t, s, v.ID, "content.txt") + writeFile(t, filepath.Join(root, "content.txt"), "ALPHA") + restoreMtime(t, filepath.Join(root, "content.txt"), contentRow.MtimeNs) + + writeFile(t, filepath.Join(root, "size.txt"), "bravo plus more") + sizeRow := rowAt(t, s, v.ID, "size.txt") + restoreMtime(t, filepath.Join(root, "size.txt"), sizeRow.MtimeNs) + + restoreMtime(t, filepath.Join(root, "mtime.txt"), rowAt(t, s, v.ID, "mtime.txt").MtimeNs+1) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Drift != 3 || rep.Offloaded != 0 || rep.Errors != 0 { + t.Fatalf("report = %+v, want 3 drift skips", rep) + } + for path, wantReason := range map[string]string{ + "content.txt": "content hash changed", + "size.txt": "size changed", + "mtime.txt": "mtime changed", + } { + res := oneResult(t, rep, path, OutcomeDrift) + if !strings.Contains(res.Reasons[0], wantReason) { + t.Fatalf("%s reason = %v, want %q", path, res.Reasons, wantReason) + } + mustExist(t, filepath.Join(root, path)) + if row := rowAt(t, s, v.ID, path); row.Status != store.StatusPresent { + t.Fatalf("%s status = %q, want present", path, row.Status) + } + } +} + +// TestOffloadMissingOnDiskSkips: a file already gone from disk (with +// the row still 'present') is drift, never an offload. +func TestOffloadMissingOnDiskSkips(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + if err := os.Remove(filepath.Join(root, "a.txt")); err != nil { + t.Fatal(err) + } + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"a.txt"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeDrift) + if !strings.Contains(res.Reasons[0], "missing on disk") { + t.Fatalf("reason = %v, want missing-on-disk drift", res.Reasons) + } + if row := rowAt(t, s, v.ID, "a.txt"); row.Status != store.StatusPresent { + t.Fatalf("status = %q, want present (only the indexer records absence)", row.Status) + } +} + +// TestOffloadSymlinkFileRefused: a symlink where the index recorded a +// regular file is refused and its target survives. +func TestOffloadSymlinkFileRefused(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + writeFile(t, filepath.Join(root, "target.txt"), "target bytes") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + if err := os.Remove(filepath.Join(root, "a.txt")); err != nil { + t.Fatal(err) + } + if err := os.Symlink(filepath.Join(root, "target.txt"), filepath.Join(root, "a.txt")); err != nil { + t.Fatalf("symlink: %v", err) + } + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"a.txt"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "a.txt", OutcomeDrift) + if !strings.Contains(res.Reasons[0], "symlink") { + t.Fatalf("reason = %v, want symlink refusal", res.Reasons) + } + mustExist(t, filepath.Join(root, "a.txt")) + mustExist(t, filepath.Join(root, "target.txt")) +} + +// TestOffloadSymlinkParentRefused: a directory component that became a +// symlink since indexing refuses the deletion even though the link +// resolves to the recorded bytes. +func TestOffloadSymlinkParentRefused(t *testing.T) { + root := t.TempDir() + writeFile(t, filepath.Join(root, "dir", "c.txt"), "charlie") + s := setupStore(t) + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + if err := os.Rename(filepath.Join(root, "dir"), filepath.Join(root, "dir2")); err != nil { + t.Fatal(err) + } + if err := os.Symlink(filepath.Join(root, "dir2"), filepath.Join(root, "dir")); err != nil { + t.Fatalf("symlink: %v", err) + } + + rep, err := Offload(context.Background(), s, root, Options{ + Name: volName, Paths: []string{"dir/c.txt"}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + res := oneResult(t, rep, "dir/c.txt", OutcomeDrift) + if !strings.Contains(res.Reasons[0], "parent dir is a symlink") { + t.Fatalf("reason = %v, want parent-symlink refusal", res.Reasons) + } + mustExist(t, filepath.Join(root, "dir2", "c.txt")) +} + +// TestOffloadPartialRun: a per-file unlink failure is reported and +// counted, the other files still offload, and the runs row lands in +// 'partial' with file_count = the files actually offloaded. +func TestOffloadPartialRun(t *testing.T) { + skipIfRoot(t) + root := t.TempDir() + writeFile(t, filepath.Join(root, "a.txt"), "alpha") + writeFile(t, filepath.Join(root, "ro", "b.txt"), "bravo") + s := setupStore(t) + ctx := context.Background() + idx := indexVolume(t, s, root) + v := testVolume(t, s) + seedVector(t, s, v.ID, "t1", selfNode(t, s).ID, idx.RunID) + + roDir := filepath.Join(root, "ro") + if err := os.Chmod(roDir, 0o555); err != nil { + t.Fatalf("chmod: %v", err) + } + t.Cleanup(func() { _ = os.Chmod(roDir, 0o755) }) + + rep, err := Offload(ctx, s, root, Options{ + Name: volName, Paths: []string{"."}, Require: []string{"t1"}, + }) + if err != nil { + t.Fatalf("Offload: %v", err) + } + if rep.Offloaded != 1 || rep.Errors != 1 { + t.Fatalf("report = %+v, want 1 offloaded + 1 error", rep) + } + oneResult(t, rep, "a.txt", OutcomeOffloaded) + res := oneResult(t, rep, "ro/b.txt", OutcomeError) + if !strings.Contains(res.Reasons[0], "remove") { + t.Fatalf("reason = %v, want remove failure", res.Reasons) + } + mustBeGone(t, filepath.Join(root, "a.txt")) + mustExist(t, filepath.Join(root, "ro", "b.txt")) + if row := rowAt(t, s, v.ID, "ro/b.txt"); row.Status != store.StatusPresent { + t.Fatalf("ro/b.txt status = %q, want present", row.Status) + } + + run, err := s.GetRun(ctx, rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusPartial || run.FileCount != 1 { + t.Fatalf("run = %+v, want status=partial file_count=1", run) + } +} diff --git a/store/contents_test.go b/store/contents_test.go new file mode 100644 index 0000000..6dc50c7 --- /dev/null +++ b/store/contents_test.go @@ -0,0 +1,98 @@ +package store + +import ( + "context" + "strings" + "testing" +) + +// TestUpsertRejectsSizeMismatchForKnownDigest: a digest the index +// already maps to a different size means corruption or a mis-hashing +// caller, so the upsert errors instead of recording the observation. +func TestUpsertRejectsSizeMismatchForKnownDigest(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run := makeRun(t, s, vID) + + d := digest(0x7a) + row := FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: d, + SizeBytes: 10, MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: run, LastSeenRunID: run, IndexedAtNs: 1, + } + if err := s.Upsert(ctx, row, nil); err != nil { + t.Fatalf("Upsert: %v", err) + } + + row.Path = "b.txt" + row.SizeBytes = 11 + err := s.Upsert(ctx, row, nil) + if err == nil { + t.Fatalf("same digest with different size accepted, want error") + } + if !strings.Contains(err.Error(), "size") { + t.Fatalf("error = %v, want one naming the size disagreement", err) + } +} + +// TestContentIntroductionRunID: the introduction run is the earliest +// first_seen_run_id across every observation of the content in the +// volume — later duplicate paths don't move it, and neither does the +// original observation being superseded (introduction is history, not +// the live set). +func TestContentIntroductionRunID(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run1 := makeRun(t, s, vID) + run2 := makeRun(t, s, vID) + + d := digest(0x31) + row := FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: d, + SizeBytes: 3, MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: run1, LastSeenRunID: run1, IndexedAtNs: 1, + } + if err := s.Upsert(ctx, row, nil); err != nil { + t.Fatalf("Upsert a.txt: %v", err) + } + row.Path = "b.txt" + row.FirstSeenRunID, row.LastSeenRunID = run2, run2 + if err := s.Upsert(ctx, row, nil); err != nil { + t.Fatalf("Upsert b.txt: %v", err) + } + + live, err := s.GetByPath(ctx, vID, "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + intro, err := s.ContentIntroductionRunID(ctx, vID, live.ContentID) + if err != nil { + t.Fatalf("ContentIntroductionRunID: %v", err) + } + if intro != run1 { + t.Fatalf("introduction run = %d, want %d", intro, run1) + } + + // Superseding the original observation keeps the introduction run. + newer := FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: digest(0x32), + SizeBytes: 3, MtimeNs: 2, Status: StatusPresent, + FirstSeenRunID: run2, LastSeenRunID: run2, IndexedAtNs: 2, + } + if err := s.Upsert(ctx, newer, nil); err != nil { + t.Fatalf("Upsert newer a.txt: %v", err) + } + intro, err = s.ContentIntroductionRunID(ctx, vID, live.ContentID) + if err != nil { + t.Fatalf("ContentIntroductionRunID after supersede: %v", err) + } + if intro != run1 { + t.Fatalf("introduction run after supersede = %d, want %d", intro, run1) + } + + if _, err := s.ContentIntroductionRunID(ctx, vID, live.ContentID+999); !IsNotFound(err) { + t.Fatalf("unknown content err = %v, want sql.ErrNoRows", err) + } +} diff --git a/store/destination_push_freshness.go b/store/destination_push_freshness.go new file mode 100644 index 0000000..6c940c5 --- /dev/null +++ b/store/destination_push_freshness.go @@ -0,0 +1,108 @@ +package store + +import ( + "context" + "fmt" +) + +// DestinationPushFreshness is one origin-space freshness coordinate: the +// highest origin run of OriginNodeID's content that was present on the +// pushing node when it last completed a successful whole-volume push of +// VolumeID to Destination. Coordinates live in the origin node's run +// space, like contents.origin_run_id and DestinationRunID.OriginRunID, so +// a node that never pushes to Destination directly can still compare a +// gated content's origin run against it. +// +// Unlike the monotonic durability vector (DestinationRunID), this maximum +// is overwritten per push: it tracks the most recent push's coverage, not +// the all-time maximum, so it answers "is this content's origin run +// covered by a *fresh* push" rather than "was it ever durable". +type DestinationPushFreshness struct { + VolumeID int64 + Destination string + OriginNodeID int64 + OriginRunID int64 + UpdatedAtNs int64 +} + +// UpsertDestinationPushFreshness sets the freshness coordinate for +// (volume, destination, origin node) to originRunID, overwriting any +// prior value. Callers invoke it once per successful whole-volume push +// with that push's present-set origin maxima, so the row always reflects +// the latest push rather than a monotonic floor. +func (s *Store) UpsertDestinationPushFreshness(ctx context.Context, volumeID int64, destination string, originNodeID, originRunID int64) error { + if destination == "" { + return fmt.Errorf("UpsertDestinationPushFreshness: destination must be non-empty") + } + _, err := s.db.ExecContext(ctx, ` + INSERT INTO destination_push_freshness (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns) + VALUES (?, ?, ?, ?, ?) + ON CONFLICT(volume_id, destination, origin_node_id) DO UPDATE SET + origin_run_id = excluded.origin_run_id, + updated_at_ns = excluded.updated_at_ns + `, volumeID, destination, originNodeID, originRunID, NowNs()) + if err != nil { + return fmt.Errorf("upsert destination_push_freshness: %w", err) + } + return nil +} + +// MergeDestinationPushFreshness raises the freshness coordinate for +// (volume, destination, origin node) to originRunID only when it exceeds +// the recorded value, leaving a higher recorded value in place. The +// durability pull uses it to accumulate freshness evidence about a +// relayed target across pulls: a target is append-only, so once a push +// covered origin run R every coordinate up to R was pushed and persists, +// making the highest coordinate ever observed the soundest cached fact. +// A stale pull reporting a lower value never rewinds the puller — the +// monotonic accumulation mirrors the durability vector's pull merge. +func (s *Store) MergeDestinationPushFreshness(ctx context.Context, volumeID int64, destination string, originNodeID, originRunID int64) error { + if destination == "" { + return fmt.Errorf("MergeDestinationPushFreshness: destination must be non-empty") + } + _, err := s.db.ExecContext(ctx, ` + INSERT INTO destination_push_freshness (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns) + VALUES (?, ?, ?, ?, ?) + ON CONFLICT(volume_id, destination, origin_node_id) DO UPDATE SET + origin_run_id = excluded.origin_run_id, + updated_at_ns = excluded.updated_at_ns + WHERE excluded.origin_run_id > destination_push_freshness.origin_run_id + `, volumeID, destination, originNodeID, originRunID, NowNs()) + if err != nil { + return fmt.Errorf("merge destination_push_freshness: %w", err) + } + return nil +} + +// ListDestinationPushFreshness returns every freshness coordinate for one +// (volume, destination), ordered by origin node id. An empty slice means +// the destination has no recorded whole-volume push for the volume yet — +// which the offload gate reads as "no freshness evidence" and refuses a +// relayed target on. +func (s *Store) ListDestinationPushFreshness(ctx context.Context, volumeID int64, destination string) ([]DestinationPushFreshness, error) { + return queryRows(ctx, s.db, + `SELECT volume_id, destination, origin_node_id, origin_run_id, updated_at_ns + FROM destination_push_freshness + WHERE volume_id = ? AND destination = ? + ORDER BY origin_node_id`, + scanDestinationPushFreshness, volumeID, destination) +} + +// ListVolumeDestinationPushFreshness returns every freshness coordinate +// for the volume across all destinations, ordered by destination then +// origin node id. The peer durability endpoint serves it so a relayed +// target's freshness evidence travels to a node that never pushes there. +func (s *Store) ListVolumeDestinationPushFreshness(ctx context.Context, volumeID int64) ([]DestinationPushFreshness, error) { + return queryRows(ctx, s.db, + `SELECT volume_id, destination, origin_node_id, origin_run_id, updated_at_ns + FROM destination_push_freshness + WHERE volume_id = ? + ORDER BY destination, origin_node_id`, + scanDestinationPushFreshness, volumeID) +} + +func scanDestinationPushFreshness(s rowScanner) (DestinationPushFreshness, error) { + var f DestinationPushFreshness + err := s.Scan(&f.VolumeID, &f.Destination, &f.OriginNodeID, &f.OriginRunID, &f.UpdatedAtNs) + return f, err +} diff --git a/store/destination_push_freshness_test.go b/store/destination_push_freshness_test.go new file mode 100644 index 0000000..8f59d82 --- /dev/null +++ b/store/destination_push_freshness_test.go @@ -0,0 +1,141 @@ +package store + +import ( + "context" + "testing" +) + +// TestUpsertDestinationPushFreshnessOverwrites: the push-side upsert +// overwrites the coordinate to the latest push's value, including +// lowering it when a later push covered less (content removed from the +// pushing node's present set). The non-monotonic overwrite is what lets +// the offload gate refuse a relayed file the most recent push no longer +// covers. +func TestUpsertDestinationPushFreshnessOverwrites(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.UpsertDestinationPushFreshness(ctx, vID, "offsite", self.ID, 10); err != nil { + t.Fatalf("upsert 10: %v", err) + } + if err := s.UpsertDestinationPushFreshness(ctx, vID, "offsite", self.ID, 4); err != nil { + t.Fatalf("upsert 4: %v", err) + } + fresh, err := s.ListDestinationPushFreshness(ctx, vID, "offsite") + if err != nil { + t.Fatalf("ListDestinationPushFreshness: %v", err) + } + if len(fresh) != 1 || fresh[0].OriginRunID != 4 { + t.Fatalf("freshness = %+v, want one coordinate at 4 (overwrite, not max)", fresh) + } +} + +// TestMergeDestinationPushFreshnessMonotonic: the pull-side merge raises +// the coordinate only, so a stale pull never lowers the puller's cached +// evidence. Append-only targets make the highest coordinate ever observed +// the soundest cached fact. +func TestMergeDestinationPushFreshnessMonotonic(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.MergeDestinationPushFreshness(ctx, vID, "offsite", self.ID, 10); err != nil { + t.Fatalf("merge 10: %v", err) + } + if err := s.MergeDestinationPushFreshness(ctx, vID, "offsite", self.ID, 4); err != nil { + t.Fatalf("merge 4: %v", err) + } + if err := s.MergeDestinationPushFreshness(ctx, vID, "offsite", self.ID, 12); err != nil { + t.Fatalf("merge 12: %v", err) + } + fresh, err := s.ListDestinationPushFreshness(ctx, vID, "offsite") + if err != nil { + t.Fatalf("ListDestinationPushFreshness: %v", err) + } + if len(fresh) != 1 || fresh[0].OriginRunID != 12 { + t.Fatalf("freshness = %+v, want one coordinate at 12 (monotonic max)", fresh) + } +} + +// TestAdvanceDestinationVectorRecordsFreshness: the snapshot-pinned +// advance path records push freshness from the same components it +// advances the vector with, so every gating whole-volume push leaves +// origin-space freshness behind for a downstream relayed target. +func TestAdvanceDestinationVectorRecordsFreshness(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + ext, err := s.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + + components := []OriginComponent{ + {OriginNodeID: self.ID, OriginRunID: 5}, + {OriginNodeID: ext.ID, OriginRunID: 42}, + } + if err := s.AdvanceDestinationVectorTo(ctx, vID, "offsite", VerifyMethodBlake3, components); err != nil { + t.Fatalf("AdvanceDestinationVectorTo: %v", err) + } + fresh, err := s.ListDestinationPushFreshness(ctx, vID, "offsite") + if err != nil { + t.Fatalf("ListDestinationPushFreshness: %v", err) + } + byNode := map[int64]int64{} + for _, f := range fresh { + byNode[f.OriginNodeID] = f.OriginRunID + } + if len(byNode) != 2 || byNode[self.ID] != 5 || byNode[ext.ID] != 42 { + t.Fatalf("freshness = %+v, want self→5 ext→42", byNode) + } +} + +// TestListVolumeDestinationPushFreshness returns coordinates across every +// destination of the volume, ordered by destination then origin node — +// the listing the durability endpoint serves. +func TestListVolumeDestinationPushFreshness(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + other := makeVolume(t, s, "/other") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + for _, a := range []struct { + volID int64 + dest string + run int64 + }{ + {vID, "offsite-b", 7}, + {vID, "offsite-a", 3}, + {other, "offsite-a", 99}, + } { + if err := s.UpsertDestinationPushFreshness(ctx, a.volID, a.dest, self.ID, a.run); err != nil { + t.Fatalf("upsert %s: %v", a.dest, err) + } + } + got, err := s.ListVolumeDestinationPushFreshness(ctx, vID) + if err != nil { + t.Fatalf("ListVolumeDestinationPushFreshness: %v", err) + } + if len(got) != 2 { + t.Fatalf("got %d coordinates, want 2 (the other volume excluded)", len(got)) + } + if got[0].Destination != "offsite-a" || got[1].Destination != "offsite-b" { + t.Fatalf("order = %q, %q, want offsite-a then offsite-b", got[0].Destination, got[1].Destination) + } +} diff --git a/store/destination_run_ids.go b/store/destination_run_ids.go new file mode 100644 index 0000000..c8d4746 --- /dev/null +++ b/store/destination_run_ids.go @@ -0,0 +1,404 @@ +package store + +import ( + "context" + "database/sql" + "errors" + "fmt" +) + +// Verification methods recorded on a durability component's +// VerifyMethod. They name the comparison that last advanced the +// component, so the offload gate can require a genuinely content-checked +// method before it deletes the only local copy. sync re-exports these as +// its VerifyMethod* identifiers, keeping one source of truth. +const ( + // VerifyMethodBlake3 is rclone's end-to-end content check + // (--checksum --hash blake3). + VerifyMethodBlake3 = "blake3" + // VerifyMethodSizeMtime is rclone's default size+mtime comparison, + // used for --shallow runs and forced by crypt destinations. Not a + // content check. + VerifyMethodSizeMtime = "size+mtime" + // VerifyMethodPeer is the node-sync handshake's receiver-side BLAKE3 + // re-hash of every delivered path. + VerifyMethodPeer = "peer-blake3" + // VerifyMethodKopia is kopia's own repository verification + // (`kopia snapshot verify`). + VerifyMethodKopia = "kopia-verify" + // VerifyMethodPresenceSize is the content-addressed push's check: + // presence plus the expected ciphertext size, no content hash. Not a + // content check on its own — a verified scan-back fingerprint must + // back the object before such a component gates offload. + VerifyMethodPresenceSize = "presence+size" +) + +// ContentVerifiedMethod reports whether a durability component advanced +// by method carries genuine content verification — the precondition the +// offload gate applies before deleting a local copy. A presence-only or +// size+mtime method is not content-verified; an empty method (a pre-v19 +// component, or one whose provenance is unknown) is treated as +// unverified so the gate refuses rather than over-claims. +func ContentVerifiedMethod(method string) bool { + switch method { + case VerifyMethodBlake3, VerifyMethodPeer, VerifyMethodKopia: + return true + default: + return false + } +} + +// KnownVerifyMethod reports whether method is one of the defined +// verification-method identifiers above. It exists for the wire +// boundary: a pulled durability component carries its origin's +// VerifyMethod verbatim, and the offload gate later switches on it +// (ContentVerifiedMethod), so an unrecognised non-empty method should be +// refused on receipt rather than stored and silently ignored. The empty +// method (a pre-v19 row, or one whose provenance is unknown) is a +// legitimate "unverified" state and is deliberately excluded here; +// callers that accept it test for "" explicitly. +func KnownVerifyMethod(method string) bool { + switch method { + case VerifyMethodBlake3, VerifyMethodSizeMtime, VerifyMethodPeer, VerifyMethodKopia, VerifyMethodPresenceSize: + return true + default: + return false + } +} + +// DestinationRunID is one component of a destination's durability +// version vector: the highest origin-space run id of OriginNodeID's +// content known durable on Destination for VolumeID. Destination is the +// unified target name — a bucket destination or a peer node name, the +// same namespace runs.destination uses. OriginRunID is in the origin +// node's run space, so like contents.origin_run_id it is not a local +// runs FK. Content with origin (N, r) is durable on a destination iff +// the vector's component for N is ≥ r. VerifyMethod names the comparison +// that last advanced the component (empty for a pre-v19 row). +type DestinationRunID struct { + VolumeID int64 + Destination string + OriginNodeID int64 + OriginRunID int64 + UpdatedAtNs int64 + VerifyMethod string +} + +// DestinationRunIDHistory is one row of the insert-only +// destination_run_ids_history log: a single vector-component advance. +// AtNs is the insertion timestamp; rows written in the same tick still +// order by id. VerifyMethod records the method behind this advance. +type DestinationRunIDHistory struct { + ID int64 + VolumeID int64 + Destination string + OriginNodeID int64 + OriginRunID int64 + AtNs int64 + VerifyMethod string +} + +// DestinationRewindError carries the rejected and current vector +// components when UpsertDestinationRunID refuses a backwards move. It +// wraps ErrWatermarkRewind so errors.Is matches the shared sentinel. +type DestinationRewindError struct { + VolumeID int64 + Destination string + OriginNodeID int64 + Current int64 + Attempted int64 +} + +func (e *DestinationRewindError) Error() string { + return fmt.Sprintf( + "destination %q watermark for volume %d origin node %d would move backwards from %d to %d; pass allowRewind to override", + e.Destination, e.VolumeID, e.OriginNodeID, e.Current, e.Attempted) +} + +func (e *DestinationRewindError) Unwrap() error { return ErrWatermarkRewind } + +// GetDestinationRunID returns one vector component, or sql.ErrNoRows +// when the destination has never recorded durability for content +// originating at originNodeID. "No row" imposes no floor: any origin +// run id advances from it. +func (s *Store) GetDestinationRunID(ctx context.Context, volumeID int64, destination string, originNodeID int64) (DestinationRunID, error) { + row := s.db.QueryRowContext(ctx, + `SELECT volume_id, destination, origin_node_id, origin_run_id, updated_at_ns, verify_method + FROM destination_run_ids + WHERE volume_id = ? AND destination = ? AND origin_node_id = ?`, + volumeID, destination, originNodeID) + return scanDestinationRunID(row) +} + +// ListDestinationRunIDs returns the full durability vector for one +// (volume, destination), ordered by origin node id. An empty slice +// means the destination has no recorded durability yet. +func (s *Store) ListDestinationRunIDs(ctx context.Context, volumeID int64, destination string) ([]DestinationRunID, error) { + return queryRows(ctx, s.db, + `SELECT volume_id, destination, origin_node_id, origin_run_id, updated_at_ns, verify_method + FROM destination_run_ids + WHERE volume_id = ? AND destination = ? + ORDER BY origin_node_id`, + scanDestinationRunID, volumeID, destination) +} + +// ListVolumeDestinationRunIDs returns every recorded vector component +// for the volume across all destinations, ordered by destination then +// origin node id. The peer durability endpoint serves this listing so +// a peer can hold offline evidence about destinations only this node +// can see. +func (s *Store) ListVolumeDestinationRunIDs(ctx context.Context, volumeID int64) ([]DestinationRunID, error) { + return queryRows(ctx, s.db, + `SELECT volume_id, destination, origin_node_id, origin_run_id, updated_at_ns, verify_method + FROM destination_run_ids + WHERE volume_id = ? + ORDER BY destination, origin_node_id`, + scanDestinationRunID, volumeID) +} + +func scanDestinationRunID(s rowScanner) (DestinationRunID, error) { + var d DestinationRunID + var method sql.NullString + err := s.Scan(&d.VolumeID, &d.Destination, &d.OriginNodeID, &d.OriginRunID, &d.UpdatedAtNs, &method) + d.VerifyMethod = method.String + return d, err +} + +// AdvanceDestinationVectorTo advances the destination's durability +// vector to exactly the supplied components, tagging each with +// verifyMethod. Callers compute the components once from the push's own +// enumeration snapshot (a content-addressed delta, a peer plan, a +// pre-transfer listing) so the advance reflects only what was actually +// transferred — never a wider live set re-read after the transfer, which +// would claim durability for rows committed mid-push. Each component +// routes through the monotonic upsert; an attempted rewind is skipped +// (the recorded floor already covers it). This is the single +// advancement path the destination handlers and the peer-sync initiator +// use rather than writing components directly. +func (s *Store) AdvanceDestinationVectorTo(ctx context.Context, volumeID int64, destination, verifyMethod string, components []OriginComponent) error { + for _, c := range components { + err := s.upsertDestinationRunID(ctx, volumeID, destination, c.OriginNodeID, c.OriginRunID, verifyMethod, false) + if errors.Is(err, ErrWatermarkRewind) { + continue + } + if err != nil { + return err + } + } + return s.recordPushFreshness(ctx, volumeID, destination, components) +} + +// recordPushFreshness overwrites the destination's push-freshness maxima +// to exactly the supplied snapshot — the per-origin-node maxima of the +// present set this push enumerated. Distinct from the monotonic vector +// advance above: freshness reflects only the latest push, so a push that +// dropped content from the present set lowers the maxima. The offload +// gate reads it as origin-space freshness for a relayed target. +// +// A node absent from the snapshot keeps its prior freshness row: the push +// enumerated no present content for that origin, which says nothing about +// whether that node's earlier content stopped being fresh, so leaving the +// row is the conservative choice (the monotonic vector still governs +// durability). +func (s *Store) recordPushFreshness(ctx context.Context, volumeID int64, destination string, components []OriginComponent) error { + for _, c := range components { + if err := s.UpsertDestinationPushFreshness(ctx, volumeID, destination, c.OriginNodeID, c.OriginRunID); err != nil { + return err + } + } + return nil +} + +// OriginComponent is one (origin node, max origin run) pair computed +// over a volume's present rows by PresentOriginMaxima. +type OriginComponent struct { + OriginNodeID int64 + OriginRunID int64 +} + +// PresentOriginMaxima computes the per-origin-node maximum origin run +// over the volume's present files, deduplicated to one coordinate per +// content. Content whose origin is NULL (or partially NULL — degraded +// the same way the conflict pre-stage treats partial provenance) maps +// to selfNodeID at its introduction run, mirroring +// ContentIntroductionRunID (the introduction MIN spans every +// observation of the content, any status). The reserved sync subtrees +// are excluded from the present set — they never travel to a +// destination, so they must not advance its evidence. +// +// Handlers capture this snapshot before the transfer and feed it to +// AdvanceDestinationVectorTo, so a row committed between the snapshot and +// the advance is never claimed durable. +func (s *Store) PresentOriginMaxima(ctx context.Context, volumeID, selfNodeID int64) ([]OriginComponent, error) { + return queryRows(ctx, s.db, ` + WITH present_contents AS ( + SELECT DISTINCT f.content_id, c.origin_node_id, c.origin_run_id + FROM files f + JOIN folders fo ON fo.id = f.folder_id + JOIN contents c ON c.id = f.content_id + WHERE fo.volume_id = ? AND f.status = 'present' + AND `+reservedSubtreeFilter+` + ) + SELECT + CASE WHEN pc.origin_node_id IS NULL OR pc.origin_run_id IS NULL + THEN ? ELSE pc.origin_node_id END AS origin_node, + MAX(CASE WHEN pc.origin_node_id IS NULL OR pc.origin_run_id IS NULL + THEN (SELECT MIN(f2.first_seen_run_id) + FROM files f2 + JOIN folders fo2 ON fo2.id = f2.folder_id + WHERE fo2.volume_id = ? AND f2.content_id = pc.content_id) + ELSE pc.origin_run_id END) AS origin_run + FROM present_contents pc + GROUP BY origin_node + ORDER BY origin_node + `, scanOriginComponent, volumeID, selfNodeID, volumeID) +} + +func scanOriginComponent(s rowScanner) (OriginComponent, error) { + var c OriginComponent + err := s.Scan(&c.OriginNodeID, &c.OriginRunID) + return c, err +} + +// UpsertDestinationRunID advances one component of a destination's +// durability vector to originRunID, recording no verification method +// (the component reads as unverified to the offload gate until a typed +// advance re-stamps it). Callers invoke it only once the destination has +// verifiably landed every piece of content up to that origin run — a +// failed or partial push leaves the prior value in place. +// +// The component is meant to advance monotonically: the upsert statement +// itself only applies when originRunID is at or above the recorded value +// (or allowRewind is set — the opt-in for genuine recovery, mirroring +// UpsertPeerSyncState), so a racing writer cannot regress the vector. A +// refused rewind surfaces as a *DestinationRewindError (wrapping +// ErrWatermarkRewind). +// +// The upsert and an insert-only destination_run_ids_history row are +// written in one transaction so the append-only advance log can never +// diverge from the live vector. +// +// verify_method follows the component: a non-empty method always wins +// (an advance or a re-confirmation that upgrades the recorded method); an +// empty method clears it when the run strictly advances (a new, +// unverified coordinate) but preserves the existing method when the run +// is unchanged, so a methodless re-confirmation (e.g. a pull from a +// pre-v19 peer) never degrades a content-verified component to unknown. +func (s *Store) UpsertDestinationRunID(ctx context.Context, volumeID int64, destination string, originNodeID, originRunID int64, allowRewind bool) error { + return s.upsertDestinationRunID(ctx, volumeID, destination, originNodeID, originRunID, "", allowRewind) +} + +// UpsertDestinationRunIDVerified is UpsertDestinationRunID with an +// explicit verification method recorded on the component — the entry +// point the durability pull uses to carry a peer's reported method +// verbatim, so the puller's offload gate weighs a pulled component +// exactly as the responder did. +func (s *Store) UpsertDestinationRunIDVerified(ctx context.Context, volumeID int64, destination string, originNodeID, originRunID int64, verifyMethod string, allowRewind bool) error { + return s.upsertDestinationRunID(ctx, volumeID, destination, originNodeID, originRunID, verifyMethod, allowRewind) +} + +func (s *Store) upsertDestinationRunID(ctx context.Context, volumeID int64, destination string, originNodeID, originRunID int64, verifyMethod string, allowRewind bool) error { + if destination == "" { + return fmt.Errorf("UpsertDestinationRunID: destination must be non-empty") + } + tx, err := s.db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("begin upsert destination_run_ids: %w", err) + } + defer func() { _ = tx.Rollback() }() + + atNs := NowNs() + method := nullableString(verifyMethod) + res, err := tx.ExecContext(ctx, ` + INSERT INTO destination_run_ids (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns, verify_method) + VALUES (?, ?, ?, ?, ?, ?) + ON CONFLICT(volume_id, destination, origin_node_id) DO UPDATE SET + origin_run_id = excluded.origin_run_id, + updated_at_ns = excluded.updated_at_ns, + verify_method = CASE + WHEN excluded.verify_method IS NOT NULL THEN excluded.verify_method + WHEN excluded.origin_run_id > destination_run_ids.origin_run_id THEN NULL + ELSE destination_run_ids.verify_method + END + WHERE excluded.origin_run_id >= destination_run_ids.origin_run_id OR ? + `, volumeID, destination, originNodeID, originRunID, atNs, method, allowRewind) + if err != nil { + return fmt.Errorf("upsert destination_run_ids: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("upsert destination_run_ids rows: %w", err) + } + if n == 0 { + if err := guardDestinationMonotonicTx(ctx, tx, volumeID, destination, originNodeID, originRunID); err != nil { + return err + } + return fmt.Errorf("upsert destination_run_ids: conditional update applied no row for (%d, %q, %d)", volumeID, destination, originNodeID) + } + if _, err := tx.ExecContext(ctx, ` + INSERT INTO destination_run_ids_history + (volume_id, destination, origin_node_id, origin_run_id, at_ns, verify_method) + VALUES (?, ?, ?, ?, ?, ?) + `, volumeID, destination, originNodeID, originRunID, atNs, method); err != nil { + return fmt.Errorf("append destination_run_ids_history: %w", err) + } + if err := tx.Commit(); err != nil { + return fmt.Errorf("commit upsert destination_run_ids: %w", err) + } + return nil +} + +// nullableString maps "" to a SQL NULL so an unset verify method stays +// NULL rather than an empty string the gate would have to special-case. +func nullableString(s string) sql.NullString { + return sql.NullString{String: s, Valid: s != ""} +} + +// guardDestinationMonotonicTx reads the current vector component inside +// tx and returns a *DestinationRewindError when attempted is strictly +// below it. No row imposes no floor. Called after the conditional +// upsert applied nothing, to turn the refusal into a precise error. +func guardDestinationMonotonicTx(ctx context.Context, tx *sql.Tx, volumeID int64, destination string, originNodeID, attempted int64) error { + var current int64 + err := tx.QueryRowContext(ctx, + `SELECT origin_run_id FROM destination_run_ids + WHERE volume_id = ? AND destination = ? AND origin_node_id = ?`, + volumeID, destination, originNodeID).Scan(&current) + switch { + case errors.Is(err, sql.ErrNoRows): + return nil + case err != nil: + return fmt.Errorf("read current destination run id: %w", err) + } + if attempted < current { + return &DestinationRewindError{ + VolumeID: volumeID, + Destination: destination, + OriginNodeID: originNodeID, + Current: current, + Attempted: attempted, + } + } + return nil +} + +// ListDestinationRunIDHistory returns every advance recorded for the +// (volume, destination) pair across all origin nodes, oldest first +// (ascending id, which is insertion order). An empty slice means no +// recorded advances. +func (s *Store) ListDestinationRunIDHistory(ctx context.Context, volumeID int64, destination string) ([]DestinationRunIDHistory, error) { + return queryRows(ctx, s.db, ` + SELECT id, volume_id, destination, origin_node_id, origin_run_id, at_ns, verify_method + FROM destination_run_ids_history + WHERE volume_id = ? AND destination = ? + ORDER BY id + `, scanDestinationRunIDHistory, volumeID, destination) +} + +func scanDestinationRunIDHistory(s rowScanner) (DestinationRunIDHistory, error) { + var h DestinationRunIDHistory + var method sql.NullString + err := s.Scan(&h.ID, &h.VolumeID, &h.Destination, &h.OriginNodeID, &h.OriginRunID, &h.AtNs, &method) + h.VerifyMethod = method.String + return h, err +} diff --git a/store/destination_run_ids_test.go b/store/destination_run_ids_test.go new file mode 100644 index 0000000..79ad41c --- /dev/null +++ b/store/destination_run_ids_test.go @@ -0,0 +1,651 @@ +package store + +import ( + "context" + "database/sql" + "errors" + "path/filepath" + "testing" +) + +// TestMigrateV18ToV19AddsVerifyMethod builds a minimal v18 database with +// a pre-existing durability component and confirms the migration adds +// verify_method (NULL on the carried-over row) without disturbing the +// recorded coordinate. +func TestMigrateV18ToV19AddsVerifyMethod(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + rawDB, err := sql.Open("sqlite", dsn) + if err != nil { + t.Fatalf("raw sql.Open: %v", err) + } + v18DDL := []string{ + `CREATE TABLE schema_version (version INTEGER NOT NULL PRIMARY KEY)`, + `CREATE TABLE volumes (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, path TEXT NOT NULL)`, + `CREATE TABLE nodes (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, endpoint TEXT, public_key_fingerprint TEXT)`, + // contents exists from v14 on; a real v18 DB carries it, and the + // v20→v21 triggers attach to it, so the minimal fixture must too. + `CREATE TABLE contents ( + id INTEGER PRIMARY KEY, + blake3 BLOB NOT NULL UNIQUE CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + origin_node_id INTEGER REFERENCES nodes(id), + origin_run_id INTEGER + )`, + `CREATE TABLE destination_run_ids ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, + PRIMARY KEY (volume_id, destination, origin_node_id) + )`, + `CREATE TABLE destination_run_ids_history ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL, + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL, + origin_run_id INTEGER NOT NULL, + at_ns INTEGER NOT NULL + )`, + `INSERT INTO schema_version (version) VALUES (18)`, + `INSERT INTO volumes (id, name, path) VALUES (1, 'v', '/v')`, + `INSERT INTO nodes (id, name) VALUES (1, 'self')`, + `INSERT INTO destination_run_ids (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns) + VALUES (1, 'bucket', 1, 7, 100)`, + } + for _, q := range v18DDL { + if _, err := rawDB.Exec(q); err != nil { + t.Fatalf("v18 DDL %q: %v", q, err) + } + } + rawDB.Close() + + s, err := Open(dsn) + if err != nil { + t.Fatalf("Open (migrates v18→v19): %v", err) + } + defer s.Close() + ctx := context.Background() + if v, _ := s.CurrentSchemaVersion(ctx); v != SchemaVersion { + t.Fatalf("schema_version = %d, want %d", v, SchemaVersion) + } + got, err := s.GetDestinationRunID(ctx, 1, "bucket", 1) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 7 { + t.Fatalf("origin run = %d, want 7 (carried over)", got.OriginRunID) + } + if got.VerifyMethod != "" { + t.Fatalf("verify method = %q, want empty (NULL backfill)", got.VerifyMethod) + } +} + +// TestUpsertDestinationRunIDWritesHistory: every successful advance +// appends one destination_run_ids_history row alongside updating the +// live vector component — the same append-only contract the peer-sync +// watermark has. +func TestUpsertDestinationRunIDWritesHistory(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + for _, run := range []int64{7, 20, 42} { + if err := s.UpsertDestinationRunID(ctx, vID, "bucket-a", node.ID, run, false); err != nil { + t.Fatalf("UpsertDestinationRunID(%d): %v", run, err) + } + } + + history, err := s.ListDestinationRunIDHistory(ctx, vID, "bucket-a") + if err != nil { + t.Fatalf("ListDestinationRunIDHistory: %v", err) + } + if len(history) != 3 { + t.Fatalf("history rows = %d, want 3", len(history)) + } + want := []int64{7, 20, 42} + for i, h := range history { + if h.OriginRunID != want[i] { + t.Fatalf("history[%d] origin run = %d, want %d", i, h.OriginRunID, want[i]) + } + if h.VolumeID != vID || h.Destination != "bucket-a" || h.OriginNodeID != node.ID { + t.Fatalf("history[%d] key = (%d,%q,%d), want (%d,%q,%d)", + i, h.VolumeID, h.Destination, h.OriginNodeID, vID, "bucket-a", node.ID) + } + } + + got, err := s.GetDestinationRunID(ctx, vID, "bucket-a", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 42 { + t.Fatalf("live origin run = %d, want 42", got.OriginRunID) + } +} + +// TestUpsertDestinationRunIDRefusesRewind: a component move below the +// recorded value is refused by default with a *DestinationRewindError +// (wrapping the shared ErrWatermarkRewind), the live row is left +// untouched, and no history row is appended for the rejected move. +// allowRewind overrides for genuine recovery and is logged to history. +func TestUpsertDestinationRunIDRefusesRewind(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.UpsertDestinationRunID(ctx, vID, "bucket-a", node.ID, 40, false); err != nil { + t.Fatalf("seed advance: %v", err) + } + + err = s.UpsertDestinationRunID(ctx, vID, "bucket-a", node.ID, 3, false) + if !errors.Is(err, ErrWatermarkRewind) { + t.Fatalf("rewind err = %v, want ErrWatermarkRewind", err) + } + var rewErr *DestinationRewindError + if !errors.As(err, &rewErr) { + t.Fatalf("err = %v, want *DestinationRewindError", err) + } + if rewErr.Current != 40 || rewErr.Attempted != 3 || rewErr.Destination != "bucket-a" { + t.Fatalf("rewind detail = %+v, want current=40 attempted=3 destination=bucket-a", rewErr) + } + + live, err := s.GetDestinationRunID(ctx, vID, "bucket-a", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID after refusal: %v", err) + } + if live.OriginRunID != 40 { + t.Fatalf("live origin run = %d after refused rewind, want 40", live.OriginRunID) + } + history, err := s.ListDestinationRunIDHistory(ctx, vID, "bucket-a") + if err != nil { + t.Fatalf("ListDestinationRunIDHistory: %v", err) + } + if len(history) != 1 { + t.Fatalf("history rows = %d after refused rewind, want 1", len(history)) + } + + if err := s.UpsertDestinationRunID(ctx, vID, "bucket-a", node.ID, 3, true); err != nil { + t.Fatalf("allowRewind override: %v", err) + } + live, _ = s.GetDestinationRunID(ctx, vID, "bucket-a", node.ID) + if live.OriginRunID != 3 { + t.Fatalf("live origin run = %d after override, want 3", live.OriginRunID) + } + history, _ = s.ListDestinationRunIDHistory(ctx, vID, "bucket-a") + if len(history) != 2 { + t.Fatalf("history rows = %d after override, want 2 (override is logged)", len(history)) + } +} + +// TestDestinationRunIDVector: the vector for one destination carries +// one independent component per origin node, scoped per destination +// and per volume. +func TestDestinationRunIDVector(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + peer, err := s.CreateNode(ctx, "peer", "https://peer.example") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + + advances := []struct { + destination string + nodeID int64 + run int64 + }{ + {"bucket-a", self.ID, 10}, + {"bucket-a", peer.ID, 4}, + {"bucket-b", self.ID, 2}, + } + for _, a := range advances { + if err := s.UpsertDestinationRunID(ctx, vID, a.destination, a.nodeID, a.run, false); err != nil { + t.Fatalf("advance %+v: %v", a, err) + } + } + + vector, err := s.ListDestinationRunIDs(ctx, vID, "bucket-a") + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + if len(vector) != 2 { + t.Fatalf("vector components = %d, want 2", len(vector)) + } + byNode := map[int64]int64{} + for _, c := range vector { + byNode[c.OriginNodeID] = c.OriginRunID + } + if byNode[self.ID] != 10 || byNode[peer.ID] != 4 { + t.Fatalf("vector = %+v, want self→10 peer→4", byNode) + } + + // bucket-b's component for self is independent of bucket-a's. + got, err := s.GetDestinationRunID(ctx, vID, "bucket-b", self.ID) + if err != nil { + t.Fatalf("GetDestinationRunID bucket-b: %v", err) + } + if got.OriginRunID != 2 { + t.Fatalf("bucket-b self component = %d, want 2", got.OriginRunID) + } +} + +// TestUpsertDestinationRunIDRejectsEmptyDestination: the destination is +// the vector's identity, so it must be non-empty. +func TestUpsertDestinationRunIDRejectsEmptyDestination(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + if err := s.UpsertDestinationRunID(ctx, vID, "", node.ID, 1, false); err == nil { + t.Fatalf("empty destination accepted, want error") + } +} + +// advanceFromPresentSet snapshots the volume's present-set origin maxima +// and advances the destination's vector to exactly that snapshot, the +// snapshot-pinned path every handler drives. Tests use it to exercise the +// PresentOriginMaxima → AdvanceDestinationVectorTo pair the way production +// does. +func advanceFromPresentSet(t *testing.T, s *Store, volumeID int64, destination string) { + t.Helper() + ctx := context.Background() + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + components, err := s.PresentOriginMaxima(ctx, volumeID, self.ID) + if err != nil { + t.Fatalf("PresentOriginMaxima: %v", err) + } + if err := s.AdvanceDestinationVectorTo(ctx, volumeID, destination, VerifyMethodPeer, components); err != nil { + t.Fatalf("AdvanceDestinationVectorTo: %v", err) + } +} + +// TestAdvanceDestinationVector: the advance computes one component per +// origin node over the volume's present rows — locally-introduced +// content under the self node at its introduction run (the content's +// earliest first_seen, so a duplicate path observed later doesn't move +// the coordinate), forwarded content under its recorded origin +// verbatim — and excludes non-present rows and the reserved sync +// subtrees. +func TestAdvanceDestinationVector(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run1 := makeRun(t, s, vID) + run2 := makeRun(t, s, vID) + run3 := makeRun(t, s, vID) + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + ext, err := s.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + + upsert := func(path string, b byte, status string, firstSeen int64, prov *Provenance) { + t.Helper() + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: path, Blake3: digest(b), + SizeBytes: 1, MtimeNs: 1, Status: status, + FirstSeenRunID: firstSeen, LastSeenRunID: firstSeen, IndexedAtNs: 1, + }, prov); err != nil { + t.Fatalf("Upsert %s: %v", path, err) + } + } + upsert("a.txt", 0xA1, StatusPresent, run1, nil) + upsert("b.txt", 0xA2, StatusPresent, run2, nil) + upsert("c.txt", 0xA3, StatusPresent, run1, &Provenance{NodeID: ext.ID, RunID: 50}) + // A duplicate path of a.txt's content first seen at run3: the + // content's introduction run stays run1 — the coordinate the sender + // materialises on the wire — so the self component must stay at + // run2 (b.txt's introduction). + upsert("a-dup.txt", 0xA1, StatusPresent, run3, nil) + // Non-present and reserved-subtree rows must not advance anything: + // gone.txt would push the self component to run3, and the conflict + // leftover would push ext to 999. + upsert("gone.txt", 0xA4, StatusMissing, run3, nil) + upsert(".squirrel-conflicts/run-1/x.bin", 0xA5, StatusPresent, run3, &Provenance{NodeID: ext.ID, RunID: 999}) + + advanceFromPresentSet(t, s, vID, "nas") + vector, err := s.ListDestinationRunIDs(ctx, vID, "nas") + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + byNode := map[int64]int64{} + for _, c := range vector { + byNode[c.OriginNodeID] = c.OriginRunID + } + if len(byNode) != 2 || byNode[self.ID] != run2 || byNode[ext.ID] != 50 { + t.Fatalf("vector = %+v, want self→%d ext→50", byNode, run2) + } +} + +// TestAdvanceDestinationVectorKeepsHigherComponent: a recorded +// component above the computed present-set maximum stays in place (the +// destination is append-only, so the higher floor still holds) and the +// advance reports no error — componentwise max, not a rewind. +func TestAdvanceDestinationVectorKeepsHigherComponent(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run1 := makeRun(t, s, vID) + ext, err := s.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode: %v", err) + } + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "c.txt", Blake3: digest(0xB1), + SizeBytes: 1, MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: run1, LastSeenRunID: run1, IndexedAtNs: 1, + }, &Provenance{NodeID: ext.ID, RunID: 50}); err != nil { + t.Fatalf("Upsert: %v", err) + } + if err := s.UpsertDestinationRunID(ctx, vID, "nas", ext.ID, 60, false); err != nil { + t.Fatalf("seed component: %v", err) + } + + advanceFromPresentSet(t, s, vID, "nas") + got, err := s.GetDestinationRunID(ctx, vID, "nas", ext.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 60 { + t.Fatalf("ext component = %d, want 60 (higher recorded floor kept)", got.OriginRunID) + } +} + +// TestAdvanceDestinationVectorToPeerSnapshotPinned proves the peer-path +// advance covers only the captured snapshot: a row that becomes present +// between snapshot capture and the advance is not folded in. The advance +// is fed the snapshot taken before the row existed, tagged peer-blake3, +// so the later row's higher origin run never reaches the vector. +func TestAdvanceDestinationVectorToPeerSnapshotPinned(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run1 := makeRun(t, s, vID) + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: digest(0xC1), + SizeBytes: 1, MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: run1, LastSeenRunID: run1, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert a.txt: %v", err) + } + + // Snapshot captured before the second row exists — the peer driver + // takes this before the transfer. + snapshot, err := s.PresentOriginMaxima(ctx, vID, self.ID) + if err != nil { + t.Fatalf("PresentOriginMaxima: %v", err) + } + + // A row committed mid-transfer with a strictly higher introduction run. + run2 := makeRun(t, s, vID) + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "b.txt", Blake3: digest(0xC2), + SizeBytes: 1, MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: run2, LastSeenRunID: run2, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert b.txt: %v", err) + } + + if err := s.AdvanceDestinationVectorTo(ctx, vID, "nas", VerifyMethodPeer, snapshot); err != nil { + t.Fatalf("AdvanceDestinationVectorTo: %v", err) + } + got, err := s.GetDestinationRunID(ctx, vID, "nas", self.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != run1 { + t.Fatalf("self component = %d, want run1 %d (the mid-transfer row at run2 %d must not be covered)", got.OriginRunID, run1, run2) + } + if got.VerifyMethod != VerifyMethodPeer { + t.Fatalf("verify method = %q, want %q", got.VerifyMethod, VerifyMethodPeer) + } +} + +// TestListVolumeDestinationRunIDs returns components across every +// destination of the volume, ordered by destination then origin node. +func TestListVolumeDestinationRunIDs(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + otherVol := makeVolume(t, s, "/other") + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + for _, a := range []struct { + volID int64 + dest string + run int64 + }{ + {vID, "bucket-b", 7}, + {vID, "bucket-a", 3}, + {otherVol, "bucket-a", 99}, + } { + if err := s.UpsertDestinationRunID(ctx, a.volID, a.dest, self.ID, a.run, false); err != nil { + t.Fatalf("seed %+v: %v", a, err) + } + } + + rows, err := s.ListVolumeDestinationRunIDs(ctx, vID) + if err != nil { + t.Fatalf("ListVolumeDestinationRunIDs: %v", err) + } + if len(rows) != 2 { + t.Fatalf("rows = %d, want 2 (other volume excluded)", len(rows)) + } + if rows[0].Destination != "bucket-a" || rows[0].OriginRunID != 3 || + rows[1].Destination != "bucket-b" || rows[1].OriginRunID != 7 { + t.Fatalf("rows = %+v, want bucket-a→3 then bucket-b→7", rows) + } +} + +// TestAdvanceDestinationVectorToSnapshot is the #103 fix: the advance +// reflects exactly the captured enumeration snapshot, not the live +// present set re-read after a transfer. A content row inserted between +// the snapshot and the advance is NOT claimed durable. +func TestAdvanceDestinationVectorToSnapshot(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run1 := makeRun(t, s, vID) + self, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: digest(0xA1), SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: run1, LastSeenRunID: run1, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert a.txt: %v", err) + } + + // Snapshot captured here, before a second row lands. + snapshot, err := s.PresentOriginMaxima(ctx, vID, self.ID) + if err != nil { + t.Fatalf("PresentOriginMaxima: %v", err) + } + + // A row committed after the snapshot (a mid-push index) advances the + // live present set to run2 — but the snapshot still reads run1. + run2 := makeRun(t, s, vID) + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "b.txt", Blake3: digest(0xA2), SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: run2, LastSeenRunID: run2, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert b.txt: %v", err) + } + + if err := s.AdvanceDestinationVectorTo(ctx, vID, "nas", VerifyMethodBlake3, snapshot); err != nil { + t.Fatalf("AdvanceDestinationVectorTo: %v", err) + } + got, err := s.GetDestinationRunID(ctx, vID, "nas", self.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != run1 { + t.Fatalf("self component = %d, want %d (snapshot, not the live run2)", got.OriginRunID, run1) + } + if got.VerifyMethod != VerifyMethodBlake3 { + t.Fatalf("verify method = %q, want %q", got.VerifyMethod, VerifyMethodBlake3) + } +} + +// TestUpsertDestinationRunIDRecordsMethod: the verified entry point +// records the method on the live row and in history; the legacy entry +// point records none. +func TestUpsertDestinationRunIDRecordsMethod(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.UpsertDestinationRunIDVerified(ctx, vID, "bucket", node.ID, 5, VerifyMethodKopia, false); err != nil { + t.Fatalf("UpsertDestinationRunIDVerified: %v", err) + } + got, err := s.GetDestinationRunID(ctx, vID, "bucket", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.VerifyMethod != VerifyMethodKopia { + t.Fatalf("verify method = %q, want %q", got.VerifyMethod, VerifyMethodKopia) + } + hist, err := s.ListDestinationRunIDHistory(ctx, vID, "bucket") + if err != nil { + t.Fatalf("ListDestinationRunIDHistory: %v", err) + } + if len(hist) != 1 || hist[0].VerifyMethod != VerifyMethodKopia { + t.Fatalf("history = %+v, want one row with method %q", hist, VerifyMethodKopia) + } + + if err := s.UpsertDestinationRunID(ctx, vID, "bucket2", node.ID, 5, false); err != nil { + t.Fatalf("UpsertDestinationRunID: %v", err) + } + plain, err := s.GetDestinationRunID(ctx, vID, "bucket2", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if plain.VerifyMethod != "" { + t.Fatalf("verify method = %q, want empty (no method recorded)", plain.VerifyMethod) + } +} + +// TestUpsertDestinationRunIDPreservesMethodOnMethodlessReconfirm: a +// methodless re-confirmation at the same origin run (e.g. a pull from a +// pre-v19 peer) must not degrade a recorded content-verified method to +// unknown — provenance is preserved when the run does not advance. +func TestUpsertDestinationRunIDPreservesMethodOnMethodlessReconfirm(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if err := s.UpsertDestinationRunIDVerified(ctx, vID, "bucket", node.ID, 5, VerifyMethodBlake3, false); err != nil { + t.Fatalf("seed verified: %v", err) + } + // Methodless re-confirm at the same run. + if err := s.UpsertDestinationRunID(ctx, vID, "bucket", node.ID, 5, false); err != nil { + t.Fatalf("methodless reconfirm: %v", err) + } + got, err := s.GetDestinationRunID(ctx, vID, "bucket", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.VerifyMethod != VerifyMethodBlake3 { + t.Fatalf("verify method = %q, want %q preserved", got.VerifyMethod, VerifyMethodBlake3) + } + + // A methodless advance to a strictly higher run clears the method — + // the new coordinate is genuinely unverified. + if err := s.UpsertDestinationRunID(ctx, vID, "bucket", node.ID, 9, false); err != nil { + t.Fatalf("methodless advance: %v", err) + } + got, err = s.GetDestinationRunID(ctx, vID, "bucket", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 9 || got.VerifyMethod != "" { + t.Fatalf("after methodless advance: run=%d method=%q, want 9 and empty", got.OriginRunID, got.VerifyMethod) + } +} + +// TestDestinationRunIDNullVerifyMethodReadsUnverified pins the v19 +// backfill contract: a component with a NULL verify_method (a pre-v19 +// row, or a legacy upsert) scans back as an empty method, which +// ContentVerifiedMethod treats as not content-verified — so the offload +// gate refuses such a component rather than over-claiming. +func TestDestinationRunIDNullVerifyMethodReadsUnverified(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + node, err := s.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + + if _, err := s.db.ExecContext(ctx, ` + INSERT INTO destination_run_ids (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns, verify_method) + VALUES (?, ?, ?, ?, ?, NULL) + `, vID, "legacy", node.ID, 5, NowNs()); err != nil { + t.Fatalf("insert NULL-method component: %v", err) + } + + got, err := s.GetDestinationRunID(ctx, vID, "legacy", node.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.VerifyMethod != "" { + t.Fatalf("verify method = %q, want empty for a NULL column", got.VerifyMethod) + } + if ContentVerifiedMethod(got.VerifyMethod) { + t.Fatalf("a NULL/empty method must not count as content-verified") + } +} + +// TestContentVerifiedMethod pins which methods the offload gate accepts +// as genuine content verification. +func TestContentVerifiedMethod(t *testing.T) { + verified := []string{VerifyMethodBlake3, VerifyMethodPeer, VerifyMethodKopia} + for _, m := range verified { + if !ContentVerifiedMethod(m) { + t.Fatalf("method %q should be content-verified", m) + } + } + for _, m := range []string{VerifyMethodPresenceSize, VerifyMethodSizeMtime, "", "bogus"} { + if ContentVerifiedMethod(m) { + t.Fatalf("method %q must not be content-verified", m) + } + } +} diff --git a/store/files.go b/store/files.go index 4985845..107a2c6 100644 --- a/store/files.go +++ b/store/files.go @@ -10,32 +10,44 @@ import ( "strings" ) -// FileRow is a single indexed file. Path is the file's volume-relative path -// — reconstructed from the underlying (folder_id, name) storage on every -// read. VolumeID references volumes(id). FirstSeenRunID is the run that -// first inserted this row and is never overwritten on subsequent updates; -// LastSeenRunID advances on every observation. SourceNodeID and SourceRunID -// record provenance for peer-syncs (NULL means "local write" — today's only -// path). They are populated on read for inspection; writes go through Upsert -// with an explicit *Provenance. +// FileRow is a single indexed file: one path↔content observation joined +// with its contents row. Path is the file's volume-relative path — +// reconstructed from the underlying (folder_id, name) storage on every +// read. VolumeID references volumes(id). ContentID, Blake3, SizeBytes, +// OriginNodeID, and OriginRunID come from the joined contents row; +// origin is where the bytes first entered the system (NULL means +// "introduced locally"), with OriginRunID in the origin node's run +// space. FirstSeenRunID is the run that first inserted this row and is +// never overwritten on subsequent updates; LastSeenRunID advances on +// every observation. Writes go through Upsert, which resolves Blake3 + +// SizeBytes to a contents row (creating one, with the supplied +// *Provenance as its origin, on first contact) and ignores ContentID. type FileRow struct { - VolumeID int64 - Path string - Blake3 []byte // raw 32-byte BLAKE3-256 digest - SizeBytes int64 - MtimeNs int64 - Status string - FirstSeenRunID int64 - LastSeenRunID int64 - IndexedAtNs int64 - SourceNodeID sql.NullInt64 - SourceRunID sql.NullInt64 -} - -// Provenance carries the "who wrote this row" attribution that Upsert -// records on a peer-sourced write. NodeID references nodes(id) and RunID -// references runs(id) on the receiver's side. A nil *Provenance to Upsert -// records NULLs, the convention for local writes (today's only path). + VolumeID int64 + Path string + ContentID int64 + Blake3 []byte // raw 32-byte BLAKE3-256 digest + SizeBytes int64 + MtimeNs int64 + Status string + FirstSeenRunID int64 + LastSeenRunID int64 + IndexedAtNs int64 + OriginNodeID sql.NullInt64 + OriginRunID sql.NullInt64 + StatusChangedRunID sql.NullInt64 +} + +// Provenance carries the "where did this content first enter the +// system" attribution that Upsert records as (origin_node_id, +// origin_run_id) when a write creates a new contents row. NodeID +// references nodes(id); RunID is in the origin node's run space — the +// run at which that node introduced the content — so it is not a local +// runs FK. The pair is propagated verbatim across peer hops, never +// relabelled to the immediate sender. A nil *Provenance records NULLs, +// the convention for locally introduced content. Content that already +// has a contents row keeps its recorded origin — origin is content- +// level first-introduction provenance, not per-observation attribution. type Provenance struct { NodeID int64 RunID int64 @@ -48,22 +60,31 @@ type FileWithVolume struct { Volume Volume } +// File statuses. 'present' and 'missing' describe whether the live +// content was found on disk; 'offloaded' is the intentional sibling of +// 'missing' — the content is the path's current content but its local +// bytes were deliberately removed after being secured elsewhere, so +// indexing and audit treat the on-disk absence as expected. 'superseded' +// rows are the append-only content history of a path. const ( StatusPresent = "present" StatusMissing = "missing" StatusSuperseded = "superseded" + StatusOffloaded = "offloaded" ) // fileSelectColumns is the projection used by every files read. The path -// column is reconstructed from the joined folders row so callers see the -// same FileRow shape as in v7 even though storage is keyed off -// (folder_id, name). Pair every new SELECT with this list and the -// fileFromJoin clause below so columns stay in lockstep with scanDests. -const fileSelectColumns = `fo.volume_id, ` + pathFromFolderAndName + `, f.blake3, f.size_bytes, f.mtime_ns, f.status, f.first_seen_run_id, f.last_seen_run_id, f.indexed_at_ns, f.source_node_id, f.source_run_id` +// column is reconstructed from the joined folders row, and the content +// columns (blake3, size, origin) from the joined contents row, so callers +// see one flat FileRow even though storage is split. Pair every new +// SELECT with this list and the fileFromJoin clause below so columns stay +// in lockstep with scanDests. +const fileSelectColumns = `fo.volume_id, ` + pathFromFolderAndName + `, f.content_id, c.blake3, c.size_bytes, f.mtime_ns, f.status, f.first_seen_run_id, f.last_seen_run_id, f.indexed_at_ns, c.origin_node_id, c.origin_run_id, f.status_changed_run_id` // fileFromJoin is the FROM clause every file read uses. files is the inner -// table; folders is joined for volume_id + path reconstruction. -const fileFromJoin = `files f JOIN folders fo ON fo.id = f.folder_id` +// table; folders is joined for volume_id + path reconstruction and +// contents for the content columns. +const fileFromJoin = `files f JOIN folders fo ON fo.id = f.folder_id JOIN contents c ON c.id = f.content_id` // joinedColumns extends fileSelectColumns with the volume row for SELECTs // that pre-resolve the user-facing filesystem path. The volumes JOIN is @@ -88,9 +109,9 @@ func (r *FileRow) scanFrom(s rowScanner) error { // pointers on top so files-half scanning still has one source of truth. func (r *FileRow) scanDests() []any { return []any{ - &r.VolumeID, &r.Path, &r.Blake3, &r.SizeBytes, &r.MtimeNs, + &r.VolumeID, &r.Path, &r.ContentID, &r.Blake3, &r.SizeBytes, &r.MtimeNs, &r.Status, &r.FirstSeenRunID, &r.LastSeenRunID, &r.IndexedAtNs, - &r.SourceNodeID, &r.SourceRunID, + &r.OriginNodeID, &r.OriginRunID, &r.StatusChangedRunID, } } @@ -128,9 +149,10 @@ func queryRows[T any](ctx context.Context, db *sql.DB, query string, scan func(r } // GetByPath returns the currently-live row for (volumeID, relPath) — i.e. -// the row with status='present' or status='missing'. Superseded rows -// (historical content at this path) are skipped; use ListHistoryByPath to -// see them. Returns sql.ErrNoRows when no live row exists for the path. +// the row with status 'present', 'missing', or 'offloaded'. Superseded +// rows (historical content at this path) are skipped; use +// ListHistoryByPath to see them. Returns sql.ErrNoRows when no live row +// exists for the path. // // The path-level invariant (at most one non-superseded row per // (folder_id, name)) is enforced by Upsert and the uniq_files_live_per_path @@ -261,7 +283,7 @@ func relPathUnder(base, abs string) (string, bool) { func (s *Store) GetPresentByBlake3InVolume(ctx context.Context, volumeID int64, digest []byte) (FileRow, error) { row := s.db.QueryRowContext(ctx, `SELECT `+fileSelectColumns+` FROM `+fileFromJoin+` - WHERE f.blake3 = ? AND fo.volume_id = ? AND f.status = 'present' + WHERE c.blake3 = ? AND fo.volume_id = ? AND f.status = 'present' AND fo.path != '.squirrel-history' AND fo.path NOT LIKE '.squirrel-history/%' AND fo.path != '.squirrel-conflicts' AND fo.path NOT LIKE '.squirrel-conflicts/%' AND fo.path != '.squirrel-restore-history' AND fo.path NOT LIKE '.squirrel-restore-history/%' @@ -272,46 +294,69 @@ func (s *Store) GetPresentByBlake3InVolume(ctx context.Context, volumeID int64, return r, err } +// ContentIntroductionRunID returns the earliest first_seen_run_id among +// every files row (any status) observing contentID in volumeID — the +// local run at which the content was introduced to the volume. This is +// the origin-run coordinate the peer-sync sender materialises for +// locally-introduced content (contents.origin_* NULL). Returns +// sql.ErrNoRows when the content has never been observed in the volume. +func (s *Store) ContentIntroductionRunID(ctx context.Context, volumeID, contentID int64) (int64, error) { + var run sql.NullInt64 + err := s.db.QueryRowContext(ctx, + `SELECT MIN(f.first_seen_run_id) FROM files f + JOIN folders fo ON fo.id = f.folder_id + WHERE fo.volume_id = ? AND f.content_id = ?`, + volumeID, contentID).Scan(&run) + if err != nil { + return 0, fmt.Errorf("content introduction run: %w", err) + } + if !run.Valid { + return 0, sql.ErrNoRows + } + return run.Int64, nil +} + // GetByBlake3 returns all rows matching the given BLAKE3 digest (raw 32 bytes), // joined with their volume. func (s *Store) GetByBlake3(ctx context.Context, digest []byte) ([]FileWithVolume, error) { return queryRows(ctx, s.db, ` SELECT `+joinedColumns+` FROM `+fileFromJoin+` JOIN volumes v ON v.id = fo.volume_id - WHERE f.blake3 = ? + WHERE c.blake3 = ? ORDER BY v.name, `+pathFromFolderAndName+` `, scanFileWithVolume, digest) } // Upsert records an observation of content at a path. It is the only // supported write path for the files table because it enforces the -// "never overwrite a hash" rule: blake3 on an existing row is immutable. +// "never overwrite a hash" rule: a row's content_id is immutable. // -// There are three cases, all handled atomically in a single transaction: +// The row's Blake3 + SizeBytes are first resolved to a contents row, +// creating one when the hash has never been seen (with prov as its +// origin). Then there are three cases, all handled atomically in a +// single transaction: // -// 1. A row with the exact (folder_id, name, blake3) already exists and is -// the live row — update its mutable fields (touch / restore from -// missing). first_seen_run_id is preserved. -// 2. A row with the exact (folder_id, name, blake3) exists but is +// 1. A row with the exact (folder_id, name, content_id) already exists +// and is the live row — update its mutable fields (touch / restore +// from missing or offloaded). first_seen_run_id is preserved. +// 2. A row with the exact (folder_id, name, content_id) exists but is // superseded (content has reverted to a previously-seen value) — flip // the currently-live row at this path to 'superseded' and revive the // matched row to the requested status (first_seen_run_id preserved). -// 3. No row exists at (folder_id, name, blake3) — flip the currently-live -// row at (folder_id, name), if any, to 'superseded' and insert the new -// row. +// 3. No row exists at (folder_id, name, content_id) — flip the +// currently-live row at (folder_id, name), if any, to 'superseded' +// and insert the new row. // -// In all cases, blake3 is never rewritten in place; content history at a -// path grows append-only. After the row write succeeds, the affected +// In all cases, content_id is never rewritten in place; content history +// at a path grows append-only. After the row write succeeds, the affected // folder's shallow + deep hashes and every ancestor's deep hash are // recomputed inside the same transaction so the folder Merkle stays // consistent with the live file set (#44). // -// prov carries the per-write provenance recorded as (source_node_id, -// source_run_id) on the affected row. A nil *Provenance records NULLs — -// "local write", today's only path. The provenance reflects the current -// observation: cases 1 and 2 rewrite the live row's source columns to the -// new prov (the previous attribution is preserved on the superseded -// row in case 2). +// prov is recorded as the new contents row's (origin_node_id, +// origin_run_id) when this write introduces the content; a nil +// *Provenance records NULLs — "introduced locally". Content that already +// has a contents row keeps its recorded origin. func (s *Store) Upsert(ctx context.Context, r FileRow, prov *Provenance) error { tx, err := s.db.BeginTx(ctx, nil) if err != nil { @@ -350,33 +395,37 @@ func upsertRowInTx(ctx context.Context, tx *sql.Tx, r FileRow, prov *Provenance) if err != nil { return 0, err } + contentID, err := getOrCreateContentTx(ctx, tx, r.Blake3, r.SizeBytes, prov) + if err != nil { + return 0, err + } var existingStatus string err = tx.QueryRowContext(ctx, - `SELECT status FROM files WHERE folder_id = ? AND name = ? AND blake3 = ?`, - folderID, name, r.Blake3).Scan(&existingStatus) + `SELECT status FROM files WHERE folder_id = ? AND name = ? AND content_id = ?`, + folderID, name, contentID).Scan(&existingStatus) switch { case err == nil && existingStatus != StatusSuperseded: // Case 1: exact row exists and is live — touch it. - if err := updateLiveRow(ctx, tx, folderID, name, r, prov); err != nil { + if err := updateLiveRow(ctx, tx, folderID, name, contentID, r); err != nil { return 0, err } case err == nil && existingStatus == StatusSuperseded: // Case 2: content revert — supersede whatever is live now, then // revive the matched (formerly superseded) row. - if err := supersedeLiveRow(ctx, tx, folderID, name); err != nil { + if err := supersedeLiveRow(ctx, tx, folderID, name, r.LastSeenRunID); err != nil { return 0, err } - if err := updateLiveRow(ctx, tx, folderID, name, r, prov); err != nil { + if err := updateLiveRow(ctx, tx, folderID, name, contentID, r); err != nil { return 0, err } case errors.Is(err, sql.ErrNoRows): // Case 3: brand new content at this path (possibly first-ever). - if err := supersedeLiveRow(ctx, tx, folderID, name); err != nil { + if err := supersedeLiveRow(ctx, tx, folderID, name, r.LastSeenRunID); err != nil { return 0, err } - if err := insertNewRow(ctx, tx, folderID, name, r, prov); err != nil { + if err := insertNewRow(ctx, tx, folderID, name, contentID, r); err != nil { return 0, err } default: @@ -385,10 +434,10 @@ func upsertRowInTx(ctx context.Context, tx *sql.Tx, r FileRow, prov *Provenance) return folderID, nil } -// provColumns returns the (source_node_id, source_run_id) pair as -// sql.NullInt64 so callers can splat them into UPDATE/INSERT bind lists. -// A nil *Provenance yields two invalid NullInt64 — the binding renders as -// NULL columns, the "local write" convention. +// provColumns returns the (origin_node_id, origin_run_id) pair as +// sql.NullInt64 so callers can splat them into INSERT bind lists. A nil +// *Provenance yields two invalid NullInt64 — the binding renders as NULL +// columns, the "introduced locally" convention. func provColumns(p *Provenance) (sql.NullInt64, sql.NullInt64) { if p == nil { return sql.NullInt64{}, sql.NullInt64{} @@ -397,15 +446,77 @@ func provColumns(p *Provenance) (sql.NullInt64, sql.NullInt64) { sql.NullInt64{Int64: p.RunID, Valid: true} } +// getOrCreateContentTx resolves a blake3 digest to its contents row id, +// inserting the row on first contact with this content. The insert +// records sizeBytes and the supplied provenance as the content's origin; +// a digest that already has a row keeps its stored size and origin (the +// contents table is append-only and rows are immutable). A stored size +// that disagrees with sizeBytes surfaces as an error. +func getOrCreateContentTx(ctx context.Context, tx *sql.Tx, digest []byte, sizeBytes int64, prov *Provenance) (int64, error) { + id, err := lookupContentTx(ctx, tx, digest, sizeBytes) + if err == nil { + return id, nil + } + if !errors.Is(err, sql.ErrNoRows) { + return 0, err + } + originNode, originRun := provColumns(prov) + res, err := tx.ExecContext(ctx, + `INSERT INTO contents (blake3, size_bytes, origin_node_id, origin_run_id) + VALUES (?, ?, ?, ?) + ON CONFLICT(blake3) DO NOTHING`, + digest, sizeBytes, originNode, originRun) + if err != nil { + return 0, fmt.Errorf("insert content: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return 0, fmt.Errorf("insert content rows: %w", err) + } + if n == 1 { + id, err = res.LastInsertId() + if err != nil { + return 0, fmt.Errorf("content last insert id: %w", err) + } + return id, nil + } + // The digest landed via a concurrent writer between lookup and insert. + id, err = lookupContentTx(ctx, tx, digest, sizeBytes) + if err != nil { + return 0, fmt.Errorf("re-lookup content after conflict: %w", err) + } + return id, nil +} + +// lookupContentTx returns the contents row id for digest. A stored +// size_bytes that disagrees with sizeBytes means index corruption or a +// mis-hashing caller, so it surfaces loudly instead of returning the row. +func lookupContentTx(ctx context.Context, tx *sql.Tx, digest []byte, sizeBytes int64) (int64, error) { + var id, storedSize int64 + err := tx.QueryRowContext(ctx, + `SELECT id, size_bytes FROM contents WHERE blake3 = ?`, digest).Scan(&id, &storedSize) + if errors.Is(err, sql.ErrNoRows) { + return 0, err + } + if err != nil { + return 0, fmt.Errorf("lookup content: %w", err) + } + if storedSize != sizeBytes { + return 0, fmt.Errorf("content %x: stored size %d disagrees with observed size %d", digest, storedSize, sizeBytes) + } + return id, nil +} + // supersedeLiveRow flips the single non-superseded row at (folderID, name) -// (if any) to status='superseded'. A no-op when there is no live row, e.g. -// the very first observation of a path. last_seen_run_id stays frozen at the +// (if any) to status='superseded', stamping runID as the row's +// status-change run. A no-op when there is no live row, e.g. the very +// first observation of a path. last_seen_run_id stays frozen at the // value it had — that is the run during which the row was last seen alive. -func supersedeLiveRow(ctx context.Context, tx *sql.Tx, folderID int64, name string) error { +func supersedeLiveRow(ctx context.Context, tx *sql.Tx, folderID int64, name string, runID int64) error { _, err := tx.ExecContext(ctx, ` - UPDATE files SET status = 'superseded' + UPDATE files SET status = 'superseded', status_changed_run_id = ? WHERE folder_id = ? AND name = ? AND status != 'superseded' - `, folderID, name) + `, runID, folderID, name) if err != nil { return fmt.Errorf("supersede live row: %w", err) } @@ -414,10 +525,12 @@ func supersedeLiveRow(ctx context.Context, tx *sql.Tx, folderID int64, name stri // RecordConflictPreStage atomically supersedes the live row at // originalPath and inserts a new 'present' row at conflictRow.Path -// carrying the prior blake3 and the supplied provenance. The two -// updates run inside one transaction so an agent crash between them -// rolls both back rather than leaving the receiver in a state where -// the prior content is reachable only by path or only by hash. +// carrying the prior content. prov becomes the content's origin only +// when the conflict carries bytes never seen before (an out-of-band +// drift); known content keeps its recorded origin. The two updates run +// inside one transaction so an agent crash between them rolls both back +// rather than leaving the receiver in a state where the prior content +// is reachable only by path or only by hash. // // The on-disk rename that moves the bytes from originalPath to // conflictRow.Path is NOT part of this transaction (the filesystem @@ -442,7 +555,7 @@ func (s *Store) RecordConflictPreStage(ctx context.Context, volumeID int64, orig if err != nil { return err } - if err := supersedeLiveRow(ctx, tx, origFolderID, origName); err != nil { + if err := supersedeLiveRow(ctx, tx, origFolderID, origName, conflictRow.LastSeenRunID); err != nil { return err } @@ -451,7 +564,11 @@ func (s *Store) RecordConflictPreStage(ctx context.Context, volumeID int64, orig if err != nil { return err } - if err := insertNewRow(ctx, tx, conflictFolderID, conflictName, conflictRow, prov); err != nil { + conflictContentID, err := getOrCreateContentTx(ctx, tx, conflictRow.Blake3, conflictRow.SizeBytes, prov) + if err != nil { + return err + } + if err := insertNewRow(ctx, tx, conflictFolderID, conflictName, conflictContentID, conflictRow); err != nil { return err } @@ -467,36 +584,36 @@ func (s *Store) RecordConflictPreStage(ctx context.Context, volumeID int64, orig } // updateLiveRow refreshes the mutable fields on an existing row matching -// (folder_id, name, blake3). blake3 and first_seen_run_id are never touched. -// The (source_node_id, source_run_id) provenance pair is rewritten to the -// caller-supplied prov so the row tracks the most recent attribution. -func updateLiveRow(ctx context.Context, tx *sql.Tx, folderID int64, name string, r FileRow, prov *Provenance) error { - srcNode, srcRun := provColumns(prov) +// (folder_id, name, content_id). content_id and first_seen_run_id are +// never touched. status_changed_run_id advances exactly when the write +// changes the row's status (the CASE reads the pre-update status, per +// SQL UPDATE semantics), covering the revive transitions Case 2 routes +// through here. +func updateLiveRow(ctx context.Context, tx *sql.Tx, folderID int64, name string, contentID int64, r FileRow) error { _, err := tx.ExecContext(ctx, ` UPDATE files SET - size_bytes = ?, mtime_ns = ?, status = ?, - last_seen_run_id = ?, indexed_at_ns = ?, - source_node_id = ?, source_run_id = ? - WHERE folder_id = ? AND name = ? AND blake3 = ? - `, r.SizeBytes, r.MtimeNs, r.Status, r.LastSeenRunID, r.IndexedAtNs, - srcNode, srcRun, - folderID, name, r.Blake3) + mtime_ns = ?, + status_changed_run_id = CASE WHEN status = ? THEN status_changed_run_id ELSE ? END, + status = ?, + last_seen_run_id = ?, indexed_at_ns = ? + WHERE folder_id = ? AND name = ? AND content_id = ? + `, r.MtimeNs, r.Status, r.LastSeenRunID, r.Status, r.LastSeenRunID, r.IndexedAtNs, + folderID, name, contentID) if err != nil { return fmt.Errorf("update live row: %w", err) } return nil } -func insertNewRow(ctx context.Context, tx *sql.Tx, folderID int64, name string, r FileRow, prov *Provenance) error { - srcNode, srcRun := provColumns(prov) +func insertNewRow(ctx context.Context, tx *sql.Tx, folderID int64, name string, contentID int64, r FileRow) error { _, err := tx.ExecContext(ctx, - `INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, + `INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns, - source_node_id, source_run_id) - VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, - folderID, name, r.Blake3, r.SizeBytes, r.MtimeNs, + status_changed_run_id) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`, + folderID, name, contentID, r.MtimeNs, r.Status, r.FirstSeenRunID, r.LastSeenRunID, r.IndexedAtNs, - srcNode, srcRun) + r.FirstSeenRunID) if err != nil { return fmt.Errorf("insert new row: %w", err) } @@ -558,9 +675,11 @@ func touchSeenRowInTx(ctx context.Context, tx *sql.Tx, volumeID int64, relPath s return 0, fmt.Errorf("lookup folder: %w", err) } if _, err := tx.ExecContext(ctx, - `UPDATE files SET last_seen_run_id = ?, status = 'present' + `UPDATE files SET last_seen_run_id = ?, + status_changed_run_id = CASE WHEN status = 'present' THEN status_changed_run_id ELSE ? END, + status = 'present' WHERE folder_id = ? AND name = ? AND status != 'superseded'`, - runID, folderID, name); err != nil { + runID, runID, folderID, name); err != nil { return 0, fmt.Errorf("touch seen: %w", err) } return folderID, nil @@ -648,9 +767,12 @@ func (s *Store) ApplyIndexBatch(ctx context.Context, runID int64, entries []Inde // MarkMissing flips every row in the given volume that was not touched by the // given run (last_seen_run_id != currentRunID) and is currently 'present' to // 'missing', stamping last_seen_run_id with the current run as part of the -// flip. The stamp captures the audit run that *observed* the absence so -// drift surfacing (#17) can count "files newly missing during run N" via -// CountMissingFilesByRun without any audit-specific schema column. +// flip. Only 'present' rows are eligible: an 'offloaded' row's on-disk +// absence is intentional, so it keeps its status and stays out of the +// drift counts. The stamp captures the audit run that *observed* the +// absence so drift surfacing (#17) can count "files newly missing during +// run N" via CountMissingFilesByRun without any audit-specific schema +// column. // // The caller is responsible for only invoking this after the run has fully // scanned the volume: any path the run failed to visit (per-file error, @@ -697,11 +819,11 @@ func (s *Store) MarkMissing(ctx context.Context, volumeID int64, currentRunID in affRows.Close() res, err := tx.ExecContext(ctx, ` - UPDATE files SET status = 'missing', last_seen_run_id = ? + UPDATE files SET status = 'missing', last_seen_run_id = ?, status_changed_run_id = ? WHERE status = 'present' AND folder_id IN ( SELECT id FROM folders WHERE volume_id = ? ) AND last_seen_run_id != ? - `, currentRunID, volumeID, currentRunID) + `, currentRunID, currentRunID, volumeID, currentRunID) if err != nil { return 0, fmt.Errorf("mark missing: %w", err) } @@ -720,18 +842,62 @@ func (s *Store) MarkMissing(ctx context.Context, volumeID int64, currentRunID in return n, nil } -// ListDuplicates returns rows whose blake3 digest appears at more than one +// MarkOffloaded flips the live 'present' row at (volumeID, relPath) +// carrying contentID to status='offloaded', stamping last_seen_run_id +// with the offload run that removed the bytes. Matching on the exact +// content id is part of the offload safety contract: the caller +// verified the on-disk bytes against this content immediately before +// unlinking, so a row whose content or status changed underfoot matches +// nothing here and surfaces as an error instead of mislabelling a +// different observation. The folder Merkle and ancestor chain are +// recomputed in the same transaction because the flip removes the file +// from the live set, mirroring MarkMissing. +func (s *Store) MarkOffloaded(ctx context.Context, volumeID int64, relPath string, contentID, runID int64) error { + tx, err := s.db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("begin mark offloaded: %w", err) + } + defer tx.Rollback() + + folderPath, name := splitFilePath(relPath) + var folderID int64 + if err := tx.QueryRowContext(ctx, + `SELECT id FROM folders WHERE volume_id = ? AND path = ?`, + volumeID, folderPath).Scan(&folderID); err != nil { + return fmt.Errorf("mark offloaded %s: lookup folder: %w", relPath, err) + } + res, err := tx.ExecContext(ctx, ` + UPDATE files SET status = 'offloaded', last_seen_run_id = ?, status_changed_run_id = ? + WHERE folder_id = ? AND name = ? AND content_id = ? AND status = 'present' + `, runID, runID, folderID, name, contentID) + if err != nil { + return fmt.Errorf("mark offloaded %s: %w", relPath, err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("mark offloaded rows %s: %w", relPath, err) + } + if n != 1 { + return fmt.Errorf("mark offloaded %s: no live 'present' row with content id %d", relPath, contentID) + } + if err := recomputeFolderAndAncestors(ctx, tx, folderID, runID); err != nil { + return err + } + return tx.Commit() +} + +// ListDuplicates returns rows whose content appears at more than one // (volume_id, path), joined with their volume. func (s *Store) ListDuplicates(ctx context.Context) ([]FileWithVolume, error) { return queryRows(ctx, s.db, ` SELECT `+joinedColumns+` FROM `+fileFromJoin+` JOIN volumes v ON v.id = fo.volume_id - WHERE f.blake3 IN ( - SELECT blake3 FROM files WHERE status = 'present' - GROUP BY blake3 HAVING COUNT(*) > 1 + WHERE f.content_id IN ( + SELECT content_id FROM files WHERE status = 'present' + GROUP BY content_id HAVING COUNT(*) > 1 ) AND f.status = 'present' - ORDER BY f.blake3, v.name, `+pathFromFolderAndName+` + ORDER BY c.blake3, v.name, `+pathFromFolderAndName+` `, scanFileWithVolume) } @@ -770,19 +936,19 @@ func (s *Store) ListPresentFilesInFolder(ctx context.Context, folderID int64) ([ scanFileRow, folderID) } -// ListPresentBySource yields every present row in volumeID whose -// source_node_id matches nodeID. A valid nodeID matches that node id -// and exploits idx_files_source_node (the partial index on -// (source_node_id) WHERE status='present' AND source_node_id IS NOT -// NULL); a zero NullInt64 filters to rows with source_node_id IS NULL — -// the "local write" convention — and falls back to a status-scoped scan -// because the partial index excludes those rows by construction. +// ListPresentByOrigin yields every present row in volumeID whose +// content's origin_node_id matches nodeID. A valid nodeID matches that +// node id and exploits idx_contents_origin_node (the partial index on +// contents(origin_node_id) WHERE origin_node_id IS NOT NULL); a zero +// NullInt64 filters to content with origin_node_id IS NULL — the +// "introduced locally" convention — and falls back to a status-scoped +// scan because the partial index excludes those rows by construction. // // Yielded in path order so a caller streaming to `rclone --files-from` // produces a stable, diffable listing. iter.Seq2 is used so large // volumes don't materialise the whole row set in memory before the // caller starts consuming it. -func (s *Store) ListPresentBySource(ctx context.Context, volumeID int64, nodeID sql.NullInt64) iter.Seq2[FileRow, error] { +func (s *Store) ListPresentByOrigin(ctx context.Context, volumeID int64, nodeID sql.NullInt64) iter.Seq2[FileRow, error] { return func(yield func(FileRow, error) bool) { var ( rows *sql.Rows @@ -791,25 +957,25 @@ func (s *Store) ListPresentBySource(ctx context.Context, volumeID int64, nodeID if nodeID.Valid { rows, err = s.db.QueryContext(ctx, `SELECT `+fileSelectColumns+` FROM `+fileFromJoin+` - WHERE fo.volume_id = ? AND f.status = 'present' AND f.source_node_id = ? + WHERE fo.volume_id = ? AND f.status = 'present' AND c.origin_node_id = ? ORDER BY `+pathFromFolderAndName, volumeID, nodeID.Int64) } else { rows, err = s.db.QueryContext(ctx, `SELECT `+fileSelectColumns+` FROM `+fileFromJoin+` - WHERE fo.volume_id = ? AND f.status = 'present' AND f.source_node_id IS NULL + WHERE fo.volume_id = ? AND f.status = 'present' AND c.origin_node_id IS NULL ORDER BY `+pathFromFolderAndName, volumeID) } if err != nil { - yield(FileRow{}, fmt.Errorf("query present by source: %w", err)) + yield(FileRow{}, fmt.Errorf("query present by origin: %w", err)) return } defer rows.Close() for rows.Next() { var r FileRow if err := r.scanFrom(rows); err != nil { - yield(FileRow{}, fmt.Errorf("scan present by source: %w", err)) + yield(FileRow{}, fmt.Errorf("scan present by origin: %w", err)) return } if !yield(r, nil) { diff --git a/store/folders.go b/store/folders.go index 479a10c..ed99573 100644 --- a/store/folders.go +++ b/store/folders.go @@ -19,7 +19,7 @@ const folderHashContext = "squirrel-folder-v1" // folderHashKey is the 32-byte key seeded once at package init. blake3 keyed // hashing with this key produces digests that cannot collide with raw -// file-content BLAKE3 digests stored on files.blake3, even if the +// file-content BLAKE3 digests stored on contents.blake3, even if the // observable bytes coincide. var folderHashKey [32]byte @@ -398,9 +398,10 @@ func recomputeFolderAndAncestors(ctx context.Context, tx *sql.Tx, folderID int64 // value, never NULL — and zero aggregates. func computeShallowAndDirectAggregatesTx(ctx context.Context, tx *sql.Tx, folderID int64) ([]byte, int64, int64, error) { rows, err := tx.QueryContext(ctx, - `SELECT name, blake3, size_bytes FROM files - WHERE folder_id = ? AND status = 'present' - ORDER BY name`, + `SELECT f.name, c.blake3, c.size_bytes + FROM files f JOIN contents c ON c.id = f.content_id + WHERE f.folder_id = ? AND f.status = 'present' + ORDER BY f.name`, folderID) if err != nil { return nil, 0, 0, fmt.Errorf("read folder %d files: %w", folderID, err) diff --git a/store/folders_test.go b/store/folders_test.go index abca213..9ddffe8 100644 --- a/store/folders_test.go +++ b/store/folders_test.go @@ -584,10 +584,10 @@ func upsertMissing(ctx context.Context, s *Store, volumeID int64, relPath string return err } // MarkMissing won't help here (it acts on whole-run staleness). Use - // a raw UPDATE inside the same supersede contract: blake3 stays, status - // flips to missing. The folder hash recompute lives behind TouchSeen / - // Upsert; we trigger one explicitly to mirror what the real indexer - // would do after marking absent. + // a raw UPDATE inside the same supersede contract: content_id stays, + // status flips to missing. The folder hash recompute lives behind + // TouchSeen / Upsert; we trigger one explicitly to mirror what the + // real indexer would do after marking absent. folderPath, name := splitFilePath(relPath) tx, err := s.db.BeginTx(ctx, nil) if err != nil { @@ -602,8 +602,8 @@ func upsertMissing(ctx context.Context, s *Store, volumeID int64, relPath string } if _, err := tx.ExecContext(ctx, `UPDATE files SET status = 'missing', last_seen_run_id = ? - WHERE folder_id = ? AND name = ? AND blake3 = ?`, - runID, folderID, name, row.Blake3); err != nil { + WHERE folder_id = ? AND name = ? AND content_id = ?`, + runID, folderID, name, row.ContentID); err != nil { return err } if err := recomputeFolderAndAncestors(ctx, tx, folderID, runID); err != nil { @@ -646,8 +646,9 @@ func snapshotFolderHashes(t *testing.T, s *Store, volumeID int64, shallow bool) // migration round-trip test and drift test. func freshRecomputeShallow(ctx context.Context, s *Store, volumeID int64, folderPath string) ([]byte, error) { rows, err := s.db.QueryContext(ctx, ` - SELECT f.name, f.blake3 FROM files f + SELECT f.name, c.blake3 FROM files f JOIN folders fo ON fo.id = f.folder_id + JOIN contents c ON c.id = f.content_id WHERE fo.volume_id = ? AND fo.path = ? AND f.status = 'present' ORDER BY f.name`, volumeID, folderPath) diff --git a/store/hookruns.go b/store/hookruns.go index 0289605..76ea0fa 100644 --- a/store/hookruns.go +++ b/store/hookruns.go @@ -3,6 +3,7 @@ package store import ( "context" "database/sql" + "errors" "fmt" ) @@ -112,32 +113,58 @@ func (s *Store) BeginHookRun(ctx context.Context, spec HookRunSpec) (int64, erro return id, nil } +// isTerminalHookStatus reports whether status is one of the two terminal +// hook states. A row in either must not be re-finalised by FinishHookRun. +func isTerminalHookStatus(status string) bool { + return status == HookStatusSuccess || status == HookStatusFailed +} + // FinishHookRun records the terminal state of a hook run. exitCode is // stored as-is (pass an invalid sql.NullInt64 when the process produced // no code, e.g. spawn failure or timeout); errMsg is stored as NULL when // empty. Returns an error if id matches no row so a hook is never left // stuck in 'running'. +// +// Like FinishRun, the transition is guarded: a hook run already in a +// terminal status is never re-finalised — the first terminal write wins +// and FinishHookRun returns ErrAlreadyFinished (matchable via errors.Is) +// without touching the row, so a double-finish bug or a buggy retry can't +// silently rewrite the recorded status, exit code, and end timestamp. The +// read and the update share one transaction so the check and the write +// can't race. func (s *Store) FinishHookRun(ctx context.Context, id int64, status string, exitCode sql.NullInt64, errMsg string) error { if status != HookStatusSuccess && status != HookStatusFailed { return fmt.Errorf("FinishHookRun: status must be %q or %q, got %q", HookStatusSuccess, HookStatusFailed, status) } + tx, err := s.db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("begin finish hook run %d: %w", id, err) + } + defer func() { _ = tx.Rollback() }() + + var current string + switch err := tx.QueryRowContext(ctx, `SELECT status FROM hook_runs WHERE id = ?`, id).Scan(&current); { + case errors.Is(err, sql.ErrNoRows): + return fmt.Errorf("finish hook run %d: no such hook run", id) + case err != nil: + return fmt.Errorf("finish hook run %d read status: %w", id, err) + } + if isTerminalHookStatus(current) { + return fmt.Errorf("finish hook run %d (status %s): %w", id, current, ErrAlreadyFinished) + } + var errVal sql.NullString if errMsg != "" { errVal = sql.NullString{String: errMsg, Valid: true} } - res, err := s.db.ExecContext(ctx, ` + if _, err := tx.ExecContext(ctx, ` UPDATE hook_runs SET ended_at_ns = ?, status = ?, exit_code = ?, error = ? WHERE id = ? - `, NowNs(), status, exitCode, errVal, id) - if err != nil { + `, NowNs(), status, exitCode, errVal, id); err != nil { return fmt.Errorf("finish hook run %d: %w", id, err) } - n, err := res.RowsAffected() - if err != nil { - return fmt.Errorf("finish hook run %d rows affected: %w", id, err) - } - if n == 0 { - return fmt.Errorf("finish hook run %d: no such hook run", id) + if err := tx.Commit(); err != nil { + return fmt.Errorf("commit finish hook run %d: %w", id, err) } return nil } diff --git a/store/hookruns_test.go b/store/hookruns_test.go index db3ea9e..3545c1a 100644 --- a/store/hookruns_test.go +++ b/store/hookruns_test.go @@ -131,6 +131,45 @@ func TestFinishHookRunUnknownID(t *testing.T) { } } +// TestFinishHookRunRefusesTerminalRow: the first terminal write wins. A +// second finish is refused with ErrAlreadyFinished and leaves the +// recorded status, exit code, and end timestamp untouched (#114) — the +// same first-write-wins guard FinishRun has. +func TestFinishHookRunRefusesTerminalRow(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vol, _ := s.CreateVolume(ctx, "v", "/tmp/v") + id, err := s.BeginHookRun(ctx, HookRunSpec{VolumeID: vol.ID, Trigger: HookTriggerInterval}) + if err != nil { + t.Fatalf("BeginHookRun: %v", err) + } + + firstExit := sql.NullInt64{Int64: 0, Valid: true} + if err := s.FinishHookRun(ctx, id, HookStatusSuccess, firstExit, ""); err != nil { + t.Fatalf("first FinishHookRun: %v", err) + } + before, err := s.hookRunByID(ctx, id) + if err != nil { + t.Fatalf("read back after first finish: %v", err) + } + + err = s.FinishHookRun(ctx, id, HookStatusFailed, sql.NullInt64{Int64: 7, Valid: true}, "second finish") + if !errors.Is(err, ErrAlreadyFinished) { + t.Fatalf("second FinishHookRun err = %v, want ErrAlreadyFinished", err) + } + + after, err := s.hookRunByID(ctx, id) + if err != nil { + t.Fatalf("read back after refused finish: %v", err) + } + if after.Status != HookStatusSuccess { + t.Fatalf("status = %q after refused second finish, want success", after.Status) + } + if after.ExitCode != before.ExitCode || after.EndedAtNs != before.EndedAtNs || after.Error != before.Error { + t.Fatalf("terminal row mutated by refused finish: before=%+v after=%+v", before, after) + } +} + func TestListHookRuns(t *testing.T) { s := openTestStore(t) ctx := context.Background() diff --git a/store/migrations.go b/store/migrations.go index 63163b0..592039f 100644 --- a/store/migrations.go +++ b/store/migrations.go @@ -10,7 +10,7 @@ import ( ) // SchemaVersion is the schema version this binary writes and reads. -const SchemaVersion = 13 +const SchemaVersion = 21 // freshSchemaBaseline is the version applied to a brand-new database. The // chain in `migrations` continues from here. v1 is no longer reachable from @@ -49,6 +49,14 @@ func buildMigrations(mctx migrationCtx) []migration { {version: 11, up: migrateV10ToV11}, {version: 12, up: migrateV11ToV12}, {version: 13, up: migrateV12ToV13}, + {version: 14, up: migrateV13ToV14}, + {version: 15, up: migrateV14ToV15}, + {version: 16, up: migrateV15ToV16}, + {version: 17, up: migrateV16ToV17}, + {version: 18, up: migrateV17ToV18}, + {version: 19, up: migrateV18ToV19}, + {version: 20, up: migrateV19ToV20}, + {version: 21, up: migrateV20ToV21}, } } @@ -1374,3 +1382,490 @@ func migrateV12ToV13(ctx context.Context, db *sql.DB) error { } return tx.Commit() } + +// --- v13 → v14 --- + +// migrateV13ToV14 splits the files table into `contents` (the content +// entity: one row per distinct blake3, carrying size and origin) and a +// reshaped `files` (the path↔content observation, keyed on content_id +// instead of blake3). The files status CHECK gains 'offloaded' — a +// sibling of 'missing' for content intentionally removed from local +// disk while it stays durable elsewhere. +// +// Backfill mapping: each distinct blake3 becomes one contents row whose +// size_bytes and (origin_node_id, origin_run_id) come from the files +// row with the earliest first_seen_run_id for that hash. The old +// source_* columns recorded per-observation sender attribution while +// origin_* is content-level first-introduction provenance, so the +// earliest observation is the closest available approximation. +// +// The files_blake3_immutable trigger is dropped and not recreated: the +// id↔blake3 binding on contents is immutable by construction (blake3 is +// UNIQUE and contents rows are never updated), so a trigger guarding +// in-place hash rewrites has nothing left to guard. +// +// FK enforcement is disabled across the rebuild (files references +// folders, runs, and now contents; the old table is dropped +// mid-migration) with the usual foreign_key_check verification before +// commit. +func migrateV13ToV14(ctx context.Context, db *sql.DB) error { + conn, restore, err := disableForeignKeys(ctx, db) + if err != nil { + return err + } + defer restore() + + tx, err := conn.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + if err := createAndSeedContentsV14(ctx, tx); err != nil { + return err + } + if err := rebuildFilesV14(ctx, tx); err != nil { + return err + } + if _, err := tx.ExecContext(ctx, `INSERT INTO schema_version (version) VALUES (14)`); err != nil { + return fmt.Errorf("record schema v14: %w", err) + } + if err := verifyForeignKeysClean(ctx, tx, "v13→v14"); err != nil { + return err + } + return tx.Commit() +} + +// createAndSeedContentsV14 creates the contents table and inserts one +// row per distinct blake3 in the old files table. The seed row per hash +// is chosen by (first_seen_run_id, rowid) ascending so the backfill is +// deterministic when several rows share the earliest run. +func createAndSeedContentsV14(ctx context.Context, tx *sql.Tx) error { + stmts := []string{ + // origin_run_id is in the origin node's run space (NULL together + // with origin_node_id means "introduced locally"), so it is + // deliberately not a FK to the local runs table. + `CREATE TABLE contents ( + id INTEGER PRIMARY KEY, + blake3 BLOB NOT NULL UNIQUE CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + origin_node_id INTEGER REFERENCES nodes(id), + origin_run_id INTEGER + )`, + `INSERT INTO contents (blake3, size_bytes, origin_node_id, origin_run_id) + SELECT f.blake3, f.size_bytes, f.source_node_id, f.source_run_id + FROM files f + WHERE f.rowid = ( + SELECT f2.rowid FROM files f2 WHERE f2.blake3 = f.blake3 + ORDER BY f2.first_seen_run_id, f2.rowid LIMIT 1 + )`, + // Partial index backing "find content introduced by node X" + // (ListPresentByOrigin); excluding the local-origin majority keeps + // the index sized to the peer-sourced subset, the same trade the + // old idx_files_source_node made. + `CREATE INDEX idx_contents_origin_node ON contents(origin_node_id) + WHERE origin_node_id IS NOT NULL`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("create contents: %w", err) + } + } + return nil +} + +// rebuildFilesV14 stages the reshaped files table, copies every old row +// with its blake3 resolved to the freshly seeded contents id, and swaps +// the new table into place. blake3↔content_id is one-to-one, so the PK +// widening from (folder_id, name, blake3) to (folder_id, name, +// content_id) is conflict-free and row counts are preserved. +func rebuildFilesV14(ctx context.Context, tx *sql.Tx) error { + stmts := []string{ + `CREATE TABLE files_v14 ( + folder_id INTEGER NOT NULL REFERENCES folders(id), + name TEXT NOT NULL, + content_id INTEGER NOT NULL REFERENCES contents(id), + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded','offloaded')), + first_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + last_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + indexed_at_ns INTEGER NOT NULL, + PRIMARY KEY (folder_id, name, content_id) + )`, + `INSERT INTO files_v14 ( + folder_id, name, content_id, mtime_ns, status, + first_seen_run_id, last_seen_run_id, indexed_at_ns + ) + SELECT f.folder_id, f.name, c.id, f.mtime_ns, f.status, + f.first_seen_run_id, f.last_seen_run_id, f.indexed_at_ns + FROM files f JOIN contents c ON c.blake3 = f.blake3`, + `DROP TABLE files`, + `ALTER TABLE files_v14 RENAME TO files`, + `CREATE INDEX idx_files_content ON files(content_id)`, + `CREATE INDEX idx_files_missing ON files(folder_id, name) WHERE status = 'missing'`, + `CREATE UNIQUE INDEX uniq_files_live_per_path ON files(folder_id, name) WHERE status != 'superseded'`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("rebuild files: %w", err) + } + } + return nil +} + +// --- v14 → v15 --- + +// migrateV14ToV15 rebuilds the runs table to add 'offload' to the kind +// CHECK. An offload run records the local "delete on-disk bytes whose +// content is durable elsewhere" operation, so it joins index and audit +// in the destination-NULL branch of the kind↔destination coupling. +// Same FK-off rebuild recipe as v6→v7 (runs is referenced by files and +// hook_runs). +func migrateV14ToV15(ctx context.Context, db *sql.DB) error { + conn, restore, err := disableForeignKeys(ctx, db) + if err != nil { + return err + } + defer restore() + + tx, err := conn.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + if err := rebuildRunsTableV15(ctx, tx); err != nil { + return err + } + if err := verifyForeignKeysClean(ctx, tx, "v14→v15"); err != nil { + return err + } + return tx.Commit() +} + +func rebuildRunsTableV15(ctx context.Context, tx *sql.Tx) error { + stmts := []string{ + `CREATE TABLE runs_v15 ( + id INTEGER PRIMARY KEY, + kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit','offload')), + volume_id INTEGER REFERENCES volumes(id), + destination TEXT, + started_at_ns INTEGER NOT NULL, + ended_at_ns INTEGER, + status TEXT NOT NULL CHECK (status IN ('running','success','failed','partial')), + error TEXT, + file_count INTEGER NOT NULL DEFAULT 0, + peer_node_id INTEGER REFERENCES nodes(id), + correlated_run_id INTEGER, + shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)), + CHECK ( + (kind IN ('index','audit','offload') AND destination IS NULL) OR + (kind IN ('sync','restore') AND destination IS NOT NULL AND destination != '') + ) + )`, + `INSERT INTO runs_v15 ( + id, kind, volume_id, destination, started_at_ns, ended_at_ns, + status, error, file_count, peer_node_id, correlated_run_id, shallow + ) + SELECT id, kind, volume_id, destination, started_at_ns, ended_at_ns, + status, error, file_count, peer_node_id, correlated_run_id, shallow + FROM runs`, + `DROP TABLE runs`, + `ALTER TABLE runs_v15 RENAME TO runs`, + `CREATE INDEX idx_runs_volume_started ON runs(volume_id, started_at_ns)`, + `CREATE INDEX idx_runs_destination ON runs(destination) WHERE destination IS NOT NULL`, + `INSERT INTO schema_version (version) VALUES (15)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("rebuild runs: %w", err) + } + } + return nil +} + +// --- v15 → v16 --- + +// migrateV15ToV16 adds the offload substrate tables, all additive: +// +// - destination_run_ids: the per-destination durability version +// vector. One row per (volume, destination, origin node) carrying +// the highest origin-space run id known durable on that +// destination. `destination` is the unified target name — a bucket +// destination or a peer node name, matching what runs.destination +// already stores. origin_run_id is in the origin node's run space, +// so like contents.origin_run_id it is not a local FK. +// - destination_run_ids_history: append-only log of every vector +// advance, written in the same transaction as the live-row upsert +// (the same recoverability contract peer_sync_state_history gives +// the peer-sync watermark). +// - remote_objects: per-(content, destination) upload fingerprints +// for destinations that can't be cheaply re-read — the provider's +// stored checksum recorded at upload time and compared verbatim on +// later verification passes. +func migrateV15ToV16(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + stmts := []string{ + `CREATE TABLE destination_run_ids ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, + PRIMARY KEY (volume_id, destination, origin_node_id) + )`, + `CREATE TABLE destination_run_ids_history ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL, + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL, + origin_run_id INTEGER NOT NULL, + at_ns INTEGER NOT NULL + )`, + `CREATE INDEX idx_destination_run_ids_history + ON destination_run_ids_history(volume_id, destination)`, + `CREATE TABLE remote_objects ( + content_id INTEGER NOT NULL REFERENCES contents(id), + destination TEXT NOT NULL, + uploaded_run_id INTEGER NOT NULL REFERENCES runs(id), + checksum_algo TEXT NOT NULL, + checksum TEXT NOT NULL, + verified_at_ns INTEGER, + PRIMARY KEY (content_id, destination) + )`, + `INSERT INTO schema_version (version) VALUES (16)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v15→v16: %w", err) + } + } + return tx.Commit() +} + +// --- v16 → v17 --- + +// migrateV16ToV17 relaxes remote_objects so the upload record and the +// fingerprint are two separate facts: the content-addressed offsite +// push records the upload immediately (checksum_algo and checksum both +// NULL — uploaded, fingerprint pending) and a later scan-back pass +// fills the provider checksum in. A CHECK keeps the pair atomic — a +// checksum without its algorithm (or vice versa) is uninterpretable. +// +// remote_objects is a leaf table (nothing references it), so the +// rebuild needs no FK-off recipe: the staged table is populated while +// the old one still satisfies every constraint, then swapped in. +// Existing rows carry over verbatim. +func migrateV16ToV17(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + stmts := []string{ + `CREATE TABLE remote_objects_v17 ( + content_id INTEGER NOT NULL REFERENCES contents(id), + destination TEXT NOT NULL, + uploaded_run_id INTEGER NOT NULL REFERENCES runs(id), + checksum_algo TEXT, + checksum TEXT, + verified_at_ns INTEGER, + PRIMARY KEY (content_id, destination), + CHECK ((checksum_algo IS NULL) = (checksum IS NULL)) + )`, + `INSERT INTO remote_objects_v17 ( + content_id, destination, uploaded_run_id, checksum_algo, checksum, verified_at_ns + ) + SELECT content_id, destination, uploaded_run_id, checksum_algo, checksum, verified_at_ns + FROM remote_objects`, + `DROP TABLE remote_objects`, + `ALTER TABLE remote_objects_v17 RENAME TO remote_objects`, + `INSERT INTO schema_version (version) VALUES (17)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v16→v17: %w", err) + } + } + return tx.Commit() +} + +// --- v17 → v18 --- + +// migrateV17ToV18 adds files.status_changed_run_id: the run during +// which the row last changed status. last_seen_run_id can't answer +// "what changed since run W" — it advances on every observation of a +// present row and freezes at the last-alive run on supersession — so +// the per-destination manifest delta needs a stamp that moves exactly +// when status does. Every status writer maintains it from v18 on: +// inserts stamp their first_seen run, supersession/missing flips stamp +// the flipping run, and re-observations leave it alone. +// +// The backfill approximates history the old columns retain: a present +// row last changed status when it was inserted (first_seen_run_id; +// re-flips to present weren't recorded), and a superseded/missing row's +// last_seen_run_id is the closest recorded coordinate to its flip (the +// flip stamp for missing rows, the last-alive run for superseded ones). +// The approximation only affects pre-v18 history; no content-addressed +// destination can have synced before v18 exists, so every first delta +// is computed against watermark 0 and reads the full live state anyway. +func migrateV17ToV18(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + stmts := []string{ + `ALTER TABLE files ADD COLUMN status_changed_run_id INTEGER REFERENCES runs(id)`, + `UPDATE files SET status_changed_run_id = CASE + WHEN status = 'present' THEN first_seen_run_id + ELSE last_seen_run_id + END`, + `INSERT INTO schema_version (version) VALUES (18)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v17→v18: %w", err) + } + } + return tx.Commit() +} + +// --- v18 → v19 --- + +// migrateV18ToV19 adds the verification-method provenance the offload +// gate needs to tell a content-verified durability component apart from +// a presence-only one. destination_run_ids.verify_method records the +// method that advanced a live component (blake3, peer-blake3, +// kopia-verify, or presence+size); destination_run_ids_history.verify_method +// records it per advance so the audit log keeps the same fact. +// +// Both columns are additive and nullable. Existing components are left +// NULL: the gate reads a NULL method as "not content-verified" and holds +// the target out until a fresh verified push re-stamps it. That is the +// strictly-stricter reading — a pre-v19 component can only ever start +// refusing offload, never start permitting it — and is harmless because +// a verified push re-advances the component cheaply. +func migrateV18ToV19(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + stmts := []string{ + `ALTER TABLE destination_run_ids ADD COLUMN verify_method TEXT`, + `ALTER TABLE destination_run_ids_history ADD COLUMN verify_method TEXT`, + `INSERT INTO schema_version (version) VALUES (19)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v18→v19: %w", err) + } + } + return tx.Commit() +} + +// --- v19 → v20 --- + +// migrateV19ToV20 adds destination_push_freshness: the per-origin-node +// maxima of the present set captured at a destination's most recent +// successful whole-volume push, in origin-space coordinates. The offload +// gate's freshness condition reads it for a target the local node does +// not push to directly (a peer-relayed offsite, named in offload_requires +// but absent from the local sync_to). The local-run-space watermark +// LastSuccessfulWholeVolumePushRunID is always 0 for such a target, so the +// freshness condition would refuse every file forever; this table lets a +// pulled-evidence target satisfy freshness from the pushing node's own +// determination, expressed in the origin coordinates the gate already +// holds for the gated content. +// +// origin_run_id is the snapshot maxima of the *latest* push — overwritten +// per push (non-monotonic), distinct from destination_run_ids.origin_run_id +// which is the monotonic durability vector. A push removing content from +// the pushing node's present set lowers the freshness maxima even though +// the append-only target still holds the bytes (the monotonic vector keeps +// covering them), so a relayed file above the freshness watermark is held +// out — the safe direction. +// +// The table is empty after migration: only a successful whole-volume push +// writes a row. A target with no row yields no freshness evidence, which +// the gate reads as "refuse" for a relayed target. +func migrateV19ToV20(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + stmts := []string{ + `CREATE TABLE destination_push_freshness ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, + PRIMARY KEY (volume_id, destination, origin_node_id) + )`, + `INSERT INTO schema_version (version) VALUES (20)`, + } + for _, q := range stmts { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v19→v20: %w", err) + } + } + return tx.Commit() +} + +// --- v20 → v21 --- + +// migrateV20ToV21 restores schema-level immutability for the contents +// table. contents is the append-only content entity: one row per BLAKE3, +// carrying its size and origin. The id↔blake3 binding is already immutable +// by construction (blake3 is UNIQUE), but the v13→v14 reshape dropped the +// files_blake3_immutable trigger without installing an equivalent guard on +// the new table, so a future bug could UPDATE a row's size_bytes/origin_* +// in place or DELETE a row whose hash other rows still reference. +// +// Two triggers re-assert the guarantee the append-only contract implies: +// any UPDATE or DELETE on a contents row aborts. The sanctioned way to +// record different content at a path is to supersede the files row and +// insert a new one (see Upsert), which leaves the contents row untouched. +func migrateV20ToV21(ctx context.Context, db *sql.DB) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() + + for _, q := range append(contentsImmutableTriggers(), `INSERT INTO schema_version (version) VALUES (21)`) { + if _, err := tx.ExecContext(ctx, q); err != nil { + return fmt.Errorf("v20→v21: %w", err) + } + } + return tx.Commit() +} + +// contentsImmutableTriggers returns the DDL for the two triggers that make +// the contents table append-only at the schema level: a row's size and +// origin are fixed once written, and a row is never removed. Shared with +// any future fresh-baseline so the guarantee survives a schema rebase. +func contentsImmutableTriggers() []string { + return []string{ + `CREATE TRIGGER contents_no_update BEFORE UPDATE ON contents + BEGIN + SELECT RAISE(ABORT, 'contents is append-only; supersede the files row and insert new content instead of updating'); + END`, + `CREATE TRIGGER contents_no_delete BEFORE DELETE ON contents + BEGIN + SELECT RAISE(ABORT, 'contents is append-only; a content row is never deleted'); + END`, + } +} diff --git a/store/nodes.go b/store/nodes.go index f92bc03..63bf907 100644 --- a/store/nodes.go +++ b/store/nodes.go @@ -35,7 +35,7 @@ func (s *Store) GetSelfNode(ctx context.Context) (Node, error) { } // GetNodeByID returns the node row with the given id, or -// sql.ErrNoRows. The id is the surrogate key used by `files.source_node_id` +// sql.ErrNoRows. The id is the surrogate key used by `contents.origin_node_id` // and `runs.peer_node_id`. func (s *Store) GetNodeByID(ctx context.Context, id int64) (Node, error) { var n Node @@ -82,6 +82,33 @@ func (s *Store) CreateNode(ctx context.Context, name, endpoint string) (Node, er return Node{ID: id, Name: name, Endpoint: endpointVal}, nil } +// ValidNodeName reports whether name satisfies the node-name rule +// (nodeNameRE). Exposed so wire-facing layers can validate +// peer-declared node names before handing them to CreateNode, failing +// the request instead of surfacing a store error mid-commit. +func ValidNodeName(name string) bool { + return nodeNameRE.MatchString(name) +} + +// GetOrCreateOriginNode resolves a node *name* — the cross-node +// identity content origins travel under — to a local nodes row, +// creating one on first contact. Unlike GetOrCreatePeerNode it matches +// purely by name: a forwarded origin may name the self-row, a known +// peer, or a node this host has never peered with. Created rows carry +// the same "peer://<name>" placeholder endpoint the peer-sync handshake +// records for initiators that expose no URL, so a later real handshake +// under the same name finds a row it agrees with. +func (s *Store) GetOrCreateOriginNode(ctx context.Context, name string) (Node, error) { + existing, err := s.GetNodeByName(ctx, name) + if err == nil { + return existing, nil + } + if !errors.Is(err, sql.ErrNoRows) { + return Node{}, fmt.Errorf("lookup origin node: %w", err) + } + return s.CreateNode(ctx, name, placeholderEndpoint(name)) +} + // GetOrCreatePeerNode looks up a peer node by name. If absent, a new // row is inserted with the supplied endpoint. If present, the // endpoint must agree with the stored value: a name re-used across @@ -90,11 +117,22 @@ func (s *Store) CreateNode(ctx context.Context, name, endpoint string) (Node, er // so a name collision with a real peer would also be an auth // boundary issue). // +// allowEndpointUpgrade gates the one mutating case: the name-derived +// "peer://<name>" placeholder (written for initiators that expose no +// URL, and for nodes first met as a forwarded origin) is replaced by +// the presented endpoint on first real contact. Only an +// operator-configured caller (the initiator dialling its own +// config-declared peer) passes true; the receiver-side /begin path +// passes false because its endpoint derives from unauthenticated wire +// input, so a peer must not be able to bind an arbitrary dial-back URL +// to a placeholder. With it false an existing row is returned verbatim +// and the presented endpoint is used only to create an absent row. +// // The self-row is intentionally NOT returned by this function — its // endpoint is NULL, and a peer claiming the self-name would be // caught here by the "different endpoint" comparison (NULL vs. // non-empty) and rejected. -func (s *Store) GetOrCreatePeerNode(ctx context.Context, name, endpoint string) (Node, error) { +func (s *Store) GetOrCreatePeerNode(ctx context.Context, name, endpoint string, allowEndpointUpgrade bool) (Node, error) { if endpoint == "" { return Node{}, errors.New("peer endpoint must not be empty") } @@ -103,14 +141,30 @@ func (s *Store) GetOrCreatePeerNode(ctx context.Context, name, endpoint string) if !existing.Endpoint.Valid { return Node{}, fmt.Errorf("node %q is the local self-row; refusing to overwrite with peer endpoint %q", name, endpoint) } - if existing.Endpoint.String != endpoint { - return Node{}, fmt.Errorf("node %q already has endpoint %q in the local index; peer presented %q — resolve the collision before continuing", - name, existing.Endpoint.String, endpoint) + if existing.Endpoint.String == endpoint { + return existing, nil } - return existing, nil + if !allowEndpointUpgrade { + return existing, nil + } + if existing.Endpoint.String == placeholderEndpoint(name) { + if _, err := s.db.ExecContext(ctx, + `UPDATE nodes SET endpoint = ? WHERE id = ?`, endpoint, existing.ID); err != nil { + return Node{}, fmt.Errorf("upgrade placeholder endpoint for %q: %w", name, err) + } + existing.Endpoint = sql.NullString{String: endpoint, Valid: true} + return existing, nil + } + return Node{}, fmt.Errorf("node %q already has endpoint %q in the local index; peer presented %q — resolve the collision before continuing", + name, existing.Endpoint.String, endpoint) } if !errors.Is(err, sql.ErrNoRows) { return Node{}, fmt.Errorf("lookup peer node: %w", err) } return s.CreateNode(ctx, name, endpoint) } + +// placeholderEndpoint is the synthetic endpoint stored for nodes known +// only by name: peer-sync initiators that expose no URL of their own, +// and nodes first encountered as a forwarded content origin. +func placeholderEndpoint(name string) string { return "peer://" + name } diff --git a/store/nodes_test.go b/store/nodes_test.go index eeb8fa5..a665180 100644 --- a/store/nodes_test.go +++ b/store/nodes_test.go @@ -49,11 +49,11 @@ func TestGetOrCreatePeerNodeIdempotent(t *testing.T) { defer s.Close() ctx := context.Background() - first, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local") + first, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local", true) if err != nil { t.Fatalf("first: %v", err) } - again, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local") + again, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local", true) if err != nil { t.Fatalf("second: %v", err) } @@ -71,10 +71,10 @@ func TestGetOrCreatePeerNodeRejectsEndpointMismatch(t *testing.T) { defer s.Close() ctx := context.Background() - if _, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local"); err != nil { + if _, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.local", true); err != nil { t.Fatalf("first: %v", err) } - _, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.different") + _, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.different", true) if err == nil || !strings.Contains(err.Error(), "already has endpoint") { t.Fatalf("error = %v, want collision-message", err) } @@ -92,7 +92,7 @@ func TestGetOrCreatePeerNodeRefusesSelfNameCollision(t *testing.T) { defer s.Close() ctx := context.Background() - _, err = s.GetOrCreatePeerNode(ctx, "me", "http://attacker.example") + _, err = s.GetOrCreatePeerNode(ctx, "me", "http://attacker.example", true) if err == nil || !strings.Contains(err.Error(), "self-row") { t.Fatalf("error = %v, want self-row refusal", err) } @@ -107,7 +107,7 @@ func TestPeerSyncStateUpsertRoundtrip(t *testing.T) { ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) if err := s.UpsertPeerSyncState(ctx, vID, peer.ID, 7, false); err != nil { t.Fatalf("first upsert: %v", err) @@ -139,7 +139,7 @@ func TestBeginPeerSyncRunStampsLinkage(t *testing.T) { ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) id, err := s.BeginPeerSyncRun(ctx, vID, peer.ID, 99, "nas") if err != nil { @@ -170,7 +170,7 @@ func TestSetCorrelatedRunID(t *testing.T) { ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) id, _ := s.BeginPeerSyncRun(ctx, vID, peer.ID, 0, "nas") if err := s.SetCorrelatedRunID(ctx, id, 1234); err != nil { @@ -184,3 +184,142 @@ func TestSetCorrelatedRunID(t *testing.T) { t.Fatalf("expected no-such-run error") } } + +// TestGetOrCreateOriginNode covers the three name-resolution outcomes +// the verbatim origin-propagation path needs: an unknown name creates a +// placeholder-endpoint row (a forwarded origin may name a node this +// host has never peered with), a known peer row is returned as-is, and +// the self name resolves to the self-row rather than colliding with it. +func TestGetOrCreateOriginNode(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "local"}) + if err != nil { + t.Fatalf("Open: %v", err) + } + defer s.Close() + ctx := context.Background() + + created, err := s.GetOrCreateOriginNode(ctx, "far-away") + if err != nil { + t.Fatalf("GetOrCreateOriginNode(far-away): %v", err) + } + if !created.Endpoint.Valid || created.Endpoint.String != "peer://far-away" { + t.Fatalf("created endpoint = %+v, want the peer://far-away placeholder", created.Endpoint) + } + again, err := s.GetOrCreateOriginNode(ctx, "far-away") + if err != nil { + t.Fatalf("GetOrCreateOriginNode(far-away) again: %v", err) + } + if again.ID != created.ID { + t.Fatalf("second resolve created a new row: %d → %d", created.ID, again.ID) + } + + peer, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.example", true) + if err != nil { + t.Fatalf("GetOrCreatePeerNode: %v", err) + } + byOrigin, err := s.GetOrCreateOriginNode(ctx, "nas") + if err != nil { + t.Fatalf("GetOrCreateOriginNode(nas): %v", err) + } + if byOrigin.ID != peer.ID || byOrigin.Endpoint.String != "https://nas.example" { + t.Fatalf("origin resolve of a peer = %+v, want the existing peer row %+v", byOrigin, peer) + } + + self, _ := s.GetSelfNode(ctx) + bySelfName, err := s.GetOrCreateOriginNode(ctx, "local") + if err != nil { + t.Fatalf("GetOrCreateOriginNode(local): %v", err) + } + if bySelfName.ID != self.ID { + t.Fatalf("self name resolved to row %d, want the self-row %d", bySelfName.ID, self.ID) + } +} + +// TestGetOrCreateOriginNodeRejectsInvalidName pins that the node-name +// rule guards origin creation too — a wire-supplied name that fails +// nodeNameRE must not land a row. ValidNodeName is the predicate the +// protocol layer uses to refuse such names up front. +func TestGetOrCreateOriginNodeRejectsInvalidName(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + s, _ := Open(dsn) + defer s.Close() + + if ValidNodeName("../etc") { + t.Fatalf("ValidNodeName accepted a traversal-shaped name") + } + if !ValidNodeName("node-a_2") { + t.Fatalf("ValidNodeName rejected a compliant name") + } + if _, err := s.GetOrCreateOriginNode(context.Background(), "../etc"); err == nil { + t.Fatalf("invalid origin node name accepted, want error") + } +} + +// TestGetOrCreatePeerNodeUpgradesPlaceholder: a row created from a +// name-only context (a forwarded origin, or a durability pull before +// any sync) carries the peer:// placeholder; an operator-configured +// (trusted) caller presenting an actual endpoint upgrades it in place +// instead of refusing the collision. Real-endpoint mismatches stay +// refused. +func TestGetOrCreatePeerNodeUpgradesPlaceholder(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + s, _ := Open(dsn) + defer s.Close() + ctx := context.Background() + + seeded, err := s.GetOrCreateOriginNode(ctx, "nas") + if err != nil { + t.Fatalf("GetOrCreateOriginNode: %v", err) + } + upgraded, err := s.GetOrCreatePeerNode(ctx, "nas", "https://nas.example:8443", true) + if err != nil { + t.Fatalf("GetOrCreatePeerNode after placeholder: %v", err) + } + if upgraded.ID != seeded.ID { + t.Fatalf("upgrade created a new row: %d → %d", seeded.ID, upgraded.ID) + } + if upgraded.Endpoint.String != "https://nas.example:8443" { + t.Fatalf("endpoint = %q, want the upgraded real endpoint", upgraded.Endpoint.String) + } + persisted, _ := s.GetNodeByName(ctx, "nas") + if persisted.Endpoint.String != "https://nas.example:8443" { + t.Fatalf("persisted endpoint = %q, want the upgrade written through", persisted.Endpoint.String) + } + + if _, err := s.GetOrCreatePeerNode(ctx, "nas", "https://other.example", true); err == nil { + t.Fatalf("real-endpoint mismatch accepted, want refusal") + } +} + +// TestGetOrCreatePeerNodeUntrustedKeepsPlaceholder is the #110b guard: +// an untrusted caller (allowEndpointUpgrade=false, the receiver-side +// /begin path whose endpoint derives from wire input) must not rebind a +// placeholder row to a presented endpoint. The placeholder stays put so +// a peer cannot point an arbitrary node-name's dial-back URL at an +// attacker address; the existing row is returned unchanged. +func TestGetOrCreatePeerNodeUntrustedKeepsPlaceholder(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + s, _ := Open(dsn) + defer s.Close() + ctx := context.Background() + + seeded, err := s.GetOrCreateOriginNode(ctx, "nas") + if err != nil { + t.Fatalf("GetOrCreateOriginNode: %v", err) + } + got, err := s.GetOrCreatePeerNode(ctx, "nas", "https://attacker.example:8443", false) + if err != nil { + t.Fatalf("GetOrCreatePeerNode untrusted: %v", err) + } + if got.ID != seeded.ID { + t.Fatalf("untrusted call created a new row: %d → %d", seeded.ID, got.ID) + } + if got.Endpoint.String != "peer://nas" { + t.Fatalf("returned endpoint = %q, want the untouched peer://nas placeholder", got.Endpoint.String) + } + persisted, _ := s.GetNodeByName(ctx, "nas") + if persisted.Endpoint.String != "peer://nas" { + t.Fatalf("persisted endpoint = %q, want the placeholder left in place", persisted.Endpoint.String) + } +} diff --git a/store/offload_test.go b/store/offload_test.go new file mode 100644 index 0000000..cc17d57 --- /dev/null +++ b/store/offload_test.go @@ -0,0 +1,159 @@ +package store + +import ( + "context" + "testing" +) + +// upsertPresent writes one 'present' row and returns it re-read, so +// tests get the resolved ContentID. +func upsertPresent(t *testing.T, s *Store, volumeID, runID int64, relPath string, d byte) FileRow { + t.Helper() + ctx := context.Background() + r := FileRow{ + VolumeID: volumeID, Path: relPath, Blake3: digest(d), SizeBytes: 10, + MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: 100, + } + if err := s.Upsert(ctx, r, nil); err != nil { + t.Fatalf("Upsert %s: %v", relPath, err) + } + got, err := s.GetByPath(ctx, volumeID, relPath) + if err != nil { + t.Fatalf("GetByPath %s: %v", relPath, err) + } + return got +} + +// TestMarkOffloaded: the present → offloaded flip stamps +// last_seen_run_id with the offload run, preserves first_seen_run_id +// and the content binding, and updates the folder's live aggregates in +// the same transaction (the offloaded file leaves the live set). +func TestMarkOffloaded(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + indexRun := makeRun(t, s, vID) + row := upsertPresent(t, s, vID, indexRun, "sub/a.txt", 0x01) + upsertPresent(t, s, vID, indexRun, "sub/b.txt", 0x02) + + offloadRun := makeRun(t, s, vID) + if err := s.MarkOffloaded(ctx, vID, "sub/a.txt", row.ContentID, offloadRun); err != nil { + t.Fatalf("MarkOffloaded: %v", err) + } + + got, err := s.GetByPath(ctx, vID, "sub/a.txt") + if err != nil { + t.Fatalf("GetByPath after flip: %v", err) + } + if got.Status != StatusOffloaded { + t.Fatalf("status = %q, want offloaded", got.Status) + } + if got.LastSeenRunID != offloadRun { + t.Fatalf("last_seen_run_id = %d, want offload run %d", got.LastSeenRunID, offloadRun) + } + if got.FirstSeenRunID != row.FirstSeenRunID || got.ContentID != row.ContentID { + t.Fatalf("first_seen/content changed: got %+v, want %+v", got, row) + } + + folder, err := s.GetFolderByPath(ctx, vID, "sub") + if err != nil { + t.Fatalf("GetFolderByPath: %v", err) + } + if folder.FileCount != 1 { + t.Fatalf("folder file_count = %d, want 1 (only b.txt stays live)", folder.FileCount) + } +} + +// TestMarkOffloadedRefusals: the flip only ever applies to the exact +// live 'present' (folder, name, content) row the caller verified. A +// row in any other status, a different content id, or an unknown path +// must error instead of mislabelling. +func TestMarkOffloadedRefusals(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + run := makeRun(t, s, vID) + row := upsertPresent(t, s, vID, run, "a.txt", 0x01) + + if err := s.MarkOffloaded(ctx, vID, "a.txt", row.ContentID+99, run); err == nil { + t.Fatalf("MarkOffloaded with wrong content id succeeded, want error") + } + if err := s.MarkOffloaded(ctx, vID, "nope/missing.txt", row.ContentID, run); err == nil { + t.Fatalf("MarkOffloaded for unknown path succeeded, want error") + } + + if err := s.MarkOffloaded(ctx, vID, "a.txt", row.ContentID, run); err != nil { + t.Fatalf("first MarkOffloaded: %v", err) + } + if err := s.MarkOffloaded(ctx, vID, "a.txt", row.ContentID, run); err == nil { + t.Fatalf("second MarkOffloaded on an offloaded row succeeded, want error") + } +} + +// TestBeginOffloadRunIfClear: offload defers to every other run kind on +// the volume — any 'running' row blocks, finished rows clear the gate, +// and the inserted row is kind='offload' with destination NULL. +func TestBeginOffloadRunIfClear(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + indexRun, err := s.BeginIndexRun(ctx, RunKindIndex, vID, false) + if err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + id, blocker, err := s.BeginOffloadRunIfClear(ctx, vID) + if err != nil { + t.Fatalf("BeginOffloadRunIfClear: %v", err) + } + if id != 0 || blocker == nil || blocker.ID != indexRun { + t.Fatalf("running index run did not block: id=%d blocker=%+v", id, blocker) + } + if err := s.FinishRun(ctx, indexRun, RunStatusSuccess, "", 0); err != nil { + t.Fatalf("FinishRun index: %v", err) + } + + id, blocker, err = s.BeginOffloadRunIfClear(ctx, vID) + if err != nil { + t.Fatalf("BeginOffloadRunIfClear after finish: %v", err) + } + if blocker != nil || id == 0 { + t.Fatalf("clear volume refused: id=%d blocker=%+v", id, blocker) + } + run, err := s.GetRun(ctx, id) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Kind != RunKindOffload || run.Destination.Valid || run.Status != RunStatusRunning { + t.Fatalf("offload run = %+v, want kind=offload destination=NULL status=running", run) + } + + id2, blocker, err := s.BeginOffloadRunIfClear(ctx, vID) + if err != nil { + t.Fatalf("BeginOffloadRunIfClear while offload running: %v", err) + } + if id2 != 0 || blocker == nil || blocker.Kind != RunKindOffload { + t.Fatalf("running offload run did not block: id=%d blocker=%+v", id2, blocker) + } +} + +// TestBeginOffloadRunIfClearScopedToVolume: a running run on another +// volume never blocks this volume's offload. +func TestBeginOffloadRunIfClearScopedToVolume(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + aID := makeVolume(t, s, "/a") + bID := makeVolume(t, s, "/b") + if _, err := s.BeginIndexRun(ctx, RunKindIndex, aID, false); err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + + id, blocker, err := s.BeginOffloadRunIfClear(ctx, bID) + if err != nil { + t.Fatalf("BeginOffloadRunIfClear: %v", err) + } + if blocker != nil || id == 0 { + t.Fatalf("other volume's run blocked offload: id=%d blocker=%+v", id, blocker) + } +} diff --git a/store/path_delta.go b/store/path_delta.go new file mode 100644 index 0000000..b222207 --- /dev/null +++ b/store/path_delta.go @@ -0,0 +1,55 @@ +package store + +import ( + "context" +) + +// PathDelta is one path-level state change, as exported into a +// content-addressed destination's manifest segment: the volume-relative +// path, its content coordinates, and the status the change left the row +// in. Status is one of the files statuses — 'present' (the path's +// current content), 'superseded' (an outgoing content observation), +// 'missing' or 'offloaded' (the path's current content with its local +// bytes gone — unexpectedly or intentionally). +type PathDelta struct { + Path string + ContentID int64 + Blake3 []byte // raw 32-byte BLAKE3-256 digest + SizeBytes int64 + MtimeNs int64 + Status string +} + +// reservedSubtreeFilter excludes the squirrel-reserved sync subtrees +// from a files read (table alias fo). Content under them never travels +// to a destination, so destination-facing reads — the durability vector +// and the manifest delta — must not see it. +const reservedSubtreeFilter = `fo.path != '.squirrel-history' AND fo.path NOT LIKE '.squirrel-history/%' + AND fo.path != '.squirrel-conflicts' AND fo.path NOT LIKE '.squirrel-conflicts/%' + AND fo.path != '.squirrel-restore-history' AND fo.path NOT LIKE '.squirrel-restore-history/%' + AND fo.path != '.squirrel-index' AND fo.path NOT LIKE '.squirrel-index/%'` + +// ListPathDeltaSince returns every row in the volume whose status last +// changed after sinceRunID, ordered by (path, status) so the export is +// deterministic. sinceRunID = 0 reads the volume's full recorded state. +// The reserved sync subtrees are excluded — they never travel to a +// destination. status_changed_run_id is maintained by every status +// writer from v18 on; the COALESCE covers rows a pre-v18 binary wrote +// after the backfill ran (their insert run is the one recorded +// coordinate). +func (s *Store) ListPathDeltaSince(ctx context.Context, volumeID, sinceRunID int64) ([]PathDelta, error) { + return queryRows(ctx, s.db, ` + SELECT `+pathFromFolderAndName+`, f.content_id, c.blake3, c.size_bytes, f.mtime_ns, f.status + FROM `+fileFromJoin+` + WHERE fo.volume_id = ? + AND COALESCE(f.status_changed_run_id, f.first_seen_run_id) > ? + AND `+reservedSubtreeFilter+` + ORDER BY `+pathFromFolderAndName+`, f.status + `, scanPathDelta, volumeID, sinceRunID) +} + +func scanPathDelta(s rowScanner) (PathDelta, error) { + var d PathDelta + err := s.Scan(&d.Path, &d.ContentID, &d.Blake3, &d.SizeBytes, &d.MtimeNs, &d.Status) + return d, err +} diff --git a/store/path_delta_test.go b/store/path_delta_test.go new file mode 100644 index 0000000..97f3948 --- /dev/null +++ b/store/path_delta_test.go @@ -0,0 +1,173 @@ +package store + +import ( + "context" + "database/sql" + "errors" + "testing" +) + +// deltaKey flattens a PathDelta to the (path, status) pair assertions +// key on. +type deltaKey struct{ path, status string } + +func deltaKeys(delta []PathDelta) []deltaKey { + out := make([]deltaKey, 0, len(delta)) + for _, d := range delta { + out = append(out, deltaKey{d.Path, d.Status}) + } + return out +} + +// TestListPathDeltaSince drives one volume through two "sync epochs" +// and checks the delta read returns exactly the rows whose status +// changed after the watermark: the full state at watermark 0, and the +// add/supersede/missing slice after the second index pass. +func TestListPathDeltaSince(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + upsert := func(runID int64, path string, digestByte byte) { + t.Helper() + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: path, Blake3: digest(digestByte), SizeBytes: 1, + MtimeNs: runID, Status: StatusPresent, + FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: runID, + }, nil); err != nil { + t.Fatalf("Upsert %s: %v", path, err) + } + } + + r1 := makeRun(t, s, vID) + upsert(r1, "a.txt", 0xaa) + upsert(r1, "c.txt", 0xcc) + + full, err := s.ListPathDeltaSince(ctx, vID, 0) + if err != nil { + t.Fatalf("ListPathDeltaSince(0): %v", err) + } + wantFull := []deltaKey{{"a.txt", StatusPresent}, {"c.txt", StatusPresent}} + if got := deltaKeys(full); len(got) != 2 || got[0] != wantFull[0] || got[1] != wantFull[1] { + t.Fatalf("full delta = %v, want %v", got, wantFull) + } + + // Second epoch: a.txt changes content, c.txt disappears, d.txt is + // new. TouchSeen on the changed rows mirrors what a real index run + // does for unchanged files (none here beyond the upserts). + r2 := makeRun(t, s, vID) + upsert(r2, "a.txt", 0xab) + upsert(r2, "d.txt", 0xdd) + if _, err := s.MarkMissing(ctx, vID, r2); err != nil { + t.Fatalf("MarkMissing: %v", err) + } + + delta, err := s.ListPathDeltaSince(ctx, vID, r1) + if err != nil { + t.Fatalf("ListPathDeltaSince(%d): %v", r1, err) + } + want := []deltaKey{ + {"a.txt", StatusPresent}, + {"a.txt", StatusSuperseded}, + {"c.txt", StatusMissing}, + {"d.txt", StatusPresent}, + } + got := deltaKeys(delta) + if len(got) != len(want) { + t.Fatalf("delta = %v, want %v", got, want) + } + for i := range want { + if got[i] != want[i] { + t.Fatalf("delta[%d] = %v, want %v", i, got[i], want[i]) + } + } + + // A watermark past every change yields the empty delta. + empty, err := s.ListPathDeltaSince(ctx, vID, r2) + if err != nil { + t.Fatalf("ListPathDeltaSince(%d): %v", r2, err) + } + if len(empty) != 0 { + t.Fatalf("delta past every change = %v, want empty", deltaKeys(empty)) + } +} + +// TestListPathDeltaSinceExcludesReservedSubtrees: rows under the +// squirrel-reserved directories never travel to a destination, so they +// must not surface in the manifest delta either. +func TestListPathDeltaSinceExcludesReservedSubtrees(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + r1 := makeRun(t, s, vID) + + reserved := []string{ + ".squirrel-history/run-1/old.txt", + ".squirrel-conflicts/run-2/x.txt", + ".squirrel-restore-history/run-3/y.txt", + ".squirrel-index/index-snapshot.db", + } + for i, p := range reserved { + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: p, Blake3: digest(byte(0x50 + i)), SizeBytes: 1, + MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: r1, LastSeenRunID: r1, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert %s: %v", p, err) + } + } + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "real.txt", Blake3: digest(0x60), SizeBytes: 1, + MtimeNs: 1, Status: StatusPresent, + FirstSeenRunID: r1, LastSeenRunID: r1, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert real.txt: %v", err) + } + + delta, err := s.ListPathDeltaSince(ctx, vID, 0) + if err != nil { + t.Fatalf("ListPathDeltaSince: %v", err) + } + if len(delta) != 1 || delta[0].Path != "real.txt" { + t.Fatalf("delta = %v, want only real.txt", deltaKeys(delta)) + } +} + +// TestLatestSuccessfulSyncRun pins the watermark choice: only a +// status='success' sync of the same (volume, destination) counts — +// failed and partial runs left no confirmed segment, and other +// destinations' successes are someone else's watermark. +func TestLatestSuccessfulSyncRun(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + if _, err := s.LatestSuccessfulSyncRun(ctx, vID, "offsite"); !errors.Is(err, sql.ErrNoRows) { + t.Fatalf("err = %v, want sql.ErrNoRows before any sync", err) + } + + finish := func(dest, status string) int64 { + t.Helper() + id, err := s.BeginRun(ctx, RunKindSync, vID, dest, true) + if err != nil { + t.Fatalf("BeginRun: %v", err) + } + if err := s.FinishRun(ctx, id, status, "", 0); err != nil { + t.Fatalf("FinishRun: %v", err) + } + return id + } + + want := finish("offsite", RunStatusSuccess) + finish("offsite", RunStatusFailed) + finish("offsite", RunStatusPartial) + finish("other", RunStatusSuccess) + + got, err := s.LatestSuccessfulSyncRun(ctx, vID, "offsite") + if err != nil { + t.Fatalf("LatestSuccessfulSyncRun: %v", err) + } + if got.ID != want { + t.Fatalf("watermark run = %d, want %d", got.ID, want) + } +} diff --git a/store/peer_sync.go b/store/peer_sync.go index 9a21d20..1aab585 100644 --- a/store/peer_sync.go +++ b/store/peer_sync.go @@ -21,15 +21,16 @@ type PeerSyncState struct { LastSyncedAtNs int64 } -// ErrWatermarkRewind is returned by UpsertPeerSyncState when the -// supplied last_shared_run_id is below the watermark already recorded -// for the (volume, peer) pair and allowRewind was not set. The watermark -// is meant to advance monotonically — a backwards move usually signals a -// misordered close or a hostile peer claiming a run id we never agreed -// to, and silently accepting it would re-anchor drift detection against -// the bad value (SAFETY-AUDIT H6). Genuine recovery passes allowRewind -// to override. Matchable via errors.Is. -var ErrWatermarkRewind = errors.New("peer-sync watermark would move backwards") +// ErrWatermarkRewind is the shared sentinel for a refused backwards +// watermark move, returned (wrapped) by UpsertPeerSyncState and +// UpsertDestinationRunID when the supplied value is below the one +// already recorded and allowRewind was not set. Watermarks are meant to +// advance monotonically — a backwards move usually signals a misordered +// close or a hostile peer claiming a run id we never agreed to, and +// silently accepting it would re-anchor drift detection against the bad +// value (SAFETY-AUDIT H6). Genuine recovery passes allowRewind to +// override. Matchable via errors.Is. +var ErrWatermarkRewind = errors.New("watermark would move backwards") // WatermarkRewindError carries the rejected and current watermarks // alongside ErrWatermarkRewind so a caller (or CLI) can report exactly diff --git a/store/peer_sync_test.go b/store/peer_sync_test.go index 6907cb2..b915f3f 100644 --- a/store/peer_sync_test.go +++ b/store/peer_sync_test.go @@ -16,7 +16,7 @@ func TestUpsertPeerSyncStateWritesHistory(t *testing.T) { s := openTestStore(t) ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, err := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, err := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) if err != nil { t.Fatalf("GetOrCreatePeerNode: %v", err) } @@ -61,7 +61,7 @@ func TestUpsertPeerSyncStateRefusesRewind(t *testing.T) { s := openTestStore(t) ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) if err := s.UpsertPeerSyncState(ctx, vID, peer.ID, 42, false); err != nil { t.Fatalf("seed watermark: %v", err) @@ -97,7 +97,7 @@ func TestUpsertPeerSyncStateAllowRewind(t *testing.T) { s := openTestStore(t) ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) if err := s.UpsertPeerSyncState(ctx, vID, peer.ID, 42, false); err != nil { t.Fatalf("seed watermark: %v", err) @@ -124,7 +124,7 @@ func TestSetCorrelatedRunIDWritesAudit(t *testing.T) { s := openTestStore(t) ctx := context.Background() vID := makeVolume(t, s, "/v") - peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example") + peer, _ := s.GetOrCreatePeerNode(ctx, "nas", "http://nas.example", true) // Use the initiator's real path: BeginSyncRunIfClear leaves // correlated_run_id NULL, so the first stamp transitions none->value. runID, blocker, err := s.BeginSyncRunIfClear(ctx, SyncRunSpec{ @@ -222,11 +222,12 @@ func TestMigrateV11ToV12CreatesPeerSyncHistory(t *testing.T) { } } -// v11SchemaDDL returns the minimal v11-shape DDL the v12 migration needs: -// the FK targets (volumes, nodes) for peer_sync_state_history, plus a -// self-row peer so the upsert's node FK resolves, and the prior tables -// the open path expects to find. Only the columns the migration and the -// post-migration insert touch are modelled. +// v11SchemaDDL returns the minimal v11-shape DDL the v12 migration needs +// — the FK targets (volumes, nodes) for peer_sync_state_history, plus a +// self-row peer so the upsert's node FK resolves — and empty v11-shape +// files and runs tables so the later chain (the v14 contents split and +// the v15 runs rebuild) has its inputs. Only the columns the migrations +// and the post-migration insert touch are modelled. func v11SchemaDDL() []string { return []string{ `CREATE TABLE schema_version (version INTEGER NOT NULL PRIMARY KEY)`, @@ -244,6 +245,34 @@ func v11SchemaDDL() []string { last_synced_at INTEGER NOT NULL, PRIMARY KEY (volume_id, peer_node_id) )`, + `CREATE TABLE runs ( + id INTEGER PRIMARY KEY, + kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit')), + volume_id INTEGER REFERENCES volumes(id), + destination TEXT, + started_at_ns INTEGER NOT NULL, + ended_at_ns INTEGER, + status TEXT NOT NULL CHECK (status IN ('running','success','failed','partial')), + error TEXT, + file_count INTEGER NOT NULL DEFAULT 0, + peer_node_id INTEGER, + correlated_run_id INTEGER, + shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)) + )`, + `CREATE TABLE files ( + folder_id INTEGER NOT NULL, + name TEXT NOT NULL, + blake3 BLOB NOT NULL CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded')), + first_seen_run_id INTEGER NOT NULL, + last_seen_run_id INTEGER NOT NULL, + indexed_at_ns INTEGER NOT NULL, + source_node_id INTEGER, + source_run_id INTEGER, + PRIMARY KEY (folder_id, name, blake3) + )`, `INSERT INTO schema_version (version) VALUES (11)`, `INSERT INTO volumes (id, name, path) VALUES (1, 'v', '/v')`, `INSERT INTO nodes (id, name, endpoint) VALUES (2, 'nas', 'http://nas.example')`, diff --git a/store/remote_objects.go b/store/remote_objects.go new file mode 100644 index 0000000..5b78a2c --- /dev/null +++ b/store/remote_objects.go @@ -0,0 +1,177 @@ +package store + +import ( + "context" + "database/sql" + "errors" + "fmt" +) + +// RemoteObject is the per-(content, destination) upload record for +// destinations whose stored bytes can't be cheaply re-read. The row is +// written once at upload time; ChecksumAlgo and Checksum carry the +// provider's own checksum for the uploaded object, compared verbatim +// ("value then" vs "value now") on later verification passes. The pair +// is NULL together while the fingerprint is still pending — the upload +// happened, the scan-back pass hasn't filled the provider checksum in +// yet (a CHECK in the schema keeps the two columns paired). +// UploadedRunID references the local run that performed the upload; +// VerifiedAtNs is NULL until the first re-verification confirms the +// object unchanged. +type RemoteObject struct { + ContentID int64 + Destination string + UploadedRunID int64 + ChecksumAlgo sql.NullString + Checksum sql.NullString + VerifiedAtNs sql.NullInt64 +} + +// InsertRemoteObject records one freshly uploaded object, with its +// fingerprint when the caller already holds one and with the checksum +// pair NULL when the fingerprint comes later. Content is uploaded at +// most once per destination (the offsite layout is content-addressed +// and append-only), so a second insert for the same (content, +// destination) fails on the primary key rather than silently replacing +// the record future verifications compare against. +func (s *Store) InsertRemoteObject(ctx context.Context, o RemoteObject) error { + if o.Destination == "" { + return fmt.Errorf("InsertRemoteObject: destination must be non-empty") + } + if o.ChecksumAlgo.Valid != o.Checksum.Valid { + return fmt.Errorf("InsertRemoteObject: checksum_algo and checksum must be set together") + } + if o.ChecksumAlgo.Valid && (o.ChecksumAlgo.String == "" || o.Checksum.String == "") { + return fmt.Errorf("InsertRemoteObject: checksum_algo and checksum must be non-empty when set") + } + _, err := s.db.ExecContext(ctx, ` + INSERT INTO remote_objects (content_id, destination, uploaded_run_id, checksum_algo, checksum, verified_at_ns) + VALUES (?, ?, ?, ?, ?, ?) + `, o.ContentID, o.Destination, o.UploadedRunID, o.ChecksumAlgo, o.Checksum, o.VerifiedAtNs) + if err != nil { + return fmt.Errorf("insert remote object: %w", err) + } + return nil +} + +// SetRemoteObjectChecksum fills the pending checksum pair on an upload +// record: the scan-back fingerprint read from the provider after the +// upload was confirmed. Only a NULL pair is filled — the recorded +// fingerprint is what every later verification compares against, so a +// second write for the same (content, destination) fails instead of +// replacing it, and a missing record errors as a caller bug. +func (s *Store) SetRemoteObjectChecksum(ctx context.Context, contentID int64, destination, algo, checksum string) error { + if algo == "" || checksum == "" { + return fmt.Errorf("SetRemoteObjectChecksum: algo and checksum must be non-empty") + } + res, err := s.db.ExecContext(ctx, ` + UPDATE remote_objects SET checksum_algo = ?, checksum = ? + WHERE content_id = ? AND destination = ? + AND checksum_algo IS NULL AND checksum IS NULL + `, algo, checksum, contentID, destination) + if err != nil { + return fmt.Errorf("set remote object checksum: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("set remote object checksum rows: %w", err) + } + if n == 0 { + if _, getErr := s.GetRemoteObject(ctx, contentID, destination); errors.Is(getErr, sql.ErrNoRows) { + return fmt.Errorf("set remote object checksum: no remote object for content %d on %q", contentID, destination) + } else if getErr != nil { + return fmt.Errorf("set remote object checksum: %w", getErr) + } + return fmt.Errorf("set remote object checksum: content %d on %q already has a recorded fingerprint", contentID, destination) + } + return nil +} + +// RemoteObjectRecord pairs an upload record with its content hash; the +// destination-side object key is the lowercase hex of Blake3. +type RemoteObjectRecord struct { + RemoteObject + Blake3 []byte +} + +// ListRemoteObjects returns every upload record for the destination with +// the content hash joined in, ordered by hash so verification output is +// deterministic. +func (s *Store) ListRemoteObjects(ctx context.Context, destination string) ([]RemoteObjectRecord, error) { + rows, err := s.db.QueryContext(ctx, ` + SELECT r.content_id, r.destination, r.uploaded_run_id, + r.checksum_algo, r.checksum, r.verified_at_ns, c.blake3 + FROM remote_objects r + JOIN contents c ON c.id = r.content_id + WHERE r.destination = ? + ORDER BY c.blake3 + `, destination) + if err != nil { + return nil, fmt.Errorf("list remote objects for %q: %w", destination, err) + } + defer rows.Close() + var out []RemoteObjectRecord + for rows.Next() { + var r RemoteObjectRecord + if err := rows.Scan(&r.ContentID, &r.Destination, &r.UploadedRunID, + &r.ChecksumAlgo, &r.Checksum, &r.VerifiedAtNs, &r.Blake3); err != nil { + return nil, fmt.Errorf("scan remote object row: %w", err) + } + out = append(out, r) + } + return out, rows.Err() +} + +// GetRemoteObject returns the upload record for one (content, +// destination), or sql.ErrNoRows when the content was never recorded as +// uploaded there. +func (s *Store) GetRemoteObject(ctx context.Context, contentID int64, destination string) (RemoteObject, error) { + var o RemoteObject + err := s.db.QueryRowContext(ctx, ` + SELECT content_id, destination, uploaded_run_id, checksum_algo, checksum, verified_at_ns + FROM remote_objects + WHERE content_id = ? AND destination = ? + `, contentID, destination). + Scan(&o.ContentID, &o.Destination, &o.UploadedRunID, &o.ChecksumAlgo, &o.Checksum, &o.VerifiedAtNs) + return o, err +} + +// HasRemoteObject reports whether an upload record exists for the +// (content, destination) pair. The content-addressed push uses it as +// its upload-once gate: a recorded content hash is skipped, fingerprint +// present or pending. +func (s *Store) HasRemoteObject(ctx context.Context, contentID int64, destination string) (bool, error) { + var one int + err := s.db.QueryRowContext(ctx, ` + SELECT 1 FROM remote_objects WHERE content_id = ? AND destination = ? + `, contentID, destination).Scan(&one) + if errors.Is(err, sql.ErrNoRows) { + return false, nil + } + if err != nil { + return false, fmt.Errorf("lookup remote object: %w", err) + } + return true, nil +} + +// MarkRemoteObjectVerified stamps verified_at_ns after a verification +// pass re-read the provider checksum and found it equal to the recorded +// one. Returns an error when no record exists for the pair — verifying +// an unrecorded upload is a caller bug. +func (s *Store) MarkRemoteObjectVerified(ctx context.Context, contentID int64, destination string, atNs int64) error { + res, err := s.db.ExecContext(ctx, ` + UPDATE remote_objects SET verified_at_ns = ? + WHERE content_id = ? AND destination = ? + `, atNs, contentID, destination) + if err != nil { + return fmt.Errorf("mark remote object verified: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("mark remote object verified rows: %w", err) + } + if n == 0 { + return fmt.Errorf("mark remote object verified: no remote object for content %d on %q", contentID, destination) + } + return nil +} diff --git a/store/remote_objects_test.go b/store/remote_objects_test.go new file mode 100644 index 0000000..86790e8 --- /dev/null +++ b/store/remote_objects_test.go @@ -0,0 +1,340 @@ +package store + +import ( + "context" + "database/sql" + "errors" + "testing" +) + +// remoteObjectFixture upserts one file so a contents row exists, and +// returns its content id plus the run that observed it. +func remoteObjectFixture(t *testing.T, s *Store) (contentID, runID int64) { + t.Helper() + ctx := context.Background() + vID := makeVolume(t, s, "/v") + runID = makeRun(t, s, vID) + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: digest(0xaa), SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert: %v", err) + } + row, err := s.GetByPath(ctx, vID, "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + return row.ContentID, runID +} + +// nullStr wraps a literal into a valid sql.NullString for fixture +// brevity. +func nullStr(s string) sql.NullString { + return sql.NullString{String: s, Valid: true} +} + +// TestRemoteObjectRoundTrip: insert records the fingerprint verbatim, +// Get returns it, and a verification pass stamps verified_at_ns. +func TestRemoteObjectRoundTrip(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + obj := RemoteObject{ + ContentID: contentID, + Destination: "bucket-a", + UploadedRunID: runID, + ChecksumAlgo: nullStr("etag-md5"), + Checksum: nullStr("9e107d9d372bb6826bd81d3542a419d6"), + } + if err := s.InsertRemoteObject(ctx, obj); err != nil { + t.Fatalf("InsertRemoteObject: %v", err) + } + + got, err := s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if got.ChecksumAlgo != obj.ChecksumAlgo || got.Checksum != obj.Checksum || got.UploadedRunID != runID { + t.Fatalf("round trip = %+v, want %+v", got, obj) + } + if got.VerifiedAtNs.Valid { + t.Fatalf("fresh upload already verified: %+v", got.VerifiedAtNs) + } + + if err := s.MarkRemoteObjectVerified(ctx, contentID, "bucket-a", 12345); err != nil { + t.Fatalf("MarkRemoteObjectVerified: %v", err) + } + got, err = s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject after verify: %v", err) + } + if !got.VerifiedAtNs.Valid || got.VerifiedAtNs.Int64 != 12345 { + t.Fatalf("VerifiedAtNs = %+v, want 12345", got.VerifiedAtNs) + } +} + +// TestSetRemoteObjectChecksumFillsPendingPair: the scan-back pass fills +// the NULL pair left by a fingerprint-pending upload, leaving the rest +// of the row untouched. +func TestSetRemoteObjectChecksumFillsPendingPair(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + }); err != nil { + t.Fatalf("InsertRemoteObject: %v", err) + } + if err := s.SetRemoteObjectChecksum(ctx, contentID, "bucket-a", "sha256", "deadbeef"); err != nil { + t.Fatalf("SetRemoteObjectChecksum: %v", err) + } + got, err := s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if got.ChecksumAlgo != nullStr("sha256") || got.Checksum != nullStr("deadbeef") { + t.Fatalf("pair = (%+v, %+v), want (sha256, deadbeef)", got.ChecksumAlgo, got.Checksum) + } + if got.UploadedRunID != runID || got.VerifiedAtNs.Valid { + t.Fatalf("row = %+v, want run %d and no verification stamp", got, runID) + } +} + +// TestSetRemoteObjectChecksumRefusesOverwrite: a recorded fingerprint is +// the comparison baseline for every later verification and must never be +// silently replaced. +func TestSetRemoteObjectChecksumRefusesOverwrite(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + ChecksumAlgo: nullStr("sha256"), Checksum: nullStr("original"), + }); err != nil { + t.Fatalf("InsertRemoteObject: %v", err) + } + err := s.SetRemoteObjectChecksum(ctx, contentID, "bucket-a", "sha256", "tampered") + if err == nil { + t.Fatalf("overwrite of a recorded fingerprint succeeded") + } + got, err := s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if got.Checksum != nullStr("original") { + t.Fatalf("checksum = %+v after refused overwrite, want original", got.Checksum) + } +} + +func TestSetRemoteObjectChecksumUnknownPair(t *testing.T) { + s := openTestStore(t) + if err := s.SetRemoteObjectChecksum(context.Background(), 1, "bucket-a", "sha256", "x"); err == nil { + t.Fatalf("checksum for unrecorded upload succeeded, want error") + } +} + +// TestListRemoteObjects: the listing joins the content hash in, filters +// by destination, and orders by hash. +func TestListRemoteObjects(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + runID := makeRun(t, s, vID) + for i, b := range []byte{0xbb, 0xaa} { + path := []string{"b.txt", "a.txt"}[i] + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: path, Blake3: digest(b), SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert %s: %v", path, err) + } + row, err := s.GetByPath(ctx, vID, path) + if err != nil { + t.Fatalf("GetByPath %s: %v", path, err) + } + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: row.ContentID, Destination: "bucket-a", UploadedRunID: runID, + }); err != nil { + t.Fatalf("InsertRemoteObject %s: %v", path, err) + } + if i == 0 { + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: row.ContentID, Destination: "bucket-b", UploadedRunID: runID, + ChecksumAlgo: nullStr("sha1"), Checksum: nullStr("ff"), + }); err != nil { + t.Fatalf("InsertRemoteObject bucket-b: %v", err) + } + } + } + + got, err := s.ListRemoteObjects(ctx, "bucket-a") + if err != nil { + t.Fatalf("ListRemoteObjects: %v", err) + } + if len(got) != 2 { + t.Fatalf("len = %d, want 2 (bucket-b row filtered out): %+v", len(got), got) + } + if string(got[0].Blake3) != string(digest(0xaa)) || string(got[1].Blake3) != string(digest(0xbb)) { + t.Fatalf("order = %x, %x; want ascending by hash", got[0].Blake3, got[1].Blake3) + } + if got[0].Destination != "bucket-a" || got[0].ChecksumAlgo.Valid { + t.Fatalf("row = %+v, want pending bucket-a record", got[0]) + } +} + +// TestBeginRemoteVerifyRun: the verification pass rides on a kind='audit' +// run with no volume and no destination, finishable like any other run. +func TestBeginRemoteVerifyRun(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + id, err := s.BeginRemoteVerifyRun(ctx) + if err != nil { + t.Fatalf("BeginRemoteVerifyRun: %v", err) + } + run, err := s.GetRun(ctx, id) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Kind != RunKindAudit || run.VolumeID.Valid || run.Destination.Valid || run.Status != RunStatusRunning { + t.Fatalf("run = %+v, want a running audit run with NULL volume and destination", run) + } + if err := s.FinishRun(ctx, id, RunStatusSuccess, "", 3); err != nil { + t.Fatalf("FinishRun: %v", err) + } +} + +// TestRemoteObjectInsertRefusesDuplicate: the fingerprint recorded at +// upload time is what later verifications compare against, so a second +// insert for the same (content, destination) must fail loudly instead +// of replacing it. +func TestRemoteObjectInsertRefusesDuplicate(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + obj := RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + ChecksumAlgo: nullStr("etag-md5"), Checksum: nullStr("aaaa"), + } + if err := s.InsertRemoteObject(ctx, obj); err != nil { + t.Fatalf("first insert: %v", err) + } + obj.Checksum = nullStr("bbbb") + if err := s.InsertRemoteObject(ctx, obj); err == nil { + t.Fatalf("duplicate insert succeeded; fingerprint silently replaced") + } + got, err := s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if got.Checksum != nullStr("aaaa") { + t.Fatalf("checksum = %+v after refused duplicate, want original %q", got.Checksum, "aaaa") + } +} + +// TestRemoteObjectFingerprintPending: the content-addressed push +// records the upload with the checksum pair NULL; the record gates +// upload-once dedup (HasRemoteObject) until the scan-back pass fills +// the fingerprint in. +func TestRemoteObjectFingerprintPending(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + has, err := s.HasRemoteObject(ctx, contentID, "bucket-a") + if err != nil || has { + t.Fatalf("HasRemoteObject before insert = (%t, %v), want (false, nil)", has, err) + } + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + }); err != nil { + t.Fatalf("InsertRemoteObject (pending fingerprint): %v", err) + } + got, err := s.GetRemoteObject(ctx, contentID, "bucket-a") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if got.ChecksumAlgo.Valid || got.Checksum.Valid { + t.Fatalf("pending upload carries a fingerprint: %+v", got) + } + has, err = s.HasRemoteObject(ctx, contentID, "bucket-a") + if err != nil || !has { + t.Fatalf("HasRemoteObject after insert = (%t, %v), want (true, nil)", has, err) + } +} + +// TestRemoteObjectChecksumPairEnforced: a checksum without its +// algorithm (or vice versa) is uninterpretable, refused by both the Go +// validation and the schema CHECK. +func TestRemoteObjectChecksumPairEnforced(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + ChecksumAlgo: nullStr("etag-md5"), + }); err == nil { + t.Fatalf("algo without checksum accepted") + } + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: runID, + Checksum: nullStr("aaaa"), + }); err == nil { + t.Fatalf("checksum without algo accepted") + } + // The schema CHECK is the backstop when a write bypasses + // InsertRemoteObject's validation. + if _, err := s.db.ExecContext(ctx, ` + INSERT INTO remote_objects (content_id, destination, uploaded_run_id, checksum_algo, checksum) + VALUES (?, 'bucket-a', ?, 'etag-md5', NULL) + `, contentID, runID); err == nil { + t.Fatalf("schema CHECK accepted a half-set checksum pair") + } +} + +// TestRemoteObjectFKsEnforced: content_id and uploaded_run_id are real +// FKs — a fingerprint for content or a run the index doesn't know is a +// caller bug. +func TestRemoteObjectFKsEnforced(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + contentID, runID := remoteObjectFixture(t, s) + + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: 99999, Destination: "bucket-a", UploadedRunID: runID, + ChecksumAlgo: nullStr("etag-md5"), Checksum: nullStr("aaaa"), + }); err == nil { + t.Fatalf("bogus content id accepted; FK not enforced") + } + if err := s.InsertRemoteObject(ctx, RemoteObject{ + ContentID: contentID, Destination: "bucket-a", UploadedRunID: 99999, + ChecksumAlgo: nullStr("etag-md5"), Checksum: nullStr("aaaa"), + }); err == nil { + t.Fatalf("bogus run id accepted; FK not enforced") + } +} + +// TestMarkRemoteObjectVerifiedUnknownPair: verifying an unrecorded +// upload errors. +func TestMarkRemoteObjectVerifiedUnknownPair(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + if err := s.MarkRemoteObjectVerified(ctx, 1, "bucket-a", 1); err == nil { + t.Fatalf("verify of unrecorded upload succeeded, want error") + } +} + +// TestGetRemoteObjectNotFound: the missing pair surfaces as +// sql.ErrNoRows so callers share the store's IsNotFound convention. +func TestGetRemoteObjectNotFound(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + _, err := s.GetRemoteObject(ctx, 1, "bucket-a") + if !errors.Is(err, sql.ErrNoRows) { + t.Fatalf("err = %v, want sql.ErrNoRows", err) + } +} diff --git a/store/runs.go b/store/runs.go index 1dcc334..ed48fa0 100644 --- a/store/runs.go +++ b/store/runs.go @@ -11,15 +11,20 @@ import ( // Run kinds. The runs.volume_id column is nullable so a future sync run can // span volumes; index runs are always scoped to a single volume. Sync and // restore runs additionally carry a non-empty runs.destination naming the -// rclone destination; index and audit runs leave destination NULL. Audit -// runs share the index run-kind's shape — they walk a volume root and -// reconcile the index with on-disk reality — but are tagged separately so -// out-of-band drift detections don't dilute the index-run history. +// rclone destination; index, audit, and offload runs leave destination +// NULL. Audit runs share the index run-kind's shape — they walk a volume +// root and reconcile the index with on-disk reality — but are tagged +// separately so out-of-band drift detections don't dilute the index-run +// history. Offload runs record the local "remove on-disk bytes whose +// content is durable on the required destinations" operation; the files +// rows they touch flip 'present' → 'offloaded' with last_seen_run_id set +// to the offload run. const ( RunKindIndex = "index" RunKindSync = "sync" RunKindRestore = "restore" RunKindAudit = "audit" + RunKindOffload = "offload" ) // Run statuses. A run begins in 'running' and is moved to a terminal state by @@ -138,6 +143,28 @@ func (s *Store) BeginIndexRun(ctx context.Context, kind string, volumeID int64, return id, nil } +// BeginRemoteVerifyRun records the start of a remote-object verification +// pass as a kind='audit' run. The pass is destination-scoped rather than +// volume-scoped — the content-addressed objects/ space is shared by +// every volume — so volume_id is NULL; and the runs CHECK keeps +// destination NULL on audit rows, so the verified destination is +// recorded in the run's 'verify-destination' runs_audit note instead. +// Callers must pair it with FinishRun. +func (s *Store) BeginRemoteVerifyRun(ctx context.Context) (int64, error) { + res, err := s.db.ExecContext(ctx, ` + INSERT INTO runs (kind, volume_id, destination, started_at_ns, status, file_count) + VALUES ('audit', NULL, NULL, ?, 'running', 0) + `, NowNs()) + if err != nil { + return 0, fmt.Errorf("insert remote-verify run: %w", err) + } + id, err := res.LastInsertId() + if err != nil { + return 0, fmt.Errorf("remote-verify run last insert id: %w", err) + } + return id, nil +} + // BeginPeerSyncRun is BeginRun's sibling for kind='sync' rows tied to a // peer node. It records the (peer_node_id, correlated_run_id) pair // alongside the regular destination name (the peer's name from the @@ -481,12 +508,26 @@ type SyncRunSpec struct { } // BeginSyncRunIfClear atomically inserts a 'running' kind='sync' row for -// (volume, destination) iff no other such row is currently in flight. -// The check and the insert run inside a single BEGIN IMMEDIATE -// transaction (the store's DSN sets `_txlock=immediate`), so two -// concurrent callers cannot both observe "no running run" and both -// insert — the second one's transaction sees the first one's row and -// returns it as the blocker. +// (volume, destination) iff no sync of the same pair and no index or +// audit run on the volume is currently in flight. The check and the +// insert run inside a single BEGIN IMMEDIATE transaction (the store's +// DSN sets `_txlock=immediate`), so two concurrent callers cannot both +// observe "no running run" and both insert — the second one's +// transaction sees the first one's row and returns it as the blocker. +// +// Cross-kind exclusion: an in-flight index or audit run on the volume +// blocks a sync. A sync advances the destination's durability vector from +// the present set its own enumeration captured; an index or audit +// committing a new present row concurrently would otherwise let that +// advance claim durability for content the sync never transferred. The +// block is one-directional — a running sync does NOT block a new index +// (see BeginIndexRunIfClear for why) — so the guard lives only on the +// sync side. Syncs to *different* destinations stay free to overlap — +// they touch disjoint vectors. +// +// An in-flight offload also blocks: offload unlinks on-disk bytes the sync +// is enumerating, and offload itself blocks on every kind, so the sync and +// index gates name it too to keep the exclusion symmetric. // // Returns (newID, nil, nil) when the row was inserted; (0, &blocker, // nil) when refused — the caller is expected to render a diagnostic @@ -507,8 +548,11 @@ func (s *Store) BeginSyncRunIfClear(ctx context.Context, spec SyncRunSpec) (int6 row := tx.QueryRowContext(ctx, ` SELECT `+runColumns+` FROM runs - WHERE kind = 'sync' AND status = 'running' - AND volume_id = ? AND destination = ? + WHERE status = 'running' AND volume_id = ? + AND ( + (kind = 'sync' AND destination = ?) + OR kind IN ('index', 'audit', 'offload') + ) ORDER BY id LIMIT 1 `, spec.VolumeID, spec.Destination) blocker, scanErr := scanRun(row.Scan) @@ -539,14 +583,26 @@ func (s *Store) BeginSyncRunIfClear(ctx context.Context, spec SyncRunSpec) (int6 } // BeginIndexRunIfClear atomically inserts a 'running' kind='index' or -// kind='audit' row for volumeID iff no other index- or audit-kind run -// is currently in flight against the same volume. Symmetric to +// kind='audit' row for volumeID iff no index-, audit-, or offload-kind +// run is currently in flight against the same volume. Symmetric to // BeginSyncRunIfClear (BEGIN IMMEDIATE + check + insert in one tx) so // two concurrent callers cannot both observe "no running run" and both // insert. Cross-kind: an in-flight 'index' blocks a new 'audit' and // vice versa because both walk the volume and call MarkMissing with // their own run-id — letting them overlap is exactly the bug this -// guards against. +// guards against. An in-flight offload blocks too: it unlinks bytes the +// walk would otherwise observe and flip, and offload defers to every +// kind, so the block is symmetric. +// +// A running sync does not block an index here, while a running index +// does block a new sync (BeginSyncRunIfClear). The asymmetry is +// deliberate: the sync's durability advance is pinned to the present-set +// snapshot it captured at the start (AdvanceDestinationVectorTo), so an +// index committing rows mid-sync cannot be folded into that advance, and +// the agent scheduler's invariant of always indexing before a sync would +// otherwise wedge whenever an unrelated sync is in flight. The guard +// that matters for soundness — keeping an index from mutating the tree +// while a sync captures its enumeration — lives on the sync side. // // Returns (newID, nil, nil) when the row was inserted; (0, &blocker, // nil) when refused. Stale rows from crashed runs keep blocking here @@ -564,7 +620,7 @@ func (s *Store) BeginIndexRunIfClear(ctx context.Context, kind string, volumeID row := tx.QueryRowContext(ctx, ` SELECT `+runColumns+` FROM runs - WHERE kind IN ('index', 'audit') AND status = 'running' + WHERE kind IN ('index', 'audit', 'offload') AND status = 'running' AND volume_id = ? ORDER BY id LIMIT 1 `, volumeID) @@ -593,6 +649,56 @@ func (s *Store) BeginIndexRunIfClear(ctx context.Context, kind string, volumeID return id, nil, nil } +// BeginOffloadRunIfClear atomically inserts a 'running' kind='offload' +// row for volumeID iff no run of any kind is currently in flight +// against the volume. Offload is the one operation that deletes user +// data, so it defers to everything else touching the volume: a +// concurrent index or audit would race its walk against the unlinks, +// and a concurrent sync or restore could rewrite a file between the +// pre-unlink verification and the unlink. Same BEGIN IMMEDIATE +// check+insert shape as BeginSyncRunIfClear; stale 'running' rows from +// crashed runs keep blocking until cleared via `runs fail`. +// +// Returns (newID, nil, nil) when the row was inserted; (0, &blocker, +// nil) when refused. +func (s *Store) BeginOffloadRunIfClear(ctx context.Context, volumeID int64) (int64, *Run, error) { + tx, err := s.db.BeginTx(ctx, nil) + if err != nil { + return 0, nil, fmt.Errorf("begin offload-run tx: %w", err) + } + defer func() { _ = tx.Rollback() }() + + row := tx.QueryRowContext(ctx, ` + SELECT `+runColumns+` + FROM runs + WHERE status = 'running' AND volume_id = ? + ORDER BY id LIMIT 1 + `, volumeID) + blocker, scanErr := scanRun(row.Scan) + if scanErr == nil { + return 0, &blocker, nil + } + if !errors.Is(scanErr, sql.ErrNoRows) { + return 0, nil, fmt.Errorf("check running runs: %w", scanErr) + } + + res, err := tx.ExecContext(ctx, ` + INSERT INTO runs (kind, volume_id, destination, started_at_ns, status, file_count, shallow) + VALUES ('offload', ?, NULL, ?, 'running', 0, 0) + `, volumeID, NowNs()) + if err != nil { + return 0, nil, fmt.Errorf("insert offload run: %w", err) + } + id, err := res.LastInsertId() + if err != nil { + return 0, nil, fmt.Errorf("offload run last insert id: %w", err) + } + if err := tx.Commit(); err != nil { + return 0, nil, fmt.Errorf("commit offload run: %w", err) + } + return id, nil, nil +} + // LatestSuccessfulIndexRun returns the most recent index run for the given // volume that finished in status 'success' or 'partial'. Used by the sync // command as a prerequisite check: refusing to sync a volume that has never @@ -643,6 +749,49 @@ func (s *Store) LatestFinishedRun(ctx context.Context, kind string, volumeID int return scanRun(row.Scan) } +// LatestSuccessfulSyncRun returns the most recent kind='sync' run for +// the (volume, destination) pair that finished in status 'success'. +// The content-addressed push uses it as the destination's manifest +// watermark: a success means that run's objects and segment were +// confirmed landed, so the next delta starts after it; failed and +// partial runs left no segment and must stay covered by the next delta. +// Returns sql.ErrNoRows when the destination has no successful sync of +// the volume yet. +func (s *Store) LatestSuccessfulSyncRun(ctx context.Context, volumeID int64, destination string) (Run, error) { + row := s.db.QueryRowContext(ctx, ` + SELECT `+runColumns+` + FROM runs + WHERE kind = 'sync' AND volume_id = ? AND destination = ? + AND status = 'success' + ORDER BY id DESC LIMIT 1 + `, volumeID, destination) + return scanRun(row.Scan) +} + +// LastSuccessfulWholeVolumePushRunID returns the highest local run id of +// a successful whole-volume push of the volume to destination, or 0 when +// the destination has no such push yet. Every curated push (rclone +// bucket, content-addressed, kopia, and the peer handshake) records a +// kind='sync' run whose status reaches 'success' only on a fully landed, +// verified transfer of the volume's present set, so the max successful +// sync id is the freshness watermark in local run space: a path that +// became present after this run has not been pushed to destination since +// it last changed. The offload gate requires this watermark to be at or +// beyond a gated path's files.status_changed_run_id before it deletes +// the local copy. +func (s *Store) LastSuccessfulWholeVolumePushRunID(ctx context.Context, volumeID int64, destination string) (int64, error) { + var id sql.NullInt64 + err := s.db.QueryRowContext(ctx, ` + SELECT MAX(id) FROM runs + WHERE kind = 'sync' AND status = 'success' + AND volume_id = ? AND destination = ? + `, volumeID, destination).Scan(&id) + if err != nil { + return 0, fmt.Errorf("last successful whole-volume push to %q: %w", destination, err) + } + return id.Int64, nil +} + // LatestSuccessfulRunsByVolumeAndKind returns the most recent success or // partial run for each (volume_id, kind) pair in one SQL pass, as a // nested map keyed first by volume id and then by run kind. Used by the diff --git a/store/runs_audit.go b/store/runs_audit.go index ea72e61..e74fb48 100644 --- a/store/runs_audit.go +++ b/store/runs_audit.go @@ -27,6 +27,12 @@ const ( // trusting the live, overwrite-in-place runs.correlated_run_id column // (SAFETY-AUDIT H6). TransitionSetCorrelatedRunID = "set-correlated-run-id" + // TransitionVerifyDestination records a remote-object verification + // pass against its kind='audit' run (see BeginRemoteVerifyRun). The + // note carries the destination name and the pass counters — the runs + // CHECK keeps destination NULL on audit rows, so this entry is where + // the audit trail names the verified destination. + TransitionVerifyDestination = "verify-destination" ) // RunAudit is one row of the insert-only runs_audit log: a single diff --git a/store/runs_test.go b/store/runs_test.go index b6dfb40..462fad3 100644 --- a/store/runs_test.go +++ b/store/runs_test.go @@ -166,9 +166,11 @@ func TestRunsAuditForeignKey(t *testing.T) { } // TestMigrateV10ToV11CreatesRunsAudit seeds a v10-shape database by hand -// (schema_version + volumes + runs, the FK target the migration needs), -// opens it to trigger the v10→v11 step, and asserts the runs_audit table -// exists, is openable, and accepts an insert against the seeded run. +// (schema_version + volumes + runs, the FK target the migration needs, +// plus an empty v10-shape files table so the later v14 contents split +// has its input), opens it to trigger the v10→v11 step, and asserts the +// runs_audit table exists, is openable, and accepts an insert against +// the seeded run. func TestMigrateV10ToV11CreatesRunsAudit(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") rawDB, err := sql.Open("sqlite", dsn) @@ -192,6 +194,20 @@ func TestMigrateV10ToV11CreatesRunsAudit(t *testing.T) { correlated_run_id INTEGER, shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)) )`, + `CREATE TABLE files ( + folder_id INTEGER NOT NULL, + name TEXT NOT NULL, + blake3 BLOB NOT NULL CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded')), + first_seen_run_id INTEGER NOT NULL, + last_seen_run_id INTEGER NOT NULL, + indexed_at_ns INTEGER NOT NULL, + source_node_id INTEGER, + source_run_id INTEGER, + PRIMARY KEY (folder_id, name, blake3) + )`, `INSERT INTO schema_version (version) VALUES (10)`, `INSERT INTO volumes (id, name, path) VALUES (1, 'v', '/v')`, `INSERT INTO runs (id, kind, volume_id, started_at_ns, status, file_count) VALUES (1, 'index', 1, 100, 'success', 5)`, diff --git a/store/schema.sql b/store/schema.sql index 1424807..af2afc2 100644 --- a/store/schema.sql +++ b/store/schema.sql @@ -1,40 +1,80 @@ -- Generated by `go test ./store -update-schema` — DO NOT EDIT. -- --- Flattened snapshot of the squirrel index schema at version 13, for humans +-- Flattened snapshot of the squirrel index schema at version 21, for humans -- and agents who want the current shape without replaying the migration -- chain in migrations.go. It is NOT used to create or migrate databases — -- a fresh DB is built by applyV5 plus the migration registry. The golden -- test TestSchemaSnapshot fails if this file drifts from that chain. -CREATE TABLE "files" ( - folder_id INTEGER NOT NULL REFERENCES folders(id), - name TEXT NOT NULL, - blake3 BLOB NOT NULL CHECK (length(blake3) = 32), - size_bytes INTEGER NOT NULL, - mtime_ns INTEGER NOT NULL, - status TEXT NOT NULL CHECK (status IN ('present','missing','superseded')), - first_seen_run_id INTEGER NOT NULL REFERENCES runs(id), - last_seen_run_id INTEGER NOT NULL REFERENCES runs(id), - indexed_at_ns INTEGER NOT NULL, - source_node_id INTEGER REFERENCES nodes(id), - source_run_id INTEGER REFERENCES runs(id), - PRIMARY KEY (folder_id, name, blake3) - ); - -CREATE INDEX idx_files_blake3 ON files(blake3, folder_id, name); - -CREATE INDEX idx_files_missing ON files(folder_id, name) WHERE status = 'missing'; +CREATE TABLE contents ( + id INTEGER PRIMARY KEY, + blake3 BLOB NOT NULL UNIQUE CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + origin_node_id INTEGER REFERENCES nodes(id), + origin_run_id INTEGER + ); -CREATE INDEX idx_files_source_node ON files(source_node_id) - WHERE status = 'present' AND source_node_id IS NOT NULL; +CREATE INDEX idx_contents_origin_node ON contents(origin_node_id) + WHERE origin_node_id IS NOT NULL; -CREATE UNIQUE INDEX uniq_files_live_per_path ON files(folder_id, name) WHERE status != 'superseded'; +CREATE TRIGGER contents_no_delete BEFORE DELETE ON contents + BEGIN + SELECT RAISE(ABORT, 'contents is append-only; a content row is never deleted'); + END; -CREATE TRIGGER files_blake3_immutable BEFORE UPDATE OF blake3 ON files +CREATE TRIGGER contents_no_update BEFORE UPDATE ON contents BEGIN - SELECT RAISE(ABORT, 'blake3 is immutable; supersede the row and insert a new one'); + SELECT RAISE(ABORT, 'contents is append-only; supersede the files row and insert new content instead of updating'); END; +CREATE TABLE destination_push_freshness ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, + PRIMARY KEY (volume_id, destination, origin_node_id) + ); + +CREATE TABLE destination_run_ids ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, verify_method TEXT, + PRIMARY KEY (volume_id, destination, origin_node_id) + ); + +CREATE TABLE destination_run_ids_history ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL, + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL, + origin_run_id INTEGER NOT NULL, + at_ns INTEGER NOT NULL + , verify_method TEXT); + +CREATE INDEX idx_destination_run_ids_history + ON destination_run_ids_history(volume_id, destination); + +CREATE TABLE "files" ( + folder_id INTEGER NOT NULL REFERENCES folders(id), + name TEXT NOT NULL, + content_id INTEGER NOT NULL REFERENCES contents(id), + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded','offloaded')), + first_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + last_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + indexed_at_ns INTEGER NOT NULL, status_changed_run_id INTEGER REFERENCES runs(id), + PRIMARY KEY (folder_id, name, content_id) + ); + +CREATE INDEX idx_files_content ON files(content_id); + +CREATE INDEX idx_files_missing ON files(folder_id, name) WHERE status = 'missing'; + +CREATE UNIQUE INDEX uniq_files_live_per_path ON files(folder_id, name) WHERE status != 'superseded'; + CREATE TABLE folders ( id INTEGER PRIMARY KEY, volume_id INTEGER NOT NULL REFERENCES volumes(id), @@ -94,9 +134,20 @@ CREATE TABLE peer_sync_state_history ( CREATE INDEX idx_peer_sync_history_pair ON peer_sync_state_history(volume_id, peer_node_id); +CREATE TABLE "remote_objects" ( + content_id INTEGER NOT NULL REFERENCES contents(id), + destination TEXT NOT NULL, + uploaded_run_id INTEGER NOT NULL REFERENCES runs(id), + checksum_algo TEXT, + checksum TEXT, + verified_at_ns INTEGER, + PRIMARY KEY (content_id, destination), + CHECK ((checksum_algo IS NULL) = (checksum IS NULL)) + ); + CREATE TABLE "runs" ( id INTEGER PRIMARY KEY, - kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit')), + kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit','offload')), volume_id INTEGER REFERENCES volumes(id), destination TEXT, started_at_ns INTEGER NOT NULL, @@ -105,9 +156,10 @@ CREATE TABLE "runs" ( error TEXT, file_count INTEGER NOT NULL DEFAULT 0, peer_node_id INTEGER REFERENCES nodes(id), - correlated_run_id INTEGER, shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)), + correlated_run_id INTEGER, + shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)), CHECK ( - (kind IN ('index','audit') AND destination IS NULL) OR + (kind IN ('index','audit','offload') AND destination IS NULL) OR (kind IN ('sync','restore') AND destination IS NOT NULL AND destination != '') ) ); diff --git a/store/status_changed_test.go b/store/status_changed_test.go new file mode 100644 index 0000000..e8e7c24 --- /dev/null +++ b/store/status_changed_test.go @@ -0,0 +1,112 @@ +package store + +import ( + "context" + "database/sql" + "testing" +) + +// statusChangedRun reads the stamp for the single a.txt row carrying +// the given status. The test keeps at most one row per status at the +// path so the read is unambiguous. +func statusChangedRun(t *testing.T, s *Store, volumeID int64, status string) int64 { + t.Helper() + var v sql.NullInt64 + err := s.db.QueryRowContext(context.Background(), ` + SELECT f.status_changed_run_id FROM files f + JOIN folders fo ON fo.id = f.folder_id + WHERE fo.volume_id = ? AND fo.path = '' AND f.name = 'a.txt' AND f.status = ? + `, volumeID, status).Scan(&v) + if err != nil { + t.Fatalf("read status_changed_run_id for a.txt (%s): %v", status, err) + } + if !v.Valid { + t.Fatalf("status_changed_run_id for a.txt (%s) is NULL", status) + } + return v.Int64 +} + +// TestStatusChangedRunStamps walks one path through every transition +// the row state machine supports and pins where the stamp lands: +// insert, unchanged re-observation, supersession, missing flip, +// reappearance, and content revert. The stamp is what the +// content-addressed manifest delta keys on, so each transition must +// move it exactly once — and the no-op touch must leave it alone. +func TestStatusChangedRunStamps(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + upsert := func(runID int64, digestByte byte) { + t.Helper() + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: "a.txt", Blake3: digest(digestByte), SizeBytes: 1, + MtimeNs: runID, Status: StatusPresent, + FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: runID, + }, nil); err != nil { + t.Fatalf("Upsert run %d: %v", runID, err) + } + } + + r1 := makeRun(t, s, vID) + upsert(r1, 0x11) + if got := statusChangedRun(t, s, vID, StatusPresent); got != r1 { + t.Fatalf("insert stamp = %d, want %d", got, r1) + } + + r2 := makeRun(t, s, vID) + if err := s.TouchSeen(ctx, vID, "a.txt", r2); err != nil { + t.Fatalf("TouchSeen: %v", err) + } + if got := statusChangedRun(t, s, vID, StatusPresent); got != r1 { + t.Fatalf("unchanged re-observation moved the stamp to %d, want %d", got, r1) + } + + r3 := makeRun(t, s, vID) + upsert(r3, 0x22) + if got := statusChangedRun(t, s, vID, StatusSuperseded); got != r3 { + t.Fatalf("superseded stamp = %d, want %d", got, r3) + } + if got := statusChangedRun(t, s, vID, StatusPresent); got != r3 { + t.Fatalf("replacement stamp = %d, want %d", got, r3) + } + + r4 := makeRun(t, s, vID) + if _, err := s.MarkMissing(ctx, vID, r4); err != nil { + t.Fatalf("MarkMissing: %v", err) + } + if got := statusChangedRun(t, s, vID, StatusMissing); got != r4 { + t.Fatalf("missing stamp = %d, want %d", got, r4) + } + + r5 := makeRun(t, s, vID) + if err := s.TouchSeen(ctx, vID, "a.txt", r5); err != nil { + t.Fatalf("TouchSeen reappear: %v", err) + } + if got := statusChangedRun(t, s, vID, StatusPresent); got != r5 { + t.Fatalf("reappear stamp = %d, want %d", got, r5) + } + + // Content revert: the original digest comes back, reviving the + // superseded row and superseding the one that displaced it. + r6 := makeRun(t, s, vID) + upsert(r6, 0x11) + if got := statusChangedRun(t, s, vID, StatusPresent); got != r6 { + t.Fatalf("revived stamp = %d, want %d", got, r6) + } + if got := statusChangedRun(t, s, vID, StatusSuperseded); got != r6 { + t.Fatalf("displaced stamp = %d, want %d", got, r6) + } + + row, err := s.GetByPath(ctx, vID, "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + r7 := makeRun(t, s, vID) + if err := s.MarkOffloaded(ctx, vID, "a.txt", row.ContentID, r7); err != nil { + t.Fatalf("MarkOffloaded: %v", err) + } + if got := statusChangedRun(t, s, vID, StatusOffloaded); got != r7 { + t.Fatalf("offloaded stamp = %d, want %d", got, r7) + } +} diff --git a/store/store_test.go b/store/store_test.go index 7ba4e07..c765b37 100644 --- a/store/store_test.go +++ b/store/store_test.go @@ -904,16 +904,21 @@ func TestMigrateV3ToV4(t *testing.T) { t.Fatalf("schema_version = %d, want %d", v, SchemaVersion) } - // PK now includes blake3 — confirm by inserting a second row at the - // same (folder, name) but different blake3, which would have collided - // pre-v4. v8 keys files off (folder_id, name) but the same widening - // invariant applies. + // PK now includes the content identity — confirm by inserting a second + // row at the same (folder, name) with different content, which would + // have collided pre-v4. v14 keys files off (folder_id, name, + // content_id) but the same widening invariant applies. d2 := digest(0x66) + if _, err := s.db.ExecContext(ctx, + `INSERT INTO contents (blake3, size_bytes) VALUES (?, 1024)`, d2); err != nil { + t.Fatalf("insert second content: %v", err) + } if _, err := s.db.ExecContext(ctx, ` - INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) - VALUES ((SELECT id FROM folders WHERE volume_id = 1 AND path = ''), 'photo.jpg', ?, 1024, 60, 'superseded', 1, 1, 60) + INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) + VALUES ((SELECT id FROM folders WHERE volume_id = 1 AND path = ''), 'photo.jpg', + (SELECT id FROM contents WHERE blake3 = ?), 60, 'superseded', 1, 1, 60) `, d2); err != nil { - t.Fatalf("insert second blake3 at same path failed (PK not widened?): %v", err) + t.Fatalf("insert second content at same path failed (PK not widened?): %v", err) } // Status CHECK should accept 'superseded'. @@ -1040,11 +1045,12 @@ func TestUpsertRevertContent(t *testing.T) { } } -// TestTriggerRejectsBlake3Update guards the schema-level "blake3 is -// immutable" rule. The trigger must reject any UPDATE that mentions blake3 -// in its SET clause, even if invoked outside of Upsert (e.g. via raw SQL in -// some future code path). -func TestTriggerRejectsBlake3Update(t *testing.T) { +// TestContentsRowSharedAcrossPaths pins the id↔hash construction the v14 +// split rests on: every path observing the same bytes resolves to the +// same contents row, and the UNIQUE constraint on contents.blake3 rejects +// a second row for the same digest, so a content id can never silently +// change which hash it stands for. +func TestContentsRowSharedAcrossPaths(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := Open(dsn) if err != nil { @@ -1055,33 +1061,42 @@ func TestTriggerRejectsBlake3Update(t *testing.T) { vID := makeVolume(t, s, "/v") run := makeRun(t, s, vID) - if err := s.Upsert(ctx, FileRow{ - VolumeID: vID, Path: "x", Blake3: digest(0xaa), SizeBytes: 1, MtimeNs: 1, - Status: StatusPresent, FirstSeenRunID: run, LastSeenRunID: run, IndexedAtNs: 1, - }, nil); err != nil { - t.Fatalf("Upsert: %v", err) + d := digest(0xaa) + for _, p := range []string{"x", "sub/y"} { + if err := s.Upsert(ctx, FileRow{ + VolumeID: vID, Path: p, Blake3: d, SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: run, LastSeenRunID: run, IndexedAtNs: 1, + }, nil); err != nil { + t.Fatalf("Upsert %s: %v", p, err) + } } - // Direct UPDATE bypassing the Upsert state machine — the trigger must - // abort it. - _, err = s.db.ExecContext(ctx, - `UPDATE files SET blake3 = ? - WHERE folder_id = (SELECT id FROM folders WHERE volume_id = ? AND path = '') AND name = ?`, - digest(0xbb), vID, "x") - if err == nil { - t.Fatalf("direct UPDATE of blake3 succeeded; trigger did not fire") + rowX, err := s.GetByPath(ctx, vID, "x") + if err != nil { + t.Fatalf("GetByPath x: %v", err) + } + rowY, err := s.GetByPath(ctx, vID, "sub/y") + if err != nil { + t.Fatalf("GetByPath sub/y: %v", err) } - if !strings.Contains(err.Error(), "blake3 is immutable") { - t.Fatalf("got error %q, want one mentioning blake3 immutability", err) + if rowX.ContentID == 0 || rowX.ContentID != rowY.ContentID { + t.Fatalf("content ids = (%d, %d), want one shared non-zero id", rowX.ContentID, rowY.ContentID) } - // Untouched: the original row still has its original hash. - row, err := s.GetByPath(ctx, vID, "x") - if err != nil { - t.Fatalf("GetByPath: %v", err) + var n int + if err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM contents`).Scan(&n); err != nil { + t.Fatalf("count contents: %v", err) } - if !bytes.Equal(row.Blake3, digest(0xaa)) { - t.Fatalf("blake3 = %x, want %x (trigger should have aborted the UPDATE)", row.Blake3, digest(0xaa)) + if n != 1 { + t.Fatalf("contents rows = %d, want 1 (one row per distinct hash)", n) + } + + _, err = s.db.ExecContext(ctx, `INSERT INTO contents (blake3, size_bytes) VALUES (?, 1)`, d) + if err == nil { + t.Fatalf("second contents row for the same blake3 succeeded; UNIQUE did not fire") + } + if !strings.Contains(err.Error(), "UNIQUE") { + t.Fatalf("got error %q, want one mentioning UNIQUE constraint", err) } } @@ -1115,10 +1130,20 @@ func TestUniqueIndexRejectsSecondLiveRow(t *testing.T) { `SELECT id FROM folders WHERE volume_id = ? AND path = ''`, vID).Scan(&rootFolderID); err != nil { t.Fatalf("lookup root folder: %v", err) } + insertContent := func(d []byte) int64 { + t.Helper() + res, err := s.db.ExecContext(ctx, + `INSERT INTO contents (blake3, size_bytes) VALUES (?, 1)`, d) + if err != nil { + t.Fatalf("insert content: %v", err) + } + id, _ := res.LastInsertId() + return id + } _, err = s.db.ExecContext(ctx, ` - INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) - VALUES (?, ?, ?, ?, ?, 'present', ?, ?, ?) - `, rootFolderID, "x", digest(0xbb), 1, 2, run, run, 2) + INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) + VALUES (?, ?, ?, ?, 'present', ?, ?, ?) + `, rootFolderID, "x", insertContent(digest(0xbb)), 2, run, run, 2) if err == nil { t.Fatalf("direct INSERT of second live row succeeded; unique index did not fire") } @@ -1130,17 +1155,18 @@ func TestUniqueIndexRejectsSecondLiveRow(t *testing.T) { // superseded rows are exempt from the partial unique constraint, so the // schema supports unbounded historical depth per path. if _, err := s.db.ExecContext(ctx, ` - INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) - VALUES (?, ?, ?, ?, ?, 'superseded', ?, ?, ?) - `, rootFolderID, "x", digest(0xcc), 1, 3, run, run, 3); err != nil { + INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) + VALUES (?, ?, ?, ?, 'superseded', ?, ?, ?) + `, rootFolderID, "x", insertContent(digest(0xcc)), 3, run, run, 3); err != nil { t.Fatalf("inserting superseded row should be allowed, got: %v", err) } } -// TestMigrateV3ToV4InstallsSchemaGuards verifies that a v3 database upgraded -// to v4 ends up with the trigger and the partial unique index, not just the -// widened PK and the superseded status. Without these, the migration would -// leave existing databases lacking the enforcement that fresh installs get. +// TestMigrateV3ToV4InstallsSchemaGuards verifies that a v3 database +// migrated through the full chain ends up with the same enforcement a +// fresh install gets: the partial unique index keeps one live row per +// path, and the seeded content survives the v14 split with its hash +// resolvable through contents. func TestMigrateV3ToV4InstallsSchemaGuards(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") rawDB, err := sql.Open("sqlite", dsn) @@ -1171,20 +1197,25 @@ func TestMigrateV3ToV4InstallsSchemaGuards(t *testing.T) { defer s.Close() ctx := context.Background() - // Trigger must reject blake3 updates on the migrated DB. - _, err = s.db.ExecContext(ctx, - `UPDATE files SET blake3 = ? - WHERE folder_id = (SELECT id FROM folders WHERE volume_id = 1 AND path = '') AND name = 'x'`, - digest(0xbb)) - if err == nil || !strings.Contains(err.Error(), "blake3 is immutable") { - t.Fatalf("trigger missing after migration; err = %v", err) + // The seeded row's hash resolves through the v14 contents table. + row, err := s.GetByPath(ctx, 1, "x") + if err != nil { + t.Fatalf("GetByPath after migration: %v", err) + } + if !bytes.Equal(row.Blake3, digest(0xaa)) { + t.Fatalf("migrated row blake3 = %x, want %x", row.Blake3, digest(0xaa)) } // Partial UNIQUE index must reject a second live row at the same // (folder, name). + if _, err := s.db.ExecContext(ctx, + `INSERT INTO contents (blake3, size_bytes) VALUES (?, 1)`, digest(0xcc)); err != nil { + t.Fatalf("insert content: %v", err) + } _, err = s.db.ExecContext(ctx, ` - INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) - VALUES ((SELECT id FROM folders WHERE volume_id = 1 AND path = ''), 'x', ?, 1, 2, 'present', 1, 1, 2) + INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) + VALUES ((SELECT id FROM folders WHERE volume_id = 1 AND path = ''), 'x', + (SELECT id FROM contents WHERE blake3 = ?), 2, 'present', 1, 1, 2) `, digest(0xcc)) if err == nil || !strings.Contains(err.Error(), "UNIQUE") { t.Fatalf("unique index missing after migration; err = %v", err) @@ -1853,9 +1884,9 @@ func TestMigrateV5ToV6(t *testing.T) { if !bytes.Equal(row.Blake3, d) || row.SizeBytes != 10 || row.Status != StatusPresent { t.Fatalf("file row mangled by migration: %+v", row) } - if row.SourceNodeID.Valid || row.SourceRunID.Valid { + if row.OriginNodeID.Valid || row.OriginRunID.Valid { t.Fatalf("migrated row has non-NULL provenance %+v / %+v, want NULL", - row.SourceNodeID, row.SourceRunID) + row.OriginNodeID, row.OriginRunID) } var peerNode, correlated sql.NullInt64 @@ -1890,11 +1921,11 @@ func TestMigrateV5ToV6(t *testing.T) { } } -// TestUpsertWithProvenance verifies that a non-nil *Provenance lands the -// source_node_id and source_run_id columns on the inserted row, that a -// subsequent provenance-aware overwrite supersedes the prior row, and -// that the supersede flow itself is unchanged (the prior row's -// provenance survives on the historical record). +// TestUpsertWithProvenance verifies that a non-nil *Provenance lands as +// the new content's (origin_node_id, origin_run_id), that a subsequent +// local overwrite supersedes the prior row, and that the supersede flow +// itself is unchanged (the prior content's origin survives on the +// historical record). func TestUpsertWithProvenance(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "local"}) @@ -1927,11 +1958,11 @@ func TestUpsertWithProvenance(t *testing.T) { if err != nil { t.Fatalf("GetByPath: %v", err) } - if !live.SourceNodeID.Valid || live.SourceNodeID.Int64 != peerID { - t.Fatalf("SourceNodeID = %+v, want %d", live.SourceNodeID, peerID) + if !live.OriginNodeID.Valid || live.OriginNodeID.Int64 != peerID { + t.Fatalf("OriginNodeID = %+v, want %d", live.OriginNodeID, peerID) } - if !live.SourceRunID.Valid || live.SourceRunID.Int64 != run1 { - t.Fatalf("SourceRunID = %+v, want %d", live.SourceRunID, run1) + if !live.OriginRunID.Valid || live.OriginRunID.Int64 != run1 { + t.Fatalf("OriginRunID = %+v, want %d", live.OriginRunID, run1) } // New content + nil provenance — supersede the peer-sourced row with a @@ -1955,22 +1986,24 @@ func TestUpsertWithProvenance(t *testing.T) { if old.Status != StatusSuperseded || !bytes.Equal(old.Blake3, digest(0xaa)) { t.Fatalf("old row = %+v, want hashA superseded", old) } - if !old.SourceNodeID.Valid || old.SourceNodeID.Int64 != peerID { - t.Fatalf("superseded row lost provenance: %+v", old.SourceNodeID) + if !old.OriginNodeID.Valid || old.OriginNodeID.Int64 != peerID { + t.Fatalf("superseded row lost provenance: %+v", old.OriginNodeID) } if newRow.Status != StatusPresent || !bytes.Equal(newRow.Blake3, digest(0xbb)) { t.Fatalf("new row = %+v, want hashB present", newRow) } - if newRow.SourceNodeID.Valid || newRow.SourceRunID.Valid { + if newRow.OriginNodeID.Valid || newRow.OriginRunID.Valid { t.Fatalf("local-write row has non-NULL provenance: %+v / %+v", - newRow.SourceNodeID, newRow.SourceRunID) + newRow.OriginNodeID, newRow.OriginRunID) } } -// TestUpsertProvenanceFKRejected guards the FK enforcement on both new -// provenance columns: pointing at a node or run id that does not exist -// must fail rather than silently land a dangling reference. -func TestUpsertProvenanceFKRejected(t *testing.T) { +// TestUpsertProvenanceFK pins the FK shape of the contents origin +// columns: origin_node_id is a real FK (a bogus node id must fail +// rather than land a dangling reference), while origin_run_id is in the +// origin node's run space and deliberately not FK-bound to local runs — +// a run id with no local row must be accepted. +func TestUpsertProvenanceFK(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := Open(dsn) if err != nil { @@ -1988,26 +2021,24 @@ func TestUpsertProvenanceFKRejected(t *testing.T) { } peerID, _ := res.LastInsertId() - cases := []struct { - name string - prov *Provenance - }{ - {"bogus node id", &Provenance{NodeID: 99999, RunID: run}}, - {"bogus run id", &Provenance{NodeID: peerID, RunID: 99999}}, + mkRow := func(path string, hash byte) FileRow { + return FileRow{ + VolumeID: vID, Path: path, Blake3: digest(hash), SizeBytes: 1, MtimeNs: 1, + Status: StatusPresent, FirstSeenRunID: run, LastSeenRunID: run, IndexedAtNs: 1, + } } - for i, c := range cases { - t.Run(c.name, func(t *testing.T) { - err := s.Upsert(ctx, FileRow{ - // Distinct paths per case so a prior failure can't shadow a - // later one through the live-row state machine. - VolumeID: vID, Path: fmt.Sprintf("x-%d", i), Blake3: digest(byte(0x10 + i)), - SizeBytes: 1, MtimeNs: 1, - Status: StatusPresent, FirstSeenRunID: run, LastSeenRunID: run, IndexedAtNs: 1, - }, c.prov) - if err == nil { - t.Fatalf("Upsert with %s succeeded; FK not enforced", c.name) - } - }) + if err := s.Upsert(ctx, mkRow("x-node", 0x10), &Provenance{NodeID: 99999, RunID: run}); err == nil { + t.Fatalf("Upsert with bogus node id succeeded; FK not enforced") + } + if err := s.Upsert(ctx, mkRow("x-run", 0x11), &Provenance{NodeID: peerID, RunID: 99999}); err != nil { + t.Fatalf("Upsert with foreign-space run id rejected: %v (origin_run_id must not be a local FK)", err) + } + row, err := s.GetByPath(ctx, vID, "x-run") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + if !row.OriginRunID.Valid || row.OriginRunID.Int64 != 99999 { + t.Fatalf("OriginRunID = %+v, want 99999", row.OriginRunID) } } @@ -2043,11 +2074,10 @@ func TestPeerSyncStateAcceptsForeignRunID(t *testing.T) { } } -// TestPartialIndexOnSourceNodeExistsV6 verifies the schema-introspection -// expectation called out in the PR description: the partial index on -// files(source_node_id) WHERE status='present' exists on v6 (and is -// absent on a v5 fixture that hasn't been migrated yet). -func TestPartialIndexOnSourceNodeExistsV6(t *testing.T) { +// TestPartialIndexOnOriginNodeExists verifies the partial index backing +// ListPresentByOrigin: contents(origin_node_id) WHERE origin_node_id IS +// NOT NULL, excluding the locally-introduced majority. +func TestPartialIndexOnOriginNodeExists(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := Open(dsn) if err != nil { @@ -2058,13 +2088,13 @@ func TestPartialIndexOnSourceNodeExistsV6(t *testing.T) { var ddl string err = s.db.QueryRowContext(ctx, - `SELECT sql FROM sqlite_master WHERE type='index' AND name='idx_files_source_node'`).Scan(&ddl) + `SELECT sql FROM sqlite_master WHERE type='index' AND name='idx_contents_origin_node'`).Scan(&ddl) if err != nil { t.Fatalf("look up partial index: %v", err) } - for _, want := range []string{"source_node_id", "status = 'present'", "source_node_id IS NOT NULL"} { + for _, want := range []string{"origin_node_id", "origin_node_id IS NOT NULL"} { if !strings.Contains(ddl, want) { - t.Fatalf("idx_files_source_node SQL = %q, missing %q (partial index must exclude local-write NULLs)", ddl, want) + t.Fatalf("idx_contents_origin_node SQL = %q, missing %q (partial index must exclude local-origin NULLs)", ddl, want) } } } @@ -2311,7 +2341,7 @@ func TestFileRowScanInsertRoundTrip(t *testing.T) { volID := makeVolume(t, s, "/photos") runID := makeRun(t, s, volID) - peerNode, err := s.GetOrCreatePeerNode(ctx, "peer-x", "https://peer-x.example") + peerNode, err := s.GetOrCreatePeerNode(ctx, "peer-x", "https://peer-x.example", true) if err != nil { t.Fatalf("GetOrCreatePeerNode: %v", err) } @@ -2327,11 +2357,11 @@ func TestFileRowScanInsertRoundTrip(t *testing.T) { FirstSeenRunID: runID, LastSeenRunID: runID, IndexedAtNs: 1_700_000_500_000_000_000, - SourceNodeID: sql.NullInt64{Int64: peerNodeID, Valid: true}, - SourceRunID: sql.NullInt64{Int64: runID, Valid: true}, + OriginNodeID: sql.NullInt64{Int64: peerNodeID, Valid: true}, + OriginRunID: sql.NullInt64{Int64: runID, Valid: true}, } - if err := s.Upsert(ctx, want, &Provenance{NodeID: want.SourceNodeID.Int64, RunID: want.SourceRunID.Int64}); err != nil { + if err := s.Upsert(ctx, want, &Provenance{NodeID: want.OriginNodeID.Int64, RunID: want.OriginRunID.Int64}); err != nil { t.Fatalf("Upsert: %v", err) } @@ -2346,7 +2376,7 @@ func TestFileRowScanInsertRoundTrip(t *testing.T) { got.Status != want.Status || got.FirstSeenRunID != want.FirstSeenRunID || got.LastSeenRunID != want.LastSeenRunID || got.IndexedAtNs != want.IndexedAtNs || - got.SourceNodeID != want.SourceNodeID || got.SourceRunID != want.SourceRunID { + got.OriginNodeID != want.OriginNodeID || got.OriginRunID != want.OriginRunID { t.Fatalf("round-trip mismatch:\n got=%+v\nwant=%+v", got, want) } } @@ -2491,11 +2521,11 @@ func TestCountFilesFirstSeenByRunWithPathPrefix(t *testing.T) { } } -// TestListPresentBySource pins the two filter modes: valid nodeID +// TestListPresentByOrigin pins the two filter modes: valid nodeID // returns rows attributed to that peer, NULL nodeID returns rows // without provenance (local writes). Superseded and missing rows // must be excluded under either mode. -func TestListPresentBySource(t *testing.T) { +func TestListPresentByOrigin(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "self"}) if err != nil { @@ -2533,7 +2563,7 @@ func TestListPresentBySource(t *testing.T) { collect := func(nodeID sql.NullInt64) []string { var got []string - for row, err := range s.ListPresentBySource(ctx, vID, nodeID) { + for row, err := range s.ListPresentByOrigin(ctx, vID, nodeID) { if err != nil { t.Fatalf("iter: %v", err) } @@ -2559,11 +2589,11 @@ func TestListPresentBySource(t *testing.T) { } } -// TestListPresentBySourceEarlyBreakClosesRows confirms the iter.Seq2 +// TestListPresentByOriginEarlyBreakClosesRows confirms the iter.Seq2 // implementation closes its underlying rows when the consumer breaks // early. Without this guarantee a long-running CLI could leak a // statement handle whenever the user pages results. -func TestListPresentBySourceEarlyBreakClosesRows(t *testing.T) { +func TestListPresentByOriginEarlyBreakClosesRows(t *testing.T) { dsn := filepath.Join(t.TempDir(), "test.db") s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "self"}) if err != nil { @@ -2585,7 +2615,7 @@ func TestListPresentBySourceEarlyBreakClosesRows(t *testing.T) { } var seen int - for _, err := range s.ListPresentBySource(ctx, vID, sql.NullInt64{Int64: peer.ID, Valid: true}) { + for _, err := range s.ListPresentByOrigin(ctx, vID, sql.NullInt64{Int64: peer.ID, Valid: true}) { if err != nil { t.Fatalf("iter: %v", err) } @@ -2601,7 +2631,7 @@ func TestListPresentBySourceEarlyBreakClosesRows(t *testing.T) { // rows handle (the store pins MaxOpenConns=1, so a leak would block // the next QueryContext indefinitely — guard with a separate scan). again := 0 - for _, err := range s.ListPresentBySource(ctx, vID, sql.NullInt64{Int64: peer.ID, Valid: true}) { + for _, err := range s.ListPresentByOrigin(ctx, vID, sql.NullInt64{Int64: peer.ID, Valid: true}) { if err != nil { t.Fatalf("second iter: %v", err) } @@ -3379,6 +3409,139 @@ func TestBeginIndexRunIfClearRejectsWrongKind(t *testing.T) { } } +// TestBeginSyncRunIfClearBlockedByIndex is the #103 cross-kind guard: a +// running index (or audit) on the volume refuses a new sync, so a sync +// never captures its enumeration snapshot against a tree an index is +// mutating. Once the index finishes, the sync is admitted. +func TestBeginSyncRunIfClearBlockedByIndex(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + for _, indexKind := range []string{RunKindIndex, RunKindAudit} { + idxID, blocker, err := s.BeginIndexRunIfClear(ctx, indexKind, vID, false) + if err != nil || blocker != nil { + t.Fatalf("%s: begin index: id=%d blocker=%+v err=%v", indexKind, idxID, blocker, err) + } + + syncID, syncBlocker, err := s.BeginSyncRunIfClear(ctx, SyncRunSpec{VolumeID: vID, Destination: "backup"}) + if err != nil { + t.Fatalf("%s: begin sync: %v", indexKind, err) + } + if syncBlocker == nil || syncID != 0 { + t.Fatalf("%s: sync admitted (id=%d) while index running, want blocked", indexKind, syncID) + } + if syncBlocker.Kind != indexKind { + t.Fatalf("%s: blocker kind = %q, want %q", indexKind, syncBlocker.Kind, indexKind) + } + + if err := s.FinishRun(ctx, idxID, RunStatusSuccess, "", 0); err != nil { + t.Fatalf("%s: finish index: %v", indexKind, err) + } + syncID, syncBlocker, err = s.BeginSyncRunIfClear(ctx, SyncRunSpec{VolumeID: vID, Destination: "backup"}) + if err != nil || syncBlocker != nil || syncID == 0 { + t.Fatalf("%s: sync refused after index finished: id=%d blocker=%+v err=%v", indexKind, syncID, syncBlocker, err) + } + if err := s.FinishRun(ctx, syncID, RunStatusSuccess, "", 0); err != nil { + t.Fatalf("%s: finish sync: %v", indexKind, err) + } + } +} + +// TestBeginIndexRunIfClearAllowsConcurrentSync pins the deliberate +// asymmetry to the sync→index block (#103): a running sync does NOT +// block a new index, because the sync's advance is pinned to the +// snapshot it already captured and the agent scheduler must be free to +// index before a sync even while an unrelated sync is in flight. +func TestBeginIndexRunIfClearAllowsConcurrentSync(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + syncID, blocker, err := s.BeginSyncRunIfClear(ctx, SyncRunSpec{VolumeID: vID, Destination: "backup"}) + if err != nil || blocker != nil || syncID == 0 { + t.Fatalf("begin sync: id=%d blocker=%+v err=%v", syncID, blocker, err) + } + + idxID, idxBlocker, err := s.BeginIndexRunIfClear(ctx, RunKindIndex, vID, false) + if err != nil { + t.Fatalf("begin index during sync: %v", err) + } + if idxBlocker != nil || idxID == 0 { + t.Fatalf("index blocked by running sync (id=%d blocker=%+v), want admitted", idxID, idxBlocker) + } +} + +// TestBeginSyncRunIfClearBlockedByOffload makes the run gate symmetric: +// offload already blocks on every kind, so a sync must refuse to start +// while an offload is in flight (#114). A concurrent unlink would +// otherwise race the sync's enumeration. +func TestBeginSyncRunIfClearBlockedByOffload(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + vID := makeVolume(t, s, "/v") + + offID, blocker, err := s.BeginOffloadRunIfClear(ctx, vID) + if err != nil || blocker != nil || offID == 0 { + t.Fatalf("begin offload: id=%d blocker=%+v err=%v", offID, blocker, err) + } + + syncID, syncBlocker, err := s.BeginSyncRunIfClear(ctx, SyncRunSpec{VolumeID: vID, Destination: "backup"}) + if err != nil { + t.Fatalf("begin sync during offload: %v", err) + } + if syncBlocker == nil || syncID != 0 { + t.Fatalf("sync admitted (id=%d) while offload running, want blocked", syncID) + } + if syncBlocker.Kind != RunKindOffload { + t.Fatalf("blocker kind = %q, want %q", syncBlocker.Kind, RunKindOffload) + } + + if err := s.FinishRun(ctx, offID, RunStatusSuccess, "", 0); err != nil { + t.Fatalf("finish offload: %v", err) + } + syncID, syncBlocker, err = s.BeginSyncRunIfClear(ctx, SyncRunSpec{VolumeID: vID, Destination: "backup"}) + if err != nil || syncBlocker != nil || syncID == 0 { + t.Fatalf("sync refused after offload finished: id=%d blocker=%+v err=%v", syncID, syncBlocker, err) + } +} + +// TestBeginIndexRunIfClearBlockedByOffload: an in-flight offload blocks a +// new index or audit so the walk can't observe-and-flip a row mid-unlink +// (#114). +func TestBeginIndexRunIfClearBlockedByOffload(t *testing.T) { + s := openTestStore(t) + ctx := context.Background() + + for _, indexKind := range []string{RunKindIndex, RunKindAudit} { + vID := makeVolume(t, s, "/v-"+indexKind) + + offID, blocker, err := s.BeginOffloadRunIfClear(ctx, vID) + if err != nil || blocker != nil || offID == 0 { + t.Fatalf("%s: begin offload: id=%d blocker=%+v err=%v", indexKind, offID, blocker, err) + } + + idxID, idxBlocker, err := s.BeginIndexRunIfClear(ctx, indexKind, vID, false) + if err != nil { + t.Fatalf("%s: begin index during offload: %v", indexKind, err) + } + if idxBlocker == nil || idxID != 0 { + t.Fatalf("%s: index admitted (id=%d) while offload running, want blocked", indexKind, idxID) + } + if idxBlocker.Kind != RunKindOffload { + t.Fatalf("%s: blocker kind = %q, want %q", indexKind, idxBlocker.Kind, RunKindOffload) + } + + if err := s.FinishRun(ctx, offID, RunStatusSuccess, "", 0); err != nil { + t.Fatalf("%s: finish offload: %v", indexKind, err) + } + idxID, idxBlocker, err = s.BeginIndexRunIfClear(ctx, indexKind, vID, false) + if err != nil || idxBlocker != nil || idxID == 0 { + t.Fatalf("%s: index refused after offload finished: id=%d blocker=%+v err=%v", indexKind, idxID, idxBlocker, err) + } + } +} + // TestBackupVacuumIntoProducesValidSnapshot exercises Backup against // a populated store, then opens the snapshot as a regular DB and // verifies it carries the same volume row. Cheapest reliable check @@ -3550,3 +3713,505 @@ func TestMigratePreMigrationBackup(t *testing.T) { t.Fatalf("backup name = %q, want pre-migration-v5-to-v*", entries[0].Name()) } } + +// v13Fixture returns the DDL + seed for a fully populated v13 database +// — the last schema before the contents split. The seed exercises every +// backfill rule of migrateV13ToV14: +// +// - hash X lives at two paths; the earliest observation (a.txt, +// first_seen_run_id=1, local write) donates the contents row's size +// and NULL origin even though the later sub/b.txt observation +// carries peer attribution. +// - hash Y is peer-sourced (node 2, run 2) and live at c.txt. +// - hash Z is c.txt's superseded predecessor. +// - hash W is a missing row. +func v13Fixture() []string { + return []string{ + `CREATE TABLE schema_version (version INTEGER NOT NULL PRIMARY KEY)`, + `CREATE TABLE volumes (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, path TEXT NOT NULL)`, + `CREATE TABLE nodes ( + id INTEGER PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + endpoint TEXT, + public_key_fingerprint TEXT + )`, + `CREATE TABLE runs ( + id INTEGER PRIMARY KEY, + kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit')), + volume_id INTEGER REFERENCES volumes(id), + destination TEXT, + started_at_ns INTEGER NOT NULL, + ended_at_ns INTEGER, + status TEXT NOT NULL CHECK (status IN ('running','success','failed','partial')), + error TEXT, + file_count INTEGER NOT NULL DEFAULT 0, + peer_node_id INTEGER REFERENCES nodes(id), + correlated_run_id INTEGER, + shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)), + CHECK ( + (kind IN ('index','audit') AND destination IS NULL) OR + (kind IN ('sync','restore') AND destination IS NOT NULL AND destination != '') + ) + )`, + `CREATE TABLE folders ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL REFERENCES volumes(id), + parent_id INTEGER REFERENCES folders(id), + path TEXT NOT NULL, + shallow_blake3 BLOB CHECK (shallow_blake3 IS NULL OR length(shallow_blake3) = 32), + deep_blake3 BLOB CHECK (deep_blake3 IS NULL OR length(deep_blake3) = 32), + last_changed_run_id INTEGER REFERENCES runs(id), + file_count INTEGER NOT NULL DEFAULT 0, + cumulative_size INTEGER NOT NULL DEFAULT 0, + UNIQUE (volume_id, path) + )`, + `CREATE TABLE files ( + folder_id INTEGER NOT NULL REFERENCES folders(id), + name TEXT NOT NULL, + blake3 BLOB NOT NULL CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded')), + first_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + last_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + indexed_at_ns INTEGER NOT NULL, + source_node_id INTEGER REFERENCES nodes(id), + source_run_id INTEGER REFERENCES runs(id), + PRIMARY KEY (folder_id, name, blake3) + )`, + `CREATE UNIQUE INDEX uniq_files_live_per_path ON files(folder_id, name) WHERE status != 'superseded'`, + `CREATE TRIGGER files_blake3_immutable BEFORE UPDATE OF blake3 ON files + BEGIN + SELECT RAISE(ABORT, 'blake3 is immutable; supersede the row and insert a new one'); + END`, + `INSERT INTO schema_version (version) VALUES (13)`, + `INSERT INTO volumes (id, name, path) VALUES (1, 'photos', '/photos')`, + `INSERT INTO nodes (id, name) VALUES (1, 'self'), (2, 'peer')`, + `INSERT INTO runs (id, kind, volume_id, destination, started_at_ns, status, peer_node_id, correlated_run_id) + VALUES (1, 'index', 1, NULL, 100, 'success', NULL, NULL), + (2, 'sync', 1, 'peer', 200, 'success', 2, 900), + (3, 'index', 1, NULL, 300, 'success', NULL, NULL)`, + `INSERT INTO folders (id, volume_id, parent_id, path) VALUES (1, 1, NULL, ''), (2, 1, 1, 'sub')`, + `INSERT INTO files (folder_id, name, blake3, size_bytes, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns, source_node_id, source_run_id) VALUES + (1, 'a.txt', X'` + strings.Repeat("11", 32) + `', 10, 1, 'present', 1, 3, 1, NULL, NULL), + (2, 'b.txt', X'` + strings.Repeat("11", 32) + `', 10, 2, 'present', 3, 3, 2, 2, 2), + (1, 'c.txt', X'` + strings.Repeat("22", 32) + `', 20, 3, 'present', 2, 3, 3, 2, 2), + (1, 'c.txt', X'` + strings.Repeat("33", 32) + `', 30, 4, 'superseded', 1, 1, 4, NULL, NULL), + (2, 'd.txt', X'` + strings.Repeat("44", 32) + `', 40, 5, 'missing', 1, 3, 5, NULL, NULL)`, + } +} + +// TestMigrateV13ContentsSplit drives a populated v13 database through +// the v14–v16 chain and verifies the reshape end to end: row counts, +// the hash↔content mapping with its size/origin backfill, preserved +// statuses and run stamps, the surviving partial unique index, the +// dropped immutability trigger, the widened runs CHECK, and the new +// destination watermark store with its rewind refusal. +func TestMigrateV13ContentsSplit(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + rawDB, err := sql.Open("sqlite", dsn) + if err != nil { + t.Fatalf("raw sql.Open: %v", err) + } + for _, q := range v13Fixture() { + if _, err := rawDB.Exec(q); err != nil { + rawDB.Close() + t.Fatalf("v13 DDL %q: %v", q, err) + } + } + rawDB.Close() + + s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "self"}) + if err != nil { + t.Fatalf("Open (migrates v13→v%d): %v", SchemaVersion, err) + } + defer s.Close() + ctx := context.Background() + + if v, _ := s.CurrentSchemaVersion(ctx); v != SchemaVersion { + t.Fatalf("schema_version = %d, want %d", v, SchemaVersion) + } + + assertContentsBackfill(t, s) + assertFilesReshape(t, s) + assertSchemaGuardsAfterSplit(t, s) + assertRunsOffloadCheck(t, s) + assertDestinationStoreAfterMigration(t, s) +} + +// assertContentsBackfill checks the distinct-hash → contents mapping: +// one row per hash, size and origin taken from the earliest observation. +func assertContentsBackfill(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + var fileCount, contentCount int + if err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM files`).Scan(&fileCount); err != nil { + t.Fatalf("count files: %v", err) + } + if fileCount != 5 { + t.Fatalf("files rows = %d, want 5 (no row lost in the rebuild)", fileCount) + } + if err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM contents`).Scan(&contentCount); err != nil { + t.Fatalf("count contents: %v", err) + } + if contentCount != 4 { + t.Fatalf("contents rows = %d, want 4 (one per distinct hash)", contentCount) + } + + cases := []struct { + hash []byte + size int64 + originNode sql.NullInt64 + originRun sql.NullInt64 + }{ + // X: earliest observation is the local a.txt row, so the later + // peer-attributed duplicate does not become the origin. + {digest(0x11), 10, sql.NullInt64{}, sql.NullInt64{}}, + {digest(0x22), 20, sql.NullInt64{Int64: 2, Valid: true}, sql.NullInt64{Int64: 2, Valid: true}}, + {digest(0x33), 30, sql.NullInt64{}, sql.NullInt64{}}, + {digest(0x44), 40, sql.NullInt64{}, sql.NullInt64{}}, + } + for _, c := range cases { + var size int64 + var originNode, originRun sql.NullInt64 + if err := s.db.QueryRowContext(ctx, + `SELECT size_bytes, origin_node_id, origin_run_id FROM contents WHERE blake3 = ?`, + c.hash).Scan(&size, &originNode, &originRun); err != nil { + t.Fatalf("contents row for %x: %v", c.hash[:2], err) + } + if size != c.size || originNode != c.originNode || originRun != c.originRun { + t.Fatalf("contents %x = (size=%d, origin=%+v/%+v), want (size=%d, origin=%+v/%+v)", + c.hash[:2], size, originNode, originRun, c.size, c.originNode, c.originRun) + } + } +} + +// assertFilesReshape checks the per-path view through the store API: +// statuses, run stamps, the supersede chain, and duplicate detection +// across the shared content row. +func assertFilesReshape(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + a, err := s.GetByPath(ctx, 1, "a.txt") + if err != nil { + t.Fatalf("GetByPath a.txt: %v", err) + } + if !bytes.Equal(a.Blake3, digest(0x11)) || a.Status != StatusPresent || + a.SizeBytes != 10 || a.FirstSeenRunID != 1 || a.LastSeenRunID != 3 { + t.Fatalf("a.txt = %+v, want X present first=1 last=3 size=10", a) + } + + history, err := s.ListHistoryByPath(ctx, 1, "c.txt") + if err != nil { + t.Fatalf("ListHistoryByPath c.txt: %v", err) + } + if len(history) != 2 { + t.Fatalf("c.txt history rows = %d, want 2", len(history)) + } + if !bytes.Equal(history[0].Blake3, digest(0x33)) || history[0].Status != StatusSuperseded { + t.Fatalf("c.txt history[0] = %+v, want Z superseded", history[0]) + } + if !bytes.Equal(history[1].Blake3, digest(0x22)) || history[1].Status != StatusPresent { + t.Fatalf("c.txt history[1] = %+v, want Y present", history[1]) + } + if !history[1].OriginNodeID.Valid || history[1].OriginNodeID.Int64 != 2 { + t.Fatalf("c.txt live origin = %+v, want node 2", history[1].OriginNodeID) + } + + d, err := s.GetByPath(ctx, 1, "sub/d.txt") + if err != nil { + t.Fatalf("GetByPath sub/d.txt: %v", err) + } + if d.Status != StatusMissing { + t.Fatalf("sub/d.txt status = %q, want missing", d.Status) + } + + dups, err := s.ListDuplicates(ctx) + if err != nil { + t.Fatalf("ListDuplicates: %v", err) + } + if len(dups) != 2 { + t.Fatalf("duplicates = %d rows, want 2 (a.txt + sub/b.txt share X)", len(dups)) + } + if dups[0].File.ContentID != dups[1].File.ContentID { + t.Fatalf("duplicate rows carry different content ids: %d vs %d", + dups[0].File.ContentID, dups[1].File.ContentID) + } +} + +// assertSchemaGuardsAfterSplit checks the post-split schema shape: the +// partial unique index still guards one live row per path, and the +// blake3-immutability trigger is gone (id↔hash is immutable by +// construction on contents). +func assertSchemaGuardsAfterSplit(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + if _, err := s.db.ExecContext(ctx, + `INSERT INTO contents (blake3, size_bytes) VALUES (?, 1)`, digest(0x55)); err != nil { + t.Fatalf("insert content: %v", err) + } + _, err := s.db.ExecContext(ctx, ` + INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns) + VALUES (1, 'a.txt', (SELECT id FROM contents WHERE blake3 = ?), 9, 'present', 1, 1, 9) + `, digest(0x55)) + if err == nil || !strings.Contains(err.Error(), "UNIQUE") { + t.Fatalf("second live row at a.txt: err = %v, want UNIQUE violation", err) + } + + var triggers int + if err := s.db.QueryRowContext(ctx, + `SELECT COUNT(*) FROM sqlite_master WHERE type = 'trigger' AND name = 'files_blake3_immutable'`).Scan(&triggers); err != nil { + t.Fatalf("count triggers: %v", err) + } + if triggers != 0 { + t.Fatalf("files_blake3_immutable still present after v14, want dropped") + } +} + +// assertRunsOffloadCheck checks the v15 kind CHECK: offload joins the +// destination-NULL branch. +func assertRunsOffloadCheck(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + if _, err := s.db.ExecContext(ctx, ` + INSERT INTO runs (kind, volume_id, destination, started_at_ns, status) + VALUES ('offload', 1, NULL, 400, 'running') + `); err != nil { + t.Fatalf("offload run with NULL destination rejected: %v", err) + } + if _, err := s.db.ExecContext(ctx, ` + INSERT INTO runs (kind, volume_id, destination, started_at_ns, status) + VALUES ('offload', 1, 'bucket', 500, 'running') + `); err == nil { + t.Fatalf("offload run with a destination accepted, want CHECK violation") + } +} + +// assertDestinationStoreAfterMigration checks the v16 watermark store +// against the migrated DB: an advance lands with history, and a rewind +// is refused. +func assertDestinationStoreAfterMigration(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + if err := s.UpsertDestinationRunID(ctx, 1, "bucket", 2, 5, false); err != nil { + t.Fatalf("UpsertDestinationRunID after migration: %v", err) + } + if err := s.UpsertDestinationRunID(ctx, 1, "bucket", 2, 4, false); !errors.Is(err, ErrWatermarkRewind) { + t.Fatalf("rewind err = %v, want ErrWatermarkRewind", err) + } + history, err := s.ListDestinationRunIDHistory(ctx, 1, "bucket") + if err != nil { + t.Fatalf("ListDestinationRunIDHistory: %v", err) + } + if len(history) != 1 || history[0].OriginRunID != 5 { + t.Fatalf("history = %+v, want one advance to 5", history) + } +} + +// v18Fixture is a populated v18 database covering the offload-substrate +// tables (contents, remote_objects, destination_run_ids) so the +// v18→v19→v20→v21 chain can be exercised against real rows. The runs +// kind CHECK already carries 'offload' (v15) and status_changed_run_id +// exists on files (v18); verify_method, destination_push_freshness, and +// the contents triggers are what the chain still adds. +func v18Fixture() []string { + return []string{ + `CREATE TABLE schema_version (version INTEGER NOT NULL PRIMARY KEY)`, + `CREATE TABLE volumes (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, path TEXT NOT NULL)`, + `CREATE TABLE nodes (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE, endpoint TEXT, public_key_fingerprint TEXT)`, + `CREATE TABLE runs ( + id INTEGER PRIMARY KEY, + kind TEXT NOT NULL CHECK (kind IN ('index','sync','restore','audit','offload')), + volume_id INTEGER REFERENCES volumes(id), + destination TEXT, + started_at_ns INTEGER NOT NULL, + ended_at_ns INTEGER, + status TEXT NOT NULL CHECK (status IN ('running','success','failed','partial')), + error TEXT, + file_count INTEGER NOT NULL DEFAULT 0, + peer_node_id INTEGER REFERENCES nodes(id), + correlated_run_id INTEGER, + shallow INTEGER CHECK (shallow IS NULL OR shallow IN (0, 1)), + CHECK ( + (kind IN ('index','audit','offload') AND destination IS NULL) OR + (kind IN ('sync','restore') AND destination IS NOT NULL AND destination != '') + ) + )`, + `CREATE TABLE folders ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL REFERENCES volumes(id), + parent_id INTEGER REFERENCES folders(id), + path TEXT NOT NULL, + shallow_blake3 BLOB, + deep_blake3 BLOB, + last_changed_run_id INTEGER REFERENCES runs(id), + file_count INTEGER NOT NULL DEFAULT 0, + cumulative_size INTEGER NOT NULL DEFAULT 0, + UNIQUE (volume_id, path) + )`, + `CREATE TABLE contents ( + id INTEGER PRIMARY KEY, + blake3 BLOB NOT NULL UNIQUE CHECK (length(blake3) = 32), + size_bytes INTEGER NOT NULL, + origin_node_id INTEGER REFERENCES nodes(id), + origin_run_id INTEGER + )`, + `CREATE TABLE files ( + folder_id INTEGER NOT NULL REFERENCES folders(id), + name TEXT NOT NULL, + content_id INTEGER NOT NULL REFERENCES contents(id), + mtime_ns INTEGER NOT NULL, + status TEXT NOT NULL CHECK (status IN ('present','missing','superseded','offloaded')), + first_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + last_seen_run_id INTEGER NOT NULL REFERENCES runs(id), + indexed_at_ns INTEGER NOT NULL, + status_changed_run_id INTEGER REFERENCES runs(id), + PRIMARY KEY (folder_id, name, content_id) + )`, + `CREATE UNIQUE INDEX uniq_files_live_per_path ON files(folder_id, name) WHERE status != 'superseded'`, + `CREATE TABLE destination_run_ids ( + volume_id INTEGER NOT NULL REFERENCES volumes(id), + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL REFERENCES nodes(id), + origin_run_id INTEGER NOT NULL, + updated_at_ns INTEGER NOT NULL, + PRIMARY KEY (volume_id, destination, origin_node_id) + )`, + `CREATE TABLE destination_run_ids_history ( + id INTEGER PRIMARY KEY, + volume_id INTEGER NOT NULL, + destination TEXT NOT NULL, + origin_node_id INTEGER NOT NULL, + origin_run_id INTEGER NOT NULL, + at_ns INTEGER NOT NULL + )`, + `CREATE TABLE remote_objects ( + content_id INTEGER NOT NULL REFERENCES contents(id), + destination TEXT NOT NULL, + uploaded_run_id INTEGER NOT NULL REFERENCES runs(id), + checksum_algo TEXT, + checksum TEXT, + verified_at_ns INTEGER, + PRIMARY KEY (content_id, destination), + CHECK ((checksum_algo IS NULL) = (checksum IS NULL)) + )`, + `INSERT INTO schema_version (version) VALUES (18)`, + `INSERT INTO volumes (id, name, path) VALUES (1, 'photos', '/photos')`, + `INSERT INTO nodes (id, name) VALUES (1, 'self'), (2, 'peer')`, + `INSERT INTO runs (id, kind, volume_id, destination, started_at_ns, status) + VALUES (1, 'index', 1, NULL, 100, 'success'), + (2, 'sync', 1, 'bucket', 200, 'success')`, + `INSERT INTO folders (id, volume_id, parent_id, path) VALUES (1, 1, NULL, '')`, + `INSERT INTO contents (id, blake3, size_bytes, origin_node_id, origin_run_id) VALUES + (1, X'` + strings.Repeat("11", 32) + `', 10, NULL, NULL), + (2, X'` + strings.Repeat("22", 32) + `', 20, 2, 9)`, + `INSERT INTO files (folder_id, name, content_id, mtime_ns, status, first_seen_run_id, last_seen_run_id, indexed_at_ns, status_changed_run_id) VALUES + (1, 'a.txt', 1, 1, 'present', 1, 1, 1, 1), + (1, 'b.txt', 2, 2, 'present', 1, 1, 2, 1)`, + `INSERT INTO destination_run_ids (volume_id, destination, origin_node_id, origin_run_id, updated_at_ns) + VALUES (1, 'bucket', 1, 7, 100)`, + `INSERT INTO destination_run_ids_history (volume_id, destination, origin_node_id, origin_run_id, at_ns) + VALUES (1, 'bucket', 1, 7, 100)`, + `INSERT INTO remote_objects (content_id, destination, uploaded_run_id, checksum_algo, checksum, verified_at_ns) + VALUES (1, 'bucket', 2, 'blake3', 'deadbeef', 150)`, + } +} + +// TestMigrateV18ChainToV21 drives a populated v18 database through the +// v19–v21 chain and confirms the offload-substrate rows survive intact +// (destination_run_ids with its NULL-backfilled verify_method, +// remote_objects with its fingerprint) and that the v21 contents triggers +// actually abort an UPDATE and a DELETE on a contents row. +func TestMigrateV18ChainToV21(t *testing.T) { + dsn := filepath.Join(t.TempDir(), "test.db") + rawDB, err := sql.Open("sqlite", dsn) + if err != nil { + t.Fatalf("raw sql.Open: %v", err) + } + for _, q := range v18Fixture() { + if _, err := rawDB.Exec(q); err != nil { + rawDB.Close() + t.Fatalf("v18 DDL %q: %v", q, err) + } + } + rawDB.Close() + + s, err := OpenWithOptions(dsn, OpenOptions{NodeName: "self"}) + if err != nil { + t.Fatalf("Open (migrates v18→v%d): %v", SchemaVersion, err) + } + defer s.Close() + ctx := context.Background() + + if v, _ := s.CurrentSchemaVersion(ctx); v != SchemaVersion { + t.Fatalf("schema_version = %d, want %d", v, SchemaVersion) + } + + assertV18SubstrateSurvived(t, s) + assertContentsTriggersAbort(t, s) +} + +// assertV18SubstrateSurvived checks the offload-substrate rows carried +// through the v19–v21 chain: the durability vector keeps its coordinate +// with a NULL-backfilled verify_method, and the remote_objects fingerprint +// is intact. +func assertV18SubstrateSurvived(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + got, err := s.GetDestinationRunID(ctx, 1, "bucket", 1) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 7 { + t.Fatalf("origin run = %d, want 7 (carried over)", got.OriginRunID) + } + if got.VerifyMethod != "" { + t.Fatalf("verify method = %q, want empty (NULL backfill)", got.VerifyMethod) + } + + var algo, checksum string + if err := s.db.QueryRowContext(ctx, + `SELECT checksum_algo, checksum FROM remote_objects WHERE content_id = 1 AND destination = 'bucket'`). + Scan(&algo, &checksum); err != nil { + t.Fatalf("remote_objects row: %v", err) + } + if algo != "blake3" || checksum != "deadbeef" { + t.Fatalf("remote_objects fingerprint = (%q,%q), want (blake3,deadbeef)", algo, checksum) + } +} + +// assertContentsTriggersAbort checks the v21 schema-level immutability: +// an in-place UPDATE and a DELETE on a contents row both abort, while the +// row stays exactly as written. +func assertContentsTriggersAbort(t *testing.T, s *Store) { + t.Helper() + ctx := context.Background() + + if _, err := s.db.ExecContext(ctx, `UPDATE contents SET size_bytes = 999 WHERE id = 1`); err == nil { + t.Fatalf("UPDATE on contents succeeded, want trigger ABORT") + } + if _, err := s.db.ExecContext(ctx, `DELETE FROM contents WHERE id = 1`); err == nil { + t.Fatalf("DELETE on contents succeeded, want trigger ABORT") + } + + var size int64 + var count int + if err := s.db.QueryRowContext(ctx, `SELECT size_bytes FROM contents WHERE id = 1`).Scan(&size); err != nil { + t.Fatalf("contents row after refused mutations: %v", err) + } + if size != 10 { + t.Fatalf("size_bytes = %d after refused UPDATE, want 10", size) + } + if err := s.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM contents`).Scan(&count); err != nil { + t.Fatalf("count contents: %v", err) + } + if count != 2 { + t.Fatalf("contents rows = %d after refused DELETE, want 2", count) + } +} diff --git a/sync/content_addressed.go b/sync/content_addressed.go new file mode 100644 index 0000000..80911c3 --- /dev/null +++ b/sync/content_addressed.go @@ -0,0 +1,449 @@ +package sync + +import ( + "bytes" + "context" + "encoding/hex" + "encoding/json" + "errors" + "fmt" + "io" + "os" + "path" + "path/filepath" + "slices" + "strconv" + + "github.com/zeebo/blake3" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// Content-addressed destination layout: an append-only store of content +// objects plus the manifest segments that map paths onto them. The +// layout has no mirrored user tree — every byte under the destination +// root is squirrel-written. +const ( + // ObjectsDirName holds one immutable object per BLAKE3 content + // hash at the destination root: objects/<lowercase hex>, raw file + // bytes (encrypted by the crypt overlay when the destination has + // one). The directory is destination-global — shared by every + // volume, matching remote_objects' (content, destination) key — + // so duplicated content across volumes uploads once. An object is + // uploaded once and never moved, overwritten, or deleted. + ObjectsDirName = "objects" + // ManifestDirName holds one immutable manifest segment per sync + // run, per volume: <volume>/index/run-<run id>, the JSONL + // path-level delta of that run (see ManifestEntry). Replaying a + // volume's segments in run-id order reconstructs its full + // path→content mapping with no SQLite required. Distinct from + // IndexDirName, the dot-directory the snapshot ride-along writes. + ManifestDirName = "index" +) + +// ManifestEntry is one line of a manifest segment: a single path-level +// state change, JSON-encoded with exactly these fields in this order, +// one object per line (JSONL), lines sorted by (path, status). +// +// {"path":"2024/cat.jpg","blake3":"<64 hex chars>","status":"present","size_bytes":123,"mtime_ns":456} +// +// status is one of present, superseded, missing, offloaded. To replay a +// segment log: process segments in ascending run id; for each line with +// status present, missing, or offloaded, set the path's current +// (content, status) — the bytes for a present or offloaded path live at +// objects/<blake3>; a missing path's content is known but was lost at +// the origin. Lines with status superseded are history only (the +// outgoing content of a path that changed) and update no mapping. The +// format is stable so a small external script can recover data from the +// destination without squirrel. +type ManifestEntry struct { + Path string `json:"path"` + Blake3 string `json:"blake3"` + Status string `json:"status"` + SizeBytes int64 `json:"size_bytes"` + MtimeNs int64 `json:"mtime_ns"` +} + +// encodeManifestSegment renders the delta as JSONL. The input order +// (path, status — as ListPathDeltaSince returns it) is preserved, so +// identical deltas encode byte-for-byte identically. An empty delta +// encodes to an empty segment; the segment still uploads so every +// successful run leaves its landing evidence. +func encodeManifestSegment(delta []store.PathDelta) ([]byte, error) { + var out []byte + for _, d := range delta { + line, err := json.Marshal(ManifestEntry{ + Path: d.Path, + Blake3: hex.EncodeToString(d.Blake3), + Status: d.Status, + SizeBytes: d.SizeBytes, + MtimeNs: d.MtimeNs, + }) + if err != nil { + return nil, fmt.Errorf("encode manifest entry for %s: %w", d.Path, err) + } + out = append(out, line...) + out = append(out, '\n') + } + return out, nil +} + +// contentAddressedHandler pushes a volume to a content-addressed rclone +// destination: per-hash `rclone copyto` for each content object the +// destination lacks, then the run's manifest segment. The landing is +// transactional from the durability gate's point of view — the runs row +// reaches success and the destination vector advances only once both +// the objects and the segment are confirmed present at the expected +// size; any earlier failure leaves orphaned objects that are recorded +// (or re-uploaded idempotently) and harmless without a segment mapping +// them. +type contentAddressedHandler struct { + store *store.Store + rcl *Rclone + vol *config.Volume + dest *config.Destination +} + +func (h *contentAddressedHandler) TargetName() string { return h.dest.Name } + +func (h *contentAddressedHandler) Push(ctx context.Context, opts Options) (Report, error) { + rep := Report{Volume: h.vol.Name, Destination: h.dest.Name} + // Stamped up front so output renderers key content-addressed + // formatting off the method even when the push fails early. + rep.Verification.Method = VerifyMethodPresenceSize + if h.vol.Name == ObjectsDirName { + return rep, fmt.Errorf("volume %q: the name collides with the destination-root %s/ directory of content-addressed destination %q — rename the volume or use a mirrored destination", h.vol.Name, ObjectsDirName, h.dest.Name) + } + volID, err := requireIndexedVolume(ctx, h.store, h.vol) + if err != nil { + return rep, err + } + if opts.DryRun { + return rep, fmt.Errorf("destination %q: the content-addressed push has no dry-run mode yet — run without --dry-run", h.dest.Name) + } + // shallow=true on the runs row: the per-object transfers carry no + // BLAKE3 end-to-end check (crypt remotes expose no hashes), and the + // audit trail stays honest about that. + runID, err := beginSyncRunGuarded(ctx, h.store, false, store.SyncRunSpec{ + VolumeID: volID, + Destination: h.dest.Name, + Shallow: true, + }, h.vol.Name) + if err != nil { + return rep, err + } + rep.RunID = runID + if opts.OnRunID != nil { + opts.OnRunID(runID) + } + + err = h.push(ctx, &rep, volID, runID) + finishHandlerRun(ctx, h.store, &rep, err) + opts.Snapshot.afterSync(ctx, &rep, h.vol, h.dest) + return rep, err +} + +func (h *contentAddressedHandler) sealed() {} + +// push runs the transactional landing: delta → objects → segment → +// vector. rep.Status starts failed and is promoted to success only at +// the end, after the destination vector advanced — a confirmed landing +// whose evidence failed to record must not present as success, or the +// next run's watermark would skip past it. +func (h *contentAddressedHandler) push(ctx context.Context, rep *Report, volID, runID int64) error { + rep.Status = store.RunStatusFailed + watermark, err := h.watermark(ctx, volID) + if err != nil { + return err + } + advance, err := captureDurabilityAdvance(ctx, h.store, volID) + if err != nil { + return err + } + delta, err := h.store.ListPathDeltaSince(ctx, volID, watermark) + if err != nil { + return fmt.Errorf("compute path delta since run %d: %w", watermark, err) + } + rep.Verification.Files = int64(len(delta)) + if err := h.uploadObjects(ctx, rep, runID, delta); err != nil { + return err + } + if err := h.uploadSegment(ctx, delta, runID); err != nil { + return err + } + // presence+size is not a content-verified method (crypt remotes + // expose no hashes): the component advances so a later scan-back + // fingerprint can upgrade it, but the offload gate holds this target + // out until a verified fingerprint backs the gated object. + if err := h.store.AdvanceDestinationVectorTo(ctx, volID, h.dest.Name, store.VerifyMethodPresenceSize, advance); err != nil { + return fmt.Errorf("advance destination vector for %s: %w", h.dest.Name, err) + } + rep.Status = store.RunStatusSuccess + rep.Verification.Bytes = rep.RcloneResult.Bytes + return nil +} + +// watermark resolves the run id the delta starts after: the last +// successful sync of this (volume, destination), or 0 for a fresh +// destination. The last success must still have its manifest segment at +// the destination — every successful content-addressed run uploads one, +// so its absence means the recorded history belongs to a different +// layout (a destination flipped from mirror) and a delta computed +// against it would silently skip everything the mirror era covered. +func (h *contentAddressedHandler) watermark(ctx context.Context, volID int64) (int64, error) { + last, err := h.store.LatestSuccessfulSyncRun(ctx, volID, h.dest.Name) + if store.IsNotFound(err) { + return 0, nil + } + if err != nil { + return 0, fmt.Errorf("lookup last successful sync of %s: %w", h.dest.Name, err) + } + segURI := h.segmentURI(last.ID) + if _, err := h.rcl.statRemote(ctx, segURI, checkersArgs(h.dest)...); err != nil { + return 0, fmt.Errorf("destination %q: the last successful sync (run %d) left no manifest segment at %s — its history does not look content-addressed; point the layout at a fresh destination or root instead of switching an existing one: %w", h.dest.Name, last.ID, segURI, err) + } + return last.ID, nil +} + +// uploadObjects lands every content object the delta needs that the +// destination has no upload record for. Counters land on +// rep.RcloneResult so the run report reads like the other rclone +// flows: Transferred = objects uploaded, Checked = objects skipped as +// already recorded, Errors/FailedFiles = per-object failures. Per-object +// failures don't stop the loop — every object that lands now is recorded +// and saves work on the retry — but any failure fails the run before +// the segment is written. +// +// A source whose bytes drifted from the indexed hash is refused without a +// remote_objects row (errContentDrift): it is surfaced as a warning and +// fails the run, so the segment is not written and the watermark does not +// advance. The next run recomputes the same delta and re-offers the +// object, letting the honest bytes land once they are restored — without +// the drifted bytes ever being recorded under the hash. +func (h *contentAddressedHandler) uploadObjects(ctx context.Context, rep *Report, runID int64, delta []store.PathDelta) error { + var confirmed []store.PathDelta + var drifted int + for _, d := range plannedUploads(delta) { + recorded, err := h.store.HasRemoteObject(ctx, d.ContentID, h.dest.Name) + if err != nil { + return fmt.Errorf("lookup upload record for %s: %w", d.Path, err) + } + if recorded { + rep.RcloneResult.Checked++ + continue + } + if err := h.uploadOneObject(ctx, runID, d); err != nil { + if errors.Is(err, errContentDrift) { + drifted++ + rep.Warnings = append(rep.Warnings, err.Error()) + continue + } + rep.RcloneResult.Errors++ + if int64(len(rep.RcloneResult.FailedFiles)) < maxFailedFiles { + rep.RcloneResult.FailedFiles = append(rep.RcloneResult.FailedFiles, + FailedFile{Object: d.Path, Message: err.Error()}) + } + continue + } + confirmed = append(confirmed, d) + rep.RcloneResult.Transferred++ + rep.RcloneResult.Bytes += d.SizeBytes + } + h.captureFingerprints(ctx, rep, confirmed) + if rep.RcloneResult.Errors > 0 { + return fmt.Errorf("%d object(s) failed to land on %q; the manifest segment for run %d was not written and the durability vector did not advance", rep.RcloneResult.Errors, h.dest.Name, runID) + } + if drifted > 0 { + return fmt.Errorf("%d object(s) on %q were refused for drifting from their indexed hash; re-index the volume and sync again — the manifest segment for run %d was not written and the durability vector did not advance", drifted, h.dest.Name, runID) + } + return nil +} + +// fingerprintBatchSize caps how many freshly confirmed objects one +// lsjson invocation covers, bounding argv growth from the per-object +// --include filters. +const fingerprintBatchSize = 200 + +// captureFingerprints fills the pending checksum pair of every object +// confirmed during this run with the provider checksum read back from +// the underlying remote — batched into one `lsjson --hash` per chunk, +// scoped by --include filters so the backend hashes only this run's +// uploads. Capture problems are warnings, not failures: the upload is +// already confirmed and recorded, and `squirrel verify` fills any +// fingerprint left pending. +func (h *contentAddressedHandler) captureFingerprints(ctx context.Context, rep *Report, confirmed []store.PathDelta) { + dirURI := underlyingObjectsURI(h.dest) + types := captureHashTypes(h.dest) + for batch := range slices.Chunk(confirmed, fingerprintBatchSize) { + extra := checkersArgs(h.dest) + for _, d := range batch { + extra = append(extra, "--include", hex.EncodeToString(d.Blake3)) + } + entries, err := h.rcl.listHashes(ctx, dirURI, types, extra...) + if err != nil { + rep.Warnings = append(rep.Warnings, fmt.Sprintf("fingerprint capture on %q failed: %v — checksums stay pending until `squirrel verify`", h.dest.Name, err)) + return + } + byName := make(map[string]map[string]string, len(entries)) + present := make(map[string]bool, len(entries)) + for _, e := range entries { + byName[e.Name] = e.Hashes + present[e.Name] = true + } + for _, d := range batch { + hash := hex.EncodeToString(d.Blake3) + if !present[hash] { + rep.Warnings = append(rep.Warnings, fmt.Sprintf("object %s on %q: not yet returned by the remote listing; fingerprint stays pending", hash, h.dest.Name)) + continue + } + cs, ok := extractChecksum(h.dest, byName[hash]) + if !ok { + rep.Warnings = append(rep.Warnings, fmt.Sprintf("object %s on %q: remote exposes no usable checksum (e.g. a multipart object whose ETag rclone does not surface as a hash); fingerprint stays pending", hash, h.dest.Name)) + continue + } + if err := h.store.SetRemoteObjectChecksum(ctx, d.ContentID, h.dest.Name, cs.Algo, cs.Value); err != nil { + rep.Warnings = append(rep.Warnings, fmt.Sprintf("record fingerprint for %s: %v", hash, err)) + continue + } + rep.Fingerprints++ + } + } +} + +// plannedUploads selects the delta rows that need a content object — +// status present, the bytes are on local disk — deduplicated to one +// source path per content hash. Delta order is deterministic, so so is +// the chosen source path. +func plannedUploads(delta []store.PathDelta) []store.PathDelta { + seen := make(map[int64]bool, len(delta)) + var out []store.PathDelta + for _, d := range delta { + if d.Status != store.StatusPresent || seen[d.ContentID] { + continue + } + seen[d.ContentID] = true + out = append(out, d) + } + return out +} + +// errContentDrift marks a source file whose bytes no longer match the +// content hash the index bound them to. The upload path raises it instead +// of recording an object; uploadObjects turns it into a warning and fails +// the run so the watermark holds and the object is re-offered next run. +var errContentDrift = errors.New("source content drifted from its indexed hash") + +// uploadOneObject lands one content object and records the upload. It +// guards the content-addressed invariant — the bytes stored under a hash +// must be the bytes that produced it — by re-hashing the source file +// immediately before the transfer and refusing (errContentDrift) when the +// digest no longer matches the indexed hash, catching a +// size+mtime-preserving in-place edit that a metadata stat would pass. +// The post-transfer stat confirms presence and size on the remote, and the +// upload record is written only after that confirmation, so a recorded +// hash is always a confirmed one; a crash in between re-uploads the same +// bytes idempotently on the next run. +// +// Residual: rclone reads the file in a separate child process after the +// re-hash, so a writer that edits the file in the window between the hash +// and rclone's read could still upload drifted bytes. The window is the +// fork/exec of one rclone invocation rather than the whole walk-to-push +// span, and the scan-back fingerprint pass (#109) re-reads the landed +// object to upgrade the durability vector, catching any byte that slipped +// through before the object is treated as content-verified. +func (h *contentAddressedHandler) uploadOneObject(ctx context.Context, runID int64, d store.PathDelta) error { + src := filepath.Join(h.vol.Path, filepath.FromSlash(d.Path)) + digest, err := hashLocalFile(src) + if err != nil { + return fmt.Errorf("re-hash %s before upload: %w", src, err) + } + if !bytes.Equal(digest, d.Blake3) { + return fmt.Errorf("%w: %s now hashes to %s, indexed as %s — run `squirrel index %s` and sync again", + errContentDrift, d.Path, hex.EncodeToString(digest), hex.EncodeToString(d.Blake3), h.vol.Name) + } + hash := hex.EncodeToString(d.Blake3) + uri := h.objectURI(hash) + if err := h.rcl.copyTo(ctx, src, uri, checkersArgs(h.dest)...); err != nil { + return err + } + size, err := h.rcl.statRemote(ctx, uri, checkersArgs(h.dest)...) + if err != nil { + return fmt.Errorf("confirm object %s after upload: %w", hash, err) + } + if size != d.SizeBytes { + return fmt.Errorf("object %s landed with size %d, want %d", hash, size, d.SizeBytes) + } + if err := h.store.InsertRemoteObject(ctx, store.RemoteObject{ + ContentID: d.ContentID, + Destination: h.dest.Name, + UploadedRunID: runID, + }); err != nil { + return fmt.Errorf("record upload of %s: %w", hash, err) + } + return nil +} + +// hashLocalFile streams the file at path through BLAKE3 and returns the +// raw 32-byte digest, the same hash the indexer binds content under. +func hashLocalFile(path string) ([]byte, error) { + f, err := os.Open(path) + if err != nil { + return nil, err + } + defer f.Close() + h := blake3.New() + if _, err := io.Copy(h, f); err != nil { + return nil, err + } + return h.Sum(nil), nil +} + +// uploadSegment writes the run's manifest segment and confirms it +// landed at the expected size. Every run uploads one — an unchanged +// volume yields an empty segment — so each successful run leaves the +// landing evidence the next watermark check looks for. +func (h *contentAddressedHandler) uploadSegment(ctx context.Context, delta []store.PathDelta, runID int64) error { + body, err := encodeManifestSegment(delta) + if err != nil { + return err + } + tmp, err := os.CreateTemp("", "squirrel-manifest-*") + if err != nil { + return fmt.Errorf("stage manifest segment: %w", err) + } + defer func() { _ = os.Remove(tmp.Name()) }() + if _, err := tmp.Write(body); err != nil { + tmp.Close() + return fmt.Errorf("write manifest segment: %w", err) + } + if err := tmp.Close(); err != nil { + return fmt.Errorf("close manifest segment: %w", err) + } + + uri := h.segmentURI(runID) + if err := h.rcl.copyTo(ctx, tmp.Name(), uri, checkersArgs(h.dest)...); err != nil { + return fmt.Errorf("upload manifest segment to %s: %w", uri, err) + } + size, err := h.rcl.statRemote(ctx, uri, checkersArgs(h.dest)...) + if err != nil { + return fmt.Errorf("confirm manifest segment at %s: %w", uri, err) + } + if size != int64(len(body)) { + return fmt.Errorf("manifest segment at %s landed with size %d, want %d", uri, size, len(body)) + } + return nil +} + +// objectURI addresses one content object under the destination-root +// objects/ directory, through the crypt overlay when the destination +// has one. +func (h *contentAddressedHandler) objectURI(hash string) string { + return remoteSubpathURI(h.dest, path.Join(ObjectsDirName, hash)) +} + +// segmentURI addresses one run's manifest segment under the +// destination's per-volume index/ directory. +func (h *contentAddressedHandler) segmentURI(runID int64) string { + return remoteSubpathURI(h.dest, path.Join(h.vol.Name, ManifestDirName, "run-"+strconv.FormatInt(runID, 10))) +} diff --git a/sync/content_addressed_test.go b/sync/content_addressed_test.go new file mode 100644 index 0000000..1d7dfae --- /dev/null +++ b/sync/content_addressed_test.go @@ -0,0 +1,770 @@ +package sync + +import ( + "context" + "encoding/hex" + "encoding/json" + "fmt" + "os" + "path/filepath" + "runtime" + "strings" + "testing" + "time" + + "github.com/zeebo/blake3" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/index" + "github.com/mbertschler/squirrel/store" +) + +// fakeRcloneScript is the PATH-shim stand-in for the rclone binary, +// mirroring the kopia shim: it logs every argv line to +// $RCLONE_FAKE_LOG, then plays back the two subcommands the +// content-addressed push and the verify pass drive. Remote URIs +// (`remote:path`) map onto the local directory $RCLONE_FAKE_ROOT +// (after stripping the $RCLONE_FAKE_STRIP prefix, so overlay and +// underlying URIs land in the same tree); $RCLONE_FAKE_FAIL_GLOB +// injects per-destination copyto failures. lsjson hashes are derived +// from the file bytes via cksum, emitted under each requested +// --hash-type (or md5+sha1 when none is requested), with +// $RCLONE_FAKE_HASH_PREFIX simulating remote-side tampering, +// $RCLONE_FAKE_HASH_VALUE forcing an exact value, and +// $RCLONE_FAKE_NO_HASHES a backend that exposes no checksums; +// $RCLONE_FAKE_EMPTY_LISTING a directory lsjson that returns no entries. +const fakeRcloneScript = `#!/bin/sh +{ + printf 'argv:' + for a in "$@"; do printf ' %s' "$a"; done + printf '\n' +} >> "$RCLONE_FAKE_LOG" +if [ "$1" = "--config" ]; then shift 2; fi +cmd=$1; shift +resolve() { + case "$1" in + *:*) + p="${1#*:}" + case "$p" in + "${RCLONE_FAKE_STRIP:-//none//}"/*) p="${p#"${RCLONE_FAKE_STRIP}"/}" ;; + esac + printf '%s/%s' "$RCLONE_FAKE_ROOT" "$p" ;; + *) printf '%s' "$1" ;; + esac +} +hashtypes="" includes="" stat=0 a1="" a2="" +while [ $# -gt 0 ]; do + case "$1" in + --stat) stat=1 ;; + --hash-type) shift; hashtypes="$hashtypes $1" ;; + --include) shift; includes="$includes $1" ;; + --checkers) shift ;; + --*) ;; + *) if [ -z "$a1" ]; then a1="$1"; else a2="$1"; fi ;; + esac + shift +done +hashes_json() { + [ -n "$RCLONE_FAKE_NO_HASHES" ] && return + v="$RCLONE_FAKE_HASH_VALUE" + [ -z "$v" ] && v="${RCLONE_FAKE_HASH_PREFIX}$(cksum < "$1" | cut -d' ' -f1)" + printf ',"Hashes":{' + sep="" + for t in ${hashtypes:-md5 sha1}; do + printf '%s"%s":"%s"' "$sep" "$t" "$v" + sep="," + done + printf '}' +} +entry_json() { + size=$(wc -c < "$1" | tr -d '[:space:]') + printf '{"Path":"%s","Name":"%s","Size":%s,"IsDir":false' "$(basename "$1")" "$(basename "$1")" "$size" + hashes_json "$1" + printf '}' +} +case "$cmd" in +copyto) + case "$a2" in + ${RCLONE_FAKE_FAIL_GLOB:-//none//}) echo "fake copyto failure for $a2" >&2; exit 1 ;; + esac + dst=$(resolve "$a2") + mkdir -p "$(dirname "$dst")" && cp "$(resolve "$a1")" "$dst" + ;; +lsjson) + if [ "$stat" = 1 ]; then + f=$(resolve "$a1") + if [ ! -f "$f" ]; then echo "object not found: $a1" >&2; exit 3; fi + entry_json "$f"; printf '\n' + else + dir=$(resolve "$a1") + if [ ! -d "$dir" ]; then echo "directory not found: $a1" >&2; exit 3; fi + if [ -n "$RCLONE_FAKE_EMPTY_LISTING" ]; then printf '[]\n'; exit 0; fi + printf '[' + sep="" + for f in "$dir"/*; do + [ -f "$f" ] || continue + name=$(basename "$f") + if [ -n "$includes" ]; then + m=0 + for inc in $includes; do [ "$name" = "$inc" ] && m=1; done + [ "$m" = 1 ] || continue + fi + printf '%s' "$sep"; entry_json "$f"; sep="," + done + printf ']\n' + fi + ;; +*) echo "unexpected rclone subcommand: $cmd $*" >&2; exit 64 ;; +esac +` + +// caFixture is the content-addressed analogue of syncFixture: a store, +// a fake-rclone wrapper, and one volume syncing to one crypt sftp +// destination with layout = "content-addressed". fakeRoot is the local +// directory the shim materialises the remote into. +type caFixture struct { + store *store.Store + rcl *Rclone + cfg *config.Config + pair Pair + fakeRoot string + logPath string +} + +func setupContentAddressedFixture(t *testing.T) *caFixture { + t.Helper() + return setupCAFixture(t, `[destinations.offsite] +type = "sftp" +host = "remote.invalid" +user = "u" +root = "/data" +layout = "content-addressed" + +[destinations.offsite.crypt] +password = "obscured-pw" +`, "/data") +} + +// setupCAFixture is the destination-configurable body of +// setupContentAddressedFixture. destBlock declares the `offsite` +// destination; strip is the destination root the shim removes from +// underlying-remote URIs so they land in the same fake tree as the +// crypt overlay's root-relative paths. +func setupCAFixture(t *testing.T, destBlock, strip string) *caFixture { + t.Helper() + if runtime.GOOS == "windows" { + t.Skip("fake rclone shim is a POSIX shell script") + } + dir := t.TempDir() + binPath := filepath.Join(dir, "rclone") + if err := os.WriteFile(binPath, []byte(fakeRcloneScript), 0o755); err != nil { + t.Fatalf("write fake rclone: %v", err) + } + fakeRoot := filepath.Join(dir, "remote") + logPath := filepath.Join(dir, "calls.log") + t.Setenv("RCLONE_FAKE_LOG", logPath) + t.Setenv("RCLONE_FAKE_ROOT", fakeRoot) + t.Setenv("RCLONE_FAKE_FAIL_GLOB", "") + t.Setenv("RCLONE_FAKE_STRIP", strip) + t.Setenv("RCLONE_FAKE_NO_HASHES", "") + t.Setenv("RCLONE_FAKE_HASH_VALUE", "") + t.Setenv("RCLONE_FAKE_HASH_PREFIX", "") + + root := t.TempDir() + volPath := filepath.Join(root, "src") + docsPath := filepath.Join(root, "docs-src") + for _, p := range []string{volPath, docsPath} { + if err := os.MkdirAll(p, 0o755); err != nil { + t.Fatalf("mkdir %s: %v", p, err) + } + } + s, err := store.Open(filepath.Join(root, "test.db")) + if err != nil { + t.Fatalf("store.Open: %v", err) + } + t.Cleanup(func() { s.Close() }) + + cfgPath := filepath.Join(root, "config.toml") + cfgBody := destBlock + ` +[volumes.pics] +path = "` + volPath + `" +sync_to = ["offsite"] + +[volumes.docs] +path = "` + docsPath + `" +sync_to = ["offsite"] +` + if err := os.WriteFile(cfgPath, []byte(cfgBody), 0o600); err != nil { + t.Fatalf("write config: %v", err) + } + cfg, err := config.Load(cfgPath) + if err != nil { + t.Fatalf("config.Load: %v", err) + } + pairs, err := PairsFor(cfg, "pics", "") + if err != nil { + t.Fatalf("PairsFor: %v", err) + } + rcl := &Rclone{Binary: binPath, Config: filepath.Join(root, "rclone.conf")} + if err := os.WriteFile(rcl.Config, []byte{}, 0o600); err != nil { + t.Fatalf("seed rclone.conf: %v", err) + } + return &caFixture{store: s, rcl: rcl, cfg: cfg, pair: pairs[0], fakeRoot: fakeRoot, logPath: logPath} +} + +func (f *caFixture) write(t *testing.T, name, content string) { + t.Helper() + p := filepath.Join(f.pair.Volume.Path, name) + if err := os.MkdirAll(filepath.Dir(p), 0o755); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(p, []byte(content), 0o644); err != nil { + t.Fatal(err) + } +} + +func (f *caFixture) mtimeNs(t *testing.T, name string) int64 { + t.Helper() + fi, err := os.Stat(filepath.Join(f.pair.Volume.Path, name)) + if err != nil { + t.Fatalf("stat %s: %v", name, err) + } + return fi.ModTime().UnixNano() +} + +func (f *caFixture) index(t *testing.T) { + t.Helper() + if _, err := index.Index(context.Background(), f.store, f.pair.Volume.Path, index.Options{Name: "pics"}); err != nil { + t.Fatalf("index.Index: %v", err) + } +} + +func (f *caFixture) sync(t *testing.T) (Report, error) { + t.Helper() + return RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, f.pair, Options{}) +} + +func (f *caFixture) volumeID(t *testing.T) int64 { + t.Helper() + v, err := f.store.GetVolumeByName(context.Background(), "pics") + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + return v.ID +} + +// remotePath maps a destination subpath to where the shim materialised +// it: objects/ lives at the root, manifest segments per volume. +func (f *caFixture) remotePath(parts ...string) string { + return filepath.Join(append([]string{f.fakeRoot}, parts...)...) +} + +func (f *caFixture) readSegment(t *testing.T, runID int64) []ManifestEntry { + t.Helper() + data, err := os.ReadFile(f.remotePath("pics", ManifestDirName, fmt.Sprintf("run-%d", runID))) + if err != nil { + t.Fatalf("read manifest segment: %v", err) + } + var entries []ManifestEntry + for _, line := range strings.Split(strings.TrimSuffix(string(data), "\n"), "\n") { + if line == "" { + continue + } + var e ManifestEntry + if err := json.Unmarshal([]byte(line), &e); err != nil { + t.Fatalf("parse manifest line %q: %v", line, err) + } + entries = append(entries, e) + } + return entries +} + +func blake3Hex(content string) string { + sum := blake3.Sum256([]byte(content)) + return hex.EncodeToString(sum[:]) +} + +func TestContentAddressedPushHappyPath(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.write(t, "b.txt", "beta") + f.index(t) + + rep, err := f.sync(t) + if err != nil { + t.Fatalf("sync: %v (rep=%+v)", err, rep) + } + if rep.Status != store.RunStatusSuccess { + t.Fatalf("Status = %q, want success", rep.Status) + } + if rep.Verification.Verified() || rep.Verification.Method != VerifyMethodPresenceSize { + t.Fatalf("Verification = %+v, want unverified %q (presence is weaker than a content check)", rep.Verification, VerifyMethodPresenceSize) + } + if rep.Verification.Files != 2 || rep.RcloneResult.Transferred != 2 || rep.RcloneResult.Checked != 0 { + t.Fatalf("counts = files=%d transferred=%d checked=%d, want 2/2/0", rep.Verification.Files, rep.RcloneResult.Transferred, rep.RcloneResult.Checked) + } + if rep.RcloneResult.Bytes != int64(len("alpha")+len("beta")) { + t.Fatalf("Bytes = %d, want %d", rep.RcloneResult.Bytes, len("alpha")+len("beta")) + } + + for name, content := range map[string]string{"a.txt": "alpha", "b.txt": "beta"} { + obj := f.remotePath(ObjectsDirName, blake3Hex(content)) + got, err := os.ReadFile(obj) + if err != nil { + t.Fatalf("object for %s missing at %s: %v", name, obj, err) + } + if string(got) != content { + t.Fatalf("object for %s = %q, want %q", name, got, content) + } + } + + entries := f.readSegment(t, rep.RunID) + if len(entries) != 2 || entries[0].Path != "a.txt" || entries[1].Path != "b.txt" { + t.Fatalf("segment entries = %+v, want a.txt and b.txt", entries) + } + if entries[0].Blake3 != blake3Hex("alpha") || entries[0].Status != store.StatusPresent || entries[0].SizeBytes != 5 { + t.Fatalf("segment entry = %+v, want present alpha", entries[0]) + } + + run, err := f.store.GetRun(context.Background(), rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusSuccess || run.FileCount != 2 || !run.Shallow.Valid || !run.Shallow.Bool { + t.Fatalf("run = %+v, want success file_count=2 shallow=true", run) + } + + row, err := f.store.GetByPath(context.Background(), f.volumeID(t), "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + obj, err := f.store.GetRemoteObject(context.Background(), row.ContentID, "offsite") + if err != nil { + t.Fatalf("GetRemoteObject: %v", err) + } + if obj.UploadedRunID != rep.RunID || obj.ChecksumAlgo.String != "sha256" || !obj.Checksum.Valid { + t.Fatalf("remote object = %+v, want a sha256-fingerprinted record for run %d", obj, rep.RunID) + } + if obj.VerifiedAtNs.Valid { + t.Fatalf("fresh upload already verified: %+v", obj) + } + if rep.Fingerprints != 2 { + t.Fatalf("Fingerprints = %d, want 2", rep.Fingerprints) + } + + vector, err := f.store.ListDestinationRunIDs(context.Background(), f.volumeID(t), "offsite") + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + if len(vector) != 1 { + t.Fatalf("vector = %+v, want one self component", vector) + } + self, err := f.store.GetSelfNode(context.Background()) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + if vector[0].OriginNodeID != self.ID || vector[0].OriginRunID == 0 { + t.Fatalf("vector component = %+v, want self node at the introduction run", vector[0]) + } + // The content-addressed advance records presence+size, not a + // content-verified method: the offload gate holds this target out + // until a verified fingerprint backs the object (#109). + if vector[0].VerifyMethod != VerifyMethodPresenceSize { + t.Fatalf("verify method = %q, want %q", vector[0].VerifyMethod, VerifyMethodPresenceSize) + } + if store.ContentVerifiedMethod(vector[0].VerifyMethod) { + t.Fatalf("presence+size must not count as content-verified") + } + + // Transfers and confirmations address the crypt overlay remote. + log, err := os.ReadFile(f.logPath) + if err != nil { + t.Fatalf("read shim log: %v", err) + } + if !strings.Contains(string(log), "copyto") || !strings.Contains(string(log), "offsite-crypt:"+ObjectsDirName+"/") { + t.Fatalf("shim log lacks crypt-addressed copyto lines:\n%s", log) + } +} + +// TestContentAddressedUploadOnce: a second run uploads only hashes the +// destination has no record of — a new path carrying already-recorded +// content transfers nothing and still lands in the manifest. +func TestContentAddressedUploadOnce(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("first sync: %v", err) + } + + f.write(t, "copy.txt", "alpha") + f.index(t) + rep, err := f.sync(t) + if err != nil { + t.Fatalf("second sync: %v", err) + } + if rep.RcloneResult.Transferred != 0 || rep.RcloneResult.Checked != 1 { + t.Fatalf("transferred=%d checked=%d, want 0/1 (hash already recorded)", rep.RcloneResult.Transferred, rep.RcloneResult.Checked) + } + entries := f.readSegment(t, rep.RunID) + if len(entries) != 1 || entries[0].Path != "copy.txt" || entries[0].Blake3 != blake3Hex("alpha") { + t.Fatalf("segment = %+v, want one copy.txt line", entries) + } +} + +// TestContentAddressedManifestSegmentGolden pins the documented segment +// format byte-for-byte across an add + supersede + missing delta. +func TestContentAddressedManifestSegmentGolden(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "v1") + f.write(t, "c.txt", "cc") + aOldMtime := f.mtimeNs(t, "a.txt") + cMtime := f.mtimeNs(t, "c.txt") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("first sync: %v", err) + } + + f.write(t, "a.txt", "v2-longer") + if err := os.Remove(filepath.Join(f.pair.Volume.Path, "c.txt")); err != nil { + t.Fatal(err) + } + f.write(t, "d.txt", "dd") + f.index(t) + rep, err := f.sync(t) + if err != nil { + t.Fatalf("second sync: %v", err) + } + + line := func(path, content, status string, mtime int64) string { + return fmt.Sprintf(`{"path":%q,"blake3":%q,"status":%q,"size_bytes":%d,"mtime_ns":%d}`, + path, blake3Hex(content), status, len(content), mtime) + } + want := strings.Join([]string{ + line("a.txt", "v2-longer", store.StatusPresent, f.mtimeNs(t, "a.txt")), + line("a.txt", "v1", store.StatusSuperseded, aOldMtime), + line("c.txt", "cc", store.StatusMissing, cMtime), + line("d.txt", "dd", store.StatusPresent, f.mtimeNs(t, "d.txt")), + }, "\n") + "\n" + + got, err := os.ReadFile(f.remotePath("pics", ManifestDirName, fmt.Sprintf("run-%d", rep.RunID))) + if err != nil { + t.Fatalf("read segment: %v", err) + } + if string(got) != want { + t.Fatalf("segment = \n%s\nwant\n%s", got, want) + } +} + +// TestContentAddressedObjectFailureIsTransactional: when an object +// fails to land, the manifest segment is never written, the runs row is +// failed, no upload is recorded, and the durability vector stays put. +func TestContentAddressedObjectFailureIsTransactional(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_FAIL_GLOB", "*"+ObjectsDirName+"*") + f.write(t, "a.txt", "alpha") + f.index(t) + + rep, err := f.sync(t) + if err == nil || !strings.Contains(err.Error(), "did not advance") { + t.Fatalf("expected transactional-landing failure, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + if _, statErr := os.Stat(f.remotePath("pics", ManifestDirName, fmt.Sprintf("run-%d", rep.RunID))); statErr == nil { + t.Fatalf("manifest segment written despite object failure") + } + run, err := f.store.GetRun(context.Background(), rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusFailed || !run.Error.Valid { + t.Fatalf("run = %+v, want failed with an error message", run) + } + row, err := f.store.GetByPath(context.Background(), f.volumeID(t), "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + if has, _ := f.store.HasRemoteObject(context.Background(), row.ContentID, "offsite"); has { + t.Fatalf("failed upload was recorded") + } + vector, err := f.store.ListDestinationRunIDs(context.Background(), f.volumeID(t), "offsite") + if err != nil || len(vector) != 0 { + t.Fatalf("vector = %+v (err=%v), want empty", vector, err) + } +} + +// TestContentAddressedSegmentFailureThenRecovery: a failed segment +// upload keeps the vector put even though every object landed, and the +// retry transfers nothing — the recorded objects are skipped and only +// the segment's missing piece is re-pushed. +func TestContentAddressedSegmentFailureThenRecovery(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_FAIL_GLOB", "*pics/"+ManifestDirName+"/*") + f.write(t, "a.txt", "alpha") + f.index(t) + + rep, err := f.sync(t) + if err == nil || !strings.Contains(err.Error(), "manifest segment") { + t.Fatalf("expected segment-upload failure, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + if _, err := os.Stat(f.remotePath(ObjectsDirName, blake3Hex("alpha"))); err != nil { + t.Fatalf("object should have landed before the segment failed: %v", err) + } + vector, err := f.store.ListDestinationRunIDs(context.Background(), f.volumeID(t), "offsite") + if err != nil || len(vector) != 0 { + t.Fatalf("vector = %+v (err=%v), want empty after segment failure", vector, err) + } + + t.Setenv("RCLONE_FAKE_FAIL_GLOB", "") + rep2, err := f.sync(t) + if err != nil { + t.Fatalf("retry sync: %v", err) + } + if rep2.RcloneResult.Transferred != 0 || rep2.RcloneResult.Checked != 1 { + t.Fatalf("retry transferred=%d checked=%d, want 0/1 (object already recorded)", rep2.RcloneResult.Transferred, rep2.RcloneResult.Checked) + } + entries := f.readSegment(t, rep2.RunID) + if len(entries) != 1 || entries[0].Path != "a.txt" { + t.Fatalf("retry segment = %+v, want the a.txt line", entries) + } + vector, err = f.store.ListDestinationRunIDs(context.Background(), f.volumeID(t), "offsite") + if err != nil || len(vector) != 1 { + t.Fatalf("vector = %+v (err=%v), want one component after recovery", vector, err) + } +} + +// TestContentAddressedEmptyDeltaStillLandsSegment: an unchanged volume +// produces an empty segment — uploaded anyway so the run leaves the +// landing evidence the next watermark check looks for. +func TestContentAddressedEmptyDeltaStillLandsSegment(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("first sync: %v", err) + } + + rep, err := f.sync(t) + if err != nil { + t.Fatalf("second sync: %v", err) + } + if rep.Status != store.RunStatusSuccess || rep.Verification.Files != 0 { + t.Fatalf("rep = status=%q files=%d, want success with an empty delta", rep.Status, rep.Verification.Files) + } + data, err := os.ReadFile(f.remotePath("pics", ManifestDirName, fmt.Sprintf("run-%d", rep.RunID))) + if err != nil { + t.Fatalf("empty segment missing: %v", err) + } + if len(data) != 0 { + t.Fatalf("empty delta produced segment content: %q", data) + } +} + +// TestContentAddressedReservedDirsStayHome: indexed rows under the +// reserved sync subtrees never reach the destination — no object, no +// manifest line. +func TestContentAddressedReservedDirsStayHome(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.write(t, HistoryDirName+"/secret.txt", "do-not-upload") + f.index(t) + + rep, err := f.sync(t) + if err != nil { + t.Fatalf("sync: %v", err) + } + entries := f.readSegment(t, rep.RunID) + if len(entries) != 1 || entries[0].Path != "a.txt" { + t.Fatalf("segment = %+v, want only a.txt", entries) + } + if _, err := os.Stat(f.remotePath(ObjectsDirName, blake3Hex("do-not-upload"))); err == nil { + t.Fatalf("reserved-subtree content was uploaded as an object") + } +} + +// TestContentAddressedWatermarkGuard: a destination whose recorded +// last success left no manifest segment (a mirror-era run) is refused +// rather than silently diffed against the wrong baseline. +func TestContentAddressedWatermarkGuard(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + + ctx := context.Background() + mirrorRun, err := f.store.BeginRun(ctx, store.RunKindSync, f.volumeID(t), "offsite", false) + if err != nil { + t.Fatalf("seed mirror-era run: %v", err) + } + if err := f.store.FinishRun(ctx, mirrorRun, store.RunStatusSuccess, "", 1); err != nil { + t.Fatalf("finish mirror-era run: %v", err) + } + + rep, err := f.sync(t) + if err == nil || !strings.Contains(err.Error(), "does not look content-addressed") { + t.Fatalf("expected layout-flip refusal, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } +} + +func TestContentAddressedDryRunRefused(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + + _, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, f.pair, Options{DryRun: true}) + if err == nil || !strings.Contains(err.Error(), "dry-run") { + t.Fatalf("expected dry-run refusal, got %v", err) + } + runs, _ := f.store.ListRuns(context.Background(), store.ListRunsOpts{}) + for _, r := range runs { + if r.Kind == store.RunKindSync { + t.Fatalf("dry-run wrote a sync runs row: %+v", r) + } + } +} + +func TestRestoreRefusesContentAddressedDestination(t *testing.T) { + f := setupContentAddressedFixture(t) + _, err := Restore(context.Background(), f.store, f.rcl, f.pair.Volume, f.pair.Destination, RestoreOptions{}) + if err == nil || !strings.Contains(err.Error(), "content-addressed") { + t.Fatalf("expected content-addressed restore refusal, got %v", err) + } +} + +// TestContentAddressedCrossVolumeDedup: objects/ is destination-global, +// matching remote_objects' (content, destination) key — a second volume +// carrying already-recorded content uploads nothing, and its manifest +// still maps the path onto the shared object. +func TestContentAddressedCrossVolumeDedup(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "shared-bytes") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("pics sync: %v", err) + } + + docs := f.cfg.Volumes["docs"] + if err := os.WriteFile(filepath.Join(docs.Path, "report.txt"), []byte("shared-bytes"), 0o644); err != nil { + t.Fatal(err) + } + if _, err := index.Index(context.Background(), f.store, docs.Path, index.Options{Name: "docs"}); err != nil { + t.Fatalf("index docs: %v", err) + } + docsPairs, err := PairsFor(f.cfg, "docs", "") + if err != nil { + t.Fatalf("PairsFor docs: %v", err) + } + rep, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, docsPairs[0], Options{}) + if err != nil { + t.Fatalf("docs sync: %v", err) + } + if rep.RcloneResult.Transferred != 0 || rep.RcloneResult.Checked != 1 { + t.Fatalf("docs transferred=%d checked=%d, want 0/1 (object shared across volumes)", rep.RcloneResult.Transferred, rep.RcloneResult.Checked) + } + data, err := os.ReadFile(f.remotePath("docs", ManifestDirName, fmt.Sprintf("run-%d", rep.RunID))) + if err != nil { + t.Fatalf("docs segment missing: %v", err) + } + if !strings.Contains(string(data), blake3Hex("shared-bytes")) || !strings.Contains(string(data), "report.txt") { + t.Fatalf("docs segment = %q, want report.txt mapped onto the shared object", data) + } + if _, err := os.Stat(f.remotePath(ObjectsDirName, blake3Hex("shared-bytes"))); err != nil { + t.Fatalf("shared object missing at the destination root: %v", err) + } +} + +// TestContentAddressedDriftRefusesObject: a source file whose on-disk +// bytes drift from the indexed hash (here a same-length, mtime-preserving +// in-place edit the metadata stat alone cannot catch) is never recorded in +// remote_objects. The run is refused and warned, the watermark holds, and +// once the honest bytes are restored the next run lands and records the +// object normally — the drifted bytes never bound to the hash. +func TestContentAddressedDriftRefusesObject(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + + indexedMtime := f.mtimeNs(t, "a.txt") + src := filepath.Join(f.pair.Volume.Path, "a.txt") + if err := os.WriteFile(src, []byte("ALPHA"), 0o644); err != nil { + t.Fatalf("in-place edit: %v", err) + } + restoreMtime := time.Unix(0, indexedMtime) + if err := os.Chtimes(src, restoreMtime, restoreMtime); err != nil { + t.Fatalf("restore mtime: %v", err) + } + + rep, err := f.sync(t) + if err == nil || !strings.Contains(err.Error(), "drifting from their indexed hash") { + t.Fatalf("expected a drift refusal, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + if rep.RcloneResult.Transferred != 0 || rep.RcloneResult.Errors != 0 { + t.Fatalf("counts = transferred=%d errors=%d, want 0/0 (drift is a refusal, not a transfer error)", + rep.RcloneResult.Transferred, rep.RcloneResult.Errors) + } + if len(rep.Warnings) == 0 || !strings.Contains(strings.Join(rep.Warnings, "\n"), "drifted") { + t.Fatalf("Warnings = %+v, want a drift advisory", rep.Warnings) + } + if _, statErr := os.Stat(f.remotePath(ObjectsDirName, blake3Hex("alpha"))); statErr == nil { + t.Fatalf("drifted source uploaded an object under the indexed hash") + } + + row, err := f.store.GetByPath(context.Background(), f.volumeID(t), "a.txt") + if err != nil { + t.Fatalf("GetByPath: %v", err) + } + if has, _ := f.store.HasRemoteObject(context.Background(), row.ContentID, "offsite"); has { + t.Fatalf("drifted object was recorded in remote_objects") + } + if _, statErr := os.Stat(f.remotePath("pics", ManifestDirName, fmt.Sprintf("run-%d", rep.RunID))); statErr == nil { + t.Fatalf("manifest segment written despite a refused object") + } + if vector, err := f.store.ListDestinationRunIDs(context.Background(), f.volumeID(t), "offsite"); err != nil || len(vector) != 0 { + t.Fatalf("vector = %+v (err=%v), want empty after a refused object", vector, err) + } + + if err := os.WriteFile(src, []byte("alpha"), 0o644); err != nil { + t.Fatalf("restore honest bytes: %v", err) + } + if err := os.Chtimes(src, restoreMtime, restoreMtime); err != nil { + t.Fatalf("restore mtime again: %v", err) + } + rep2, err := f.sync(t) + if err != nil { + t.Fatalf("retry sync: %v", err) + } + if rep2.RcloneResult.Transferred != 1 { + t.Fatalf("retry transferred = %d, want 1 (honest bytes upload after drift cleared)", rep2.RcloneResult.Transferred) + } + if has, _ := f.store.HasRemoteObject(context.Background(), row.ContentID, "offsite"); !has { + t.Fatalf("honest re-upload was not recorded") + } + got, err := os.ReadFile(f.remotePath(ObjectsDirName, blake3Hex("alpha"))) + if err != nil { + t.Fatalf("object missing after honest re-upload: %v", err) + } + if string(got) != "alpha" { + t.Fatalf("recorded object = %q, want the honest bytes %q", got, "alpha") + } +} + +// TestContentAddressedRefusesVolumeNamedObjects: a volume named like +// the destination-root objects/ directory would collide with it. +func TestContentAddressedRefusesVolumeNamedObjects(t *testing.T) { + f := setupContentAddressedFixture(t) + pair := Pair{Volume: &config.Volume{Name: ObjectsDirName, Path: t.TempDir()}, Destination: f.pair.Destination} + _, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, pair, Options{}) + if err == nil || !strings.Contains(err.Error(), "collides") { + t.Fatalf("expected volume-name collision refusal, got %v", err) + } +} diff --git a/sync/durability.go b/sync/durability.go new file mode 100644 index 0000000..265db81 --- /dev/null +++ b/sync/durability.go @@ -0,0 +1,249 @@ +package sync + +import ( + "context" + "errors" + "fmt" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" + "github.com/mbertschler/squirrel/syncproto" +) + +// maxDurabilityDropSamples caps how many dropped entries the report +// retains as detail. The Dropped count stays exact; only the sampled +// Drops slice is bounded, so an adversarial peer flooding out-of-scope +// destinations cannot blow up the report or the output that renders it. +const maxDurabilityDropSamples = 16 + +// maxOriginNodesPerPull bounds how many distinct origin-node names one +// durability pull will resolve (and so, for novel names, create as local +// nodes rows). A real volume's content originates on a handful of nodes; +// the cap is deliberately generous and exists only to convert a runaway +// peer into a loud refusal rather than to police a legitimate topology. +const maxOriginNodesPerPull = 256 + +// DurabilityPullReport summarises one durability metadata pull from a +// peer: how many entries (vector components and freshness coordinates) +// were fetched, how many components landed in the local +// destination_run_ids (advanced or re-confirmed), how many were refused +// as rewinds, and how many entries were dropped because the peer named a +// destination outside this volume's accepted target set. Every fetched +// entry lands in exactly one of the applied / rewind / dropped buckets +// (a merged freshness coordinate counts as applied). +type DurabilityPullReport struct { + Volume string + Peer string + Fetched int + Applied int + Dropped int + Rewinds []DurabilityRewind + // Drops samples the dropped entries up to maxDurabilityDropSamples; + // Dropped is the exact total. + Drops []DurabilityDrop +} + +// recordDrop counts a dropped entry and samples it into Drops up to the +// cap, so the exact total survives while the detail stays bounded. +func (r *DurabilityPullReport) recordDrop(d DurabilityDrop) { + r.Dropped++ + if len(r.Drops) < maxDurabilityDropSamples { + r.Drops = append(r.Drops, d) + } +} + +// DurabilityDrop is one pulled entry the merge discarded because its +// destination falls outside the volume's accepted target set +// (offload_requires ∪ sync_to). Drops are counted and sampled so a peer +// asserting evidence for destinations this node uses for neither offload +// nor sync stays observable. +type DurabilityDrop struct { + Destination string + OriginNode string + Kind string // "component" or "freshness" +} + +func (d DurabilityDrop) String() string { + return fmt.Sprintf("%s for unconfigured destination %s origin %s", + d.Kind, d.Destination, d.OriginNode) +} + +// DurabilityRewind is one component the pull refused because the peer +// reported a value below the locally recorded one. The local value +// stays; re-running with the allow-rewind opt-in accepts the peer's. +type DurabilityRewind struct { + Destination string + OriginNode string + Current int64 + Attempted int64 +} + +func (r DurabilityRewind) String() string { + return fmt.Sprintf("destination %s origin %s: recorded %d, peer reports %d", + r.Destination, r.OriginNode, r.Current, r.Attempted) +} + +// PullDurability fetches the peer's destination durability vectors for +// the volume and merges them into the local destination_run_ids under +// the same destination names — peers and buckets share one flat target +// namespace, so a component about a destination only the peer can see +// lands as locally cached evidence for offline decisions. The merge is +// metadata-only and monotonic: each component routes through the +// watermark store, refused rewinds are reported on the result (not +// applied), and allowRewind is the explicit recovery override. +// +// The merge is scoped to destinations this volume actually references +// (offload_requires ∪ sync_to); evidence for any other destination is +// dropped, so a buggy or compromised peer cannot pollute the local +// vector with rows for destinations this node neither requires for +// offload nor syncs to. +// +// The standalone `peer-sync pull-durability` command and the automatic +// post-close pull share this implementation. +func PullDurability(ctx context.Context, s *store.Store, vol *config.Volume, node *config.Node, allowRewind bool) (DurabilityPullReport, error) { + v, err := s.GetVolumeByName(ctx, vol.Name) + if err != nil { + if store.IsNotFound(err) { + return DurabilityPullReport{}, fmt.Errorf("volume %q has no local index row; index it before pulling durability", vol.Name) + } + return DurabilityPullReport{}, fmt.Errorf("lookup volume %q: %w", vol.Name, err) + } + return pullDurability(ctx, s, newNodeClient(node), vol.Name, v.ID, node.Name, acceptedDestinations(vol), allowRewind) +} + +// acceptedDestinations is the set of destination names this volume +// references: the union of its offload_requires and sync_to entries. +// A pulled durability entry for any name outside this set has no bearing +// on the volume's local decisions and is dropped by the pull. +func acceptedDestinations(vol *config.Volume) map[string]struct{} { + accepted := make(map[string]struct{}, len(vol.OffloadRequires)+len(vol.SyncTo)) + for _, name := range vol.OffloadRequires { + accepted[name] = struct{}{} + } + for _, name := range vol.SyncTo { + accepted[name] = struct{}{} + } + return accepted +} + +// pullDurability is the transport-injected body of PullDurability, +// shared with the node-sync driver (which already holds a client). +// accepted scopes which destinations the merge will store (see +// acceptedDestinations). +func pullDurability(ctx context.Context, s *store.Store, client *nodeClient, volumeName string, volumeID int64, peerName string, accepted map[string]struct{}, allowRewind bool) (DurabilityPullReport, error) { + rep := DurabilityPullReport{Volume: volumeName, Peer: peerName} + resp, err := client.durability(ctx, syncproto.DurabilityRequest{Volume: volumeName}) + if err != nil { + return rep, err + } + rep.Fetched = len(resp.Components) + len(resp.Freshness) + originIDs := make(map[string]int64, 4) + resolveOrigin := func(name string) (int64, error) { + if id, ok := originIDs[name]; ok { + return id, nil + } + // GetOrCreateOriginNode creates a local nodes row for any name + // not seen before. Bound how many distinct origins one pull will + // resolve so a peer bug (or hostile peer) flooding novel names + // cannot grow the local nodes table without limit — a real + // volume references only a handful of origins. Fails the pull + // rather than truncating, so the cap is observable. + if len(originIDs) >= maxOriginNodesPerPull { + return 0, fmt.Errorf("durability pull names more than %d distinct origin nodes; refusing to create unbounded node rows from one pull", maxOriginNodesPerPull) + } + node, err := s.GetOrCreateOriginNode(ctx, name) + if err != nil { + return 0, fmt.Errorf("resolve origin node %q: %w", name, err) + } + originIDs[name] = node.ID + return node.ID, nil + } + for _, c := range resp.Components { + if _, ok := accepted[c.Destination]; !ok { + rep.recordDrop(DurabilityDrop{Destination: c.Destination, OriginNode: c.OriginNode, Kind: "component"}) + continue + } + if err := validateComponent(c); err != nil { + return rep, fmt.Errorf("component %+v: %w", c, err) + } + nodeID, err := resolveOrigin(c.OriginNode) + if err != nil { + return rep, err + } + err = s.UpsertDestinationRunIDVerified(ctx, volumeID, c.Destination, nodeID, c.OriginRun, c.VerifyMethod, allowRewind) + var rewind *store.DestinationRewindError + if errors.As(err, &rewind) { + rep.Rewinds = append(rep.Rewinds, DurabilityRewind{ + Destination: c.Destination, + OriginNode: c.OriginNode, + Current: rewind.Current, + Attempted: rewind.Attempted, + }) + continue + } + if err != nil { + return rep, fmt.Errorf("apply component for destination %q origin %q: %w", c.Destination, c.OriginNode, err) + } + rep.Applied++ + } + for _, f := range resp.Freshness { + if _, ok := accepted[f.Destination]; !ok { + rep.recordDrop(DurabilityDrop{Destination: f.Destination, OriginNode: f.OriginNode, Kind: "freshness"}) + continue + } + if err := validateFreshness(f); err != nil { + return rep, fmt.Errorf("freshness %+v: %w", f, err) + } + nodeID, err := resolveOrigin(f.OriginNode) + if err != nil { + return rep, err + } + if err := s.MergeDestinationPushFreshness(ctx, volumeID, f.Destination, nodeID, f.OriginRun); err != nil { + return rep, fmt.Errorf("apply freshness for destination %q origin %q: %w", f.Destination, f.OriginNode, err) + } + rep.Applied++ + } + return rep, nil +} + +// validateComponent guards the wire-supplied component before it +// touches the local vector: destination and origin names are +// identities, the run id must be a positive origin-space id, and the +// verify method must be empty or a method this build recognises. The +// method check is defence-in-depth, not a trust boundary (the peer is +// trusted to assert its own durability — see SAFETY-AUDIT.md D1): the +// gate already refuses to offload on an unrecognised method, so the +// only effect of an unknown non-empty method reaching the store is a +// silently-inert row. Refusing it here turns a peer bug or a +// version-skew method string into a loud error at the pull instead. +func validateComponent(c syncproto.DurabilityComponent) error { + if c.Destination == "" { + return errors.New("destination must be non-empty") + } + if !store.ValidNodeName(c.OriginNode) { + return fmt.Errorf("origin_node %q is not a valid node name", c.OriginNode) + } + if c.OriginRun <= 0 { + return fmt.Errorf("origin_run %d must be positive", c.OriginRun) + } + if c.VerifyMethod != "" && !store.KnownVerifyMethod(c.VerifyMethod) { + return fmt.Errorf("verify_method %q is not a recognised verification method", c.VerifyMethod) + } + return nil +} + +// validateFreshness guards a wire-supplied freshness coordinate before it +// merges into the local table: same identity and positive-run-id rules as +// validateComponent. +func validateFreshness(f syncproto.DurabilityFreshness) error { + if f.Destination == "" { + return errors.New("destination must be non-empty") + } + if !store.ValidNodeName(f.OriginNode) { + return fmt.Errorf("origin_node %q is not a valid node name", f.OriginNode) + } + if f.OriginRun <= 0 { + return fmt.Errorf("origin_run %d must be positive", f.OriginRun) + } + return nil +} diff --git a/sync/durability_test.go b/sync/durability_test.go new file mode 100644 index 0000000..b9557a7 --- /dev/null +++ b/sync/durability_test.go @@ -0,0 +1,380 @@ +package sync + +import ( + "context" + "fmt" + "strings" + "testing" + + "github.com/mbertschler/squirrel/store" + "github.com/mbertschler/squirrel/syncproto" +) + +// seedReceiverDurability records vector components on the receiver so +// the pull tests have something to fetch. Returns the receiver's self +// name (the origin-node identity its components travel under). +func seedReceiverDurability(t *testing.T, f *nodeFixture, components map[string]int64) string { + t.Helper() + ctx := context.Background() + v, err := f.recvStore.CreateVolume(ctx, f.recvVol.Name, f.recvVol.Path) + if err != nil { + t.Fatalf("CreateVolume on receiver: %v", err) + } + self, err := f.recvStore.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode on receiver: %v", err) + } + for dest, run := range components { + if err := f.recvStore.UpsertDestinationRunID(ctx, v.ID, dest, self.ID, run, false); err != nil { + t.Fatalf("seed %s→%d: %v", dest, run, err) + } + } + return self.Name +} + +// seedReceiverFreshness records push-freshness coordinates on the +// receiver, mirroring seedReceiverDurability for the freshness table. +// Reuses the receiver volume the durability seed created when present. +func seedReceiverFreshness(t *testing.T, f *nodeFixture, coords map[string]int64) string { + t.Helper() + ctx := context.Background() + v, err := f.recvStore.GetOrCreateVolume(ctx, f.recvVol.Path) + if err != nil { + t.Fatalf("GetOrCreateVolume on receiver: %v", err) + } + self, err := f.recvStore.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode on receiver: %v", err) + } + for dest, run := range coords { + if err := f.recvStore.UpsertDestinationPushFreshness(ctx, v.ID, dest, self.ID, run); err != nil { + t.Fatalf("seed freshness %s→%d: %v", dest, run, err) + } + } + return self.Name +} + +// TestPullDurabilityMergesFreshness: the pull fetches the peer's +// push-freshness coordinates alongside the vector and merges them into +// the LOCAL destination_push_freshness, so a relayed target's freshness +// evidence reaches a node that never pushes there. The merge is +// monotonic: a stale pull below a higher local value is ignored. +func TestPullDurabilityMergesFreshness(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offsite-a"} + f.initVol.SyncTo = []string{"offsite-b"} + originName := seedReceiverFreshness(t, f, map[string]int64{ + "offsite-a": 12, + "offsite-b": 5, + }) + + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + origin, err := f.initStore.GetOrCreateOriginNode(ctx, originName) + if err != nil { + t.Fatalf("GetOrCreateOriginNode: %v", err) + } + // A higher local floor on offsite-b: the stale pull must not lower it. + if err := f.initStore.MergeDestinationPushFreshness(ctx, v.ID, "offsite-b", origin.ID, 8); err != nil { + t.Fatalf("seed local freshness floor: %v", err) + } + + if _, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false); err != nil { + t.Fatalf("PullDurability: %v", err) + } + + for dest, want := range map[string]int64{"offsite-a": 12, "offsite-b": 8} { + fresh, err := f.initStore.ListDestinationPushFreshness(ctx, v.ID, dest) + if err != nil { + t.Fatalf("ListDestinationPushFreshness %s: %v", dest, err) + } + if len(fresh) != 1 || fresh[0].OriginRunID != want { + t.Fatalf("%s freshness = %+v, want one coordinate at %d", dest, fresh, want) + } + } +} + +// TestPullDurabilityCopiesComponents: the pull fetches the peer's +// vector components and lands them in the LOCAL destination_run_ids +// under the same destination names, with origin node names mapped to +// local rows (created on first contact). +func TestPullDurabilityCopiesComponents(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offsite-a"} + f.initVol.SyncTo = []string{"offsite-b"} + originName := seedReceiverDurability(t, f, map[string]int64{ + "offsite-a": 12, + "offsite-b": 5, + }) + + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + rep, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err != nil { + t.Fatalf("PullDurability: %v", err) + } + if rep.Fetched != 2 || rep.Applied != 2 || len(rep.Rewinds) != 0 { + t.Fatalf("report = %+v, want fetched=2 applied=2 no rewinds", rep) + } + + origin, err := f.initStore.GetNodeByName(ctx, originName) + if err != nil { + t.Fatalf("origin node %q was not created locally: %v", originName, err) + } + for dest, want := range map[string]int64{"offsite-a": 12, "offsite-b": 5} { + got, err := f.initStore.GetDestinationRunID(ctx, v.ID, dest, origin.ID) + if err != nil { + t.Fatalf("GetDestinationRunID %s: %v", dest, err) + } + if got.OriginRunID != want { + t.Fatalf("%s component = %d, want %d", dest, got.OriginRunID, want) + } + } +} + +// TestPullDurabilityDropsUnconfiguredDestinations: the pull merges +// components for destinations the volume references (one via +// offload_requires, one via sync_to) and drops one for an unconfigured +// destination — counted and reported, never stored, and without +// aborting the merge of the legitimate components. +func TestPullDurabilityDropsUnconfiguredDestinations(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offload-target"} + f.initVol.SyncTo = []string{"sync-target"} + originName := seedReceiverDurability(t, f, map[string]int64{ + "offload-target": 12, + "sync-target": 5, + "junk": 99, + }) + + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + rep, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err != nil { + t.Fatalf("PullDurability: %v", err) + } + if rep.Fetched != 3 || rep.Applied != 2 || rep.Dropped != 1 || len(rep.Rewinds) != 0 { + t.Fatalf("report = %+v, want fetched=3 applied=2 dropped=1 no rewinds", rep) + } + if len(rep.Drops) != 1 || rep.Drops[0].Destination != "junk" || rep.Drops[0].Kind != "component" { + t.Fatalf("drops = %+v, want one component drop for junk", rep.Drops) + } + + origin, err := f.initStore.GetNodeByName(ctx, originName) + if err != nil { + t.Fatalf("origin node %q was not created locally: %v", originName, err) + } + for dest, want := range map[string]int64{"offload-target": 12, "sync-target": 5} { + got, err := f.initStore.GetDestinationRunID(ctx, v.ID, dest, origin.ID) + if err != nil { + t.Fatalf("GetDestinationRunID %s: %v", dest, err) + } + if got.OriginRunID != want { + t.Fatalf("%s component = %d, want %d", dest, got.OriginRunID, want) + } + } + if _, err := f.initStore.GetDestinationRunID(ctx, v.ID, "junk", origin.ID); err == nil { + t.Fatal("junk component was stored, want it dropped") + } +} + +// TestPullDurabilityDropsUnconfiguredFreshness: a freshness coordinate +// for a destination outside the volume's accepted set is dropped and +// counted (Kind "freshness") just like a stray vector component, so a +// peer can't seed push-freshness for a destination this node never uses. +func TestPullDurabilityDropsUnconfiguredFreshness(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offsite-a"} + f.initVol.SyncTo = nil + seedReceiverFreshness(t, f, map[string]int64{ + "offsite-a": 12, + "junk": 99, + }) + + if _, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path); err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + rep, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err != nil { + t.Fatalf("PullDurability: %v", err) + } + if rep.Dropped != 1 || len(rep.Drops) != 1 { + t.Fatalf("report = %+v, want exactly one drop", rep) + } + if rep.Drops[0].Destination != "junk" || rep.Drops[0].Kind != "freshness" { + t.Fatalf("drop = %+v, want a freshness drop for junk", rep.Drops[0]) + } + if rep.Fetched < 1 || rep.Applied < 1 { + t.Fatalf("report = %+v, want the accepted offsite-a freshness still applied and counted", rep) + } +} + +// TestPullDurabilityCapsDropSamples: a peer flooding many out-of-scope +// destinations keeps the exact Dropped count but bounds the sampled +// Drops slice, so neither the report nor the output it feeds can grow +// unbounded under an adversarial peer. +func TestPullDurabilityCapsDropSamples(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + junk := make(map[string]int64, 50) + for i := range 50 { + junk[fmt.Sprintf("junk-%02d", i)] = 1 + } + seedReceiverDurability(t, f, junk) + + if _, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path); err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + rep, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err != nil { + t.Fatalf("PullDurability: %v", err) + } + if rep.Fetched != 50 || rep.Applied != 0 || rep.Dropped != 50 { + t.Fatalf("report = fetched=%d applied=%d dropped=%d, want 50/0/50", rep.Fetched, rep.Applied, rep.Dropped) + } + if len(rep.Drops) > 16 { + t.Fatalf("len(Drops) = %d, want capped at 16", len(rep.Drops)) + } +} + +// TestPullDurabilityRefusesRewind: a peer component below the locally +// recorded value is refused and reported, leaving the local value in +// place; the allow-rewind opt-in accepts it. +func TestPullDurabilityRefusesRewind(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offsite-a"} + f.initVol.SyncTo = []string{"offsite-b"} + originName := seedReceiverDurability(t, f, map[string]int64{ + "offsite-a": 12, + "offsite-b": 5, + }) + + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + origin, err := f.initStore.GetOrCreateOriginNode(ctx, originName) + if err != nil { + t.Fatalf("GetOrCreateOriginNode: %v", err) + } + if err := f.initStore.UpsertDestinationRunID(ctx, v.ID, "offsite-b", origin.ID, 9, false); err != nil { + t.Fatalf("seed local floor: %v", err) + } + + rep, err := PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err != nil { + t.Fatalf("PullDurability: %v", err) + } + if rep.Applied != 1 || len(rep.Rewinds) != 1 { + t.Fatalf("report = %+v, want applied=1 rewinds=1", rep) + } + rw := rep.Rewinds[0] + if rw.Destination != "offsite-b" || rw.OriginNode != originName || rw.Current != 9 || rw.Attempted != 5 { + t.Fatalf("rewind = %+v, want offsite-b/%s 9→5 refused", rw, originName) + } + got, err := f.initStore.GetDestinationRunID(ctx, v.ID, "offsite-b", origin.ID) + if err != nil { + t.Fatalf("GetDestinationRunID: %v", err) + } + if got.OriginRunID != 9 { + t.Fatalf("offsite-b component = %d after refused rewind, want 9", got.OriginRunID) + } + + rep, err = PullDurability(ctx, f.initStore, f.initVol, f.node, true) + if err != nil { + t.Fatalf("PullDurability allowRewind: %v", err) + } + if rep.Applied != 2 || len(rep.Rewinds) != 0 { + t.Fatalf("override report = %+v, want applied=2 no rewinds", rep) + } + got, _ = f.initStore.GetDestinationRunID(ctx, v.ID, "offsite-b", origin.ID) + if got.OriginRunID != 5 { + t.Fatalf("offsite-b component = %d after override, want 5", got.OriginRunID) + } +} + +// TestPullDurabilityRequiresLocalVolume: the pull lands rows under a +// local volume id, so a volume with no local index row fails fast with +// a pointer at `index` rather than inventing a row. +func TestPullDurabilityRequiresLocalVolume(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + seedReceiverDurability(t, f, map[string]int64{"offsite-a": 12}) + + _, err := PullDurability(context.Background(), f.initStore, f.initVol, f.node, false) + if err == nil || !strings.Contains(err.Error(), "no local index row") { + t.Fatalf("err = %v, want the no-local-index-row guard", err) + } +} + +// TestValidateComponentVerifyMethod: the wire-boundary guard accepts the +// empty method (a legitimate "unverified" state) and every method this +// build defines, and refuses an unrecognised non-empty method so a peer +// bug or version-skew string is loud at the pull rather than a +// silently-inert local row (SAFETY-AUDIT.md D1). Freshness carries no +// method and is unaffected. +func TestValidateComponentVerifyMethod(t *testing.T) { + base := syncproto.DurabilityComponent{Destination: "offsite-a", OriginNode: "laptop", OriginRun: 5} + for _, method := range []string{ + "", + store.VerifyMethodBlake3, + store.VerifyMethodSizeMtime, + store.VerifyMethodPeer, + store.VerifyMethodKopia, + store.VerifyMethodPresenceSize, + } { + c := base + c.VerifyMethod = method + if err := validateComponent(c); err != nil { + t.Errorf("validateComponent(method=%q) = %v, want nil", method, err) + } + } + c := base + c.VerifyMethod = "totally-bogus" + if err := validateComponent(c); err == nil || !strings.Contains(err.Error(), "recognised verification method") { + t.Fatalf("validateComponent(unknown method) = %v, want the unrecognised-method refusal", err) + } +} + +// TestPullDurabilityCapsOriginNodeCreation: a pull that names more than +// maxOriginNodesPerPull distinct origins is refused before it grows the +// local nodes table without bound (SAFETY-AUDIT.md D1). Seeds cap+1 +// distinct-origin components on the receiver, all on one accepted +// destination, and asserts the pull fails with the cap message. +func TestPullDurabilityCapsOriginNodeCreation(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + f.initVol.OffloadRequires = []string{"offsite-a"} + + rv, err := f.recvStore.CreateVolume(ctx, f.recvVol.Name, f.recvVol.Path) + if err != nil { + t.Fatalf("CreateVolume on receiver: %v", err) + } + for i := 0; i <= maxOriginNodesPerPull; i++ { + origin, err := f.recvStore.GetOrCreateOriginNode(ctx, fmt.Sprintf("origin-%04d", i)) + if err != nil { + t.Fatalf("seed origin %d: %v", i, err) + } + if err := f.recvStore.UpsertDestinationRunID(ctx, rv.ID, "offsite-a", origin.ID, int64(i+1), false); err != nil { + t.Fatalf("seed component %d: %v", i, err) + } + } + + if _, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path); err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + _, err = PullDurability(ctx, f.initStore, f.initVol, f.node, false) + if err == nil || !strings.Contains(err.Error(), "distinct origin nodes") { + t.Fatalf("PullDurability = %v, want the origin-node cap refusal", err) + } +} diff --git a/sync/fingerprint.go b/sync/fingerprint.go new file mode 100644 index 0000000..65b951b --- /dev/null +++ b/sync/fingerprint.go @@ -0,0 +1,128 @@ +package sync + +import ( + "maps" + "path" + "slices" + "strconv" + "strings" + + "github.com/mbertschler/squirrel/config" +) + +// Checksum algo labels recorded in remote_objects.checksum_algo for the +// s3 backend, whose provider checksum is the object ETag. rclone surfaces +// the ETag in the md5 hash slot only for objects it can treat as an MD5 — +// single-part uploads, or multipart objects that carry an MD5 in their +// metadata. A multipart object without that metadata exposes no md5 hash +// through lsjson, so its fingerprint stays pending (see the capture path). +// Recorded values are compared verbatim on verification — squirrel never +// recomputes a provider checksum. Every other backend records the plain +// rclone hash name (sha256, sha1, …). +// +// The exact reach of ETag capture against a live multipart-splitting +// backend still needs confirmation — see the follow-up issue. +const ( + AlgoEtagMD5 = "etag-md5" + AlgoEtagMD5Composite = "etag-md5-composite" +) + +// remoteChecksum is one provider checksum read back from a destination's +// underlying remote: the algo label recorded in remote_objects plus the +// provider's canonical value. +type remoteChecksum struct { + Algo string + Value string +} + +// hashPreference orders the rclone hash names picked when a backend +// exposes several and the destination configures no hash_algo: strongest +// first, with anything unlisted falling back to name order. +var hashPreference = []string{"sha256", "sha1", "md5", "crc32"} + +// extractChecksum maps one object's lsjson hashes onto the fingerprint +// recorded for dest's backend type: the ETag under an etag flavor for +// s3, the configured hash_algo where one is set, and the strongest +// exposed hash otherwise. ok is false when the listing exposes no usable +// checksum for the object. +func extractChecksum(dest *config.Destination, hashes map[string]string) (remoteChecksum, bool) { + switch { + case dest.Type == "s3": + v := hashes["md5"] + if v == "" { + return remoteChecksum{}, false + } + return remoteChecksum{Algo: etagFlavor(v), Value: v}, true + case dest.HashAlgo != "": + v := hashes[dest.HashAlgo] + if v == "" { + return remoteChecksum{}, false + } + return remoteChecksum{Algo: dest.HashAlgo, Value: v}, true + default: + for _, name := range hashPreference { + if v := hashes[name]; v != "" { + return remoteChecksum{Algo: name, Value: v}, true + } + } + for _, name := range slices.Sorted(maps.Keys(hashes)) { + if v := hashes[name]; v != "" { + return remoteChecksum{Algo: name, Value: v}, true + } + } + return remoteChecksum{}, false + } +} + +// etagFlavor labels an s3 ETag value by shape: a "-<parts>" suffix marks +// the multipart composite form, otherwise the value is a whole-object md5. +// The label is descriptive only — both are stored and compared verbatim. +func etagFlavor(v string) string { + if strings.Contains(v, "-") { + return AlgoEtagMD5Composite + } + return AlgoEtagMD5 +} + +// captureHashTypes returns the --hash-type set fingerprint capture +// requests from dest's underlying remote; nil means "whatever the +// backend exposes". Narrowing matters on backends that compute hashes +// per request (sftp runs one server-side sum command per file per hash +// type). +func captureHashTypes(dest *config.Destination) []string { + switch { + case dest.Type == "s3": + return []string{"md5"} + case dest.HashAlgo != "": + return []string{dest.HashAlgo} + } + return nil +} + +// algoHashType maps a recorded checksum_algo back to the rclone hash +// name re-verification must request: the etag flavors ride on the md5 +// hash, every other algo is the hash name itself. +func algoHashType(algo string) string { + if algo == AlgoEtagMD5 || algo == AlgoEtagMD5Composite { + return "md5" + } + return algo +} + +// checkersArgs renders dest's optional concurrent-checkers cap as rclone +// argv. +func checkersArgs(dest *config.Destination) []string { + if dest.Checkers <= 0 { + return nil + } + return []string{"--checkers", strconv.Itoa(dest.Checkers)} +} + +// underlyingObjectsURI addresses the destination-root objects/ directory +// on the underlying remote, bypassing any crypt overlay: the scan-back +// fingerprint is over the stored ciphertext, and with filename +// encryption fixed off the underlying object key equals the overlay +// path. +func underlyingObjectsURI(dest *config.Destination) string { + return dest.Name + ":" + path.Join(dest.Root, ObjectsDirName) +} diff --git a/sync/fingerprint_test.go b/sync/fingerprint_test.go new file mode 100644 index 0000000..8c491c9 --- /dev/null +++ b/sync/fingerprint_test.go @@ -0,0 +1,308 @@ +package sync + +import ( + "context" + "os" + "strings" + "testing" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +func TestExtractChecksum(t *testing.T) { + cases := []struct { + name string + dest *config.Destination + hashes map[string]string + want remoteChecksum + ok bool + }{ + { + name: "s3 plain etag", + dest: &config.Destination{Type: "s3"}, + hashes: map[string]string{"md5": "9e107d9d372bb6826bd81d3542a419d6"}, + want: remoteChecksum{Algo: AlgoEtagMD5, Value: "9e107d9d372bb6826bd81d3542a419d6"}, + ok: true, + }, + { + name: "s3 multipart composite etag is opaque", + dest: &config.Destination{Type: "s3"}, + hashes: map[string]string{"md5": "9e107d9d372bb6826bd81d3542a419d6-12"}, + want: remoteChecksum{Algo: AlgoEtagMD5Composite, Value: "9e107d9d372bb6826bd81d3542a419d6-12"}, + ok: true, + }, + { + name: "s3 without etag", + dest: &config.Destination{Type: "s3"}, + hashes: map[string]string{"sha1": "aa"}, + ok: false, + }, + { + name: "sftp configured algo", + dest: &config.Destination{Type: "sftp", HashAlgo: "sha256"}, + hashes: map[string]string{"sha256": "aa", "md5": "bb"}, + want: remoteChecksum{Algo: "sha256", Value: "aa"}, + ok: true, + }, + { + name: "sftp configured algo not exposed", + dest: &config.Destination{Type: "sftp", HashAlgo: "sha256"}, + hashes: map[string]string{"md5": "bb"}, + ok: false, + }, + { + name: "preference picks the strongest exposed hash", + dest: &config.Destination{Type: "b2"}, + hashes: map[string]string{"md5": "bb", "sha1": "aa"}, + want: remoteChecksum{Algo: "sha1", Value: "aa"}, + ok: true, + }, + { + name: "unlisted hash names fall back to name order", + dest: &config.Destination{Type: "b2"}, + hashes: map[string]string{"quickxor": "qq", "dropbox": "dd"}, + want: remoteChecksum{Algo: "dropbox", Value: "dd"}, + ok: true, + }, + { + name: "no hashes", + dest: &config.Destination{Type: "b2"}, + ok: false, + }, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + got, ok := extractChecksum(c.dest, c.hashes) + if ok != c.ok || got != c.want { + t.Fatalf("extractChecksum = (%+v, %t), want (%+v, %t)", got, ok, c.want, c.ok) + } + }) + } +} + +// shimLog returns the recorded fake-rclone argv lines. +func (f *caFixture) shimLog(t *testing.T) []string { + t.Helper() + data, err := os.ReadFile(f.logPath) + if err != nil { + t.Fatalf("read shim log: %v", err) + } + return strings.Split(strings.TrimSpace(string(data)), "\n") +} + +// captureLines filters the shim log to the fingerprint/verify listing +// invocations: lsjson directory listings, as opposed to the --stat +// presence confirms. +func (f *caFixture) captureLines(t *testing.T) []string { + t.Helper() + var out []string + for _, line := range f.shimLog(t) { + if strings.Contains(line, " lsjson ") && !strings.Contains(line, "--stat") { + out = append(out, line) + } + } + return out +} + +// TestCaptureReadsUnderlyingRemote pins the privacy-critical argv shape +// of fingerprint capture on a crypt destination: the listing addresses +// the base remote (the fingerprint is over the stored ciphertext; with +// filename encryption off the underlying key equals the overlay path), +// requests exactly the configured hash type, and scopes the listing to +// this run's uploads. +func TestCaptureReadsUnderlyingRemote(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + lines := f.captureLines(t) + if len(lines) != 1 { + t.Fatalf("capture lsjson invocations = %d, want one batched call:\n%s", len(lines), strings.Join(lines, "\n")) + } + line := lines[0] + for _, want := range []string{ + "offsite:/data/objects", + "--hash-type sha256", + "--include " + blake3Hex("alpha"), + } { + if !strings.Contains(line, want) { + t.Fatalf("capture argv lacks %q:\n%s", want, line) + } + } + if strings.Contains(line, "offsite-crypt:") { + t.Fatalf("capture addressed the crypt overlay:\n%s", line) + } +} + +// TestCaptureRecordsEtagFlavorForS3: on an s3 destination the recorded +// fingerprint is the ETag rclone surfaces as the md5 hash, labeled with +// its etag flavor, and the listing requests only the md5 hash type. +func TestCaptureRecordsEtagFlavorForS3(t *testing.T) { + f := setupCAFixture(t, `[destinations.offsite] +type = "s3" +provider = "Other" +bucket = "b" +root = "data" +layout = "content-addressed" +`, "") + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + obj := f.remoteObject(t, "a.txt") + if obj.ChecksumAlgo.String != AlgoEtagMD5 || !obj.Checksum.Valid { + t.Fatalf("remote object = %+v, want an etag-md5 fingerprint", obj) + } + lines := f.captureLines(t) + if len(lines) != 1 || !strings.Contains(lines[0], "--hash-type md5") { + t.Fatalf("capture argv should request md5 only:\n%s", strings.Join(lines, "\n")) + } +} + +// TestCaptureCompositeEtagRecordedOpaquely: a multipart-style composite +// ETag is recorded as-is under its own flavor and verifies by verbatim +// comparison — no recomputation anywhere. +func TestCaptureCompositeEtagRecordedOpaquely(t *testing.T) { + f := setupCAFixture(t, `[destinations.offsite] +type = "s3" +provider = "Other" +bucket = "b" +root = "data" +layout = "content-addressed" +`, "") + t.Setenv("RCLONE_FAKE_HASH_VALUE", "9e107d9d372bb6826bd81d3542a419d6-12") + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + obj := f.remoteObject(t, "a.txt") + if obj.ChecksumAlgo.String != AlgoEtagMD5Composite || obj.Checksum.String != "9e107d9d372bb6826bd81d3542a419d6-12" { + t.Fatalf("remote object = %+v, want the composite etag recorded verbatim", obj) + } + + rep, err := VerifyRemote(context.Background(), f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Verified != 1 || !rep.Clean() { + t.Fatalf("rep = %+v, want the composite etag verified by verbatim compare", rep) + } +} + +// TestCaptureNoChecksumWarns: a backend exposing no checksum leaves the +// pair pending with a run-report warning — the push still succeeds. +func TestCaptureNoChecksumWarns(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_NO_HASHES", "1") + f.write(t, "a.txt", "alpha") + f.index(t) + + rep, err := f.sync(t) + if err != nil { + t.Fatalf("sync: %v", err) + } + if rep.Status != store.RunStatusSuccess || rep.Fingerprints != 0 { + t.Fatalf("rep = status=%q fingerprints=%d, want success with none recorded", rep.Status, rep.Fingerprints) + } + var warned bool + for _, w := range rep.Warnings { + if strings.Contains(w, "fingerprint stays pending") { + warned = true + } + } + if !warned { + t.Fatalf("Warnings = %v, want a fingerprint-pending advisory", rep.Warnings) + } + obj := f.remoteObject(t, "a.txt") + if obj.ChecksumAlgo.Valid || obj.Checksum.Valid { + t.Fatalf("remote object = %+v, want a pending pair", obj) + } +} + +// TestCaptureMissingFromListingStaysPending: an object the remote +// listing does not return yet leaves the pair pending with a distinct +// "not yet returned" advisory — separate from a returned-but-no-hash +// object — and the push still succeeds. +func TestCaptureMissingFromListingStaysPending(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_EMPTY_LISTING", "1") + f.write(t, "a.txt", "alpha") + f.index(t) + + rep, err := f.sync(t) + if err != nil { + t.Fatalf("sync: %v", err) + } + if rep.Status != store.RunStatusSuccess || rep.Fingerprints != 0 { + t.Fatalf("rep = status=%q fingerprints=%d, want success with none recorded", rep.Status, rep.Fingerprints) + } + var warned bool + for _, w := range rep.Warnings { + if strings.Contains(w, "not yet returned by the remote listing") { + warned = true + } + } + if !warned { + t.Fatalf("Warnings = %v, want a not-yet-listed advisory distinct from no-checksum", rep.Warnings) + } + obj := f.remoteObject(t, "a.txt") + if obj.ChecksumAlgo.Valid || obj.Checksum.Valid { + t.Fatalf("remote object = %+v, want a pending pair", obj) + } +} + +// TestCheckersCapInArgv: a destination's checkers cap reaches every +// rclone invocation the content-addressed push and the verify pass run +// against it. +func TestCheckersCapInArgv(t *testing.T) { + f := setupCAFixture(t, `[destinations.offsite] +type = "sftp" +host = "remote.invalid" +user = "u" +root = "/data" +layout = "content-addressed" +checkers = 3 + +[destinations.offsite.crypt] +password = "obscured-pw" +`, "/data") + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + if _, err := VerifyRemote(context.Background(), f.store, f.rcl, f.pair.Destination); err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + + for _, line := range f.shimLog(t) { + if strings.Contains(line, " copyto ") || strings.Contains(line, " lsjson ") { + if !strings.Contains(line, "--checkers 3") { + t.Fatalf("argv lacks --checkers 3:\n%s", line) + } + } + } +} + +// remoteObject fetches the upload record for a path on the fixture's +// offsite destination. +func (f *caFixture) remoteObject(t *testing.T, path string) store.RemoteObject { + t.Helper() + row, err := f.store.GetByPath(context.Background(), f.volumeID(t), path) + if err != nil { + t.Fatalf("GetByPath %s: %v", path, err) + } + obj, err := f.store.GetRemoteObject(context.Background(), row.ContentID, "offsite") + if err != nil { + t.Fatalf("GetRemoteObject %s: %v", path, err) + } + return obj +} diff --git a/sync/handler.go b/sync/handler.go new file mode 100644 index 0000000..4c8b2eb --- /dev/null +++ b/sync/handler.go @@ -0,0 +1,220 @@ +package sync + +import ( + "context" + "errors" + "fmt" + "path/filepath" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// Verification methods reported by the curated handlers on +// VerifyResult.Method. They alias the canonical identifiers in store so +// the durability vector's recorded method and the handler's reported +// method are the same strings — store owns them because the offload gate +// reads them to decide whether a component is content-verified. +const ( + // VerifyMethodBlake3 is rclone's end-to-end content check + // (--checksum --hash blake3). + VerifyMethodBlake3 = store.VerifyMethodBlake3 + // VerifyMethodSizeMtime is rclone's default comparison, used for + // --shallow runs and forced by crypt destinations. + VerifyMethodSizeMtime = store.VerifyMethodSizeMtime + // VerifyMethodPeer is the node-sync handshake's receiver-side + // BLAKE3 re-hash of every delivered path. + VerifyMethodPeer = store.VerifyMethodPeer + // VerifyMethodKopia is kopia's own repository consistency check + // (`kopia snapshot verify`). + VerifyMethodKopia = store.VerifyMethodKopia + // VerifyMethodPresenceSize is the content-addressed push's check: + // rclone reported every transfer succeeded, and a follow-up listing + // confirmed each object and the manifest segment present at the + // expected size. Presence evidence is weaker than a content check + // (crypt remotes expose no hashes), so results carrying it stay + // unverified until the provider-checksum fingerprint pass lands. + VerifyMethodPresenceSize = store.VerifyMethodPresenceSize +) + +// VerifyResult is the typed durability report of one handler push: how +// the destination's copy was checked and what the tool counted. The +// verified flag is unexported, so a positive result can only be minted +// by the curated handlers in this package — that keeps durability +// reporting structurally separate from the hook mechanism, whose +// outcomes are exit-code-only by design. +type VerifyResult struct { + verified bool + // Method names the comparison that backed this push. + Method string + // SnapshotID identifies the snapshot for snapshot-based handlers. + SnapshotID string + // Files and Bytes are the counts the tool reported for this push. + Files int64 + Bytes int64 +} + +// Verified reports whether the destination's copy of this push was +// content-verified. +func (v VerifyResult) Verified() bool { return v.verified } + +// Tools bundles the configured external-tool wrappers the curated +// handlers drive. Rclone backs bucket and peer targets; Kopia backs +// kopia targets and is filled in by ToolsFor exactly when a pair needs +// it. +type Tools struct { + Rclone *Rclone + Kopia *Kopia +} + +// ToolsFor bundles the wrappers pairs need: the caller's configured +// rclone wrapper plus, when any pair targets a kopia destination, a +// kopia wrapper whose destination config files live next to the +// squirrel config — the same directory rclone.conf is managed in. +func ToolsFor(cfg *config.Config, pairs []Pair, rcl *Rclone) (Tools, error) { + tools := Tools{Rclone: rcl} + for _, p := range pairs { + if p.Destination != nil && p.Destination.Type == "kopia" { + kop, err := FindKopia(filepath.Dir(cfg.Path)) + if err != nil { + return Tools{}, err + } + tools.Kopia = kop + break + } + } + return tools, nil +} + +// Handler is the curated, type-determined driver for one (volume, +// target) pair. A handler owns the external tool invocation end to end +// — verb, safety flags, source/destination composition — so config can +// only supply declarative parameters, never alter the operation. +// +// The interface is sealed: every implementation lives in this package, +// which keeps the ability to produce a VerifyResult a curated-handler +// capability. +type Handler interface { + // TargetName names the destination or node for run rows and output. + TargetName() string + // Push transfers the pair's volume to its target, verifies the + // result, and records the runs row. The typed durability outcome + // lands on Report.Verification. + Push(ctx context.Context, opts Options) (Report, error) + + sealed() +} + +// HandlerFor returns the curated handler for p, chosen by the target's +// declared type. +func HandlerFor(s *store.Store, tools Tools, p Pair) (Handler, error) { + switch { + case p.IsNode(): + if tools.Rclone == nil { + return nil, fmt.Errorf("node %q: rclone wrapper is required", p.Node.Name) + } + return &peerHandler{store: s, rcl: tools.Rclone, vol: p.Volume, node: p.Node}, nil + case p.Destination == nil: + return nil, errors.New("pair names no destination or node") + case p.Destination.Type == "kopia": + if tools.Kopia == nil { + return nil, fmt.Errorf("destination %q: kopia wrapper is required (build Tools via ToolsFor)", p.Destination.Name) + } + return &kopiaHandler{store: s, kopia: tools.Kopia, vol: p.Volume, dest: p.Destination}, nil + case p.Destination.Layout == config.LayoutContentAddressed: + if tools.Rclone == nil { + return nil, fmt.Errorf("destination %q: rclone wrapper is required", p.Destination.Name) + } + return &contentAddressedHandler{store: s, rcl: tools.Rclone, vol: p.Volume, dest: p.Destination}, nil + default: + if tools.Rclone == nil { + return nil, fmt.Errorf("destination %q: rclone wrapper is required", p.Destination.Name) + } + return &rcloneHandler{store: s, rcl: tools.Rclone, vol: p.Volume, dest: p.Destination}, nil + } +} + +// rcloneHandler pushes to an rclone-backed bucket destination via Sync. +type rcloneHandler struct { + store *store.Store + rcl *Rclone + vol *config.Volume + dest *config.Destination +} + +func (h *rcloneHandler) TargetName() string { return h.dest.Name } + +func (h *rcloneHandler) Push(ctx context.Context, opts Options) (Report, error) { + return Sync(ctx, h.store, h.rcl, h.vol, h.dest, opts) +} + +func (h *rcloneHandler) sealed() {} + +// peerHandler pushes to a peer node via the SyncNode handshake. +type peerHandler struct { + store *store.Store + rcl *Rclone + vol *config.Volume + node *config.Node +} + +func (h *peerHandler) TargetName() string { return h.node.Name } + +func (h *peerHandler) Push(ctx context.Context, opts Options) (Report, error) { + return SyncNode(ctx, h.store, h.rcl, h.vol, h.node, opts) +} + +func (h *peerHandler) sealed() {} + +// finishHandlerRun writes a handler-driven run's terminal state from +// rep.Status, mirroring the rclone scaffold's finishRun contract: a +// FinishRun failure lands on rep.FinishErr so the caller surfaces it +// next to the push outcome. The kopia and content-addressed handlers +// share it; their file counts ride on rep.Verification.Files. +func finishHandlerRun(ctx context.Context, s *store.Store, rep *Report, runErr error) { + if rep.RunID == 0 { + return + } + errMsg := "" + if runErr != nil { + errMsg = runErr.Error() + } + if err := s.FinishRun(ctx, rep.RunID, rep.Status, errMsg, rep.Verification.Files); err != nil { + rep.FinishErr = err + } +} + +// rcloneVerification derives the typed durability report for one rclone +// bucket transfer: BLAKE3 end-to-end when the integrity flags were in +// force, rclone's size+mtime comparison otherwise. Only a fully +// successful BLAKE3 run counts as verified. +// +// A run that asked for BLAKE3 but hit rclone's "no hashes in common" +// fallback is downgraded to size+mtime here even though the flags were +// set and rclone exited 0: rclone silently compared by size, so the copy +// was not content-verified and must not advance the durability vector. +func rcloneVerification(dest *config.Destination, opts Options, rep *Report) VerifyResult { + v := VerifyResult{ + Method: VerifyMethodBlake3, + Files: rep.RcloneResult.Transferred + rep.RcloneResult.Checked, + Bytes: rep.RcloneResult.Bytes, + } + if EffectiveShallow(dest, opts.Shallow) || rep.RcloneResult.HashFallback { + v.Method = VerifyMethodSizeMtime + } + v.verified = v.Method == VerifyMethodBlake3 && rep.Status == store.RunStatusSuccess + return v +} + +// peerVerification derives the typed durability report for one node +// sync. The receiver re-hashes every delivered path with BLAKE3 during +// the handshake's verify phase, so a fully successful session is +// content-verified even when the rclone transfer itself ran shallow. +func peerVerification(rep *Report) VerifyResult { + return VerifyResult{ + verified: rep.Status == store.RunStatusSuccess, + Method: VerifyMethodPeer, + Files: int64(len(rep.NodeVerify.Matched)), + Bytes: rep.RcloneResult.Bytes, + } +} diff --git a/sync/handler_test.go b/sync/handler_test.go new file mode 100644 index 0000000..ba288a9 --- /dev/null +++ b/sync/handler_test.go @@ -0,0 +1,94 @@ +package sync + +import ( + "testing" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +func TestHandlerForDispatch(t *testing.T) { + vol := &config.Volume{Name: "pics", Path: "/tmp/pics"} + bucket := &config.Destination{Name: "scratch", Type: "local", Root: "/tmp/dst"} + node := &config.Node{Name: "nas"} + tools := Tools{Rclone: &Rclone{Binary: "rclone"}} + + h, err := HandlerFor(nil, tools, Pair{Volume: vol, Destination: bucket}) + if err != nil { + t.Fatalf("bucket pair: %v", err) + } + if _, ok := h.(*rcloneHandler); !ok || h.TargetName() != "scratch" { + t.Fatalf("bucket pair resolved to %T (%q), want *rcloneHandler scratch", h, h.TargetName()) + } + + h, err = HandlerFor(nil, tools, Pair{Volume: vol, Node: node}) + if err != nil { + t.Fatalf("node pair: %v", err) + } + if _, ok := h.(*peerHandler); !ok || h.TargetName() != "nas" { + t.Fatalf("node pair resolved to %T (%q), want *peerHandler nas", h, h.TargetName()) + } + + ca := &config.Destination{Name: "offsite", Type: "sftp", Root: "/data", Layout: config.LayoutContentAddressed} + h, err = HandlerFor(nil, tools, Pair{Volume: vol, Destination: ca}) + if err != nil { + t.Fatalf("content-addressed pair: %v", err) + } + if _, ok := h.(*contentAddressedHandler); !ok || h.TargetName() != "offsite" { + t.Fatalf("content-addressed pair resolved to %T (%q), want *contentAddressedHandler offsite", h, h.TargetName()) + } + + if _, err := HandlerFor(nil, tools, Pair{Volume: vol}); err == nil { + t.Fatalf("expected error for pair without a target") + } + if _, err := HandlerFor(nil, Tools{}, Pair{Volume: vol, Destination: bucket}); err == nil { + t.Fatalf("expected error for bucket pair without an rclone wrapper") + } +} + +func TestRcloneVerification(t *testing.T) { + plain := &config.Destination{Name: "d", Type: "sftp"} + crypt := &config.Destination{Name: "d", Type: "sftp", Crypt: &config.Crypt{Password: "x"}} + cases := []struct { + name string + dest *config.Destination + opts Options + status string + hashFallback bool + wantVerified bool + wantMethod string + }{ + {"checksum success", plain, Options{}, store.RunStatusSuccess, false, true, VerifyMethodBlake3}, + {"checksum partial", plain, Options{}, store.RunStatusPartial, false, false, VerifyMethodBlake3}, + {"shallow success", plain, Options{Shallow: true}, store.RunStatusSuccess, false, false, VerifyMethodSizeMtime}, + {"crypt forces shallow", crypt, Options{}, store.RunStatusSuccess, false, false, VerifyMethodSizeMtime}, + // rclone exited 0 with the integrity flags set, but reported the + // no-common-hash fallback: the copy was compared by size, so the + // result must be size+mtime and unverified. + {"hash fallback downgrades", plain, Options{}, store.RunStatusSuccess, true, false, VerifyMethodSizeMtime}, + } + for _, c := range cases { + rep := &Report{Status: c.status} + rep.RcloneResult = RunResult{Transferred: 2, Checked: 3, Bytes: 42, HashFallback: c.hashFallback} + v := rcloneVerification(c.dest, c.opts, rep) + if v.Verified() != c.wantVerified || v.Method != c.wantMethod { + t.Errorf("%s: verified=%t method=%q, want %t %q", c.name, v.Verified(), v.Method, c.wantVerified, c.wantMethod) + } + if v.Files != 5 || v.Bytes != 42 { + t.Errorf("%s: files=%d bytes=%d, want 5 42", c.name, v.Files, v.Bytes) + } + } +} + +func TestPeerVerification(t *testing.T) { + rep := &Report{Status: store.RunStatusSuccess} + rep.NodeVerify.Matched = []string{"a", "b"} + v := peerVerification(rep) + if !v.Verified() || v.Method != VerifyMethodPeer || v.Files != 2 { + t.Fatalf("verified=%t method=%q files=%d, want true %q 2", v.Verified(), v.Method, v.Files, VerifyMethodPeer) + } + rep.Status = store.RunStatusPartial + if peerVerification(rep).Verified() { + t.Fatalf("partial peer session must report unverified") + } +} diff --git a/sync/kopia.go b/sync/kopia.go new file mode 100644 index 0000000..26fe2e4 --- /dev/null +++ b/sync/kopia.go @@ -0,0 +1,284 @@ +package sync + +import ( + "bytes" + "context" + "encoding/json" + "fmt" + "os" + "os/exec" + "path/filepath" + "strconv" + "strings" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// Kopia is a configured kopia wrapper. Like rclone, kopia is treated as +// an opaque child process: squirrel owns the argv, points every +// invocation at a destination-scoped config file under ConfigDir +// (kopia-<destination>.config, sibling to rclone.conf), and hands the +// repository password to the child via KOPIA_PASSWORD in its +// environment — the password stays out of argv, logs, and error +// strings. The user's own kopia configuration is left untouched. +type Kopia struct { + Binary string + ConfigDir string +} + +// FindKopia locates the kopia binary on PATH and roots the wrapper's +// destination config files at configDir. +func FindKopia(configDir string) (*Kopia, error) { + bin, err := exec.LookPath("kopia") + if err != nil { + return nil, fmt.Errorf("kopia not found on PATH (required for kopia destinations): %w", err) + } + return &Kopia{Binary: bin, ConfigDir: configDir}, nil +} + +// configFile returns the per-destination kopia config file path. One +// file per destination because each kopia destination is its own +// repository. +func (k *Kopia) configFile(destName string) string { + return filepath.Join(k.ConfigDir, "kopia-"+destName+".config") +} + +// environWithout returns the process environment with every entry for +// key removed, so a single appended override is the one value the child +// sees regardless of what the parent shell exported. +func environWithout(key string) []string { + env := os.Environ() + out := env[:0] + prefix := key + "=" + for _, e := range env { + if !strings.HasPrefix(e, prefix) { + out = append(out, e) + } + } + return out +} + +// run executes one kopia subcommand against the given config file, +// returning captured stdout. Stderr is folded into the error on failure +// for diagnostics. +func (k *Kopia) run(ctx context.Context, cfgFile, password string, args ...string) ([]byte, error) { + full := append(append([]string(nil), args...), "--config-file", cfgFile) + cmd := exec.CommandContext(ctx, k.Binary, full...) + cmd.Env = append(environWithout("KOPIA_PASSWORD"), "KOPIA_PASSWORD="+password) + var stdout, stderr bytes.Buffer + cmd.Stdout = &stdout + cmd.Stderr = &stderr + if err := cmd.Run(); err != nil { + verb := strings.Join(args[:min(2, len(args))], " ") + if msg := strings.TrimSpace(stderr.String()); msg != "" { + return stdout.Bytes(), fmt.Errorf("kopia %s: %w: %s", verb, err, msg) + } + return stdout.Bytes(), fmt.Errorf("kopia %s: %w", verb, err) + } + return stdout.Bytes(), nil +} + +// ensureRepository connects the destination-scoped config file to the +// filesystem repository at repoPath. Connect runs on every push so a +// repository path changed in squirrel's config is re-pointed rather than +// silently snapshotting into the old one. --no-persist-credentials keeps +// the password scoped to each invocation's environment; kopia's default +// would write it to a sidecar file next to the config on keyring-less +// hosts. +// +// A connect failure creates the repository only when init is set. Without +// it, a failed connect is an error: creating on every connect failure +// would mint a fresh, empty repository on a transient outage or a +// mistyped path, and the destination's durability vector — monotonic and +// without a CLI to rewind — would keep claiming coverage the new +// repository cannot honour. init mirrors the --init gate the local +// destination marker uses for first-use bootstrap. +func (k *Kopia) ensureRepository(ctx context.Context, cfgFile, password, repoPath string, init bool) error { + _, connectErr := k.run(ctx, cfgFile, password, "repository", "connect", "filesystem", "--path", repoPath, "--no-persist-credentials") + if connectErr == nil { + return nil + } + if !init { + return fmt.Errorf("kopia repository at %s: connect failed (%w) — re-run with --init to create a new repository (refusing to auto-create in case the path is wrong or the destination is temporarily unreachable)", repoPath, connectErr) + } + if _, createErr := k.run(ctx, cfgFile, password, "repository", "create", "filesystem", "--path", repoPath, "--no-persist-credentials"); createErr != nil { + return fmt.Errorf("kopia repository at %s: connect failed (%w); create failed: %w", repoPath, connectErr, createErr) + } + return nil +} + +// kopiaSnapshot is the subset of the manifest `kopia snapshot create +// --json` prints that squirrel reports on: the manifest id plus the +// root directory summary's counts. Field names follow kopia's JSON +// casing exactly. +type kopiaSnapshot struct { + ID string `json:"id"` + RootEntry struct { + Summary struct { + Size int64 `json:"size"` + Files int64 `json:"files"` + FatalErrors int64 `json:"numFailed"` + IgnoredErrors int64 `json:"numIgnoredErrors"` + } `json:"summ"` + } `json:"rootEntry"` +} + +// snapshotCreate snapshots sourcePath into the connected repository and +// parses the resulting manifest. +func (k *Kopia) snapshotCreate(ctx context.Context, cfgFile, password, sourcePath string) (kopiaSnapshot, error) { + out, err := k.run(ctx, cfgFile, password, "snapshot", "create", sourcePath, "--json") + if err != nil { + return kopiaSnapshot{}, err + } + var snap kopiaSnapshot + if err := json.Unmarshal(bytes.TrimSpace(out), &snap); err != nil { + return kopiaSnapshot{}, fmt.Errorf("parse kopia snapshot manifest: %w", err) + } + if snap.ID == "" { + return kopiaSnapshot{}, fmt.Errorf("kopia snapshot manifest carries no id: %q", bytes.TrimSpace(out)) + } + return snap, nil +} + +// DefaultVerifyFilesPercent is the fraction of snapshot file bytes +// `kopia snapshot verify` reads back when a kopia destination does not +// configure verify_files_percent. kopia's own default is 0 — manifest +// and object-existence only, no file bytes — which would let a kopia +// component gate offload on a check that read none of the content. A +// non-zero default makes every kopia advance rest on a real, if +// sampled, content read. +const DefaultVerifyFilesPercent = 10 + +// snapshotVerify runs kopia's own consistency check, scoped to the given +// snapshot manifest id, reading back verifyFilesPercent of file bytes so +// the verification covers real content rather than object existence +// alone. +func (k *Kopia) snapshotVerify(ctx context.Context, cfgFile, password, snapshotID string, verifyFilesPercent float64) error { + _, err := k.run(ctx, cfgFile, password, "snapshot", "verify", + "--verify-files-percent", strconv.FormatFloat(verifyFilesPercent, 'f', -1, 64), snapshotID) + return err +} + +// kopiaHandler pushes a volume into a kopia repository: connect (or +// first-use create), `kopia snapshot create <volume path>`, then +// `kopia snapshot verify`. The runs row matches the other sync targets +// (kind='sync', destination=name); shallow is always false because +// kopia verifies its own content hashes. +type kopiaHandler struct { + store *store.Store + kopia *Kopia + vol *config.Volume + dest *config.Destination +} + +func (h *kopiaHandler) TargetName() string { return h.dest.Name } + +func (h *kopiaHandler) Push(ctx context.Context, opts Options) (Report, error) { + rep := Report{Volume: h.vol.Name, Destination: h.dest.Name} + // Stamped up front so output renderers key kopia formatting off the + // method even when the push fails before a snapshot exists. + rep.Verification.Method = VerifyMethodKopia + volID, err := requireIndexedVolume(ctx, h.store, h.vol) + if err != nil { + return rep, err + } + if opts.DryRun { + return rep, fmt.Errorf("destination %q: kopia has no dry-run mode — run without --dry-run", h.dest.Name) + } + runID, err := beginSyncRunGuarded(ctx, h.store, false, store.SyncRunSpec{ + VolumeID: volID, + Destination: h.dest.Name, + }, h.vol.Name) + if err != nil { + return rep, err + } + rep.RunID = runID + if opts.OnRunID != nil { + opts.OnRunID(runID) + } + + // Captured before the snapshot walk so RunPair advances the vector + // over the indexed present set this push was scoped to, not whatever + // kopia's independent live walk happened to include. + if rep.durabilityAdvance, err = captureDurabilityAdvance(ctx, h.store, volID); err != nil { + rep.Status = store.RunStatusFailed + finishHandlerRun(ctx, h.store, &rep, err) + return rep, err + } + + err = h.snapshotAndVerify(ctx, &rep, opts.Init) + finishHandlerRun(ctx, h.store, &rep, err) + // Local index snapshot only: the repository is kopia's own format, + // so the rclone ride-along stays out of it (dest=nil, mirroring the + // peer flow). + opts.Snapshot.afterSync(ctx, &rep, h.vol, nil) + return rep, err +} + +func (h *kopiaHandler) sealed() {} + +// kopiaVerifyFilesPercent resolves the destination's verify_files_percent +// param, falling back to DefaultVerifyFilesPercent when unset. The value +// is a percentage in (0, 100]; a malformed, out-of-range, or zero value +// is a configuration error rather than a silent fallback, since it +// governs how much content a gating verification actually reads. Zero is +// rejected explicitly: kopia accepts it (verify manifests and object +// existence, read no file bytes) but a kopia component gates offload as +// content-verified, so a zero-byte verify would let the gate delete the +// only local copy on the strength of a check that read none of the +// content. +func kopiaVerifyFilesPercent(dest *config.Destination) (float64, error) { + raw, ok := dest.Params["verify_files_percent"] + if !ok || raw == "" { + return DefaultVerifyFilesPercent, nil + } + pct, err := strconv.ParseFloat(raw, 64) + if err != nil { + return 0, fmt.Errorf("destination %q: verify_files_percent %q is not a number", dest.Name, raw) + } + if pct <= 0 || pct > 100 { + return 0, fmt.Errorf("destination %q: verify_files_percent %v is outside (0, 100] — a kopia verify that gates offload must read a non-zero fraction of file bytes", dest.Name, pct) + } + return pct, nil +} + +// snapshotAndVerify drives the kopia binary and derives rep.Status and +// rep.Verification. Status starts failed and is promoted: success for a +// clean verified snapshot, partial when the snapshot landed with +// per-file errors kopia tolerated. Verified is reserved for the clean +// path — a snapshot with skipped files is durable but incomplete. init +// authorises first-use repository creation on a connect failure. +func (h *kopiaHandler) snapshotAndVerify(ctx context.Context, rep *Report, init bool) error { + rep.Status = store.RunStatusFailed + cfgFile := h.kopia.configFile(h.dest.Name) + password := h.dest.Params["password"] + if err := h.kopia.ensureRepository(ctx, cfgFile, password, h.dest.Root, init); err != nil { + return err + } + verifyFilesPercent, err := kopiaVerifyFilesPercent(h.dest) + if err != nil { + return err + } + snap, err := h.kopia.snapshotCreate(ctx, cfgFile, password, h.vol.Path) + if err != nil { + return err + } + summ := snap.RootEntry.Summary + rep.Verification = VerifyResult{ + Method: VerifyMethodKopia, + SnapshotID: snap.ID, + Files: summ.Files, + Bytes: summ.Size, + } + if err := h.kopia.snapshotVerify(ctx, cfgFile, password, snap.ID, verifyFilesPercent); err != nil { + return err + } + if summ.FatalErrors+summ.IgnoredErrors > 0 { + rep.Status = store.RunStatusPartial + return nil + } + rep.Status = store.RunStatusSuccess + rep.Verification.verified = true + return nil +} diff --git a/sync/kopia_test.go b/sync/kopia_test.go new file mode 100644 index 0000000..e8e838d --- /dev/null +++ b/sync/kopia_test.go @@ -0,0 +1,610 @@ +package sync + +import ( + "context" + "errors" + "os" + "os/exec" + "path/filepath" + "runtime" + "strings" + "testing" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/index" + "github.com/mbertschler/squirrel/store" +) + +// fakeKopiaScript is the PATH-shim stand-in for the kopia binary. It +// appends one argv line and one env line per invocation to +// $KOPIA_FAKE_LOG, then plays back behaviour keyed on the subcommand +// via KOPIA_FAKE_*_EXIT variables, so tests can assert exactly what +// squirrel asked kopia to do without a real repository. +const fakeKopiaScript = `#!/bin/sh +{ + printf 'argv:' + for a in "$@"; do printf ' %s' "$a"; done + printf '\n' + printf 'env:KOPIA_PASSWORD=%s\n' "$KOPIA_PASSWORD" +} >> "$KOPIA_FAKE_LOG" +case "$1 $2" in +"repository connect") exit "${KOPIA_FAKE_CONNECT_EXIT:-0}" ;; +"repository create") exit "${KOPIA_FAKE_CREATE_EXIT:-0}" ;; +"snapshot create") + if [ "${KOPIA_FAKE_SNAPSHOT_EXIT:-0}" != 0 ]; then + echo "fake snapshot failure" >&2 + exit "${KOPIA_FAKE_SNAPSHOT_EXIT}" + fi + cat "$KOPIA_FAKE_SNAPSHOT_JSON" + ;; +"snapshot verify") + if [ "${KOPIA_FAKE_VERIFY_EXIT:-0}" != 0 ]; then + echo "fake verify failure" >&2 + exit "${KOPIA_FAKE_VERIFY_EXIT}" + fi + ;; +*) echo "unexpected kopia subcommand: $*" >&2; exit 64 ;; +esac +` + +// fakeSnapshotJSON mirrors the manifest shape `kopia snapshot create +// --json` prints (trimmed to the fields squirrel reads, captured from +// kopia 0.23). +const fakeSnapshotJSON = `{"id":"snap123","source":{"host":"h","userName":"u","path":"/v"},` + + `"rootEntry":{"name":"src","type":"d","obj":"k1","summ":{"size":1234,"files":3,"dirs":1,"numFailed":0}}}` + "\n" + +// installFakeKopia puts a fake kopia shim at the head of PATH and +// returns the log file it records invocations into. Behaviour knobs are +// plain env vars so individual tests tune them with t.Setenv. +func installFakeKopia(t *testing.T) (logPath string) { + t.Helper() + if runtime.GOOS == "windows" { + t.Skip("fake kopia shim is a POSIX shell script") + } + dir := t.TempDir() + if err := os.WriteFile(filepath.Join(dir, "kopia"), []byte(fakeKopiaScript), 0o755); err != nil { + t.Fatalf("write fake kopia: %v", err) + } + jsonPath := filepath.Join(dir, "snapshot.json") + if err := os.WriteFile(jsonPath, []byte(fakeSnapshotJSON), 0o644); err != nil { + t.Fatalf("write fake snapshot json: %v", err) + } + logPath = filepath.Join(dir, "calls.log") + t.Setenv("PATH", dir+string(os.PathListSeparator)+os.Getenv("PATH")) + t.Setenv("KOPIA_FAKE_LOG", logPath) + t.Setenv("KOPIA_FAKE_SNAPSHOT_JSON", jsonPath) + return logPath +} + +// kopiaFixture is the kopia analogue of syncFixture: a store, a config +// with one volume syncing to one kopia destination, and the Tools built +// the way the CLI builds them. No rclone involved. +type kopiaFixture struct { + store *store.Store + cfg *config.Config + tools Tools + pair Pair +} + +func setupKopiaFixture(t *testing.T) *kopiaFixture { + t.Helper() + root := t.TempDir() + volPath := filepath.Join(root, "src") + if err := os.MkdirAll(volPath, 0o755); err != nil { + t.Fatalf("mkdir src: %v", err) + } + if err := os.WriteFile(filepath.Join(volPath, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + + dbPath := filepath.Join(root, "test.db") + s, err := store.Open(dbPath) + if err != nil { + t.Fatalf("store.Open: %v", err) + } + t.Cleanup(func() { s.Close() }) + + cfgPath := filepath.Join(root, "config.toml") + cfgBody := "[destinations.mirror]\ntype = \"kopia\"\nroot = \"" + filepath.Join(root, "repo") + "\"\npassword = \"hunter2\"\n\n" + + "[volumes.pics]\npath = \"" + volPath + "\"\nsync_to = [\"mirror\"]\n" + if err := os.WriteFile(cfgPath, []byte(cfgBody), 0o600); err != nil { + t.Fatalf("write config: %v", err) + } + cfg, err := config.Load(cfgPath) + if err != nil { + t.Fatalf("config.Load: %v", err) + } + + pairs, err := PairsFor(cfg, "", "") + if err != nil { + t.Fatalf("PairsFor: %v", err) + } + if len(pairs) != 1 || pairs[0].Destination == nil || pairs[0].Destination.Type != "kopia" { + t.Fatalf("pairs = %+v, want one kopia pair", pairs) + } + tools, err := ToolsFor(cfg, pairs, nil) + if err != nil { + t.Fatalf("ToolsFor: %v", err) + } + if tools.Kopia == nil { + t.Fatalf("ToolsFor left Kopia nil for a kopia pair") + } + + if _, err := index.Index(context.Background(), s, volPath, index.Options{Name: "pics"}); err != nil { + t.Fatalf("index.Index: %v", err) + } + return &kopiaFixture{store: s, cfg: cfg, tools: tools, pair: pairs[0]} +} + +// readCallLog splits the fake binary's log into argv lines and env +// lines for assertion. +func readCallLog(t *testing.T, logPath string) (argv, env []string) { + t.Helper() + data, err := os.ReadFile(logPath) + if err != nil { + t.Fatalf("read fake kopia log: %v", err) + } + for _, line := range strings.Split(strings.TrimSpace(string(data)), "\n") { + switch { + case strings.HasPrefix(line, "argv: "): + argv = append(argv, strings.TrimPrefix(line, "argv: ")) + case strings.HasPrefix(line, "env:"): + env = append(env, strings.TrimPrefix(line, "env:")) + } + } + return argv, env +} + +func TestKopiaPushHappyPath(t *testing.T) { + logPath := installFakeKopia(t) + // A stale parent-shell export must not shadow the configured + // password: the wrapper strips it before appending its own. + t.Setenv("KOPIA_PASSWORD", "stale-parent-value") + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if rep.Status != store.RunStatusSuccess { + t.Fatalf("Status = %q, want success", rep.Status) + } + if !rep.Verification.Verified() || rep.Verification.Method != VerifyMethodKopia { + t.Fatalf("Verification = %+v, want verified kopia result", rep.Verification) + } + if rep.Verification.SnapshotID != "snap123" || rep.Verification.Files != 3 || rep.Verification.Bytes != 1234 { + t.Fatalf("Verification = %+v, want snap123 / 3 files / 1234 bytes", rep.Verification) + } + + run, err := f.store.GetRun(context.Background(), rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Kind != store.RunKindSync || !run.Destination.Valid || run.Destination.String != "mirror" { + t.Fatalf("run = %+v, want kind=sync destination=mirror", run) + } + if run.Status != store.RunStatusSuccess || run.FileCount != 3 { + t.Fatalf("run = %+v, want success with file_count=3", run) + } + if !run.Shallow.Valid || run.Shallow.Bool { + t.Fatalf("run.Shallow = %+v, want false (kopia verifies its own hashes)", run.Shallow) + } + + argv, env := readCallLog(t, logPath) + cfgFile := filepath.Join(filepath.Dir(f.cfg.Path), "kopia-mirror.config") + repo := f.pair.Destination.Root + wantArgv := []string{ + "repository connect filesystem --path " + repo + " --no-persist-credentials --config-file " + cfgFile, + "snapshot create " + f.pair.Volume.Path + " --json --config-file " + cfgFile, + "snapshot verify --verify-files-percent 10 snap123 --config-file " + cfgFile, + } + if len(argv) != len(wantArgv) { + t.Fatalf("argv lines = %q, want %q", argv, wantArgv) + } + for i := range wantArgv { + if argv[i] != wantArgv[i] { + t.Fatalf("argv[%d] = %q, want %q", i, argv[i], wantArgv[i]) + } + if strings.Contains(argv[i], "hunter2") { + t.Fatalf("argv[%d] leaks the repository password: %q", i, argv[i]) + } + } + for i, e := range env { + if e != "KOPIA_PASSWORD=hunter2" { + t.Fatalf("env[%d] = %q, want the password via KOPIA_PASSWORD", i, e) + } + } +} + +func TestKopiaConnectFallsBackToCreate(t *testing.T) { + logPath := installFakeKopia(t) + t.Setenv("KOPIA_FAKE_CONNECT_EXIT", "1") + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{Init: true}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + argv, _ := readCallLog(t, logPath) + if len(argv) != 4 || !strings.HasPrefix(argv[1], "repository create filesystem --path ") { + t.Fatalf("argv = %q, want connect, create, snapshot create, snapshot verify", argv) + } +} + +// TestKopiaConnectFailWithoutInitRefuses: a connect failure without --init +// is an error, not a silent re-create (#114). Auto-creating on every +// connect failure would mint a fresh empty repository on a transient +// outage or a mistyped path while the monotonic durability vector keeps +// claiming coverage the new repository cannot honour. +func TestKopiaConnectFailWithoutInitRefuses(t *testing.T) { + logPath := installFakeKopia(t) + t.Setenv("KOPIA_FAKE_CONNECT_EXIT", "1") + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err == nil { + t.Fatalf("expected connect-fail error without --init, got rep=%+v", rep) + } + if !strings.Contains(err.Error(), "--init") { + t.Fatalf("error should point at --init, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + argv, _ := readCallLog(t, logPath) + for _, line := range argv { + if strings.HasPrefix(line, "repository create") { + t.Fatalf("repository was created despite no --init: argv = %q", argv) + } + } +} + +func TestKopiaCreateFailureRecordsFailedRun(t *testing.T) { + installFakeKopia(t) + t.Setenv("KOPIA_FAKE_CONNECT_EXIT", "1") + t.Setenv("KOPIA_FAKE_CREATE_EXIT", "1") + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{Init: true}) + if err == nil { + t.Fatalf("expected error, got rep=%+v", rep) + } + if strings.Contains(err.Error(), "hunter2") { + t.Fatalf("error leaks the repository password: %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + if rep.Verification.Method != VerifyMethodKopia { + t.Fatalf("Method = %q on early failure, want %q so output renders kopia-shaped", rep.Verification.Method, VerifyMethodKopia) + } + run, getErr := f.store.GetRun(context.Background(), rep.RunID) + if getErr != nil { + t.Fatalf("GetRun: %v", getErr) + } + if run.Status != store.RunStatusFailed || !run.Error.Valid || run.Error.String == "" { + t.Fatalf("run = %+v, want failed with an error message", run) + } + if strings.Contains(run.Error.String, "hunter2") { + t.Fatalf("runs row leaks the repository password: %q", run.Error.String) + } +} + +func TestKopiaSnapshotWithFailedFilesIsPartial(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + partial := `{"id":"snap123","rootEntry":{"summ":{"size":1000,"files":2,"numFailed":1}}}` + if err := os.WriteFile(os.Getenv("KOPIA_FAKE_SNAPSHOT_JSON"), []byte(partial), 0o644); err != nil { + t.Fatal(err) + } + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if rep.Status != store.RunStatusPartial { + t.Fatalf("Status = %q, want partial", rep.Status) + } + if rep.Verification.Verified() { + t.Fatalf("a snapshot with failed files must stay unverified") + } + run, err := f.store.GetRun(context.Background(), rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusPartial { + t.Fatalf("run status = %q, want partial", run.Status) + } +} + +func TestKopiaVerifyFailureFailsRun(t *testing.T) { + installFakeKopia(t) + t.Setenv("KOPIA_FAKE_VERIFY_EXIT", "1") + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err == nil || !strings.Contains(err.Error(), "snapshot verify") { + t.Fatalf("expected snapshot-verify error, got %v", err) + } + if rep.Status != store.RunStatusFailed { + t.Fatalf("Status = %q, want failed", rep.Status) + } + if rep.Verification.Verified() { + t.Fatalf("verification must stay unverified when kopia's verify fails") + } + if rep.Verification.SnapshotID != "snap123" { + t.Fatalf("SnapshotID = %q, want the created snapshot for forensics", rep.Verification.SnapshotID) + } +} + +func TestKopiaDryRunRefused(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{DryRun: true}) + if err == nil || !strings.Contains(err.Error(), "dry-run") { + t.Fatalf("expected dry-run refusal, got err=%v rep=%+v", err, rep) + } + runs, _ := f.store.ListRuns(context.Background(), store.ListRunsOpts{}) + for _, r := range runs { + if r.Kind == store.RunKindSync { + t.Fatalf("dry-run wrote a sync runs row: %+v", r) + } + } +} + +func TestKopiaRequiresIndexedVolume(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + // A second volume in config but never indexed. + vol := &config.Volume{Name: "fresh", Path: t.TempDir()} + pair := Pair{Volume: vol, Destination: f.pair.Destination} + + _, err := RunPair(context.Background(), f.store, f.tools, pair, Options{}) + if err == nil || !strings.Contains(err.Error(), "never been indexed") { + t.Fatalf("expected unindexed-volume refusal, got %v", err) + } +} + +func TestRestoreRefusesKopiaDestination(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + _, err := Restore(context.Background(), f.store, nil, f.pair.Volume, f.pair.Destination, RestoreOptions{}) + if err == nil || !strings.Contains(err.Error(), "kopia") { + t.Fatalf("expected kopia-restore refusal, got %v", err) + } +} + +// TestKopiaIntegrationRealBinary exercises the full +// connect→create→snapshot→verify cycle against a real kopia repository +// in a temp directory. Runs only where the kopia binary is installed. +func TestKopiaIntegrationRealBinary(t *testing.T) { + if _, err := exec.LookPath("kopia"); err != nil { + t.Skip("kopia not on PATH; install kopia to run this test") + } + f := setupKopiaFixture(t) + + // First push bootstraps a fresh repository, so it needs --init; the + // gate now refuses a silent re-create on connect failure. + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{Init: true}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if rep.Status != store.RunStatusSuccess || !rep.Verification.Verified() { + t.Fatalf("rep = %+v, want verified success", rep) + } + if rep.Verification.SnapshotID == "" || rep.Verification.Files < 1 { + t.Fatalf("Verification = %+v, want a snapshot id and file count", rep.Verification) + } + cfgFile := filepath.Join(filepath.Dir(f.cfg.Path), "kopia-mirror.config") + if _, err := os.Stat(cfgFile + ".kopia-password"); !errors.Is(err, os.ErrNotExist) { + t.Fatalf("kopia persisted the repository password to %s (stat err=%v); --no-persist-credentials must prevent the sidecar", cfgFile+".kopia-password", err) + } + + // Second push re-connects and snapshots again without error. + rep2, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err != nil { + t.Fatalf("second RunPair: %v (rep=%+v)", err, rep2) + } + if rep2.Status != store.RunStatusSuccess { + t.Fatalf("second push status = %q, want success", rep2.Status) + } +} + +// setupKopiaFixtureWithPercent is setupKopiaFixture with an explicit +// verify_files_percent on the kopia destination, for the #108 +// configurable-depth test. +func setupKopiaFixtureWithPercent(t *testing.T, percent string) *kopiaFixture { + t.Helper() + root := t.TempDir() + volPath := filepath.Join(root, "src") + if err := os.MkdirAll(volPath, 0o755); err != nil { + t.Fatalf("mkdir src: %v", err) + } + if err := os.WriteFile(filepath.Join(volPath, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + dbPath := filepath.Join(root, "test.db") + s, err := store.Open(dbPath) + if err != nil { + t.Fatalf("store.Open: %v", err) + } + t.Cleanup(func() { s.Close() }) + + cfgPath := filepath.Join(root, "config.toml") + cfgBody := "[destinations.mirror]\ntype = \"kopia\"\nroot = \"" + filepath.Join(root, "repo") + + "\"\npassword = \"hunter2\"\nverify_files_percent = \"" + percent + "\"\n\n" + + "[volumes.pics]\npath = \"" + volPath + "\"\nsync_to = [\"mirror\"]\n" + if err := os.WriteFile(cfgPath, []byte(cfgBody), 0o600); err != nil { + t.Fatalf("write config: %v", err) + } + cfg, err := config.Load(cfgPath) + if err != nil { + t.Fatalf("config.Load: %v", err) + } + pairs, err := PairsFor(cfg, "", "") + if err != nil { + t.Fatalf("PairsFor: %v", err) + } + tools, err := ToolsFor(cfg, pairs, nil) + if err != nil { + t.Fatalf("ToolsFor: %v", err) + } + if _, err := index.Index(context.Background(), s, volPath, index.Options{Name: "pics"}); err != nil { + t.Fatalf("index.Index: %v", err) + } + return &kopiaFixture{store: s, cfg: cfg, tools: tools, pair: pairs[0]} +} + +// TestKopiaVerifyFilesPercentConfigurable is the #108 depth fix: the +// destination's verify_files_percent flows into the snapshot verify argv +// so the verification reads a configured fraction of file bytes rather +// than kopia's bytes-free default. +func TestKopiaVerifyFilesPercentConfigurable(t *testing.T) { + logPath := installFakeKopia(t) + f := setupKopiaFixtureWithPercent(t, "42.5") + + if _, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}); err != nil { + t.Fatalf("RunPair: %v", err) + } + argv, _ := readCallLog(t, logPath) + var verify string + for _, line := range argv { + if strings.HasPrefix(line, "snapshot verify") { + verify = line + } + } + if !strings.Contains(verify, "--verify-files-percent 42.5") { + t.Fatalf("verify argv = %q, want --verify-files-percent 42.5", verify) + } +} + +// TestKopiaVerifyFilesPercentDefault: an unset verify_files_percent uses +// the non-zero default, so a kopia advance never rests on a zero-byte +// verification. +func TestKopiaVerifyFilesPercentDefault(t *testing.T) { + logPath := installFakeKopia(t) + f := setupKopiaFixture(t) + + if _, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}); err != nil { + t.Fatalf("RunPair: %v", err) + } + argv, _ := readCallLog(t, logPath) + var verify string + for _, line := range argv { + if strings.HasPrefix(line, "snapshot verify") { + verify = line + } + } + if !strings.Contains(verify, "--verify-files-percent 10") { + t.Fatalf("verify argv = %q, want the non-zero default --verify-files-percent 10", verify) + } +} + +// TestKopiaVerifyFilesPercentRejectsZero pins the chosen finding-2 +// behavior: verify_files_percent = "0" is a configuration error, not an +// accepted value. kopia accepts 0 (verify manifests and object existence, +// read no file bytes), but a kopia component gates offload as +// content-verified, so a zero-byte verify would let the gate delete the +// only local copy on the strength of a check that read none of the +// content. Negative and out-of-range values are rejected the same way; a +// positive value passes. +func TestKopiaVerifyFilesPercentRejectsZero(t *testing.T) { + for _, tc := range []struct { + raw string + wantErr bool + }{ + {"0", true}, + {"0.0", true}, + {"-1", true}, + {"100.5", true}, + {"not-a-number", true}, + {"0.5", false}, + {"100", false}, + } { + t.Run(tc.raw, func(t *testing.T) { + dest := &config.Destination{ + Name: "mirror", + Params: map[string]string{"verify_files_percent": tc.raw}, + } + _, err := kopiaVerifyFilesPercent(dest) + if tc.wantErr && err == nil { + t.Fatalf("verify_files_percent %q: want error, got nil", tc.raw) + } + if !tc.wantErr && err != nil { + t.Fatalf("verify_files_percent %q: unexpected error: %v", tc.raw, err) + } + }) + } +} + +// TestKopiaPushFailsOnZeroVerifyPercent: end-to-end, a kopia push with +// verify_files_percent = "0" fails rather than landing a content-verified +// advance off a zero-byte verification. +func TestKopiaPushFailsOnZeroVerifyPercent(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixtureWithPercent(t, "0") + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err == nil { + t.Fatalf("RunPair: want error for verify_files_percent 0, got nil (status %v)", rep.Status) + } + if !strings.Contains(err.Error(), "verify_files_percent") { + t.Fatalf("error = %v, want it to name verify_files_percent", err) + } + vector, verr := f.store.ListDestinationRunIDs(context.Background(), volIDForKopia(t, f), "mirror") + if verr != nil { + t.Fatalf("ListDestinationRunIDs: %v", verr) + } + if len(vector) != 0 { + t.Fatalf("vector = %+v, want no advance on a rejected verify percent", vector) + } +} + +// volIDForKopia resolves the kopia fixture volume's id. +func volIDForKopia(t *testing.T, f *kopiaFixture) int64 { + t.Helper() + v, err := f.store.GetVolumeByName(context.Background(), "pics") + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + return v.ID +} + +// TestKopiaAdvanceScopedToCapturedPresentSet is the #108 scope fix: the +// kopia push advances the vector to the present-set snapshot captured at +// push start, recorded with the kopia-verify method — not whatever +// kopia's independent live walk happened to include. +func TestKopiaAdvanceScopedToCapturedPresentSet(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + + if _, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}); err != nil { + t.Fatalf("RunPair: %v", err) + } + v, err := f.store.GetVolumeByName(context.Background(), "pics") + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + self, err := f.store.GetSelfNode(context.Background()) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + snapshot, err := f.store.PresentOriginMaxima(context.Background(), v.ID, self.ID) + if err != nil { + t.Fatalf("PresentOriginMaxima: %v", err) + } + vector, err := f.store.ListDestinationRunIDs(context.Background(), v.ID, "mirror") + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + if len(vector) != len(snapshot) || len(vector) != 1 { + t.Fatalf("vector = %+v, want one component matching the captured snapshot %+v", vector, snapshot) + } + if vector[0].OriginNodeID != snapshot[0].OriginNodeID || vector[0].OriginRunID != snapshot[0].OriginRunID { + t.Fatalf("vector component %+v != captured snapshot %+v", vector[0], snapshot[0]) + } + if vector[0].VerifyMethod != store.VerifyMethodKopia { + t.Fatalf("verify method = %q, want %q", vector[0].VerifyMethod, store.VerifyMethodKopia) + } +} diff --git a/sync/node.go b/sync/node.go index 75b3527..ba92d79 100644 --- a/sync/node.go +++ b/sync/node.go @@ -45,6 +45,9 @@ func SyncNode(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volu return rep, err } err = runNodeSession(ctx, s, rcl, vol, volID, node, opts, &rep) + if !opts.DryRun { + rep.Verification = peerVerification(&rep) + } // runNodeSession's deferred finishRun has committed the run's // terminal state by now, so the snapshot reflects this run's own row. // Peer-sync takes the local snapshot only — there is no ride-along to @@ -115,6 +118,90 @@ type nodeSyncDriver struct { // /begin. Defaults to ProtocolVersionFlat so a missing field in // the receiver's response (older agent) keeps today's behaviour. protocolVersion int + // selfNodeName caches the self-row's name for origin + // materialisation; filled lazily by selfName. + selfNodeName string + // originNodeNames caches local node id → name lookups so a plan + // full of same-origin entries resolves each origin node once. + originNodeNames map[int64]string + // durabilityAdvance is the present-set origin maxima captured before + // the transfer. phaseClose advances the peer's durability vector to + // exactly this snapshot, so a row committed between enumeration and + // close is never claimed durable — matching the bucket, content- + // addressed, and kopia handlers. + durabilityAdvance []store.OriginComponent +} + +// selfName returns this node's name — the identity locally-introduced +// content travels under. Cached after the first lookup. +func (d *nodeSyncDriver) selfName() (string, error) { + if d.selfNodeName != "" { + return d.selfNodeName, nil + } + self, err := d.store.GetSelfNode(d.ctx) + if err != nil { + return "", fmt.Errorf("look up self node: %w", err) + } + d.selfNodeName = self.Name + return d.selfNodeName, nil +} + +// entryOrigin materialises one row's content-origin coordinate for the +// wire. Content with a recorded origin forwards it verbatim — origin +// node id resolved to its name (names are the cross-node identity; +// local ids differ per node), run id untranslated. Locally-introduced +// content (origin NULLs, or the degraded partial-NULL state) is +// materialised as (this node's name, the content's introduction run in +// this volume). +func (d *nodeSyncDriver) entryOrigin(row store.FileRow) (string, int64, error) { + if !row.OriginNodeID.Valid || !row.OriginRunID.Valid { + name, err := d.selfName() + if err != nil { + return "", 0, err + } + intro, err := d.store.ContentIntroductionRunID(d.ctx, d.volID, row.ContentID) + if err != nil { + return "", 0, fmt.Errorf("introduction run for %s: %w", row.Path, err) + } + return name, intro, nil + } + name, err := d.originNodeName(row.OriginNodeID.Int64) + if err != nil { + return "", 0, err + } + return name, row.OriginRunID.Int64, nil +} + +func (d *nodeSyncDriver) originNodeName(nodeID int64) (string, error) { + if name, ok := d.originNodeNames[nodeID]; ok { + return name, nil + } + node, err := d.store.GetNodeByID(d.ctx, nodeID) + if err != nil { + return "", fmt.Errorf("resolve origin node %d: %w", nodeID, err) + } + if d.originNodeNames == nil { + d.originNodeNames = make(map[int64]string) + } + d.originNodeNames[nodeID] = node.Name + return node.Name, nil +} + +// indexEntryForRow converts one local index row to its wire form, +// attaching the materialised content origin. +func (d *nodeSyncDriver) indexEntryForRow(row store.FileRow) (syncproto.IndexEntry, error) { + originNode, originRun, err := d.entryOrigin(row) + if err != nil { + return syncproto.IndexEntry{}, err + } + return syncproto.IndexEntry{ + Path: row.Path, + Blake3Hex: hex.EncodeToString(row.Blake3), + SizeBytes: row.SizeBytes, + MtimeNs: row.MtimeNs, + OriginNode: originNode, + OriginRun: originRun, + }, nil } func (d *nodeSyncDriver) run() error { @@ -130,6 +217,13 @@ func (d *nodeSyncDriver) run() error { // the original path is empty, so rclone treats the entry like a // fresh transfer. d.report.NodeConflicts = plan.Conflicts + if !d.opts.DryRun { + advance, err := captureDurabilityAdvance(d.ctx, d.store, d.volID) + if err != nil { + return d.abortWithError("capture durability advance", err) + } + d.durabilityAdvance = advance + } if err := d.phaseTransfer(plan); err != nil { return d.abortWithError("transfer", err) } @@ -150,7 +244,7 @@ func (d *nodeSyncDriver) run() error { // invocations against the same (volume, peer) can't both proceed — // the loser surfaces the same diagnostic the bucket path does. func (d *nodeSyncDriver) phaseBegin() error { - peer, err := d.store.GetOrCreatePeerNode(d.ctx, d.node.Name, d.node.Endpoint.String()) + peer, err := d.store.GetOrCreatePeerNode(d.ctx, d.node.Name, d.node.Endpoint.String(), true) if err != nil { return fmt.Errorf("record peer node: %w", err) } @@ -386,12 +480,11 @@ func (d *nodeSyncDriver) entriesFromFolders(folderIDs []int64) ([]syncproto.Inde if isReservedSyncPath(row.Path) { continue } - entries = append(entries, syncproto.IndexEntry{ - Path: row.Path, - Blake3Hex: hex.EncodeToString(row.Blake3), - SizeBytes: row.SizeBytes, - MtimeNs: row.MtimeNs, - }) + entry, err := d.indexEntryForRow(row) + if err != nil { + return nil, err + } + entries = append(entries, entry) } } return entries, nil @@ -420,12 +513,11 @@ func (d *nodeSyncDriver) collectIndexEntries() ([]syncproto.IndexEntry, error) { if err != nil { return nil, fmt.Errorf("lookup %s: %w", p, err) } - entries = append(entries, syncproto.IndexEntry{ - Path: row.Path, - Blake3Hex: hex.EncodeToString(row.Blake3), - SizeBytes: row.SizeBytes, - MtimeNs: row.MtimeNs, - }) + entry, err := d.indexEntryForRow(row) + if err != nil { + return nil, err + } + entries = append(entries, entry) } return entries, nil } @@ -552,13 +644,83 @@ func (d *nodeSyncDriver) phaseVerify() error { // verify report's failing paths as 'failed_paths' so the receiver // skips them on commit — they'll be picked up by the next sync's // /plan when their on-disk content reappears. +// +// On a verified successful close (status success acknowledged by the +// receiver; never dry-run) the initiator records the durability +// consequence: the peer is a destination in the flat target namespace, +// so the vector advances over the present-set origin maxima captured +// before the transfer (tagged peer-blake3 — the receiver re-hashed every +// delivered path). Pinning to that snapshot keeps a row committed between +// enumeration and close from being claimed durable, matching the other +// handlers. A failed advance fails the run — the bytes are on the peer +// but the evidence isn't recorded, and the next sync re-plans (everything +// already-correct) and re-advances cheaply. The durability pull that +// follows is metadata-only and merely warns on failure. func (d *nodeSyncDriver) phaseClose() error { failed := failingPaths(d.report.NodeVerify) - return d.client.close(d.ctx, syncproto.CloseRequest{ + err := d.client.close(d.ctx, syncproto.CloseRequest{ ReceiverRunID: d.receiverRunID, Status: d.report.Status, FailedPaths: failed, }) + if err != nil { + return err + } + if d.report.Status != store.RunStatusSuccess || d.opts.DryRun { + return nil + } + if err := d.store.AdvanceDestinationVectorTo(d.ctx, d.volID, d.node.Name, store.VerifyMethodPeer, d.durabilityAdvance); err != nil { + return fmt.Errorf("advance destination vector for %s: %w", d.node.Name, err) + } + d.pullPeerDurability() + return nil +} + +// pullPeerDurability fetches the peer's destination vectors and merges +// them into the local store. Failures, refused rewinds, and components +// dropped for unconfigured destinations surface as report warnings +// rather than failing the run: the sync itself succeeded, and the pull +// can be retried any time via the standalone `peer-sync pull-durability` +// command. +func (d *nodeSyncDriver) pullPeerDurability() { + rep, err := pullDurability(d.ctx, d.store, d.client, d.vol.Name, d.volID, d.node.Name, acceptedDestinations(d.vol), false) + d.report.DurabilityPull = rep + if err != nil { + d.report.Warnings = append(d.report.Warnings, + fmt.Sprintf("durability pull from %s: %v", d.node.Name, err)) + return + } + for _, rw := range rep.Rewinds { + d.report.Warnings = append(d.report.Warnings, + fmt.Sprintf("durability pull from %s refused rewind: %s", d.node.Name, rw)) + } + if rep.Dropped > 0 { + d.report.Warnings = append(d.report.Warnings, + fmt.Sprintf("durability pull from %s dropped %d entr%s for unconfigured destinations (e.g. %s)", + d.node.Name, rep.Dropped, plural(rep.Dropped, "y", "ies"), dropSample(rep.Drops))) + } +} + +func plural(n int, one, many string) string { + if n == 1 { + return one + } + return many +} + +// dropSample renders the sampled destinations from a drop list as a +// compact, deduplicated, comma-separated string for one summary line. +func dropSample(drops []DurabilityDrop) string { + seen := make(map[string]struct{}, len(drops)) + var names []string + for _, d := range drops { + if _, ok := seen[d.Destination]; ok { + continue + } + seen[d.Destination] = struct{}{} + names = append(names, d.Destination) + } + return strings.Join(names, ", ") } func (d *nodeSyncDriver) abortWithError(phase string, err error) error { @@ -648,6 +810,11 @@ func (c *nodeClient) close(ctx context.Context, body syncproto.CloseRequest) err return c.do(ctx, "/v1/sync/close", body, nil) } +func (c *nodeClient) durability(ctx context.Context, body syncproto.DurabilityRequest) (syncproto.DurabilityResponse, error) { + var resp syncproto.DurabilityResponse + return resp, c.do(ctx, "/v1/sync/durability", body, &resp) +} + // do is the shared "POST JSON, decode JSON" implementation. The URL // is built by joining the configured endpoint's path with urlPath // (rather than concatenating raw strings, per CLAUDE.md) — a node diff --git a/sync/node_origin_test.go b/sync/node_origin_test.go new file mode 100644 index 0000000..4abac66 --- /dev/null +++ b/sync/node_origin_test.go @@ -0,0 +1,382 @@ +package sync + +import ( + "context" + "net/http/httptest" + "net/url" + "os" + "path/filepath" + "testing" + + "github.com/mbertschler/squirrel/agent" + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/index" + "github.com/mbertschler/squirrel/store" +) + +// TestCollectIndexEntriesMaterialisesOrigins pins the sender side of +// origin propagation at the /plan boundary: locally-introduced content +// travels as (self name, introduction run — the earliest first_seen of +// the content in the volume, not the duplicate's), and content with a +// recorded origin travels verbatim under the origin node's name. +func TestCollectIndexEntriesMaterialisesOrigins(t *testing.T) { + f := setupNodeFixtureNoRclone(t) + ctx := context.Background() + + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume: %v", err) + } + run1, err := f.initStore.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false) + if err != nil { + t.Fatalf("BeginIndexRun run1: %v", err) + } + run2, err := f.initStore.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false) + if err != nil { + t.Fatalf("BeginIndexRun run2: %v", err) + } + ext, err := f.initStore.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode ext: %v", err) + } + upsert := func(path string, b byte, firstSeen int64, prov *store.Provenance) { + t.Helper() + if err := f.initStore.Upsert(ctx, store.FileRow{ + VolumeID: v.ID, Path: path, Blake3: bytesDigest(b), + SizeBytes: 1, MtimeNs: 1, Status: store.StatusPresent, + FirstSeenRunID: firstSeen, LastSeenRunID: firstSeen, IndexedAtNs: 1, + }, prov); err != nil { + t.Fatalf("upsert %s: %v", path, err) + } + } + upsert("dup1.txt", 0xC1, run1, nil) + upsert("dup2.txt", 0xC1, run2, nil) + upsert("fwd.bin", 0xC2, run2, &store.Provenance{NodeID: ext.ID, RunID: 77}) + + self, err := f.initStore.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + driver := &nodeSyncDriver{ctx: ctx, store: f.initStore, vol: f.initVol, volID: v.ID} + entries, err := driver.collectIndexEntries() + if err != nil { + t.Fatalf("collectIndexEntries: %v", err) + } + type origin struct { + node string + run int64 + } + got := map[string]origin{} + for _, e := range entries { + got[e.Path] = origin{e.OriginNode, e.OriginRun} + } + want := map[string]origin{ + "dup1.txt": {self.Name, run1}, + "dup2.txt": {self.Name, run1}, + "fwd.bin": {"ext", 77}, + } + for path, w := range want { + if got[path] != w { + t.Fatalf("origin[%s] = %+v, want %+v (full: %+v)", path, got[path], w, got) + } + } +} + +// chainPeer is one receiver in the 3-node chain: its store, on-disk +// volume, and the node config a forwarder dials it with. +type chainPeer struct { + store *store.Store + vol *config.Volume + node *config.Node +} + +// newChainPeer stands up one agent-backed receiver named name under +// root, mirroring buildNodeFixture's receiver half. +func newChainPeer(t *testing.T, root, name string) *chainPeer { + t.Helper() + volParent := filepath.Join(root, name) + volPath := filepath.Join(volParent, "pics") + if err := os.MkdirAll(volPath, 0o755); err != nil { + t.Fatalf("mkdir %s: %v", volPath, err) + } + s := openStoreWithName(t, filepath.Join(root, name+".db"), name) + vol := &config.Volume{Name: "pics", Path: volPath} + srv, err := agent.New(agent.Config{ + Listen: "127.0.0.1:0", + Token: "test-token", + Version: "test", + Volumes: map[string]*config.Volume{"pics": vol}, + }, s) + if err != nil { + t.Fatalf("agent.New(%s): %v", name, err) + } + ts := httptest.NewServer(srv.Handler()) + t.Cleanup(ts.Close) + endpoint, err := url.Parse(ts.URL) + if err != nil { + t.Fatalf("parse URL: %v", err) + } + return &chainPeer{ + store: s, + vol: vol, + node: &config.Node{Name: name, Endpoint: endpoint, Token: "test-token", Path: volParent}, + } +} + +// TestNodeSyncOriginCarriedVerbatimAcrossChain is the acceptance test +// for verbatim origin propagation: alpha introduces content, syncs to +// bravo, and bravo forwards to charlie. Charlie's contents row must +// record alpha's origin coordinate — alpha's name (a node charlie has +// never peered with; a row is created for it) and alpha's introduction +// run — never a relabel to bravo. The second hop must also classify +// cleanly (no conflicts): supersede-vs-conflict is judged by delivery, +// not by the forwarded origin. +func TestNodeSyncOriginCarriedVerbatimAcrossChain(t *testing.T) { + rcl := requireRclone(t) + root := t.TempDir() + rcl.Config = filepath.Join(root, "rclone.conf") + if err := os.WriteFile(rcl.Config, []byte{}, 0o600); err != nil { + t.Fatalf("write rclone.conf: %v", err) + } + ctx := context.Background() + + volAPath := filepath.Join(root, "alpha", "pics") + if err := os.MkdirAll(volAPath, 0o755); err != nil { + t.Fatal(err) + } + storeA := openStoreWithName(t, filepath.Join(root, "alpha.db"), "alpha") + volA := &config.Volume{Name: "pics", Path: volAPath} + bravo := newChainPeer(t, root, "bravo") + charlie := newChainPeer(t, root, "charlie") + + if err := os.WriteFile(filepath.Join(volAPath, "photo.jpg"), []byte("the travelling bytes"), 0o644); err != nil { + t.Fatal(err) + } + if _, err := index.Index(ctx, storeA, volAPath, index.Options{Name: "pics"}); err != nil { + t.Fatalf("index alpha: %v", err) + } + vA, err := storeA.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("alpha volume: %v", err) + } + rowA, err := storeA.GetByPath(ctx, vA.ID, "photo.jpg") + if err != nil { + t.Fatalf("alpha row: %v", err) + } + introRun := rowA.FirstSeenRunID + + // Hop 1: alpha → bravo. + rep1, err := SyncNode(ctx, storeA, rcl, volA, bravo.node, Options{Shallow: true}) + if err != nil || rep1.Status != store.RunStatusSuccess { + t.Fatalf("hop 1: err=%v status=%q", err, rep1.Status) + } + vB, err := bravo.store.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("bravo volume: %v", err) + } + alphaOnB, err := bravo.store.GetNodeByName(ctx, "alpha") + if err != nil { + t.Fatalf("bravo has no alpha row: %v", err) + } + rowB, err := bravo.store.GetByPath(ctx, vB.ID, "photo.jpg") + if err != nil { + t.Fatalf("bravo row: %v", err) + } + if !rowB.OriginNodeID.Valid || rowB.OriginNodeID.Int64 != alphaOnB.ID || + !rowB.OriginRunID.Valid || rowB.OriginRunID.Int64 != introRun { + t.Fatalf("bravo origin = (%+v, %+v), want (alpha=%d, %d)", + rowB.OriginNodeID, rowB.OriginRunID, alphaOnB.ID, introRun) + } + + // Hop 2: bravo forwards to charlie. Bravo indexes first (the + // initiator prerequisite); re-observing the synced file keeps its + // row and content origin untouched. + if _, err := index.Index(ctx, bravo.store, bravo.vol.Path, index.Options{Name: "pics"}); err != nil { + t.Fatalf("index bravo: %v", err) + } + rep2, err := SyncNode(ctx, bravo.store, rcl, bravo.vol, charlie.node, Options{Shallow: true}) + if err != nil || rep2.Status != store.RunStatusSuccess { + t.Fatalf("hop 2: err=%v status=%q", err, rep2.Status) + } + if len(rep2.NodeConflicts) != 0 { + t.Fatalf("hop 2 conflicts = %+v, want none", rep2.NodeConflicts) + } + + vC, err := charlie.store.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("charlie volume: %v", err) + } + alphaOnC, err := charlie.store.GetNodeByName(ctx, "alpha") + if err != nil { + t.Fatalf("charlie has no nodes row for alpha (never peered, must be created from the wire origin): %v", err) + } + bravoOnC, err := charlie.store.GetNodeByName(ctx, "bravo") + if err != nil { + t.Fatalf("charlie has no bravo row: %v", err) + } + rowC, err := charlie.store.GetByPath(ctx, vC.ID, "photo.jpg") + if err != nil { + t.Fatalf("charlie row: %v", err) + } + if !rowC.OriginNodeID.Valid || rowC.OriginNodeID.Int64 != alphaOnC.ID { + t.Fatalf("charlie OriginNodeID = %+v, want alpha's row %d (bravo's is %d — relabel forbidden)", + rowC.OriginNodeID, alphaOnC.ID, bravoOnC.ID) + } + if !rowC.OriginRunID.Valid || rowC.OriginRunID.Int64 != introRun { + t.Fatalf("charlie OriginRunID = %+v, want alpha's introduction run %d", rowC.OriginRunID, introRun) + } +} + +// TestNodeSyncAdvancesVectorAndPullsDurabilityAtClose covers the +// initiator's successful-close bookkeeping: the peer's destination +// vector advances over the volume's present set (self component at the +// local introduction run, forwarded-origin component at its origin run, +// verbatim), and the automatic metadata pull lands the peer's own +// destination components in the initiator's store under the same +// names. +func TestNodeSyncAdvancesVectorAndPullsDurabilityAtClose(t *testing.T) { + f := setupNodeFixture(t) + ctx := context.Background() + + // The peer knows about a destination only it can see; the volume + // requires it for offload, so the pull accepts the peer's evidence. + f.initVol.OffloadRequires = []string{"offsite-x"} + recvSelfName := seedReceiverDurability(t, f, map[string]int64{"offsite-x": 7}) + + // Initiator: one forwarded-origin file seeded before indexing (a + // content's origin is recorded at first contact and immutable), + // plus one locally-introduced file picked up by the index run. + v, err := f.initStore.CreateVolume(ctx, f.initVol.Name, f.initVol.Path) + if err != nil { + t.Fatalf("CreateVolume on initiator: %v", err) + } + seedRun, err := f.initStore.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false) + if err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + _ = f.initStore.FinishRun(ctx, seedRun, store.RunStatusSuccess, "", 1) + ext, err := f.initStore.CreateNode(ctx, "ext", "peer://ext") + if err != nil { + t.Fatalf("CreateNode ext: %v", err) + } + fwdBody := []byte("forwarded content") + fwdAbs := filepath.Join(f.initVol.Path, "fwd.bin") + if err := os.WriteFile(fwdAbs, fwdBody, 0o644); err != nil { + t.Fatal(err) + } + if err := f.initStore.Upsert(ctx, store.FileRow{ + VolumeID: v.ID, Path: "fwd.bin", Blake3: hashFile(t, fwdAbs), + SizeBytes: int64(len(fwdBody)), MtimeNs: 1, Status: store.StatusPresent, + FirstSeenRunID: seedRun, LastSeenRunID: seedRun, IndexedAtNs: 1, + }, &store.Provenance{NodeID: ext.ID, RunID: 42}); err != nil { + t.Fatalf("seed forwarded row: %v", err) + } + if err := os.WriteFile(filepath.Join(f.initVol.Path, "local.txt"), []byte("locally introduced"), 0o644); err != nil { + t.Fatal(err) + } + f.indexInitiator(t) + + rep, err := SyncNode(ctx, f.initStore, f.rcl, f.initVol, f.node, Options{Shallow: true}) + if err != nil || rep.Status != store.RunStatusSuccess { + t.Fatalf("SyncNode: err=%v status=%q", err, rep.Status) + } + + // Vector advanced for the peer destination, in origin space. + self, _ := f.initStore.GetSelfNode(ctx) + localRow, err := f.initStore.GetByPath(ctx, v.ID, "local.txt") + if err != nil { + t.Fatalf("GetByPath local.txt: %v", err) + } + vector, err := f.initStore.ListDestinationRunIDs(ctx, v.ID, f.node.Name) + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + byNode := map[int64]int64{} + for _, c := range vector { + byNode[c.OriginNodeID] = c.OriginRunID + } + if byNode[self.ID] != localRow.FirstSeenRunID { + t.Fatalf("self component = %d, want local.txt's introduction run %d (vector = %+v)", + byNode[self.ID], localRow.FirstSeenRunID, byNode) + } + if byNode[ext.ID] != 42 { + t.Fatalf("ext component = %d, want 42 (forwarded origin, verbatim)", byNode[ext.ID]) + } + + // The automatic pull cached the peer's offsite component locally. + if rep.DurabilityPull.Fetched != 1 || rep.DurabilityPull.Applied != 1 { + t.Fatalf("DurabilityPull = %+v, want fetched=1 applied=1", rep.DurabilityPull) + } + originOnInit, err := f.initStore.GetNodeByName(ctx, recvSelfName) + if err != nil { + t.Fatalf("initiator has no row for the peer origin %q: %v", recvSelfName, err) + } + got, err := f.initStore.GetDestinationRunID(ctx, v.ID, "offsite-x", originOnInit.ID) + if err != nil { + t.Fatalf("GetDestinationRunID offsite-x: %v", err) + } + if got.OriginRunID != 7 { + t.Fatalf("offsite-x component = %d, want 7", got.OriginRunID) + } +} + +// TestNodeSyncPeerAdvanceSnapshotPinned is the finding-3 fix: the peer +// close-phase advance is pinned to the present-set origin maxima captured +// before the transfer (and tagged peer-blake3), the same snapshot the +// bucket, content-addressed, and kopia handlers use — not a live read of +// the present set after the transfer. The advance equals +// PresentOriginMaxima taken before the sync, and the same push records +// the origin-space push-freshness coordinate so a downstream puller can +// satisfy its relayed-target freshness condition. +func TestNodeSyncPeerAdvanceSnapshotPinned(t *testing.T) { + f := setupNodeFixture(t) + ctx := context.Background() + + if err := os.WriteFile(filepath.Join(f.initVol.Path, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(filepath.Join(f.initVol.Path, "b.txt"), []byte("bravo"), 0o644); err != nil { + t.Fatal(err) + } + f.indexInitiator(t) + + v, err := f.initStore.GetVolumeByName(ctx, "pics") + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + self, err := f.initStore.GetSelfNode(ctx) + if err != nil { + t.Fatalf("GetSelfNode: %v", err) + } + snapshot, err := f.initStore.PresentOriginMaxima(ctx, v.ID, self.ID) + if err != nil { + t.Fatalf("PresentOriginMaxima: %v", err) + } + + rep, err := SyncNode(ctx, f.initStore, f.rcl, f.initVol, f.node, Options{Shallow: true}) + if err != nil || rep.Status != store.RunStatusSuccess { + t.Fatalf("SyncNode: err=%v status=%q", err, rep.Status) + } + + vector, err := f.initStore.ListDestinationRunIDs(ctx, v.ID, f.node.Name) + if err != nil { + t.Fatalf("ListDestinationRunIDs: %v", err) + } + if len(vector) != len(snapshot) || len(vector) != 1 { + t.Fatalf("vector = %+v, want one component matching the captured snapshot %+v", vector, snapshot) + } + if vector[0].OriginNodeID != snapshot[0].OriginNodeID || vector[0].OriginRunID != snapshot[0].OriginRunID { + t.Fatalf("vector component %+v != captured snapshot %+v", vector[0], snapshot[0]) + } + if vector[0].VerifyMethod != store.VerifyMethodPeer { + t.Fatalf("verify method = %q, want %q", vector[0].VerifyMethod, store.VerifyMethodPeer) + } + + fresh, err := f.initStore.ListDestinationPushFreshness(ctx, v.ID, f.node.Name) + if err != nil { + t.Fatalf("ListDestinationPushFreshness: %v", err) + } + if len(fresh) != 1 || fresh[0].OriginNodeID != snapshot[0].OriginNodeID || fresh[0].OriginRunID != snapshot[0].OriginRunID { + t.Fatalf("push freshness = %+v, want it to match the captured snapshot %+v", fresh, snapshot) + } +} diff --git a/sync/node_test.go b/sync/node_test.go index 7f6d4c9..429350d 100644 --- a/sync/node_test.go +++ b/sync/node_test.go @@ -183,18 +183,42 @@ func TestNodeSyncTransfersFiles(t *testing.T) { } } - // Receiver-side index reflects the new rows. + // Receiver-side index reflects the new rows, each carrying the + // content's origin coordinate: the initiator's node name (mapped + // to the receiver's row for it) and the initiator-side run that + // introduced the content — not the receiver's own run. v, err := f.recvStore.GetVolumeByName(context.Background(), "pics") if err != nil { t.Fatalf("GetVolumeByName on receiver: %v", err) } + initSelfPre, err := f.initStore.GetSelfNode(context.Background()) + if err != nil { + t.Fatalf("GetSelfNode on initiator: %v", err) + } + initVolRow, err := f.initStore.GetVolumeByName(context.Background(), "pics") + if err != nil { + t.Fatalf("GetVolumeByName on initiator: %v", err) + } + originNode, err := f.recvStore.GetNodeByName(context.Background(), initSelfPre.Name) + if err != nil { + t.Fatalf("receiver has no nodes row named %q: %v", initSelfPre.Name, err) + } for name := range files { row, err := f.recvStore.GetByPath(context.Background(), v.ID, name) if err != nil { t.Fatalf("GetByPath %s on receiver: %v", name, err) } - if !row.SourceNodeID.Valid { - t.Fatalf("%s row has NULL source_node_id; want initiator attribution", name) + if !row.OriginNodeID.Valid || row.OriginNodeID.Int64 != originNode.ID { + t.Fatalf("%s OriginNodeID = %+v, want %d (the initiator's row by name)", + name, row.OriginNodeID, originNode.ID) + } + initRow, err := f.initStore.GetByPath(context.Background(), initVolRow.ID, name) + if err != nil { + t.Fatalf("GetByPath %s on initiator: %v", name, err) + } + if !row.OriginRunID.Valid || row.OriginRunID.Int64 != initRow.FirstSeenRunID { + t.Fatalf("%s OriginRunID = %+v, want the initiator's introduction run %d", + name, row.OriginRunID, initRow.FirstSeenRunID) } } @@ -366,8 +390,8 @@ func TestNodeSyncResolvesConflictOnLocalWriteOnReceiver(t *testing.T) { if hex.EncodeToString(liveRow.Blake3) == hex.EncodeToString(receiverDigest) { t.Fatalf("live row still carries the prior blake3; want initiator's") } - if !liveRow.SourceNodeID.Valid { - t.Fatalf("live doc.md row has NULL source_node_id; want initiator attribution") + if !liveRow.OriginNodeID.Valid { + t.Fatalf("live doc.md row has NULL origin; want the initiator-introduced content's origin") } preservedRow, err := f.recvStore.GetByPath(ctx, v.ID, preservedRel) if err != nil { @@ -377,9 +401,9 @@ func TestNodeSyncResolvesConflictOnLocalWriteOnReceiver(t *testing.T) { t.Fatalf("preserved row blake3 = %x, want %x (the prior content)", preservedRow.Blake3, receiverDigest) } - if preservedRow.SourceNodeID.Valid { + if preservedRow.OriginNodeID.Valid { t.Fatalf("preserved row source_node_id = %d, want NULL (prior was a local write)", - preservedRow.SourceNodeID.Int64) + preservedRow.OriginNodeID.Int64) } // Loser is reachable by hash too — `squirrel query <prior>` @@ -508,12 +532,6 @@ func TestVerifyReportsMismatch(t *testing.T) { f := setupNodeFixture(t) ctx := context.Background() - // Plant a file on disk at the receiver. This represents bytes - // that arrived after a (mocked) rclone transfer. - if err := os.WriteFile(filepath.Join(f.recvVol.Path, "a.txt"), []byte("not-what-initiator-claims"), 0o644); err != nil { - t.Fatal(err) - } - initSelf, _ := f.initStore.GetSelfNode(ctx) client := newNodeClient(f.node) begin, err := client.begin(ctx, syncproto.BeginRequest{ @@ -541,6 +559,14 @@ func TestVerifyReportsMismatch(t *testing.T) { if err != nil { t.Fatalf("/plan: %v", err) } + // Plant a file on disk at the receiver after /plan: this represents + // the bytes a (mocked) rclone transfer delivered to the Transfer + // destination, with content that doesn't match the initiator's claim. + // Planting it post-plan keeps it clear of the Transfer pre-stage, + // which moves any out-of-band pre-existing file aside. + if err := os.WriteFile(filepath.Join(f.recvVol.Path, "a.txt"), []byte("not-what-initiator-claims"), 0o644); err != nil { + t.Fatal(err) + } v, err := client.verify(ctx, syncproto.VerifyRequest{ReceiverRunID: begin.ReceiverRunID}) if err != nil { t.Fatalf("/verify: %v", err) @@ -601,13 +627,13 @@ func TestPlanResponseContainsAllDispositions(t *testing.T) { // Receiver-side: prepare three pre-existing rows under one // volume, three dispositions ahead of /plan: // - "same.txt" → same blake3 ⇒ already-correct - // - "evolved.txt" → different blake3 sourced from initiator - // (we plant peer_sync_state to make the - // provenance trace back to the initiator - // at run ≤ watermark) ⇒ supersede + // - "evolved.txt" → different blake3 delivered by the + // initiator (its first-seen run is a + // peer-sync run correlated at ≤ the + // watermark) ⇒ supersede // - "novel.txt" → no receiver-side row ⇒ transfer - // - "local.txt" → different blake3, source_node_id NULL - // ⇒ conflict + // - "local.txt" → different blake3, first seen by a local + // index run (no peer linkage) ⇒ conflict v, err := f.recvStore.CreateVolume(ctx, f.recvVol.Name, f.recvVol.Path) if err != nil { t.Fatalf("seed receiver volume: %v", err) @@ -632,13 +658,20 @@ func TestPlanResponseContainsAllDispositions(t *testing.T) { t.Fatalf("BeginPeerSyncRun: %v", err) } _ = f.recvStore.FinishRun(ctx, priorRun, store.RunStatusSuccess, "", 1) + // A receiver-local index run: rows first seen by it have no peer + // linkage, which is what makes local.txt a conflict. + localRun, err := f.recvStore.BeginIndexRun(ctx, store.RunKindIndex, v.ID, false) + if err != nil { + t.Fatalf("BeginIndexRun: %v", err) + } + _ = f.recvStore.FinishRun(ctx, localRun, store.RunStatusSuccess, "", 1) - mustUpsert := func(path string, digest []byte, prov *store.Provenance) { + mustUpsert := func(path string, digest []byte, firstSeen int64, prov *store.Provenance) { t.Helper() if err := f.recvStore.Upsert(ctx, store.FileRow{ VolumeID: v.ID, Path: path, Blake3: digest, SizeBytes: 1, MtimeNs: 1, Status: store.StatusPresent, - FirstSeenRunID: priorRun, LastSeenRunID: priorRun, IndexedAtNs: 1, + FirstSeenRunID: firstSeen, LastSeenRunID: firstSeen, IndexedAtNs: 1, }, prov); err != nil { t.Fatalf("upsert receiver %s: %v", path, err) } @@ -649,19 +682,20 @@ func TestPlanResponseContainsAllDispositions(t *testing.T) { // (supersede) and local.txt (conflict) both get re-hashed before // their bytes are moved, so their recorded digests must match what's // on disk — a real receiver indexes its own content; a synthetic - // digest would look like out-of-band drift. The three files share the - // same "seed" bytes, so one digest serves both. same.txt is - // already-correct and never re-hashed, so a synthetic digest is fine. + // digest would look like out-of-band drift. same.txt is + // already-correct and never re-hashed, so a synthetic digest is + // fine. for _, p := range []string{"same.txt", "evolved.txt", "local.txt"} { - if err := os.WriteFile(filepath.Join(f.recvVol.Path, p), []byte("seed"), 0o644); err != nil { + if err := os.WriteFile(filepath.Join(f.recvVol.Path, p), []byte("seed-"+p), 0o644); err != nil { t.Fatal(err) } } - seedDigest := hashFile(t, filepath.Join(f.recvVol.Path, "evolved.txt")) + evolvedDigest := hashFile(t, filepath.Join(f.recvVol.Path, "evolved.txt")) + localDigest := hashFile(t, filepath.Join(f.recvVol.Path, "local.txt")) - mustUpsert("same.txt", bytesDigest(0xAA), nil) - mustUpsert("evolved.txt", seedDigest, &store.Provenance{NodeID: peer.ID, RunID: priorRun}) - mustUpsert("local.txt", seedDigest, nil) + mustUpsert("same.txt", bytesDigest(0xAA), priorRun, nil) + mustUpsert("evolved.txt", evolvedDigest, priorRun, &store.Provenance{NodeID: peer.ID, RunID: priorInitiatorRunID}) + mustUpsert("local.txt", localDigest, localRun, nil) // peer_sync_state watermark (in initiator-id space) high enough // to cover the prior row's correlated id. @@ -739,8 +773,8 @@ func TestPlanResponseContainsAllDispositions(t *testing.T) { if err != nil { t.Fatalf("GetByPath %s: %v", plan.Conflicts[0].PreservedAtPath, err) } - if hex.EncodeToString(conflictRow.Blake3) != hex.EncodeToString(seedDigest) { - t.Fatalf("conflict-path row blake3 = %x, want the prior on-disk digest %x", conflictRow.Blake3, seedDigest) + if hex.EncodeToString(conflictRow.Blake3) != hex.EncodeToString(localDigest) { + t.Fatalf("conflict-path row blake3 = %x, want the prior on-disk digest %x", conflictRow.Blake3, localDigest) } } @@ -789,9 +823,9 @@ func TestNodeSyncEndToEndConflictAfterAgentSideIndex(t *testing.T) { if err != nil { t.Fatalf("GetByPath before round 2: %v", err) } - if beforeRound2.SourceNodeID.Valid { + if beforeRound2.OriginNodeID.Valid { t.Fatalf("post-index row has source_node_id %d; want NULL (local write)", - beforeRound2.SourceNodeID.Int64) + beforeRound2.OriginNodeID.Int64) } // Round 2: initiator writes Z and re-syncs. The receiver's row @@ -873,17 +907,22 @@ func TestNodeSyncConflictWhenPriorRowFromDifferentPeer(t *testing.T) { t.Fatalf("BeginPeerSyncRun: %v", err) } _ = f.recvStore.FinishRun(ctx, priorRun, store.RunStatusSuccess, "", 1) - priorDigest := bytesDigest(0x77) + // Write the bytes first and record their real digest: the conflict + // pre-stage re-hashes before moving, and a synthetic digest would + // look like out-of-band drift (which deliberately drops the prior + // origin). The recorded origin is the third party's own coordinate + // (its node row + a run id in *its* run space), carried verbatim. + if err := os.WriteFile(filepath.Join(f.recvVol.Path, "shared.md"), []byte("from-third-party"), 0o644); err != nil { + t.Fatal(err) + } + priorDigest := hashFile(t, filepath.Join(f.recvVol.Path, "shared.md")) if err := f.recvStore.Upsert(ctx, store.FileRow{ VolumeID: v.ID, Path: "shared.md", Blake3: priorDigest, - SizeBytes: 1, MtimeNs: 1, Status: store.StatusPresent, + SizeBytes: int64(len("from-third-party")), MtimeNs: 1, Status: store.StatusPresent, FirstSeenRunID: priorRun, LastSeenRunID: priorRun, IndexedAtNs: 1, - }, &store.Provenance{NodeID: otherPeer.ID, RunID: priorRun}); err != nil { + }, &store.Provenance{NodeID: otherPeer.ID, RunID: 7}); err != nil { t.Fatalf("seed third-party row: %v", err) } - if err := os.WriteFile(filepath.Join(f.recvVol.Path, "shared.md"), []byte("from-third-party"), 0o644); err != nil { - t.Fatal(err) - } // Initiator writes a *different* blake3. if err := os.WriteFile(filepath.Join(f.initVol.Path, "shared.md"), []byte("from-our-initiator"), 0o644); err != nil { @@ -913,9 +952,9 @@ func TestNodeSyncConflictWhenPriorRowFromDifferentPeer(t *testing.T) { if err != nil { t.Fatalf("GetByPath %s: %v", rep.NodeConflicts[0].PreservedAtPath, err) } - if !preservedRow.SourceNodeID.Valid || preservedRow.SourceNodeID.Int64 != otherPeer.ID { + if !preservedRow.OriginNodeID.Valid || preservedRow.OriginNodeID.Int64 != otherPeer.ID { t.Fatalf("preserved row source_node_id = %+v, want %d (third-party)", - preservedRow.SourceNodeID, otherPeer.ID) + preservedRow.OriginNodeID, otherPeer.ID) } } @@ -1338,8 +1377,8 @@ func TestNodeSyncCopyFromExistingDedup(t *testing.T) { if err != nil { t.Fatalf("GetByPath pets/a.jpg: %v", err) } - if !newRow.SourceNodeID.Valid { - t.Fatalf("pets/a.jpg row has NULL source_node_id; want initiator attribution") + if !newRow.OriginNodeID.Valid { + t.Fatalf("pets/a.jpg row has NULL origin; want the initiator-introduced content's origin") } } @@ -1550,7 +1589,7 @@ func TestPlanSupersedeWinsOverDedup(t *testing.T) { // watermark that puts the target row's provenance "at or before" // the shared watermark. initSelf, _ := f.initStore.GetSelfNode(ctx) - peer, err := f.recvStore.GetOrCreatePeerNode(ctx, initSelf.Name, "peer://"+initSelf.Name) + peer, err := f.recvStore.GetOrCreatePeerNode(ctx, initSelf.Name, "peer://"+initSelf.Name, false) if err != nil { t.Fatalf("GetOrCreatePeerNode: %v", err) } diff --git a/sync/rclone.go b/sync/rclone.go index 7f305a6..e1ad97b 100644 --- a/sync/rclone.go +++ b/sync/rclone.go @@ -232,6 +232,12 @@ type RunResult struct { // FatalError is true when the run failed in a way that did not produce // per-file errors — e.g. source root missing, auth failure. FatalError bool + // HashFallback is true when rclone reported that --checksum could not + // use the requested hash because source and destination share none, + // and silently fell back to a size-based comparison. A run that asked + // for BLAKE3 verification but hit this path was not content-verified, + // however rclone exited, so the caller must not record it as verified. + HashFallback bool } // FailedFile is one per-object error from the JSON log. Object may be @@ -340,10 +346,13 @@ func (r *Rclone) runPlain(ctx context.Context, args ...string) ([]byte, error) { // copyTo copies a single source file to a single destination path via // `rclone copyto`, creating intermediate directories as needed. Used by -// the snapshot ride-along to land one .db file at a fixed destination -// name (copy, by contrast, would treat the destination as a directory). -func (r *Rclone) copyTo(ctx context.Context, src, dst string) error { - _, err := r.runPlain(ctx, "copyto", src, dst) +// the snapshot ride-along and the content-addressed push to land one +// file at a fixed destination name (copy, by contrast, would treat the +// destination as a directory). extraArgs carries per-destination flags +// such as the --checkers cap. +func (r *Rclone) copyTo(ctx context.Context, src, dst string, extraArgs ...string) error { + args := append([]string{"copyto"}, extraArgs...) + _, err := r.runPlain(ctx, append(args, src, dst)...) return err } @@ -379,6 +388,69 @@ func (r *Rclone) deleteFile(ctx context.Context, fileURI string) error { return err } +// statRemote returns the size of the single object at uri via +// `rclone lsjson --stat`. The content-addressed push uses it to confirm +// presence and size of each uploaded object and manifest segment; +// through a crypt overlay the reported size is the decrypted length, so +// it compares directly against local byte counts. A missing object +// surfaces as an error (rclone exits non-zero; defensively, a `null` +// stat on a tolerant rclone build is mapped to the same outcome). +func (r *Rclone) statRemote(ctx context.Context, uri string, extraArgs ...string) (int64, error) { + args := append([]string{"lsjson", "--stat"}, extraArgs...) + out, err := r.runPlain(ctx, append(args, uri)...) + if err != nil { + return 0, err + } + trimmed := bytes.TrimSpace(out) + if len(trimmed) == 0 || string(trimmed) == "null" { + return 0, fmt.Errorf("rclone lsjson: no object at %s", uri) + } + var entry struct { + Size int64 `json:"Size"` + IsDir bool `json:"IsDir"` + } + if err := json.Unmarshal(trimmed, &entry); err != nil { + return 0, fmt.Errorf("parse lsjson --stat output for %s: %w", uri, err) + } + if entry.IsDir { + return 0, fmt.Errorf("%s is a directory, expected a single object", uri) + } + return entry.Size, nil +} + +// lsjsonEntry is one object from an `rclone lsjson` listing: its name, +// size, and — when hashes were requested — the provider checksums keyed +// by rclone hash name. +type lsjsonEntry struct { + Name string `json:"Name"` + Size int64 `json:"Size"` + IsDir bool `json:"IsDir"` + Hashes map[string]string `json:"Hashes"` +} + +// listHashes runs `rclone lsjson --hash --files-only` over dirURI and +// returns the entries with their provider checksums. hashTypes narrows +// which hashes rclone computes (--hash-type, repeated); nil requests +// every hash the backend exposes. extraArgs carries per-destination +// flags such as the --checkers cap and --include filters scoping the +// listing. +func (r *Rclone) listHashes(ctx context.Context, dirURI string, hashTypes []string, extraArgs ...string) ([]lsjsonEntry, error) { + args := []string{"lsjson", "--hash", "--files-only"} + for _, ht := range hashTypes { + args = append(args, "--hash-type", ht) + } + args = append(args, extraArgs...) + out, err := r.runPlain(ctx, append(args, dirURI)...) + if err != nil { + return nil, err + } + var entries []lsjsonEntry + if err := json.Unmarshal(bytes.TrimSpace(out), &entries); err != nil { + return nil, fmt.Errorf("parse lsjson output for %s: %w", dirURI, err) + } + return entries, nil +} + // rcloneEvent captures the subset of rclone's JSON log we care about: the // level (for error filtering), the per-object message and object name (for // failed-file lists), and the stats object that rclone emits at the end of @@ -408,6 +480,16 @@ var retrySummaryRE = regexp.MustCompile(`^Attempt \d+/\d+ failed`) func isRetrySummary(msg string) bool { return retrySummaryRE.MatchString(msg) } +// hashFallbackRE matches rclone's notice that --checksum has no common +// hash to compare with and is degrading to a size-based check, e.g. +// "--checksum is in use but the source and destination have no hashes in +// common; falling back to --size-only". The trailing verb has varied +// across rclone versions ("falling back"/"failing back") so the match +// keys on the stable phrase "no hashes in common", at any log level. +var hashFallbackRE = regexp.MustCompile(`no hashes in common`) + +func isHashFallback(msg string) bool { return hashFallbackRE.MatchString(msg) } + // parseJSONLog reads JSON-per-line events from r and updates result in // place. Non-JSON lines (e.g. an early startup notice on an older rclone) // are skipped — we cannot make decisions on them and surfacing them as @@ -425,6 +507,12 @@ func parseJSONLog(r io.Reader, result *RunResult, onProgress func(runevents.Prog if err := json.Unmarshal(line, &ev); err != nil { continue } + if isHashFallback(ev.Msg) { + // Emitted at NOTICE level (which the level filter below drops), + // so it is detected here before that filter: a run that asked + // for BLAKE3 but lost the hash must not be recorded as verified. + result.HashFallback = true + } if ev.Stats != nil { result.Transferred = ev.Stats.TotalTransfers result.Checked = ev.Stats.TotalChecks diff --git a/sync/rclone_test.go b/sync/rclone_test.go index 2155872..27a019f 100644 --- a/sync/rclone_test.go +++ b/sync/rclone_test.go @@ -72,6 +72,37 @@ func TestParseJSONLogCapturesObjectlessErrors(t *testing.T) { } } +// TestParseJSONLogDetectsHashFallback: rclone's no-common-hash notice is +// emitted at NOTICE level, which the error filter drops; parseJSONLog +// still flags it so a flags-set, exit-0 run that silently degraded to a +// size comparison is not later recorded as content-verified. +func TestParseJSONLogDetectsHashFallback(t *testing.T) { + stream := strings.Join([]string{ + `{"level":"notice","msg":"--checksum is in use but the source and destination have no hashes in common; falling back to --size-only","source":"x"}`, + `{"stats":{"errors":0,"fatalError":false,"totalTransfers":2,"totalChecks":0,"bytes":10}}`, + }, "\n") + var r RunResult + parseJSONLog(strings.NewReader(stream), &r, nil) + + if !r.HashFallback { + t.Fatalf("HashFallback = false, want true (no-common-hash notice should be detected)") + } + if len(r.FailedFiles) != 0 { + t.Fatalf("FailedFiles = %+v, want none (the notice is not a per-file error)", r.FailedFiles) + } +} + +// TestParseJSONLogNoFalseHashFallback: an ordinary run never trips the +// fallback flag. +func TestParseJSONLogNoFalseHashFallback(t *testing.T) { + stream := `{"stats":{"errors":0,"fatalError":false,"totalTransfers":2,"totalChecks":1,"bytes":10}}` + var r RunResult + parseJSONLog(strings.NewReader(stream), &r, nil) + if r.HashFallback { + t.Fatalf("HashFallback = true on a clean run, want false") + } +} + func TestIsRetrySummary(t *testing.T) { cases := []struct { in string @@ -159,6 +190,61 @@ password = "p" } } +// TestWriteRcloneConfigRendersSFTPHostKeyValidation confirms the optional +// sftp host-key params reach the written rclone.conf: known_hosts_file is +// what enables server host-key validation, and host_key_algorithms pins the +// accepted algorithms. Absent these, rclone does no host-key validation. +func TestWriteRcloneConfigRendersSFTPHostKeyValidation(t *testing.T) { + cfg := writeFakeConfig(t, ` +[destinations.nas] +type = "sftp" +host = "nas.local" +user = "martin" +root = "/data" +password = "p" +known_hosts_file = "~/.ssh/known_hosts" +host_key_algorithms = "ssh-ed25519 ssh-rsa" +`) + r := &Rclone{} + target := filepath.Join(t.TempDir(), "rclone.conf") + if _, err := r.WriteRcloneConfig(target, cfg.Destinations); err != nil { + t.Fatalf("WriteRcloneConfig: %v", err) + } + body, _ := os.ReadFile(target) + for _, want := range []string{ + "known_hosts_file = ~/.ssh/known_hosts", + "host_key_algorithms = ssh-ed25519 ssh-rsa", + } { + if !strings.Contains(string(body), want) { + t.Fatalf("rclone.conf missing %q:\n%s", want, body) + } + } +} + +// TestWriteRcloneConfigRendersS3StorageClass confirms the optional s3 +// storage_class reaches the written rclone.conf. +func TestWriteRcloneConfigRendersS3StorageClass(t *testing.T) { + cfg := writeFakeConfig(t, ` +[destinations.archive] +type = "s3" +provider = "AWS" +bucket = "squirrel" +root = "/p" +storage_class = "DEEP_ARCHIVE" +access_key_id = "AK" +secret_access_key = "sk" +`) + r := &Rclone{} + target := filepath.Join(t.TempDir(), "rclone.conf") + if _, err := r.WriteRcloneConfig(target, cfg.Destinations); err != nil { + t.Fatalf("WriteRcloneConfig: %v", err) + } + body, _ := os.ReadFile(target) + if !strings.Contains(string(body), "storage_class = DEEP_ARCHIVE") { + t.Fatalf("rclone.conf missing storage_class:\n%s", body) + } +} + // TestWriteRcloneConfigTightensExistingPermissions exercises the chmod // path. OpenFile's perm argument is only honored on create, so a file // that already exists with looser perms (e.g., 0644 from a previous @@ -210,6 +296,51 @@ root = "/tmp/scratch" } } +// TestWriteRcloneConfigRendersCryptOverlay is the file-level golden for a +// crypt destination: the underlying remote section exactly as without +// crypt, then the overlay section wrapping it at the destination root. +func TestWriteRcloneConfigRendersCryptOverlay(t *testing.T) { + cfg := writeFakeConfig(t, ` +[destinations.offsite] +type = "sftp" +host = "host.example" +user = "u" +root = "/data" +password = "transport-pw" + +[destinations.offsite.crypt] +password = "obscured-pw" +password2 = "obscured-salt" +`) + r := &Rclone{} + target := filepath.Join(t.TempDir(), "rclone.conf") + if _, err := r.WriteRcloneConfig(target, cfg.Destinations); err != nil { + t.Fatalf("WriteRcloneConfig: %v", err) + } + body, err := os.ReadFile(target) + if err != nil { + t.Fatalf("read rclone.conf: %v", err) + } + want := `[offsite] +type = sftp +host = host.example +user = u +blake3sum_command = b3sum +password = transport-pw + +[offsite-crypt] +type = crypt +remote = offsite:/data +filename_encryption = off +directory_name_encryption = false +password = obscured-pw +password2 = obscured-salt +` + if string(body) != want { + t.Fatalf("rclone.conf:\n%s\nwant:\n%s", body, want) + } +} + // TestWriteRcloneConfigSkipsUnchanged verifies the content-comparison // short-circuit: a second render of identical destinations reports // wrote=false and leaves the file's mtime untouched. The mtime is pinned diff --git a/sync/snapshot.go b/sync/snapshot.go index d18bdd8..097d854 100644 --- a/sync/snapshot.go +++ b/sync/snapshot.go @@ -180,24 +180,21 @@ func (sn *Snapshotter) rotateCloud(ctx context.Context, dirURI string) error { } // indexDirURI returns the rclone URI of the per-volume .squirrel-index/ -// directory under dest, mirroring backupDirURI: an absolute filesystem -// path for type=local, "<name>:<root>/<volume>/.squirrel-index" otherwise. +// directory under dest, addressed the same way the data transfer is +// (through the crypt overlay when the destination has one). func indexDirURI(dest *config.Destination, volumeName string) string { - subpath := path.Join(volumeName, IndexDirName) - switch dest.Type { - case "local": - return filepath.ToSlash(filepath.Join(dest.Root, subpath)) - default: - return dest.Name + ":" + path.Join(dest.Root, subpath) - } + return remoteSubpathURI(dest, path.Join(volumeName, IndexDirName)) } -// rotateSnapshots deletes the oldest snapshots in dir until only keep -// remain, mirroring the CLI's `db backup --keep` rotation. Files are -// matched by the index-/pre-migration- prefixes squirrel writes — the -// snapshot-on-sync directory defaults to the same backups/ dir the store -// and CLI use, so both prefixes share the bound. Unknown files are left -// untouched. keep<=0 means "no rotation". +// rotateSnapshots deletes the oldest snapshot-on-sync files in dir until +// only keep remain. Only the index-* files this routine writes are in the +// pool: the snapshot-on-sync directory defaults to the same backups/ dir +// the migration runner writes pre-migration-* snapshots to, and those are +// a buggy migration's only rollback surface — at the default keep=7 a +// sync cadence could rotate one away within days of a schema upgrade, so +// they are exempt here and only an explicit `db backup --keep` retention +// ever removes them. Unknown files are left untouched. keep<=0 means "no +// rotation". func rotateSnapshots(dir string, keep int) ([]string, error) { if keep <= 0 { return nil, nil @@ -219,7 +216,7 @@ func rotateSnapshots(dir string, keep int) ([]string, error) { continue } name := e.Name() - if !strings.HasPrefix(name, snapshotPrefix) && !strings.HasPrefix(name, "pre-migration-") { + if !strings.HasPrefix(name, snapshotPrefix) { continue } info, err := e.Info() diff --git a/sync/snapshot_test.go b/sync/snapshot_test.go index 75d9efd..586cfd6 100644 --- a/sync/snapshot_test.go +++ b/sync/snapshot_test.go @@ -290,6 +290,43 @@ func TestLocalRotationBoundsDir(t *testing.T) { } } +// TestRotationExemptsPreMigrationSnapshots is the #112 guard: routine +// snapshot-on-sync rotation must never delete a pre-migration snapshot, +// even when far more than keep sync-time rotations run, because it is a +// buggy migration's only rollback surface. index-* snapshots still rotate. +func TestRotationExemptsPreMigrationSnapshots(t *testing.T) { + dir := t.TempDir() + preMig := "pre-migration-v5-to-v6-20260101T000000.000Z.db" + if err := os.WriteFile(filepath.Join(dir, preMig), []byte("x"), 0o644); err != nil { + t.Fatal(err) + } + // Many more index-* snapshots than keep, written after the pre-migration + // one so by modtime they would sort newer — the pre-migration snapshot + // must survive on prefix, not on age. + for i := 0; i < 5; i++ { + name := fmt.Sprintf("index-2026020%dT000000.000Z-run-%d.db", i, i) + if err := os.WriteFile(filepath.Join(dir, name), []byte("x"), 0o644); err != nil { + t.Fatal(err) + } + } + removed, err := rotateSnapshots(dir, 2) + if err != nil { + t.Fatalf("rotateSnapshots: %v", err) + } + for _, r := range removed { + if strings.HasPrefix(filepath.Base(r), "pre-migration-") { + t.Fatalf("rotation removed a pre-migration snapshot: %s", r) + } + } + if _, err := os.Stat(filepath.Join(dir, preMig)); err != nil { + t.Fatalf("pre-migration snapshot was deleted by sync-time rotation: %v", err) + } + left, _ := filepath.Glob(filepath.Join(dir, "index-*.db")) + if len(left) != 2 { + t.Fatalf("index-* files left = %d, want 2 (keep): %v", len(left), left) + } +} + // TestSyncFiltersOutIndexDirFromSource is the reserved-path guard for // sync: a .squirrel-index dir that incidentally exists in the source // volume must not be uploaded. diff --git a/sync/sync.go b/sync/sync.go index 47d9ace..e00d057 100644 --- a/sync/sync.go +++ b/sync/sync.go @@ -98,6 +98,11 @@ type Report struct { RunID int64 RcloneResult RunResult Status string // success / partial / failed + // Verification is the handler's typed durability report for this + // push: which comparison backed it, what the tool counted, and — + // via Verified() — whether the destination's copy was + // content-verified. Zero for restores and dry runs. + Verification VerifyResult // FinishErr captures a failure to write the runs row's terminal state. // It is independent of rclone success — the bytes may have transferred // correctly but the audit-trail row got stuck in 'running'. Callers @@ -108,6 +113,12 @@ type Report struct { // .squirrel-history directory" so the user knows that content was // silently filtered from the upload. Warnings []string + // Fingerprints counts the provider checksums recorded for this run's + // freshly uploaded content-addressed objects (the scan-back + // fingerprint later `squirrel verify` passes compare against). Zero + // for other handler types; objects whose backend exposed no checksum + // surface in Warnings instead. + Fingerprints int64 // NodeReceiverRunID is set on a successful node-sync handshake and // echoed in the CLI output so the operator can join the two halves // of one logical sync against the receiver's `squirrel runs` @@ -138,27 +149,67 @@ type Report struct { // so the operator can distinguish source-side warnings from // receiver-side ones. NodePendingWarnings []string + // DurabilityPull summarises the automatic post-close durability + // metadata pull from the peer. Zero-valued for bucket syncs and + // for node syncs that didn't reach a successful close. Refused + // rewinds are mirrored into Warnings so the CLI surfaces them + // without special-casing this field. + DurabilityPull DurabilityPullReport + // durabilityAdvance is the origin-coordinate snapshot a handler + // captured from the volume's present set before its transfer. + // RunPair advances the destination vector to exactly these + // components after a verified success, so the advance reflects only + // what the push enumerated — never a live set re-read after the + // transfer. Handlers that advance the vector themselves (content- + // addressed, peer) leave it nil. + durabilityAdvance []store.OriginComponent } // RunPair is the single entry point for one sync invocation. It -// dispatches between bucket-destination and node-destination flows -// based on which slot of the Pair is populated. CLI callers use it -// directly so the per-Pair printing loop is a one-liner; the -// per-flavour functions (Sync, SyncNode) remain exported for tests -// and for callers that already have the typed destination in hand. +// resolves the curated Handler for the pair's target type and runs its +// Push. CLI callers use it directly so the per-Pair printing loop is a +// one-liner; the per-flavour functions (Sync, SyncNode) remain exported +// for tests and for callers that already have the typed destination in +// hand. // -// Concurrency: both flows allocate the 'running' kind='sync' row via -// store.BeginSyncRunIfClear, which does the check + insert atomically -// inside a BEGIN IMMEDIATE transaction. Two concurrent RunPair calls -// against the same (volume, target) cannot both win — the loser sees -// the winner's row and returns the "already running" diagnostic from -// alreadyRunningErr. Stale 'running' rows from crashed runs keep -// blocking here until cleared by `squirrel runs fail` (#37). -func RunPair(ctx context.Context, s *store.Store, rcl *Rclone, p Pair, opts Options) (Report, error) { - if p.IsNode() { - return SyncNode(ctx, s, rcl, p.Volume, p.Node, opts) +// A verified successful bucket push (BLAKE3 for rclone, repository +// verify for kopia) advances the destination's durability vector; +// peer pushes advance it inside the handshake's close phase instead. +// +// Concurrency: every handler allocates the 'running' kind='sync' row +// via store.BeginSyncRunIfClear, which does the check + insert +// atomically inside a BEGIN IMMEDIATE transaction. Two concurrent +// RunPair calls against the same (volume, target) cannot both win — the +// loser sees the winner's row and returns the "already running" +// diagnostic from alreadyRunningErr. Stale 'running' rows from crashed +// runs keep blocking here until cleared by `squirrel runs fail` (#37). +func RunPair(ctx context.Context, s *store.Store, tools Tools, p Pair, opts Options) (Report, error) { + h, err := HandlerFor(s, tools, p) + if err != nil { + rep := Report{Destination: p.TargetName()} + if p.Volume != nil { + rep.Volume = p.Volume.Name + } + return rep, err + } + rep, err := h.Push(ctx, opts) + if err != nil || opts.DryRun || p.IsNode() || rep.FinishErr != nil || + rep.Status != store.RunStatusSuccess || !rep.Verification.Verified() { + return rep, err + } + vol, verr := s.GetVolumeByName(ctx, rep.Volume) + if verr != nil { + return rep, fmt.Errorf("advance durability vector: resolve volume %q: %w", rep.Volume, verr) } - return Sync(ctx, s, rcl, p.Volume, p.Destination, opts) + // A failed advance surfaces as the command's error even though the + // runs row already closed as success: the bytes are on the + // destination, and the next verified push re-advances cheaply. The + // advance reflects the snapshot the handler captured before its + // transfer, tagged with the verification method that backed it. + if aerr := s.AdvanceDestinationVectorTo(ctx, vol.ID, p.TargetName(), rep.Verification.Method, rep.durabilityAdvance); aerr != nil { + return rep, fmt.Errorf("advance durability vector for %s → %s: %w", rep.Volume, p.TargetName(), aerr) + } + return rep, nil } // alreadyRunningErr formats the diagnostic returned when a sync is @@ -184,6 +235,9 @@ func Sync(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volume, if w := historyDirInSourceWarning(vol); w != "" { rep.Warnings = append(rep.Warnings, w) } + if w := cryptVerificationWarning(dest, opts.Shallow); w != "" { + rep.Warnings = append(rep.Warnings, w) + } volID, err := requireIndexedVolume(ctx, s, vol) if err != nil { @@ -206,7 +260,7 @@ func Sync(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volume, runID, err := beginSyncRunGuarded(ctx, s, opts.DryRun, store.SyncRunSpec{ VolumeID: volID, Destination: dest.Name, - Shallow: opts.Shallow, + Shallow: EffectiveShallow(dest, opts.Shallow), }, vol.Name) if err != nil { return rep, err @@ -215,10 +269,26 @@ func Sync(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volume, opts.OnRunID(runID) } + if !opts.DryRun { + if rep.durabilityAdvance, err = captureDurabilityAdvance(ctx, s, volID); err != nil { + // The runs row is already allocated; close it as failed so a + // capture error (context cancel, transient DB) before rclone + // starts cannot leave it stuck in 'running'. + rep.RunID = runID + rep.RcloneResult.FatalError = true + rep.RcloneResult.FailedFiles = []FailedFile{{Message: err.Error()}} + finishRun(ctx, s, opts.DryRun, runID, &rep) + return rep, err + } + } + err = runRcloneOperation(ctx, s, rcl, opts.DryRun, runID, &rep, opts.Progress, func(runID int64) ([]string, error) { return buildRcloneArgs(vol, dest, runID, opts) }) + if !opts.DryRun { + rep.Verification = rcloneVerification(dest, opts, &rep) + } // runRcloneOperation's deferred finishRun has committed the run's // terminal state by now, so the snapshot reflects this run's own row. // Destination syncs are eligible for the cloud ride-along; the @@ -227,6 +297,25 @@ func Sync(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volume, return rep, err } +// captureDurabilityAdvance snapshots the volume's present-set origin +// maxima before a transfer begins. RunPair advances the destination +// vector to exactly this snapshot on a verified success, so the advance +// covers only content the push enumerated — the cross-kind run guard +// keeps an index from committing new present rows during the push, and +// pinning to the snapshot keeps even an out-of-band write from being +// folded into the advance. +func captureDurabilityAdvance(ctx context.Context, s *store.Store, volumeID int64) ([]store.OriginComponent, error) { + self, err := s.GetSelfNode(ctx) + if err != nil { + return nil, fmt.Errorf("capture durability advance: self node: %w", err) + } + components, err := s.PresentOriginMaxima(ctx, volumeID, self.ID) + if err != nil { + return nil, fmt.Errorf("capture durability advance: %w", err) + } + return components, nil +} + // beginSyncRunGuarded is the sync-allocator the bucket and peer paths // share. It honours dry-run (returns 0 with no DB write) and delegates // to store.BeginSyncRunIfClear for the atomic gate. A blocked attempt @@ -293,6 +382,13 @@ func runRcloneOperation( // least one success-or-partial index run exists for it. Sync of an // unindexed volume is refused: without an index, we have no record of // what should be at the destination after the run. +// +// The DB row's recorded path must equal the config-declared path. A +// handler enumerates the config path's tree while the durability advance +// covers the rows the DB volume holds; if the two paths disagree (a stale +// volumes.path) the push would claim durability for one tree while +// transferring another. Offload and restore already make this +// cross-check; the push handlers share it through this gate. func requireIndexedVolume(ctx context.Context, s *store.Store, vol *config.Volume) (int64, error) { v, err := s.GetVolumeByName(ctx, vol.Name) if err != nil { @@ -301,6 +397,9 @@ func requireIndexedVolume(ctx context.Context, s *store.Store, vol *config.Volum } return 0, fmt.Errorf("lookup volume %q: %w", vol.Name, err) } + if v.Path != vol.Path { + return 0, fmt.Errorf("volume %q is at %q in the DB but config says %q — resolve the conflict before syncing", vol.Name, v.Path, vol.Path) + } if _, err := s.LatestSuccessfulIndexRun(ctx, v.ID); err != nil { if store.IsNotFound(err) { return 0, fmt.Errorf("volume %q has no successful index run — run `squirrel index %s` first", vol.Name, vol.Name) @@ -499,7 +598,8 @@ func buildRcloneArgs(vol *config.Volume, dest *config.Destination, runID int64, // back down on restore). "--filter", "- /" + IndexDirName + "/**", } - if !opts.Shallow { + args = append(args, checkersArgs(dest)...) + if !EffectiveShallow(dest, opts.Shallow) { args = append(args, "--checksum", "--hash", "blake3") } if opts.DryRun { @@ -519,19 +619,27 @@ func withTrailingSlash(p string) string { return p + "/" } -// destinationVolumeURI returns the rclone destination spec for the given -// volume under dest. For type=local this is an absolute filesystem path; -// for other types it is "<name>:<root>/<volume>/". -func destinationVolumeURI(dest *config.Destination, volumeName string) string { - switch dest.Type { - case "local": - return filepath.ToSlash(filepath.Join(dest.Root, volumeName)) + "/" +// remoteSubpathURI returns the rclone URI for subpath under dest's root. +// type=local is an absolute filesystem path; a crypt destination is +// addressed through its overlay remote, whose remote line already carries +// the root; plain remotes prefix the root themselves. +func remoteSubpathURI(dest *config.Destination, subpath string) string { + switch { + case dest.Type == "local": + return filepath.ToSlash(filepath.Join(dest.Root, subpath)) + case dest.Crypt != nil: + return dest.CryptRemoteName() + ":" + subpath default: - joined := path.Join(dest.Root, volumeName) - return dest.Name + ":" + joined + "/" + return dest.Name + ":" + path.Join(dest.Root, subpath) } } +// destinationVolumeURI returns the rclone destination spec for the given +// volume under dest. +func destinationVolumeURI(dest *config.Destination, volumeName string) string { + return remoteSubpathURI(dest, volumeName) + "/" +} + // backupDirURI returns the destination spec for rclone's --backup-dir for // this run. Path is per-volume per-run; for dry-run we still pass a // placeholder since rclone insists on the flag if --backup-dir is wanted @@ -541,13 +649,54 @@ func backupDirURI(dest *config.Destination, volumeName string, runID int64, dryR if dryRun || runID == 0 { id = "dry-run" } - subpath := path.Join(volumeName, HistoryDirName, "run-"+id) - switch dest.Type { - case "local": - return filepath.ToSlash(filepath.Join(dest.Root, subpath)) - default: - return dest.Name + ":" + path.Join(dest.Root, subpath) + return remoteSubpathURI(dest, path.Join(volumeName, HistoryDirName, "run-"+id)) +} + +// EffectiveShallow reports whether a transfer to dest runs without BLAKE3 +// verification. A crypt destination forces shallow: rclone crypt remotes +// expose no content hashes, so --checksum --hash blake3 cannot pass +// through the overlay and rclone falls back to its size+mtime comparison. +// The result is what the runs row records, keeping the audit trail honest +// about which transfers were content-verified. +func EffectiveShallow(dest *config.Destination, shallow bool) bool { + return shallow || dest.Crypt != nil +} + +// ShallowForPairs reports whether an invocation covering pairs runs +// rclone entirely without BLAKE3 verification: either the operator +// passed --shallow, or every rclone-driven target is a crypt +// destination that forces it. Kopia pairs are skipped — they drive the +// kopia binary, so they put no constraint on rclone. Content-addressed +// pairs are skipped for the same reason: their per-object copyto and +// lsjson calls never pass --hash blake3. Used to scope the rclone +// version preflight to what the run will actually invoke. +func ShallowForPairs(pairs []Pair, shallow bool) bool { + if shallow { + return true } + for _, p := range pairs { + if p.Destination != nil && p.Destination.Type == "kopia" { + continue + } + if p.Destination != nil && p.Destination.Layout == config.LayoutContentAddressed { + continue + } + if p.Destination == nil || p.Destination.Crypt == nil { + return false + } + } + return true +} + +// cryptVerificationWarning returns the advisory for a non-shallow transfer +// to a crypt destination, where EffectiveShallow downgrades verification +// without the operator having asked for --shallow. An explicit --shallow +// run already gets the CLI's shallow warning, so this stays empty then. +func cryptVerificationWarning(dest *config.Destination, shallow bool) string { + if dest.Crypt == nil || shallow { + return "" + } + return fmt.Sprintf("destination %q is encrypted (crypt): BLAKE3 verification cannot pass through the crypt overlay — comparing by size+mtime for this run, recorded as shallow", dest.Name) } // EnsureMinVersion checks the installed rclone against MinRcloneVersion. @@ -683,6 +832,15 @@ type RestoreOptions struct { // unless they explicitly intend to restore in place. func Restore(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volume, dest *config.Destination, opts RestoreOptions) (rep Report, err error) { rep = Report{Volume: vol.Name, Destination: dest.Name} + if dest.Type == "kopia" { + return rep, fmt.Errorf("destination %q is a kopia repository — restore from it with the kopia CLI (`kopia snapshot restore`)", dest.Name) + } + if dest.Layout == config.LayoutContentAddressed { + return rep, fmt.Errorf("destination %q uses the content-addressed layout — its restore tooling ships separately; the data is recoverable by replaying the manifest segments under %s/%s/ against the destination-root %s/ (see the README's manifest format)", dest.Name, vol.Name, ManifestDirName, ObjectsDirName) + } + if w := cryptVerificationWarning(dest, opts.Shallow); w != "" { + rep.Warnings = append(rep.Warnings, w) + } // "In-place" is the dangerous direction: writing into the live // volume path. Unsetting ToPath is the canonical request, but a @@ -734,7 +892,7 @@ func Restore(ctx context.Context, s *store.Store, rcl *Rclone, vol *config.Volum return rep, err } - runID, err := beginRestoreRun(ctx, s, opts.DryRun, v.ID, dest.Name, opts.Shallow) + runID, err := beginRestoreRun(ctx, s, opts.DryRun, v.ID, dest.Name, EffectiveShallow(dest, opts.Shallow)) if err != nil { return rep, err } @@ -830,7 +988,8 @@ func buildRestoreArgs(vol *config.Volume, dest *config.Destination, runID int64, args = append(args, "--filter", "- /"+RestoreHistoryDirName+"/**") args = append(args, "--filter", "- /"+IndexDirName+"/**") } - if !opts.Shallow { + args = append(args, checkersArgs(dest)...) + if !EffectiveShallow(dest, opts.Shallow) { args = append(args, "--checksum", "--hash", "blake3") } if opts.DryRun { diff --git a/sync/sync_test.go b/sync/sync_test.go index e0c77a4..a035564 100644 --- a/sync/sync_test.go +++ b/sync/sync_test.go @@ -108,6 +108,24 @@ func TestSyncRequiresIndexedVolume(t *testing.T) { } } +// TestSyncRefusesOnVolumePathMismatch mirrors the restore and offload +// cross-checks (#114): a DB volumes.path that no longer matches the +// config-declared path makes the push handler refuse, so it cannot push +// one tree while the durability advance covers another. +func TestSyncRefusesOnVolumePathMismatch(t *testing.T) { + f := setupFixture(t) + // Seed the volume row with a path that differs from f.vol.Path so the + // shared requireIndexedVolume gate fails before rclone is invoked. + staleDir := t.TempDir() + if _, err := f.store.CreateVolume(context.Background(), f.vol.Name, staleDir); err != nil { + t.Fatalf("seed stale volume row: %v", err) + } + _, err := Sync(context.Background(), f.store, f.rcl, f.vol, f.dest, Options{}) + if err == nil || !strings.Contains(err.Error(), "resolve the conflict") { + t.Fatalf("expected path-mismatch error, got %v", err) + } +} + func TestSyncHappyPath(t *testing.T) { f := setupFixture(t) if err := os.WriteFile(filepath.Join(f.vol.Path, "a.txt"), []byte("alpha"), 0o644); err != nil { @@ -318,6 +336,39 @@ func TestSyncDryRunPath(t *testing.T) { if _, err := os.Stat(filepath.Join(f.dest.Root, f.vol.Name, "a.txt")); err == nil { t.Fatalf("dry-run wrote to destination; want no-op") } + if rep.Verification.Verified() { + t.Fatalf("dry-run must report an unverified result; got %+v", rep.Verification) + } +} + +// TestSyncHappyPathStampsVerification rides on the happy-path fixture +// to pin the typed durability report a default (BLAKE3) bucket sync +// produces. +func TestSyncHappyPathStampsVerification(t *testing.T) { + f := setupFixture(t) + if err := os.WriteFile(filepath.Join(f.vol.Path, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + f.runIndex(t) + + rep, err := Sync(context.Background(), f.store, f.rcl, f.vol, f.dest, Options{}) + if err != nil { + t.Fatalf("Sync: %v", err) + } + if !rep.Verification.Verified() || rep.Verification.Method != VerifyMethodBlake3 { + t.Fatalf("Verification = %+v, want verified blake3", rep.Verification) + } + if rep.Verification.Files != 1 { + t.Fatalf("Verification.Files = %d, want 1", rep.Verification.Files) + } + + shallowRep, err := Sync(context.Background(), f.store, f.rcl, f.vol, f.dest, Options{Shallow: true}) + if err != nil { + t.Fatalf("shallow Sync: %v", err) + } + if shallowRep.Verification.Verified() || shallowRep.Verification.Method != VerifyMethodSizeMtime { + t.Fatalf("shallow Verification = %+v, want unverified size+mtime", shallowRep.Verification) + } } // TestSyncWarnsAboutHistoryDirInSource exercises the advisory path: a @@ -381,7 +432,7 @@ func TestRunPairRefusesWhenAnotherIsRunning(t *testing.T) { beforeRuns, _ := f.store.ListRuns(context.Background(), store.ListRunsOpts{}) p := Pair{Volume: f.vol, Destination: f.dest} - rep, err := RunPair(context.Background(), f.store, f.rcl, p, Options{}) + rep, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, p, Options{}) if err == nil { t.Fatalf("expected refusal while a run is in flight; got rep=%+v", rep) } @@ -400,7 +451,7 @@ func TestRunPairRefusesWhenAnotherIsRunning(t *testing.T) { if err := f.store.FinishRun(context.Background(), stuckID, store.RunStatusFailed, "test cleanup", 0); err != nil { t.Fatalf("FinishRun: %v", err) } - if _, err := RunPair(context.Background(), f.store, f.rcl, p, Options{}); err != nil { + if _, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, p, Options{}); err != nil { t.Fatalf("RunPair after clearing stuck row: %v", err) } } @@ -435,7 +486,7 @@ func TestRunPairRefusesConcurrentInvocations(t *testing.T) { go func() { defer wg.Done() <-start - _, err := RunPair(context.Background(), f.store, f.rcl, p, Options{}) + _, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, p, Options{}) mu.Lock() defer mu.Unlock() switch { @@ -618,6 +669,156 @@ func TestBuildRcloneArgsRefusesZeroRunIDOutsideDryRun(t *testing.T) { } } +// cryptFixtureDest returns a remote destination with a crypt overlay, the +// shape the crypt addressing/verification tests share; these tests stop +// at argument construction. +func cryptFixtureDest() *config.Destination { + return &config.Destination{ + Name: "offsite", + Type: "sftp", + Root: "/data", + Crypt: &config.Crypt{Password: "obscured-pw"}, + } +} + +// TestBuildRcloneArgsCryptAddressing pins the two crypt behaviours of the +// sync args builder: transfers and the backup-dir address the overlay +// remote (whose remote line carries the root, so paths are +// volume-relative), and the BLAKE3 flags are dropped because crypt +// remotes expose no content hashes. +func TestBuildRcloneArgsCryptAddressing(t *testing.T) { + vol := &config.Volume{Name: "pics", Path: "/tmp/pics"} + + args, err := buildRcloneArgs(vol, cryptFixtureDest(), 7, Options{}) + if err != nil { + t.Fatalf("buildRcloneArgs: %v", err) + } + joined := strings.Join(args, " ") + if got := args[len(args)-1]; got != "offsite-crypt:pics/" { + t.Fatalf("dst arg = %q, want offsite-crypt:pics/", got) + } + if !strings.Contains(joined, "--backup-dir offsite-crypt:pics/"+HistoryDirName+"/run-7") { + t.Fatalf("backup-dir not addressed through the crypt remote: %s", joined) + } + if strings.Contains(joined, "--checksum") || strings.Contains(joined, "blake3") { + t.Fatalf("BLAKE3 flags passed to a crypt destination: %s", joined) + } + + plain := cryptFixtureDest() + plain.Crypt = nil + plainArgs, err := buildRcloneArgs(vol, plain, 7, Options{}) + if err != nil { + t.Fatalf("buildRcloneArgs (plain): %v", err) + } + plainJoined := strings.Join(plainArgs, " ") + if got := plainArgs[len(plainArgs)-1]; got != "offsite:/data/pics/" { + t.Fatalf("plain dst arg = %q, want offsite:/data/pics/", got) + } + if !strings.Contains(plainJoined, "--checksum --hash blake3") { + t.Fatalf("plain destination lost its BLAKE3 flags: %s", plainJoined) + } +} + +// TestBuildRestoreArgsCryptAddressing mirrors the sync case for the pull +// direction: the source is the crypt remote and the BLAKE3 flags stay off. +func TestBuildRestoreArgsCryptAddressing(t *testing.T) { + vol := &config.Volume{Name: "pics", Path: "/tmp/pics"} + args := buildRestoreArgs(vol, cryptFixtureDest(), 3, RestoreOptions{ToPath: "/tmp/scratch"}) + joined := strings.Join(args, " ") + if got := args[len(args)-2]; got != "offsite-crypt:pics/" { + t.Fatalf("src arg = %q, want offsite-crypt:pics/", got) + } + if strings.Contains(joined, "--checksum") || strings.Contains(joined, "blake3") { + t.Fatalf("BLAKE3 flags passed for a crypt source: %s", joined) + } +} + +// TestIndexDirURICrypt: the snapshot ride-along lands inside the encrypted +// tree, addressed through the same overlay as the data transfer. +func TestIndexDirURICrypt(t *testing.T) { + if got := indexDirURI(cryptFixtureDest(), "pics"); got != "offsite-crypt:pics/"+IndexDirName { + t.Fatalf("indexDirURI = %q, want offsite-crypt:pics/%s", got, IndexDirName) + } +} + +// TestEffectiveShallowCrypt pins that a crypt destination downgrades a +// non-shallow request (and that the runs row will say so), while a plain +// destination passes the flag through. +func TestEffectiveShallowCrypt(t *testing.T) { + crypt := cryptFixtureDest() + plain := cryptFixtureDest() + plain.Crypt = nil + cases := []struct { + dest *config.Destination + shallow bool + want bool + }{ + {crypt, false, true}, + {crypt, true, true}, + {plain, false, false}, + {plain, true, true}, + } + for _, c := range cases { + if got := EffectiveShallow(c.dest, c.shallow); got != c.want { + t.Errorf("EffectiveShallow(crypt=%v, shallow=%v) = %v, want %v", + c.dest.Crypt != nil, c.shallow, got, c.want) + } + } +} + +// TestShallowForPairs pins the version-preflight scope: only an +// invocation with at least one blake3-verified target (a plain bucket +// or a peer node) requires the full rclone floor. +func TestShallowForPairs(t *testing.T) { + crypt := cryptFixtureDest() + plain := cryptFixtureDest() + plain.Crypt = nil + node := Pair{Node: &config.Node{Name: "peer"}} + kopia := Pair{Destination: &config.Destination{Name: "mirror", Type: "kopia", Root: "/tmp/repo"}} + contentAddressed := Pair{Destination: &config.Destination{Name: "archive", Type: "sftp", Root: "/data", Layout: config.LayoutContentAddressed}} + cases := []struct { + name string + pairs []Pair + shallow bool + want bool + }{ + {"user shallow wins", []Pair{{Destination: plain}}, true, true}, + {"all crypt", []Pair{{Destination: crypt}, {Destination: crypt}}, false, true}, + {"mixed crypt and plain", []Pair{{Destination: crypt}, {Destination: plain}}, false, false}, + {"node target verifies", []Pair{{Destination: crypt}, node}, false, false}, + {"kopia pair puts no constraint on rclone", []Pair{kopia, {Destination: crypt}}, false, true}, + {"kopia beside plain still verifies", []Pair{kopia, {Destination: plain}}, false, false}, + {"content-addressed pair puts no constraint on rclone", []Pair{contentAddressed, {Destination: crypt}}, false, true}, + {"content-addressed beside plain still verifies", []Pair{contentAddressed, {Destination: plain}}, false, false}, + {"no pairs", nil, false, true}, + } + for _, c := range cases { + if got := ShallowForPairs(c.pairs, c.shallow); got != c.want { + t.Errorf("%s: ShallowForPairs = %v, want %v", c.name, got, c.want) + } + } +} + +// TestCryptVerificationWarning: the advisory fires exactly when the +// fallback is implicit — a crypt destination without --shallow. An +// explicit --shallow run already gets the CLI's shallow warning, and a +// plain destination has nothing to warn about. +func TestCryptVerificationWarning(t *testing.T) { + crypt := cryptFixtureDest() + w := cryptVerificationWarning(crypt, false) + if !strings.Contains(w, "size+mtime") || !strings.Contains(w, crypt.Name) { + t.Fatalf("warning = %q, want one naming the destination and the size+mtime fallback", w) + } + if w := cryptVerificationWarning(crypt, true); w != "" { + t.Fatalf("warning = %q for an explicit --shallow run, want empty", w) + } + plain := cryptFixtureDest() + plain.Crypt = nil + if w := cryptVerificationWarning(plain, false); w != "" { + t.Fatalf("warning = %q for a plain destination, want empty", w) + } +} + // TestSyncRefusesUninitialisedDestination removes the bootstrap // marker that setupFixture seeded and confirms Sync refuses to run // without --init. This is the threat model: a typo in dest.Root diff --git a/sync/vector_wiring_test.go b/sync/vector_wiring_test.go new file mode 100644 index 0000000..7bca984 --- /dev/null +++ b/sync/vector_wiring_test.go @@ -0,0 +1,115 @@ +package sync + +import ( + "context" + "os" + "path/filepath" + "testing" + + "github.com/mbertschler/squirrel/store" +) + +func volumeComponents(t *testing.T, s *store.Store, volName, dest string) []store.DestinationRunID { + t.Helper() + v, err := s.GetVolumeByName(context.Background(), volName) + if err != nil { + t.Fatalf("GetVolumeByName: %v", err) + } + rows, err := s.ListVolumeDestinationRunIDs(context.Background(), v.ID) + if err != nil { + t.Fatalf("ListVolumeDestinationRunIDs: %v", err) + } + var out []store.DestinationRunID + for _, r := range rows { + if r.Destination == dest { + out = append(out, r) + } + } + return out +} + +// TestRunPairAdvancesVectorOnVerifiedPush: a BLAKE3-verified successful +// bucket push advances the destination's durability vector for the +// volume's origins. +func TestRunPairAdvancesVectorOnVerifiedPush(t *testing.T) { + f := setupFixture(t) + if err := os.WriteFile(filepath.Join(f.vol.Path, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + f.runIndex(t) + + p := Pair{Volume: f.vol, Destination: f.dest} + rep, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, p, Options{}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if !rep.Verification.Verified() { + t.Fatalf("Verification = %+v, want verified", rep.Verification) + } + comps := volumeComponents(t, f.store, f.vol.Name, f.dest.Name) + if len(comps) != 1 { + t.Fatalf("components = %+v, want exactly one self component", comps) + } + if comps[0].OriginRunID < 1 { + t.Fatalf("origin_run_id = %d, want >= 1", comps[0].OriginRunID) + } +} + +// TestRunPairShallowPushLeavesVectorAlone: a shallow push is not +// content-verified, so the vector keeps its prior state. +func TestRunPairShallowPushLeavesVectorAlone(t *testing.T) { + f := setupFixture(t) + if err := os.WriteFile(filepath.Join(f.vol.Path, "a.txt"), []byte("alpha"), 0o644); err != nil { + t.Fatal(err) + } + f.runIndex(t) + + p := Pair{Volume: f.vol, Destination: f.dest} + rep, err := RunPair(context.Background(), f.store, Tools{Rclone: f.rcl}, p, Options{Shallow: true}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if rep.Verification.Verified() { + t.Fatalf("Verification = %+v, want unverified for shallow", rep.Verification) + } + if comps := volumeComponents(t, f.store, f.vol.Name, f.dest.Name); len(comps) != 0 { + t.Fatalf("components = %+v, want none after shallow push", comps) + } +} + +// TestKopiaPushAdvancesVector: a kopia push whose snapshot verify +// succeeds advances the vector exactly like a verified bucket push. +func TestKopiaPushAdvancesVector(t *testing.T) { + installFakeKopia(t) + f := setupKopiaFixture(t) + + rep, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}) + if err != nil { + t.Fatalf("RunPair: %v (rep=%+v)", err, rep) + } + if rep.Status != store.RunStatusSuccess || !rep.Verification.Verified() { + t.Fatalf("rep = %+v, want verified success backing the advance", rep) + } + comps := volumeComponents(t, f.store, f.pair.Volume.Name, f.pair.Destination.Name) + if len(comps) != 1 { + t.Fatalf("components = %+v, want exactly one self component", comps) + } + if comps[0].OriginRunID < 1 { + t.Fatalf("origin_run_id = %d, want >= 1", comps[0].OriginRunID) + } +} + +// TestKopiaVerifyFailureLeavesVectorAlone: a failed verify means no +// durability claim, so the vector keeps its prior state. +func TestKopiaVerifyFailureLeavesVectorAlone(t *testing.T) { + installFakeKopia(t) + t.Setenv("KOPIA_FAKE_VERIFY_EXIT", "1") + f := setupKopiaFixture(t) + + if _, err := RunPair(context.Background(), f.store, f.tools, f.pair, Options{}); err == nil { + t.Fatalf("RunPair succeeded, want verify failure") + } + if comps := volumeComponents(t, f.store, f.pair.Volume.Name, f.pair.Destination.Name); len(comps) != 0 { + t.Fatalf("components = %+v, want none after failed verify", comps) + } +} diff --git a/sync/verify_boundary_test.go b/sync/verify_boundary_test.go new file mode 100644 index 0000000..89914c0 --- /dev/null +++ b/sync/verify_boundary_test.go @@ -0,0 +1,25 @@ +// Package sync_test pins the handler/hook boundary from the outside: +// VerifyResult's verified flag is unexported, so the strongest claim any +// code outside the sync package — the hook mechanism included — can +// construct is an unverified result. Hooks stay exit-code-only +// (hook.Outcome), and durability reporting stays with the curated +// handlers by construction. +package sync_test + +import ( + "testing" + + "github.com/mbertschler/squirrel/sync" +) + +func TestVerifyResultUnmintableOutsidePackage(t *testing.T) { + v := sync.VerifyResult{ + Method: sync.VerifyMethodBlake3, + SnapshotID: "abc", + Files: 100, + Bytes: 1 << 20, + } + if v.Verified() { + t.Fatalf("a VerifyResult built outside the sync package must report unverified") + } +} diff --git a/sync/verify_remote.go b/sync/verify_remote.go new file mode 100644 index 0000000..23f84a2 --- /dev/null +++ b/sync/verify_remote.go @@ -0,0 +1,212 @@ +package sync + +import ( + "context" + "encoding/hex" + "fmt" + "maps" + "slices" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// RemoteVerifyReport summarises one verification pass over a +// destination's recorded content-addressed objects. Every recorded +// object lands in exactly one of Verified, Populated, Pending, Missing, +// or Mismatched. +type RemoteVerifyReport struct { + Destination string + RunID int64 + // Objects is the number of recorded upload rows examined. + Objects int + // Verified objects matched their recorded fingerprint and had + // verified_at_ns stamped. + Verified int + // Populated objects had no fingerprint yet (uploaded before capture + // existed, or a capture that failed) and got one recorded on this + // pass. + Populated int + // Pending objects still have no fingerprint: the backend exposes no + // checksum for them. + Pending int + // Unrecorded counts objects present under the destination's objects/ + // directory with no upload record — orphans of runs that failed + // before recording, harmless without a manifest mapping them. + Unrecorded int + // Missing lists recorded objects (hex hash keys) absent from the + // remote listing. + Missing []string + Mismatched []RemoteObjectMismatch +} + +// RemoteObjectMismatch is one object whose provider checksum no longer +// matches the fingerprint recorded at upload time — potential corruption +// or tampering at the destination. The recorded fingerprint stays +// untouched so the evidence survives the pass. +type RemoteObjectMismatch struct { + Hash string + Algo string + Recorded string + // Actual is the provider's current value, or empty when the remote + // no longer exposes the recorded algo for the object. + Actual string +} + +// Clean reports whether every recorded object was accounted for without +// a mismatch. +func (r RemoteVerifyReport) Clean() bool { + return len(r.Missing) == 0 && len(r.Mismatched) == 0 +} + +// VerifyRemote re-reads the provider checksums of every object recorded +// on dest's underlying remote — one batched `lsjson --hash` over the +// destination-global objects/ directory — and compares them verbatim +// against the fingerprints recorded at upload time. Matches stamp +// verified_at_ns; objects with a pending fingerprint get one recorded; +// mismatches and missing objects land loudly on the report. The pass +// reads destination metadata and updates local verification state only. +// +// The pass is recorded as a kind='audit' run: success when every object +// checked out, partial when objects mismatched or went missing, failed +// when the pass itself aborted. A 'verify-destination' runs_audit entry +// carries the destination name and counters. +func VerifyRemote(ctx context.Context, s *store.Store, rcl *Rclone, dest *config.Destination) (RemoteVerifyReport, error) { + rep := RemoteVerifyReport{Destination: dest.Name} + if dest.Layout != config.LayoutContentAddressed { + return rep, fmt.Errorf("destination %q has layout %q — verify covers the recorded objects of content-addressed destinations", dest.Name, dest.Layout) + } + rows, err := s.ListRemoteObjects(ctx, dest.Name) + if err != nil { + return rep, fmt.Errorf("list recorded objects for %q: %w", dest.Name, err) + } + rep.Objects = len(rows) + if len(rows) == 0 { + return rep, nil + } + runID, err := s.BeginRemoteVerifyRun(ctx) + if err != nil { + return rep, fmt.Errorf("record verify run: %w", err) + } + rep.RunID = runID + + verifyErr := verifyRecordedObjects(ctx, s, rcl, dest, rows, &rep) + if err := recordVerifyOutcome(ctx, s, &rep, verifyErr); err != nil { + return rep, err + } + return rep, verifyErr +} + +// verifyRecordedObjects compares the remote listing against the recorded +// rows and applies the per-object outcome to the store and the report. +func verifyRecordedObjects(ctx context.Context, s *store.Store, rcl *Rclone, dest *config.Destination, rows []store.RemoteObjectRecord, rep *RemoteVerifyReport) error { + entries, err := rcl.listHashes(ctx, underlyingObjectsURI(dest), verifyHashTypes(dest, rows), checkersArgs(dest)...) + if err != nil { + return fmt.Errorf("read object checksums from %q: %w", dest.Name, err) + } + byName := make(map[string]map[string]string, len(entries)) + for _, e := range entries { + byName[e.Name] = e.Hashes + } + matched := 0 + for _, row := range rows { + hash := hex.EncodeToString(row.Blake3) + hashes, ok := byName[hash] + if !ok { + rep.Missing = append(rep.Missing, hash) + continue + } + matched++ + if !row.ChecksumAlgo.Valid { + if err := populateFingerprint(ctx, s, dest, row, hashes, rep); err != nil { + return err + } + continue + } + actual := hashes[algoHashType(row.ChecksumAlgo.String)] + if actual == row.Checksum.String { + if err := s.MarkRemoteObjectVerified(ctx, row.ContentID, dest.Name, store.NowNs()); err != nil { + return fmt.Errorf("stamp verification of %s: %w", hash, err) + } + rep.Verified++ + continue + } + rep.Mismatched = append(rep.Mismatched, RemoteObjectMismatch{ + Hash: hash, + Algo: row.ChecksumAlgo.String, + Recorded: row.Checksum.String, + Actual: actual, + }) + } + rep.Unrecorded = len(entries) - matched + return nil +} + +// populateFingerprint records the first fingerprint for a row whose pair +// is still pending; a backend exposing no checksum keeps it pending. +func populateFingerprint(ctx context.Context, s *store.Store, dest *config.Destination, row store.RemoteObjectRecord, hashes map[string]string, rep *RemoteVerifyReport) error { + cs, ok := extractChecksum(dest, hashes) + if !ok { + rep.Pending++ + return nil + } + if err := s.SetRemoteObjectChecksum(ctx, row.ContentID, dest.Name, cs.Algo, cs.Value); err != nil { + return fmt.Errorf("record fingerprint for %s: %w", hex.EncodeToString(row.Blake3), err) + } + rep.Populated++ + return nil +} + +// verifyHashTypes plans the --hash-type set one pass requests: the hash +// names behind every recorded algo, plus the capture set when pending +// rows need a first fingerprint. nil requests every hash the backend +// exposes — required when pending rows exist on a backend with no +// configured selection. +func verifyHashTypes(dest *config.Destination, rows []store.RemoteObjectRecord) []string { + set := map[string]bool{} + pending := false + for _, row := range rows { + if row.ChecksumAlgo.Valid { + set[algoHashType(row.ChecksumAlgo.String)] = true + } else { + pending = true + } + } + if pending { + capture := captureHashTypes(dest) + if capture == nil { + return nil + } + for _, t := range capture { + set[t] = true + } + } + return slices.Sorted(maps.Keys(set)) +} + +// recordVerifyOutcome finishes the pass's audit run and appends the +// 'verify-destination' runs_audit entry naming the destination and +// counters. +func recordVerifyOutcome(ctx context.Context, s *store.Store, rep *RemoteVerifyReport, verifyErr error) error { + status := store.RunStatusSuccess + errMsg := "" + switch { + case verifyErr != nil: + status = store.RunStatusFailed + errMsg = verifyErr.Error() + case !rep.Clean(): + status = store.RunStatusPartial + errMsg = fmt.Sprintf("%d object(s) failed verification on %q", len(rep.Missing)+len(rep.Mismatched), rep.Destination) + } + note := fmt.Sprintf("destination=%s objects=%d verified=%d fingerprinted=%d pending=%d mismatched=%d missing=%d unrecorded=%d", + rep.Destination, rep.Objects, rep.Verified, rep.Populated, rep.Pending, len(rep.Mismatched), len(rep.Missing), rep.Unrecorded) + if err := s.AppendRunAudit(ctx, store.RunAuditEntry{ + RunID: rep.RunID, Transition: store.TransitionVerifyDestination, Note: note, + }); err != nil { + return err + } + if err := s.FinishRun(ctx, rep.RunID, status, errMsg, int64(rep.Objects)); err != nil { + return fmt.Errorf("finish verify run %d: %w", rep.RunID, err) + } + return nil +} diff --git a/sync/verify_remote_test.go b/sync/verify_remote_test.go new file mode 100644 index 0000000..68e4a17 --- /dev/null +++ b/sync/verify_remote_test.go @@ -0,0 +1,231 @@ +package sync + +import ( + "context" + "os" + "strings" + "testing" + + "github.com/mbertschler/squirrel/config" + "github.com/mbertschler/squirrel/store" +) + +// TestVerifyRemoteMatchStampsVerified: a clean pass stamps every +// recorded object verified and records an audit run whose +// 'verify-destination' note names the destination. +func TestVerifyRemoteMatchStampsVerified(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.write(t, "b.txt", "beta") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + ctx := context.Background() + rep, err := VerifyRemote(ctx, f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Objects != 2 || rep.Verified != 2 || !rep.Clean() { + t.Fatalf("rep = %+v, want 2/2 verified and clean", rep) + } + for _, path := range []string{"a.txt", "b.txt"} { + if obj := f.remoteObject(t, path); !obj.VerifiedAtNs.Valid { + t.Fatalf("object for %s not stamped verified: %+v", path, obj) + } + } + + run, err := f.store.GetRun(ctx, rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Kind != store.RunKindAudit || run.Status != store.RunStatusSuccess || + run.VolumeID.Valid || run.Destination.Valid || run.FileCount != 2 { + t.Fatalf("run = %+v, want a successful destination-less audit run over 2 objects", run) + } + audits, err := f.store.ListRunAudit(ctx, rep.RunID) + if err != nil { + t.Fatalf("ListRunAudit: %v", err) + } + var note string + for _, a := range audits { + if a.Transition == store.TransitionVerifyDestination { + note = a.Note.String + } + } + if !strings.Contains(note, "destination=offsite") || !strings.Contains(note, "verified=2") { + t.Fatalf("verify-destination note = %q, want destination and counters", note) + } +} + +// TestVerifyRemoteMismatchIsLoudAndPreservesEvidence: a changed provider +// checksum is reported per object, marks the run partial, and leaves +// both the recorded fingerprint and the verification stamp untouched. +func TestVerifyRemoteMismatchIsLoudAndPreservesEvidence(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + recorded := f.remoteObject(t, "a.txt") + + t.Setenv("RCLONE_FAKE_HASH_PREFIX", "tampered-") + ctx := context.Background() + rep, err := VerifyRemote(ctx, f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Clean() || len(rep.Mismatched) != 1 || rep.Verified != 0 { + t.Fatalf("rep = %+v, want exactly one mismatch", rep) + } + m := rep.Mismatched[0] + if m.Hash != blake3Hex("alpha") || m.Algo != "sha256" || + m.Recorded != recorded.Checksum.String || m.Actual != "tampered-"+recorded.Checksum.String { + t.Fatalf("mismatch = %+v, want recorded vs tampered values", m) + } + + after := f.remoteObject(t, "a.txt") + if after.Checksum != recorded.Checksum || after.VerifiedAtNs.Valid { + t.Fatalf("object = %+v after mismatch, want the recorded fingerprint preserved and no stamp", after) + } + run, err := f.store.GetRun(ctx, rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusPartial || !run.Error.Valid { + t.Fatalf("run = %+v, want partial with an error message", run) + } +} + +// TestVerifyRemotePopulatesPendingPair: objects uploaded without a +// fingerprint get one recorded on the first pass — counted separately, +// not stamped verified — and verify as matches on the next. +func TestVerifyRemotePopulatesPendingPair(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_NO_HASHES", "1") + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + t.Setenv("RCLONE_FAKE_NO_HASHES", "") + ctx := context.Background() + rep, err := VerifyRemote(ctx, f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("first VerifyRemote: %v", err) + } + if rep.Populated != 1 || rep.Verified != 0 || !rep.Clean() { + t.Fatalf("rep = %+v, want one populated fingerprint and none verified", rep) + } + obj := f.remoteObject(t, "a.txt") + if obj.ChecksumAlgo.String != "sha256" || obj.VerifiedAtNs.Valid { + t.Fatalf("object = %+v, want a fresh sha256 fingerprint without a stamp", obj) + } + + rep, err = VerifyRemote(ctx, f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("second VerifyRemote: %v", err) + } + if rep.Verified != 1 || rep.Populated != 0 { + t.Fatalf("rep = %+v, want the populated fingerprint verified on the second pass", rep) + } +} + +// TestVerifyRemotePendingStaysPendingWithoutChecksums: a backend that +// exposes no checksums keeps the pair pending without failing the pass. +func TestVerifyRemotePendingStaysPendingWithoutChecksums(t *testing.T) { + f := setupContentAddressedFixture(t) + t.Setenv("RCLONE_FAKE_NO_HASHES", "1") + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + + rep, err := VerifyRemote(context.Background(), f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Pending != 1 || rep.Populated != 0 || !rep.Clean() { + t.Fatalf("rep = %+v, want one still-pending object on a clean pass", rep) + } +} + +// TestVerifyRemoteMissingObject: a recorded object absent from the +// remote is reported loudly and marks the run partial. +func TestVerifyRemoteMissingObject(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + if err := os.Remove(f.remotePath(ObjectsDirName, blake3Hex("alpha"))); err != nil { + t.Fatalf("remove remote object: %v", err) + } + + ctx := context.Background() + rep, err := VerifyRemote(ctx, f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if len(rep.Missing) != 1 || rep.Missing[0] != blake3Hex("alpha") || rep.Clean() { + t.Fatalf("rep = %+v, want the missing object reported", rep) + } + run, err := f.store.GetRun(ctx, rep.RunID) + if err != nil { + t.Fatalf("GetRun: %v", err) + } + if run.Status != store.RunStatusPartial { + t.Fatalf("run status = %q, want partial", run.Status) + } +} + +// TestVerifyRemoteCountsUnrecordedObjects: orphan objects from runs that +// failed before recording are counted, not failed on — they are harmless +// without a manifest mapping them. +func TestVerifyRemoteCountsUnrecordedObjects(t *testing.T) { + f := setupContentAddressedFixture(t) + f.write(t, "a.txt", "alpha") + f.index(t) + if _, err := f.sync(t); err != nil { + t.Fatalf("sync: %v", err) + } + orphan := f.remotePath(ObjectsDirName, strings.Repeat("ab", 32)) + if err := os.WriteFile(orphan, []byte("orphan"), 0o644); err != nil { + t.Fatalf("write orphan: %v", err) + } + + rep, err := VerifyRemote(context.Background(), f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Unrecorded != 1 || rep.Verified != 1 || !rep.Clean() { + t.Fatalf("rep = %+v, want one unrecorded orphan on a clean pass", rep) + } +} + +// TestVerifyRemoteNoRecordedObjects: a destination with no upload +// records reports zero objects and writes no run. +func TestVerifyRemoteNoRecordedObjects(t *testing.T) { + f := setupContentAddressedFixture(t) + rep, err := VerifyRemote(context.Background(), f.store, f.rcl, f.pair.Destination) + if err != nil { + t.Fatalf("VerifyRemote: %v", err) + } + if rep.Objects != 0 || rep.RunID != 0 { + t.Fatalf("rep = %+v, want no objects and no run", rep) + } +} + +func TestVerifyRemoteRefusesMirrorDestination(t *testing.T) { + f := setupContentAddressedFixture(t) + dest := &config.Destination{Name: "mirror", Type: "sftp", Root: "/data", Layout: config.LayoutMirror} + _, err := VerifyRemote(context.Background(), f.store, f.rcl, dest) + if err == nil || !strings.Contains(err.Error(), "content-addressed") { + t.Fatalf("err = %v, want content-addressed refusal", err) + } +} diff --git a/syncproto/syncproto.go b/syncproto/syncproto.go index 3d9e45f..1ba3656 100644 --- a/syncproto/syncproto.go +++ b/syncproto/syncproto.go @@ -13,6 +13,12 @@ // POST /v1/sync/close initiator finalises; receiver commits the // index updates and advances the watermark // +// One session-less metadata endpoint shares the namespace: +// +// POST /v1/sync/durability returns the receiver's recorded +// destination durability vectors for a +// volume; reads only, writes nothing +// // Wire-level invariants: // // - Paths are always relative to the volume root (matches the @@ -21,6 +27,13 @@ // - BLAKE3 digests travel as 64-char lowercase hex strings (not raw // bytes), so payloads stay JSON-clean and human-readable in // captured traffic. +// - Node identity travels as the node NAME. Numeric node ids are +// local surrogate keys and differ per node; every side maps a +// wire name to its own nodes row. +// - Content origin (origin node name + origin-space run id) is +// carried verbatim across every hop. A forwarder never relabels +// an origin to itself — the durability version-vector math +// depends on origins staying in their introduction coordinates. // - Field names are part of the protocol; do not rename them // without bumping the version path. package syncproto @@ -34,21 +47,24 @@ const DispositionAlreadyCorrect = "already-correct" const DispositionTransfer = "transfer" // DispositionSupersede — receiver has a live row at this path with a -// different blake3, *and* the row's provenance traces back to this -// initiator at or before the last shared watermark. The receiver -// pre-moves the prior bytes to `.squirrel-history/run-<id>/` before -// responding. Ordinary supersession. +// different blake3, *and* the row was delivered by this initiator at +// or before the last shared watermark (delivery is judged by the run +// that first materialised the row, not by the content's origin — a +// forwarded origin still supersedes through its forwarder). The +// receiver pre-moves the prior bytes to `.squirrel-history/run-<id>/` +// before responding. Ordinary supersession. const DispositionSupersede = "supersede" // DispositionConflict — receiver has a live row at this path with a -// different blake3 that is not traceable to this initiator's prior -// writes (local write on the receiver, or sourced from a different -// peer post-watermark). The receiver pre-moves the prior bytes to -// `.squirrel-conflicts/run-<id>/<path>` and seeds a new `present` row -// at that path carrying the prior blake3 + prior provenance, so both -// versions remain reachable by hash and by path. The initiator wins -// live: rclone delivers its bytes to the original path and /close -// inserts a new `present` row there with `source_node_id = initiator`. +// different blake3 that this initiator did not deliver (content +// written locally on the receiver, or delivered by a different peer, +// or by this peer past the watermark). The receiver pre-moves the +// prior bytes to `.squirrel-conflicts/run-<id>/<path>` and seeds a new +// `present` row at that path carrying the prior blake3 + prior content +// origin, so both versions remain reachable by hash and by path. The +// initiator wins live: rclone delivers its bytes to the original path +// and /close inserts a new `present` row there carrying the entry's +// declared content origin. const DispositionConflict = "conflict" // DispositionCopyFromExisting — receiver has no live row at this path @@ -57,9 +73,10 @@ const DispositionConflict = "conflict" // network, the receiver materialises the new path locally by copying // from `CopyFromPath` (an independent inode — not a hardlink). The // initiator excludes the path from the rclone scope but still verifies -// the post-copy hash and writes a `present` row on /close, with -// `source_node_id = initiator` (the path is logically initiator-owned -// from the receiver's view, identical to a successful Transfer). +// the post-copy hash and writes a `present` row on /close carrying the +// entry's declared content origin (the path is logically +// initiator-owned from the receiver's view, identical to a successful +// Transfer). const DispositionCopyFromExisting = "copy-from-existing" // DedupStrategyCopy enables receiver-side local dedup: when a /plan @@ -101,8 +118,10 @@ type BeginRequest struct { // row fails the handshake. InitiatorNodeName string `json:"initiator_node_name"` // InitiatorEndpoint is the URL the receiver could in theory dial - // back to reach the initiator. Stored on the peer nodes row for - // future symmetry; not used by PR 3. + // back to reach the initiator. The receiver currently ignores it — + // a peer-supplied endpoint must not bind a dial-back URL from + // unauthenticated wire input (#110b) — so it stays only for wire + // compatibility until an operator-verified dial-back path exists. InitiatorEndpoint string `json:"initiator_endpoint,omitempty"` // InitiatorRunID is the initiator's *local* runs.id for this // sync. The receiver records it as runs.correlated_run_id and @@ -207,8 +226,8 @@ type PlanRequest struct { Entries []IndexEntry `json:"entries"` } -// IndexEntry is one (path, blake3, size, mtime) tuple sent from the -// initiator's index. mtime_ns travels for symmetry with the schema +// IndexEntry is one (path, blake3, size, mtime, origin) tuple sent from +// the initiator's index. mtime_ns travels for symmetry with the schema // even though the plan decision is by blake3 + path + provenance, not // timestamp. type IndexEntry struct { @@ -216,6 +235,19 @@ type IndexEntry struct { Blake3Hex string `json:"blake3"` SizeBytes int64 `json:"size_bytes"` MtimeNs int64 `json:"mtime_ns"` + // OriginNode and OriginRun are the content's global origin + // coordinate: the NAME of the node where the bytes first entered + // the system and that node's local run id at introduction. The + // sender materialises them for its own content (its node name + + // the content's introduction run) and forwards a stored origin + // verbatim — never relabelled to the immediate sender. The + // receiver records them on its contents row; the destination + // durability vectors are expressed in these coordinates. Both + // fields are set together; an initiator predating the origin + // exchange omits them and the receiver falls back to attributing + // the content to the initiator at its declared sync run. + OriginNode string `json:"origin_node,omitempty"` + OriginRun int64 `json:"origin_run,omitempty"` } // PlanResponse carries the receiver's per-path verdict. @@ -326,6 +358,61 @@ type CloseResponse struct { Committed int `json:"committed"` } +// DurabilityRequest asks the receiver for its recorded destination +// durability vectors for one volume (matched by name, like /begin). +// The endpoint is session-less and read-only: it can be called outside +// any sync, which is how a node holds offline evidence about +// destinations only the peer can see. +type DurabilityRequest struct { + Volume string `json:"volume"` +} + +// DurabilityResponse carries every vector component the receiver has +// recorded for the volume, across all of its destinations, plus the +// freshness coordinates of each destination's most recent whole-volume +// push. Empty when the receiver has never advanced a vector for the +// volume. +type DurabilityResponse struct { + Components []DurabilityComponent `json:"components,omitempty"` + Freshness []DurabilityFreshness `json:"freshness,omitempty"` +} + +// DurabilityComponent is one destination-vector component: the highest +// origin-space run of OriginNode's content the responding node has +// verified durable on Destination. Destination names live in the flat +// target namespace shared by buckets and peers; OriginNode is a node +// name (the cross-node identity), and OriginRun is in that node's run +// space. UpdatedAtNs is when the responding node last advanced or +// re-confirmed the component. VerifyMethod names the comparison that +// advanced it on the responding node, carried verbatim so the puller's +// offload gate weighs a pulled component exactly as the responder did +// (empty for a pre-v19 responder, which the gate reads as unverified). +type DurabilityComponent struct { + Destination string `json:"destination"` + OriginNode string `json:"origin_node"` + OriginRun int64 `json:"origin_run"` + UpdatedAtNs int64 `json:"updated_at_ns"` + VerifyMethod string `json:"verify_method,omitempty"` +} + +// DurabilityFreshness is one origin-space freshness coordinate: the +// highest origin run of OriginNode's content the responding node held +// present when it last completed a successful whole-volume push to +// Destination. It travels alongside the monotonic vector components so a +// puller that never pushes to Destination directly (a peer-relayed +// offsite) can satisfy its offload gate's freshness condition from the +// pushing node's own determination — the local-run-space push watermark +// is always zero for such a target. Coordinates are in OriginNode's run +// space, the same coordinates the gated content carries. Absent from an +// older responder's reply; a puller that finds no freshness for a +// relayed target refuses to offload (the safe direction). +type DurabilityFreshness struct { + Destination string `json:"destination"` + OriginNode string `json:"origin_node"` + OriginRun int64 `json:"origin_run"` + UpdatedAtNs int64 `json:"updated_at_ns"` +} + // ErrorResponse is the uniform error body. Mirrors agent's // errorResponse type but is exported so client-side decoding can name it. type ErrorResponse struct { diff --git a/tui/browse.go b/tui/browse.go index 2603656..d2759fa 100644 --- a/tui/browse.go +++ b/tui/browse.go @@ -239,11 +239,11 @@ func (m *browseModel) renderFileDetail(f store.FileRow) string { {"First seen", fmt.Sprintf("run #%d", f.FirstSeenRunID)}, {"Last seen", fmt.Sprintf("run #%d", f.LastSeenRunID)}, } - if f.SourceRunID.Valid { - lines = append(lines, [2]string{"Source run", fmt.Sprintf("#%d", f.SourceRunID.Int64)}) + if f.OriginRunID.Valid { + lines = append(lines, [2]string{"Origin run", fmt.Sprintf("#%d", f.OriginRunID.Int64)}) } - if f.SourceNodeID.Valid { - lines = append(lines, [2]string{"Source node", fmt.Sprintf("#%d", f.SourceNodeID.Int64)}) + if f.OriginNodeID.Valid { + lines = append(lines, [2]string{"Origin node", fmt.Sprintf("#%d", f.OriginNodeID.Int64)}) } var b strings.Builder b.WriteString(styleHeader.Render(path.Base(f.Path)))