feat(core): add pack content checksum verification on install and update#240
Conversation
Compute and emit a sha256: prefixed checksum for each pack version's files map. The canonical form is compact JSON with sorted keys, matching the weave client's checksum::compute() function. Companion to breferrari/weave#240.
|
Registry companion PR: PackWeave/registry#4 — adds |
There was a problem hiding this comment.
Pull request overview
Adds pack content integrity checking to the core install/update/profile-switch flows by introducing a canonical JSON SHA-256 checksum and plumbing an optional checksum field through registry releases.
Changes:
- Introduce
src/core/checksum.rswith deterministic canonicalization + SHA-256 computation and verification. - Add
checksum: Option<String>toPackRelease(serde default) and aWeaveError::ChecksumMismatcherror. - Invoke checksum verification before writing registry-provided pack files to the local store during install/update/profile switching fetch paths.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/checksum.rs |
New canonical JSON hashing + verification logic with unit tests. |
src/core/install.rs |
Verifies registry release checksum prior to dry-run processing and prior to Store::fetch(). |
src/core/update.rs |
Verifies registry release checksum prior to fetching/storing updated versions. |
src/core/use_profile.rs |
Verifies checksum when profile switching needs to fetch a missing pack version from the registry. |
src/core/registry.rs |
Adds checksum field to PackRelease and updates comments/tests accordingly. |
src/error.rs |
Adds ChecksumMismatch error variant and user-facing message. |
src/core/mod.rs |
Exposes the new core::checksum module. |
src/core/store.rs |
Updates test fixtures for the new PackRelease field. |
src/core/resolver.rs |
Updates test helpers to populate checksum: None. |
src/core/publish.rs |
Updates tests constructing PackRelease to include checksum: None. |
Cargo.toml |
Adds sha2 dependency. |
Cargo.lock |
Locks transitive dependencies for sha2. |
| // Only verify algorithms we understand; warn and skip for unknown ones | ||
| // so future algorithm upgrades don't break old clients. | ||
| if !expected.starts_with(SHA256_PREFIX) { | ||
| log::warn!( | ||
| "pack '{pack_name}' v{version} uses unknown checksum algorithm \ | ||
| '{}'; skipping verification", | ||
| expected.split(':').next().unwrap_or("unknown") | ||
| ); | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
[security] verify() treats any checksum string that doesn’t start with the exact "sha256:" prefix as an “unknown algorithm” and silently skips verification. This means malformed/typo’d values like "sha256" (missing colon) or "sha256 :..." will bypass integrity checks instead of failing fast. It would be safer to strictly parse expected as sha256:{64 hex} and return an error (or at least a hard failure) for malformed SHA-256 checksums, reserving warn+skip only for genuinely different algorithms (e.g., blake3:).
| //! release's `files` map (sorted keys, no trailing whitespace). Both the | ||
| //! registry generator (`scripts/generate.py`) and this module use the same | ||
| //! canonical form so hashes match cross-platform. |
There was a problem hiding this comment.
[low-priority] The module docs reference scripts/generate.py, but this repository doesn’t have a scripts/ directory. Consider rewording to point at the companion registry repo (or describe the canonicalization requirements without a path) so the comment doesn’t become misleading/stale.
| //! release's `files` map (sorted keys, no trailing whitespace). Both the | |
| //! registry generator (`scripts/generate.py`) and this module use the same | |
| //! canonical form so hashes match cross-platform. | |
| //! release's `files` map (sorted keys, no trailing whitespace). The registry | |
| //! generator and this module must use the same canonical form so hashes | |
| //! match cross-platform. |
| // Fetch from registry and verify integrity | ||
| let release = registry.fetch_version(name, version)?; | ||
| checksum::verify(name, version, &release.files, release.checksum.as_deref())?; | ||
| Store::fetch(name, &release, Some(source))?; | ||
| Store::load_pack(name, version, Some(source)) |
There was a problem hiding this comment.
[correctness] load_or_fetch_pack() only calls checksum::verify() when a pack is missing and needs to be fetched. When the pack is already present in the store (the common case during weave use / profile switch), integrity verification is skipped entirely, which doesn’t match the PR’s stated behavior (“verification on … use”) and won’t detect local store corruption/tampering. Consider persisting the expected checksum in the lockfile/store and verifying on load, or fetching the release metadata (checksum) and verifying the on-disk pack before apply during profile switching.
- Fix unicode divergence (SEC-5): document that both Python and Rust use raw UTF-8 (ensure_ascii=False); add cross-language non-ASCII test - Add remediation hint to ChecksumMismatch error message (PO-1) - Add debug log on successful verification (PO-2) - Improve unknown algorithm warning with upgrade hint (SEC-1) - Add module-level doc noting checksums verify integrity not authorship (SEC-2) - Optimize hex encoder to single pre-allocated string (ENG-1) - Add tests: empty files map, compact separators, JSON special chars, non-ASCII raw UTF-8, cross-language non-ASCII pinned hash - Update ARCHITECTURE.md: fix "no SHA256 ceremony", add checksum module to map, add Checksum step to data flow, add field to JSON example - Update AGENTS.md: add core/checksum.rs to module map - Update REGISTRY.md: fix stale references, add checksum to examples
The blocked-by step was silently failing because NODE_ID and BLOCKER_ID were set in separate Bash calls — shell variables don't persist across tool invocations. Chain all GraphQL calls with && in a single shell.
The test was flaky because it called acquire() twice in the same process. On some Linux CI runners (overlayfs in containers), flock same-process semantics allowed the second lock to succeed. Fix: open the lock file and hold the exclusive lock directly via fs2, then call acquire() which opens a second file descriptor. Two distinct open-file-descriptions on the same file reliably trigger EWOULDBLOCK.
Summary
weave install,weave update, andweave use(profile switch)src/core/checksum.rsmodule:compute()producessha256:{hex}checksums over canonical sorted JSON of the files map;verify()compares against the registry-provided checksumchecksum: Option<String>field toPackReleasewith#[serde(default)]for backward compatibility with registries that predate checksumsChecksumMismatcherror variant with clear "may have been tampered with" messagingCompanion PR: PackWeave/registry —
generate.pynow computes and emits checksums for all pack versions.Closes #175
Test plan
compute_known_hash— checksum has correct formatcompute_is_deterministic— same input always produces same outputverify_matching_checksum— matching hash passesverify_mismatching_checksum— wrong hash returnsChecksumMismatchverify_none_checksum_ok— absent checksum warns and proceedsverify_unknown_algorithm_ok— unrecognized prefix warns and proceedscanonical_json_sorts_keys— key ordering is deterministiccross_language_checksum_matches_python— Rust and Python produce identical hashes for the same input