Add SHA-256 checksums to pack version entries#14
Conversation
The regex-based parse_pack_toml() worked for simple flat fields but would break on multiline strings, inline tables, or other valid TOML constructs. Switch to Python's stdlib tomllib (3.11+) for spec-compliant parsing. Pin Python 3.12 in CI via actions/setup-python to guarantee tomllib availability regardless of runner image updates. Output is byte-for-byte identical to the regex-based parser for all 13 current packs. Closes #5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validate_pack() and validate_unique_server_names() that check: - Required fields (name, version, description) - Name format (lowercase, digits, hyphens, no trailing hyphen) - Semver format (X.Y.Z) - authors/keywords are string lists - [[servers]] syntax (not [servers]) - Server names: format, uniqueness within and across packs - Server command/url required based on transport type Validation runs before generation on every invocation. Add --validate-only flag for local/CI use without writing files. Add validate.yml workflow that runs on PRs touching src/** or scripts/generate.py so contributors get feedback before merge. Closes #6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add deprecation support:
- Read deprecated/deprecated_message from [pack] in pack.toml
- Propagate to packs/{name}.json and index.json
- Print [DEPRECATED] warnings during generation
Add removal support:
- Remove orphaned packs/*.json when src/ directory is deleted
- Ensures removed packs disappear from index.json
Document both workflows in CONTRIBUTING.md.
New fields are safe for Weave — PackMetadata and PackListing structs
use serde defaults and silently ignore unknown fields.
Closes #8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compute a sha256-prefixed checksum over each version's files map using the same canonical form as Weave's checksum::compute() — compact sorted JSON with raw UTF-8. Cross-language compatibility verified against Weave's pinned test vectors. Weave already has PackRelease.checksum: Option<String> with full verify() support (warns on None, errors on mismatch). This change populates the field so integrity verification activates automatically. Aligns with Weave roadmap issue #175 (Milestone 6 — security hardening). Closes #9 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the registry generation pipeline to add SHA-256 integrity checksums to each pack release entry, ensuring consumers (e.g., Weave) can verify pack content integrity based on a canonicalized representation of embedded pack files.
Changes:
- Add
checksum: "sha256:..."to each generated pack version entry (computed from a canonical JSON form of the embeddedfilesmap). - Switch pack manifest parsing/validation to
tomlliband add a--validate-onlyCI validation workflow. - Update publishing workflow to use Python 3.12 and extend contributor docs with deprecation/removal guidance.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scripts/generate.py | Adds checksum computation and integrates TOML parsing + validation + generation flow updates. |
| CONTRIBUTING.md | Documents pack deprecation and removal workflow. |
| .github/workflows/validate.yml | Adds PR-time validation job running generate.py --validate-only on Python 3.12. |
| .github/workflows/publish.yml | Ensures publish job runs with Python 3.12 (needed for tomllib). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| name = pack.get("name", "") | ||
| if name and not _NAME_RE.match(name): | ||
| errors.append( | ||
| f"{pack_dir_name}: invalid pack name '{name}' — " | ||
| "must be lowercase letters, digits, and hyphens only" | ||
| ) | ||
| if name and name != pack_dir_name: | ||
| errors.append( | ||
| f"{pack_dir_name}: pack name '{name}' does not match directory name" | ||
| ) | ||
|
|
||
| version = pack.get("version", "") | ||
| if version and not _SEMVER_RE.match(version): | ||
| errors.append( | ||
| f"{pack_dir_name}: invalid version '{version}' — must be X.Y.Z" | ||
| ) |
There was a problem hiding this comment.
validate_pack() can raise TypeError when name/version are not strings (e.g., name = 123 in TOML). _NAME_RE.match(name) and _SEMVER_RE.match(version) expect str, so this should explicitly validate types and emit a readable validation error instead of crashing.
| seen_server_names: set[str] = set() | ||
| for i, server in enumerate(servers): | ||
| srv_name = server.get("name") | ||
| if not srv_name: | ||
| errors.append(f"{pack_dir_name}: server #{i + 1} is missing 'name'") | ||
| elif not _NAME_RE.match(srv_name): |
There was a problem hiding this comment.
In validate_pack(), the loop assumes each [[servers]] entry is a dict (server.get(...)). If servers contains a non-table value (or a TOML parser returns a non-dict item), this will raise AttributeError and abort validation. Consider checking isinstance(server, dict) and reporting an error like "server #N must be a table" before accessing keys.
| cross_errors = validate_unique_server_names(src_dir) | ||
| all_errors.extend(cross_errors) | ||
| for err in cross_errors: | ||
| print(f" ERROR: {err}", file=sys.stderr) | ||
|
|
||
| if all_errors: | ||
| print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
There was a problem hiding this comment.
main() runs validate_unique_server_names() even when earlier per-pack validation has already produced errors. Since validate_unique_server_names() assumes data["servers"] is a list of dicts, an invalid pack (e.g. [servers] table) can cause a crash before you reach the if all_errors: sys.exit(1) block. Either (a) skip cross-pack checks when all_errors is non-empty, or (b) make validate_unique_server_names() robust to non-list / non-dict servers values.
| cross_errors = validate_unique_server_names(src_dir) | |
| all_errors.extend(cross_errors) | |
| for err in cross_errors: | |
| print(f" ERROR: {err}", file=sys.stderr) | |
| if all_errors: | |
| print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr) | |
| sys.exit(1) | |
| if not all_errors: | |
| cross_errors = validate_unique_server_names(src_dir) | |
| all_errors.extend(cross_errors) | |
| for err in cross_errors: | |
| print(f" ERROR: {err}", file=sys.stderr) | |
| if all_errors: | |
| print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr) | |
| sys.exit(1) |
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Regenerate packs/*.json and index.json | ||
| run: python3 scripts/generate.py |
There was a problem hiding this comment.
publish.yml is still configured to run on pushes to main only when src/** changes. This PR changes scripts/generate.py (checksum generation), so after merge the publish workflow won’t run to regenerate/commit packs/*.json with the new checksum field unless someone also changes src/** or manually dispatches the workflow. Consider adding scripts/generate.py (and possibly .github/workflows/publish.yml) to the on.push.paths list, or removing the paths filter so generator/schema changes get published automatically.
- Add isinstance check for [pack] section (must be a table, not scalar) - Add string type checks for required fields (name, version, description) - Guard server entries with isinstance(server, dict) checks - Harden validate_unique_server_names() against non-list/non-dict servers - Skip cross-pack validation when per-pack errors already exist - Use consistent `python` (not `python3`) in CI workflows - Add scripts/generate.py to publish.yml path triggers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
compute_checksum()togenerate.pyusing the same canonical form as Weave'schecksum::compute()packs/{name}.jsonnow includes achecksum: "sha256:..."fieldcross_language_checksum_matches_python)Weave compatibility
PackReleasealready haspub checksum: Option<String>with#[serde(default)]checksum::verify()activates when the field is present; warns and skips whenNoneRelated
schema_versionto REGISTRY.md JSON schemas)Test plan
index.jsonunchanged (checksums are only inpacks/*.json)Closes #9
🤖 Generated with Claude Code