Skip to content

Add SHA-256 checksums to pack version entries#14

Open
breferrari wants to merge 5 commits into
mainfrom
feat/checksums
Open

Add SHA-256 checksums to pack version entries#14
breferrari wants to merge 5 commits into
mainfrom
feat/checksums

Conversation

@breferrari
Copy link
Copy Markdown
Contributor

Summary

Weave compatibility

  • PackRelease already has pub checksum: Option<String> with #[serde(default)]
  • Weave's checksum::verify() activates when the field is present; warns and skips when None
  • No Weave code changes needed — checksums start working immediately once this registry change is merged

Related

Test plan

  • All 13 packs generate successfully with checksums
  • index.json unchanged (checksums are only in packs/*.json)
  • Python checksum output matches Weave's Rust pinned test vectors
  • CI passes

Closes #9

🤖 Generated with Claude Code

breferrari and others added 4 commits March 25, 2026 09:54
The regex-based parse_pack_toml() worked for simple flat fields but would
break on multiline strings, inline tables, or other valid TOML constructs.
Switch to Python's stdlib tomllib (3.11+) for spec-compliant parsing.

Pin Python 3.12 in CI via actions/setup-python to guarantee tomllib
availability regardless of runner image updates.

Output is byte-for-byte identical to the regex-based parser for all 13
current packs.

Closes #5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validate_pack() and validate_unique_server_names() that check:
- Required fields (name, version, description)
- Name format (lowercase, digits, hyphens, no trailing hyphen)
- Semver format (X.Y.Z)
- authors/keywords are string lists
- [[servers]] syntax (not [servers])
- Server names: format, uniqueness within and across packs
- Server command/url required based on transport type

Validation runs before generation on every invocation. Add
--validate-only flag for local/CI use without writing files.

Add validate.yml workflow that runs on PRs touching src/** or
scripts/generate.py so contributors get feedback before merge.

Closes #6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add deprecation support:
- Read deprecated/deprecated_message from [pack] in pack.toml
- Propagate to packs/{name}.json and index.json
- Print [DEPRECATED] warnings during generation

Add removal support:
- Remove orphaned packs/*.json when src/ directory is deleted
- Ensures removed packs disappear from index.json

Document both workflows in CONTRIBUTING.md.

New fields are safe for Weave — PackMetadata and PackListing structs
use serde defaults and silently ignore unknown fields.

Closes #8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compute a sha256-prefixed checksum over each version's files map using
the same canonical form as Weave's checksum::compute() — compact sorted
JSON with raw UTF-8. Cross-language compatibility verified against
Weave's pinned test vectors.

Weave already has PackRelease.checksum: Option<String> with full
verify() support (warns on None, errors on mismatch). This change
populates the field so integrity verification activates automatically.

Aligns with Weave roadmap issue #175 (Milestone 6 — security hardening).

Closes #9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 08:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the registry generation pipeline to add SHA-256 integrity checksums to each pack release entry, ensuring consumers (e.g., Weave) can verify pack content integrity based on a canonicalized representation of embedded pack files.

Changes:

  • Add checksum: "sha256:..." to each generated pack version entry (computed from a canonical JSON form of the embedded files map).
  • Switch pack manifest parsing/validation to tomllib and add a --validate-only CI validation workflow.
  • Update publishing workflow to use Python 3.12 and extend contributor docs with deprecation/removal guidance.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
scripts/generate.py Adds checksum computation and integrates TOML parsing + validation + generation flow updates.
CONTRIBUTING.md Documents pack deprecation and removal workflow.
.github/workflows/validate.yml Adds PR-time validation job running generate.py --validate-only on Python 3.12.
.github/workflows/publish.yml Ensures publish job runs with Python 3.12 (needed for tomllib).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate.py
Comment on lines +82 to +97
name = pack.get("name", "")
if name and not _NAME_RE.match(name):
errors.append(
f"{pack_dir_name}: invalid pack name '{name}' — "
"must be lowercase letters, digits, and hyphens only"
)
if name and name != pack_dir_name:
errors.append(
f"{pack_dir_name}: pack name '{name}' does not match directory name"
)

version = pack.get("version", "")
if version and not _SEMVER_RE.match(version):
errors.append(
f"{pack_dir_name}: invalid version '{version}' — must be X.Y.Z"
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_pack() can raise TypeError when name/version are not strings (e.g., name = 123 in TOML). _NAME_RE.match(name) and _SEMVER_RE.match(version) expect str, so this should explicitly validate types and emit a readable validation error instead of crashing.

Copilot uses AI. Check for mistakes.
Comment thread scripts/generate.py
Comment on lines +115 to +120
seen_server_names: set[str] = set()
for i, server in enumerate(servers):
srv_name = server.get("name")
if not srv_name:
errors.append(f"{pack_dir_name}: server #{i + 1} is missing 'name'")
elif not _NAME_RE.match(srv_name):
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In validate_pack(), the loop assumes each [[servers]] entry is a dict (server.get(...)). If servers contains a non-table value (or a TOML parser returns a non-dict item), this will raise AttributeError and abort validation. Consider checking isinstance(server, dict) and reporting an error like "server #N must be a table" before accessing keys.

Copilot uses AI. Check for mistakes.
Comment thread scripts/generate.py Outdated
Comment on lines +375 to +383
cross_errors = validate_unique_server_names(src_dir)
all_errors.extend(cross_errors)
for err in cross_errors:
print(f" ERROR: {err}", file=sys.stderr)

if all_errors:
print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr)
sys.exit(1)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main() runs validate_unique_server_names() even when earlier per-pack validation has already produced errors. Since validate_unique_server_names() assumes data["servers"] is a list of dicts, an invalid pack (e.g. [servers] table) can cause a crash before you reach the if all_errors: sys.exit(1) block. Either (a) skip cross-pack checks when all_errors is non-empty, or (b) make validate_unique_server_names() robust to non-list / non-dict servers values.

Suggested change
cross_errors = validate_unique_server_names(src_dir)
all_errors.extend(cross_errors)
for err in cross_errors:
print(f" ERROR: {err}", file=sys.stderr)
if all_errors:
print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr)
sys.exit(1)
if not all_errors:
cross_errors = validate_unique_server_names(src_dir)
all_errors.extend(cross_errors)
for err in cross_errors:
print(f" ERROR: {err}", file=sys.stderr)
if all_errors:
print(f"\nValidation failed with {len(all_errors)} error(s).", file=sys.stderr)
sys.exit(1)

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/publish.yml Outdated
Comment on lines 22 to 27
- uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Regenerate packs/*.json and index.json
run: python3 scripts/generate.py
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publish.yml is still configured to run on pushes to main only when src/** changes. This PR changes scripts/generate.py (checksum generation), so after merge the publish workflow won’t run to regenerate/commit packs/*.json with the new checksum field unless someone also changes src/** or manually dispatches the workflow. Consider adding scripts/generate.py (and possibly .github/workflows/publish.yml) to the on.push.paths list, or removing the paths filter so generator/schema changes get published automatically.

Copilot uses AI. Check for mistakes.
- Add isinstance check for [pack] section (must be a table, not scalar)
- Add string type checks for required fields (name, version, description)
- Guard server entries with isinstance(server, dict) checks
- Harden validate_unique_server_names() against non-list/non-dict servers
- Skip cross-pack validation when per-pack errors already exist
- Use consistent `python` (not `python3`) in CI workflows
- Add scripts/generate.py to publish.yml path triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add file checksums to packs/*.json

2 participants