feat(release): release control-plane tooling foundation (Task 21)#822
feat(release): release control-plane tooling foundation (Task 21)#822law-chain-hot wants to merge 3 commits into
Conversation
Pure-logic, fully unit-tested building blocks for the standardized release
pipeline — no AWS/registry/CI dependencies, so they verify in isolation:
- manifest: validate release.yaml (WHAT-to-ship contract) + CLI
- artifact_manifest: immutable build-once, digest-pinned artifact record
- preflight: resource-diff / DB-baseline / DNS-collision gates
(DB baseline encodes the 1741/1753 drift incident class)
- ledger: release ledger + last-good-SHA for Cloud rollback
- versions: OSS single-version-source drift check
37 pytest tests, mutation-checked. Workflows that feed these real sst-diff/
DB/build inputs are intentionally out of scope (need CI to verify).
…k 21) - cicd/release_plan.py + release_plan.py: read-only dry-run that runs all pre-apply gates (resource diff / DB baseline / DNS) against a manifest + artifact manifest + evidence, emitting can_apply/blocked_by. Exit 0/1. - versions.py writeback (set_first_version / set_json_version_text) + write_versions.py CLI: --check reports OSS lockstep drift, --write aligns. 13 new tests (50 total). All pure-logic / fixture-verified; running write against the live tree and wiring into CI workflows remain out of scope.
📝 WalkthroughWalkthroughThis PR introduces a complete ChangesRelease Control-Plane Tooling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71f1bae595
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if INFRA_TRAINS & set(report["trains"]): | ||
| _merge(report, pf.resource_gate(pf.parse_sst_diff(sst_diff_text), allow_infra_change), "resource") | ||
|
|
||
| if "cloud" in report["trains"] and migrations_in_db is not None and expected_baseline is not None: |
There was a problem hiding this comment.
Require DB evidence before approving cloud plans
When the cloud train is enabled but the caller omits either --migrations or --baseline, this condition skips the DB baseline gate entirely and can_apply can stay true. Since the CLI flags are optional, a CI wiring mistake would silently bypass the drift check that this release plan is meant to enforce; missing baseline evidence for a cloud deploy should block or require an explicit override.
Useful? React with 👍 / 👎.
| if isinstance(data, dict): | ||
| if (data.get("oss") or {}).get("release"): | ||
| out.append("oss") | ||
| if (data.get("cloud") or {}).get("deploy"): |
There was a problem hiding this comment.
Reject non-boolean train flags
If a manifest contains a quoted false such as cloud: {deploy: "false", stage: dev} or any other non-empty string, this truthiness check activates the cloud train and validation passes, so a PR-reviewed release file can request a deploy without setting a boolean true. The train flags should be validated as booleans and enabled only when the value is exactly true.
Useful? React with 👍 / 👎.
| for key in sorted(set(data) - set(TRAINS)): | ||
| errors.append(ManifestError(key, "unknown top-level key (allowed: oss/cloud/runner)")) | ||
|
|
||
| enabled = enabled_trains(data) |
There was a problem hiding this comment.
Guard train sections before deriving enabled trains
When oss, cloud, or runner is a non-mapping YAML value like oss: [bad], validate() calls enabled_trains() before the section type checks below, and enabled_trains() calls .get on that list/string. That makes validate_manifest.py crash instead of returning the intended must be a mapping validation error for malformed release files.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (1)
scripts/release/tests/test_artifact_manifest.py (1)
18-48: ⚡ Quick winAdd regression tests for non-string/unhashable
nameandkindinputs.Current tests don’t cover malformed types that can crash
validate(). Add a case wherenameorkindis a list/dict and assert validation returns errors (not exceptions).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/release/tests/test_artifact_manifest.py` around lines 18 - 48, Add new regression test functions to handle malformed types for the name and kind fields. Create test functions test_invalid_name_type_rejected and test_invalid_kind_type_rejected that each use the _artifact() helper to pass non-string/unhashable types (such as lists or dictionaries) for the name and kind parameters respectively, then call am.build() followed by am.validate() and assert that validation returns error messages containing the relevant field names rather than raising exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/release-tooling.yml:
- Around line 27-28: The GitHub Actions workflow file uses mutable version tags
instead of immutable commit SHAs for both the actions/checkout and
actions/setup-python steps, and the checkout step does not disable credential
persistence. Replace the `@v4` tag on the actions/checkout action with a specific
commit SHA, replace the `@v5` tag on the actions/setup-python action with a
specific commit SHA, and add the persist-credentials: false option to the
actions/checkout step to prevent credentials from being available to subsequent
workflow steps.
In `@scripts/release/cicd/artifact_manifest.py`:
- Around line 31-40: The validator is missing a required check for the `schema`
field in the data dictionary, which allows incompatible payloads to pass
validation. Add a validation check for `schema` alongside the existing checks
for `version` and `commit` fields, using `data.get("schema")` and appending an
error message to the errors list if the schema is missing or invalid, similar to
how the other required field validations are implemented.
- Around line 48-56: The validator can crash with TypeError when name or kind
are unhashable types (list/dict) because the membership checks at line 51 (name
in seen) and line 55 (art.get("kind") not in KINDS) don't validate types first.
Add type guards before these membership checks to verify that name and kind are
strings or hashable types, and append an error message for invalid types instead
of allowing the membership operation to proceed and crash the validator.
In `@scripts/release/cicd/manifest.py`:
- Around line 47-64: The toggle fields oss.get("release"), cloud.get("deploy"),
and runner.get("rollout") are evaluated as truthy values, which allows
non-boolean values like the string "false" to be incorrectly treated as enabled.
Add validation checks to ensure these toggle fields are explicitly boolean types
before using them in conditional statements. For each toggle field (in the
sections checking oss, cloud, and runner respectively), validate that the value
is a bool type and only proceed with the deployment checks when the value is
literally True, otherwise append an appropriate ManifestError indicating the
field must be a boolean.
In `@scripts/release/cicd/preflight.py`:
- Around line 99-112: The dns_preflight function performs DNS ownership lookups
using raw FQDN strings without normalization. Since DNS names are
case-insensitive and may include trailing dots, this can cause collision
detection to fail. Normalize the FQDN extracted from rec.get("fqdn") by
converting it to lowercase and removing any trailing dot before using it in the
owned_by_stage.get(fqdn) lookup. Apply this same normalization consistently to
ensure the lookup matches records that may have been stored with different
casing or trailing dot conventions.
In `@scripts/release/cicd/README.md`:
- Around line 58-63: Update the "Not done" section in the README.md status block
to reflect that the CI job for running the release tooling suite has been
implemented in `.github/workflows/release-tooling.yml`. Remove or modify the
last bullet point about the missing `make test:release-tooling` target and CI
job, either by separating the `make` target requirement from the CI workflow
status (if the make target still needs to be created), or by moving that item to
a "Done" section if both components are now complete. Ensure the documentation
accurately represents what has been implemented and what still remains to be
done.
In `@scripts/release/cicd/release_plan.py`:
- Around line 54-61: The conditional gate checks in this block (for
db_baseline_gate and dns_preflight functions) are skipped entirely when their
required inputs are absent (migrations_in_db, expected_baseline, owned_dns, or
stage are None), allowing plans to report can_apply=True even without safety
evidence. Instead of skipping these gates when evidence is missing, ensure that
missing evidence is treated as a gate failure by either invoking the gate
functions with a default failure state or explicitly merging a failure result
when required inputs are absent, so that plans cannot be applied without
complete safety gate evidence.
In `@scripts/release/cicd/versions.py`:
- Around line 14-15: The regex patterns _CARGO_VERSION and _PYPROJECT_VERSION
are section-agnostic and match the first version field anywhere in the TOML
file, which can target the wrong version if dependency or tool blocks appear
before the intended section. Modify these patterns to be section-aware by
matching version fields only within their intended TOML section bodies: for
_CARGO_VERSION, match within the [workspace.package] section, and for
_PYPROJECT_VERSION, match within the [project] section. This requires updating
the regex patterns to include section header context or using multi-line
matching that anchors to the specific section before extracting the version
field. Apply the same section-aware approach to any other version regex patterns
in the file (around lines 19-22 and 35-45) to ensure all version extraction and
replacement operations target the correct fields.
In `@scripts/release/release_plan.py`:
- Around line 26-32: The functions _read_text and _read_json lack exception
handling for file-read and JSON parsing operations. Add try-except blocks around
the Path(p).read_text() call in _read_text and the
json.loads(Path(p).read_text()) call in _read_json to catch FileNotFoundError,
IOError, and json.JSONDecodeError exceptions respectively. When exceptions
occur, either log a controlled error message and return appropriate defaults
(empty string for _read_text, None for _read_json) or raise a custom exception
with a clear error message. Apply the same exception handling pattern to the
similar code present at lines 48-58.
In `@scripts/release/validate_manifest.py`:
- Around line 21-25: The exception handling in the try-except block around
m.load(path) only catches FileNotFoundError, but YAML parsing errors and other
file-read permission errors will cause uncaught tracebacks. Expand the exception
handling to catch additional exception types beyond FileNotFoundError, such as
YAML parsing errors and general exceptions that may occur during m.load(path)
execution, and provide appropriate error messages and controlled exit codes for
each error type instead of allowing tracebacks to propagate.
In `@scripts/release/write_versions.py`:
- Around line 28-30: The _current() function incorrectly routes pyproject files
to cargo_workspace_version instead of the appropriate extractor. Update the
conditional logic in _current() to explicitly handle the "pyproject" kind
separately from "json" and "cargo" kinds, routing pyproject files to the correct
parser (likely json_version based on the comment context). This will ensure
pyproject files are parsed with their proper extractor instead of relying on
incidental regex compatibility between different version extractors.
---
Nitpick comments:
In `@scripts/release/tests/test_artifact_manifest.py`:
- Around line 18-48: Add new regression test functions to handle malformed types
for the name and kind fields. Create test functions
test_invalid_name_type_rejected and test_invalid_kind_type_rejected that each
use the _artifact() helper to pass non-string/unhashable types (such as lists or
dictionaries) for the name and kind parameters respectively, then call
am.build() followed by am.validate() and assert that validation returns error
messages containing the relevant field names rather than raising exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ed6a1923-4cfa-40b0-b741-1ddd4686246d
📒 Files selected for processing (21)
.github/workflows/release-tooling.ymlscripts/release/cicd/README.mdscripts/release/cicd/__init__.pyscripts/release/cicd/artifact_manifest.pyscripts/release/cicd/ledger.pyscripts/release/cicd/manifest.pyscripts/release/cicd/preflight.pyscripts/release/cicd/release_plan.pyscripts/release/cicd/versions.pyscripts/release/conftest.pyscripts/release/release.example.yamlscripts/release/release_plan.pyscripts/release/tests/test_artifact_manifest.pyscripts/release/tests/test_ledger.pyscripts/release/tests/test_manifest.pyscripts/release/tests/test_preflight.pyscripts/release/tests/test_release_plan.pyscripts/release/tests/test_versions.pyscripts/release/tests/test_versions_writeback.pyscripts/release/validate_manifest.pyscripts/release/write_versions.py
| - uses: actions/checkout@v4 | ||
| - uses: actions/setup-python@v5 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '24,32p' .github/workflows/release-tooling.yml
rg -n 'uses:\s*actions/(checkout|setup-python)@' .github/workflows/release-tooling.yml
rg -n 'persist-credentials:' .github/workflows/release-tooling.yml || trueRepository: boxlite-ai/boxlite
Length of output: 431
Pin the GitHub Actions to immutable SHAs and disable checkout credentials.
Lines 27-28 use mutable version tags (@v4, @v5) for both actions/checkout and actions/setup-python. Additionally, checkout does not set persist-credentials: false, leaving credentials available to subsequent steps. Pin both actions to specific commit SHAs and add persist-credentials: false to the checkout step.
🧰 Tools
🪛 zizmor (1.25.2)
[warning] 27-27: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 27-27: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 28-28: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/release-tooling.yml around lines 27 - 28, The GitHub
Actions workflow file uses mutable version tags instead of immutable commit SHAs
for both the actions/checkout and actions/setup-python steps, and the checkout
step does not disable credential persistence. Replace the `@v4` tag on the
actions/checkout action with a specific commit SHA, replace the `@v5` tag on the
actions/setup-python action with a specific commit SHA, and add the
persist-credentials: false option to the actions/checkout step to prevent
credentials from being available to subsequent workflow steps.
Source: Linters/SAST tools
| errors: list[str] = [] | ||
| if not data.get("version"): | ||
| errors.append("version: required") | ||
| if not data.get("commit"): | ||
| errors.append("commit: required") | ||
|
|
||
| artifacts = data.get("artifacts") | ||
| if not isinstance(artifacts, list) or not artifacts: | ||
| errors.append("artifacts: must be a non-empty list") | ||
| return errors |
There was a problem hiding this comment.
Enforce schema in validator to preserve manifest compatibility contract.
Line 31 onward validates required top-level fields but omits schema, so incompatible payloads can pass validation and still be trusted downstream.
Suggested patch
def validate(data) -> list[str]:
@@
errors: list[str] = []
+ if data.get("schema") != 1:
+ errors.append("schema: required and must equal 1")
if not data.get("version"):
errors.append("version: required")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| errors: list[str] = [] | |
| if not data.get("version"): | |
| errors.append("version: required") | |
| if not data.get("commit"): | |
| errors.append("commit: required") | |
| artifacts = data.get("artifacts") | |
| if not isinstance(artifacts, list) or not artifacts: | |
| errors.append("artifacts: must be a non-empty list") | |
| return errors | |
| errors: list[str] = [] | |
| if data.get("schema") != 1: | |
| errors.append("schema: required and must equal 1") | |
| if not data.get("version"): | |
| errors.append("version: required") | |
| if not data.get("commit"): | |
| errors.append("commit: required") | |
| artifacts = data.get("artifacts") | |
| if not isinstance(artifacts, list) or not artifacts: | |
| errors.append("artifacts: must be a non-empty list") | |
| return errors |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/artifact_manifest.py` around lines 31 - 40, The
validator is missing a required check for the `schema` field in the data
dictionary, which allows incompatible payloads to pass validation. Add a
validation check for `schema` alongside the existing checks for `version` and
`commit` fields, using `data.get("schema")` and appending an error message to
the errors list if the schema is missing or invalid, similar to how the other
required field validations are implemented.
| name = art.get("name") | ||
| if not name: | ||
| errors.append(f"{where}.name: required") | ||
| elif name in seen: | ||
| errors.append(f"{where}.name: duplicate {name!r}") | ||
| else: | ||
| seen.add(name) | ||
| if art.get("kind") not in KINDS: | ||
| errors.append(f"{where}.kind: must be one of {sorted(KINDS)} (got {art.get('kind')!r})") |
There was a problem hiding this comment.
Guard name/kind types before set-membership checks to avoid validator crashes.
Line 51 and Line 55 can raise TypeError when name or kind is unhashable (e.g., list/dict), causing validate() to crash on malformed input instead of returning errors.
Suggested patch
name = art.get("name")
- if not name:
+ if not isinstance(name, str) or not name:
errors.append(f"{where}.name: required")
- elif name in seen:
+ elif name in seen:
errors.append(f"{where}.name: duplicate {name!r}")
else:
seen.add(name)
- if art.get("kind") not in KINDS:
- errors.append(f"{where}.kind: must be one of {sorted(KINDS)} (got {art.get('kind')!r})")
+ kind = art.get("kind")
+ if not isinstance(kind, str) or kind not in KINDS:
+ errors.append(f"{where}.kind: must be one of {sorted(KINDS)} (got {kind!r})")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/artifact_manifest.py` around lines 48 - 56, The
validator can crash with TypeError when name or kind are unhashable types
(list/dict) because the membership checks at line 51 (name in seen) and line 55
(art.get("kind") not in KINDS) don't validate types first. Add type guards
before these membership checks to verify that name and kind are strings or
hashable types, and append an error message for invalid types instead of
allowing the membership operation to proceed and crash the validator.
| elif oss.get("release"): | ||
| version = oss.get("version") | ||
| if not version: | ||
| errors.append(ManifestError("oss.version", "required when oss.release is true")) | ||
| elif not SEMVER_RE.match(str(version)): | ||
| errors.append(ManifestError("oss.version", f"not a valid semver: {version!r}")) | ||
|
|
||
| cloud = data.get("cloud") or {} | ||
| if not isinstance(cloud, dict): | ||
| errors.append(ManifestError("cloud", "must be a mapping")) | ||
| elif cloud.get("deploy"): | ||
| errors += _check_stage("cloud.stage", cloud.get("stage")) | ||
|
|
||
| runner = data.get("runner") or {} | ||
| if not isinstance(runner, dict): | ||
| errors.append(ManifestError("runner", "must be a mapping")) | ||
| elif runner.get("rollout"): | ||
| errors += _check_stage("runner.stage", runner.get("stage")) |
There was a problem hiding this comment.
Enforce boolean types for train toggles to avoid accidental enablement.
At Line 47/57/63 and Line 89-94, truthy non-boolean values (for example "false") are treated as enabled. That can unintentionally activate trains and produce incorrect plan outcomes. Validate toggle fields as booleans and only treat literal True as enabled.
Suggested fix
@@
- elif oss.get("release"):
+ else:
+ release = oss.get("release")
+ if release is not None and not isinstance(release, bool):
+ errors.append(ManifestError("oss.release", "must be a boolean"))
+ elif release is True:
version = oss.get("version")
if not version:
errors.append(ManifestError("oss.version", "required when oss.release is true"))
elif not SEMVER_RE.match(str(version)):
errors.append(ManifestError("oss.version", f"not a valid semver: {version!r}"))
@@
- elif cloud.get("deploy"):
- errors += _check_stage("cloud.stage", cloud.get("stage"))
+ else:
+ deploy = cloud.get("deploy")
+ if deploy is not None and not isinstance(deploy, bool):
+ errors.append(ManifestError("cloud.deploy", "must be a boolean"))
+ elif deploy is True:
+ errors += _check_stage("cloud.stage", cloud.get("stage"))
@@
- elif runner.get("rollout"):
- errors += _check_stage("runner.stage", runner.get("stage"))
+ else:
+ rollout = runner.get("rollout")
+ if rollout is not None and not isinstance(rollout, bool):
+ errors.append(ManifestError("runner.rollout", "must be a boolean"))
+ elif rollout is True:
+ errors += _check_stage("runner.stage", runner.get("stage"))
@@
- if (data.get("oss") or {}).get("release"):
+ if (data.get("oss") or {}).get("release") is True:
out.append("oss")
- if (data.get("cloud") or {}).get("deploy"):
+ if (data.get("cloud") or {}).get("deploy") is True:
out.append("cloud")
- if (data.get("runner") or {}).get("rollout"):
+ if (data.get("runner") or {}).get("rollout") is True:
out.append("runner")Also applies to: 85-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/manifest.py` around lines 47 - 64, The toggle fields
oss.get("release"), cloud.get("deploy"), and runner.get("rollout") are evaluated
as truthy values, which allows non-boolean values like the string "false" to be
incorrectly treated as enabled. Add validation checks to ensure these toggle
fields are explicitly boolean types before using them in conditional statements.
For each toggle field (in the sections checking oss, cloud, and runner
respectively), validate that the value is a bool type and only proceed with the
deployment checks when the value is literally True, otherwise append an
appropriate ManifestError indicating the field must be a boolean.
| def dns_preflight(planned_records: list[dict], owned_by_stage: dict[str, str], stage: str) -> Verdict: | ||
| """Block when a planned DNS record's fqdn is already owned by another stage. | ||
|
|
||
| planned_records : [{"fqdn": "api.dev.boxlite.ai"}, ...] | ||
| owned_by_stage : {"api.dev.boxlite.ai": "dev", ...} (existing ownership) | ||
| stage : the stage we are deploying | ||
| """ | ||
| v = Verdict() | ||
| for rec in planned_records: | ||
| fqdn = rec.get("fqdn") | ||
| owner = owned_by_stage.get(fqdn) | ||
| if owner and owner != stage: | ||
| v.block(f"DNS record {fqdn} already owned by stage {owner!r} (deploying {stage!r})") | ||
| return v |
There was a problem hiding this comment.
Normalize FQDNs before DNS ownership lookup.
At Line 108-110, lookup is raw-string based. DNS names are case-insensitive (and often appear with/without trailing .), so a mixed-case or dotted input can bypass collision detection.
Suggested fix
@@
def dns_preflight(planned_records: list[dict], owned_by_stage: dict[str, str], stage: str) -> Verdict:
@@
v = Verdict()
+ def _norm_fqdn(value):
+ if not isinstance(value, str):
+ return None
+ return value.rstrip(".").lower()
+
+ normalized_owned = {
+ _norm_fqdn(fqdn): owner
+ for fqdn, owner in owned_by_stage.items()
+ if _norm_fqdn(fqdn)
+ }
+
for rec in planned_records:
- fqdn = rec.get("fqdn")
- owner = owned_by_stage.get(fqdn)
+ fqdn = _norm_fqdn(rec.get("fqdn"))
+ owner = normalized_owned.get(fqdn)
if owner and owner != stage:
v.block(f"DNS record {fqdn} already owned by stage {owner!r} (deploying {stage!r})")
return v📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def dns_preflight(planned_records: list[dict], owned_by_stage: dict[str, str], stage: str) -> Verdict: | |
| """Block when a planned DNS record's fqdn is already owned by another stage. | |
| planned_records : [{"fqdn": "api.dev.boxlite.ai"}, ...] | |
| owned_by_stage : {"api.dev.boxlite.ai": "dev", ...} (existing ownership) | |
| stage : the stage we are deploying | |
| """ | |
| v = Verdict() | |
| for rec in planned_records: | |
| fqdn = rec.get("fqdn") | |
| owner = owned_by_stage.get(fqdn) | |
| if owner and owner != stage: | |
| v.block(f"DNS record {fqdn} already owned by stage {owner!r} (deploying {stage!r})") | |
| return v | |
| def dns_preflight(planned_records: list[dict], owned_by_stage: dict[str, str], stage: str) -> Verdict: | |
| """Block when a planned DNS record's fqdn is already owned by another stage. | |
| planned_records : [{"fqdn": "api.dev.boxlite.ai"}, ...] | |
| owned_by_stage : {"api.dev.boxlite.ai": "dev", ...} (existing ownership) | |
| stage : the stage we are deploying | |
| """ | |
| v = Verdict() | |
| def _norm_fqdn(value): | |
| if not isinstance(value, str): | |
| return None | |
| return value.rstrip(".").lower() | |
| normalized_owned = { | |
| _norm_fqdn(fqdn): owner | |
| for fqdn, owner in owned_by_stage.items() | |
| if _norm_fqdn(fqdn) | |
| } | |
| for rec in planned_records: | |
| fqdn = _norm_fqdn(rec.get("fqdn")) | |
| owner = normalized_owned.get(fqdn) | |
| if owner and owner != stage: | |
| v.block(f"DNS record {fqdn} already owned by stage {owner!r} (deploying {stage!r})") | |
| return v |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/preflight.py` around lines 99 - 112, The dns_preflight
function performs DNS ownership lookups using raw FQDN strings without
normalization. Since DNS names are case-insensitive and may include trailing
dots, this can cause collision detection to fail. Normalize the FQDN extracted
from rec.get("fqdn") by converting it to lowercase and removing any trailing dot
before using it in the owned_by_stage.get(fqdn) lookup. Apply this same
normalization consistently to ensure the lookup matches records that may have
been stored with different casing or trailing dot conventions.
| if INFRA_TRAINS & set(report["trains"]): | ||
| _merge(report, pf.resource_gate(pf.parse_sst_diff(sst_diff_text), allow_infra_change), "resource") | ||
|
|
||
| if "cloud" in report["trains"] and migrations_in_db is not None and expected_baseline is not None: | ||
| _merge(report, pf.db_baseline_gate(migrations_in_db, expected_baseline), "db-baseline") | ||
|
|
||
| if planned_dns and owned_dns is not None and stage: | ||
| _merge(report, pf.dns_preflight(planned_dns, owned_dns, stage), "dns") |
There was a problem hiding this comment.
Prevent fail-open plans when gate evidence is missing.
At Line 54-61, infra/cloud checks are conditionally skipped if evidence inputs are absent. That allows can_apply=True for infra trains with no diff/baseline evidence, which defeats the safety gates.
Suggested fix
@@
if INFRA_TRAINS & set(report["trains"]):
- _merge(report, pf.resource_gate(pf.parse_sst_diff(sst_diff_text), allow_infra_change), "resource")
+ if not sst_diff_text.strip():
+ report["can_apply"] = False
+ report["blocked_by"].append("resource: missing sst diff evidence")
+ else:
+ _merge(report, pf.resource_gate(pf.parse_sst_diff(sst_diff_text), allow_infra_change), "resource")
if "cloud" in report["trains"] and migrations_in_db is not None and expected_baseline is not None:
_merge(report, pf.db_baseline_gate(migrations_in_db, expected_baseline), "db-baseline")
+ elif "cloud" in report["trains"]:
+ report["can_apply"] = False
+ report["blocked_by"].append("db-baseline: missing migrations/baseline evidence")
- if planned_dns and owned_dns is not None and stage:
+ if planned_dns and owned_dns is not None and stage:
_merge(report, pf.dns_preflight(planned_dns, owned_dns, stage), "dns")
+ elif planned_dns and "cloud" in report["trains"]:
+ report["can_apply"] = False
+ report["blocked_by"].append("dns: missing ownership evidence or stage")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/release_plan.py` around lines 54 - 61, The conditional
gate checks in this block (for db_baseline_gate and dns_preflight functions) are
skipped entirely when their required inputs are absent (migrations_in_db,
expected_baseline, owned_dns, or stage are None), allowing plans to report
can_apply=True even without safety evidence. Instead of skipping these gates
when evidence is missing, ensure that missing evidence is treated as a gate
failure by either invoking the gate functions with a default failure state or
explicitly merging a failure result when required inputs are absent, so that
plans cannot be applied without complete safety gate evidence.
| _CARGO_VERSION = re.compile(r'(?m)^\s*version\s*=\s*"([^"]+)"') | ||
| _PYPROJECT_VERSION = re.compile(r'(?m)^\s*version\s*=\s*"([^"]+)"') |
There was a problem hiding this comment.
Section-agnostic TOML matching can rewrite/read the wrong version key.
The current regexes pick the first version = "..." anywhere in the file. If a dependency/tool block appears before [workspace.package] or [project], drift checks and writeback can target the wrong field.
Suggested direction
- _CARGO_VERSION = re.compile(r'(?m)^\s*version\s*=\s*"([^"]+)"')
- _PYPROJECT_VERSION = re.compile(r'(?m)^\s*version\s*=\s*"([^"]+)"')
+ _TOML_VERSION = re.compile(r'(?m)^\s*version\s*=\s*"([^"]+)"')
+ _CARGO_SECTION = re.compile(r'(?ms)^\s*\[(?:workspace\.package|package)\]\s*(.*?)(?=^\s*\[|\Z)')
+ _PYPROJECT_SECTION = re.compile(r'(?ms)^\s*\[project\]\s*(.*?)(?=^\s*\[|\Z)')Then extract/replace version only inside the intended section body.
Also applies to: 19-22, 35-45
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/cicd/versions.py` around lines 14 - 15, The regex patterns
_CARGO_VERSION and _PYPROJECT_VERSION are section-agnostic and match the first
version field anywhere in the TOML file, which can target the wrong version if
dependency or tool blocks appear before the intended section. Modify these
patterns to be section-aware by matching version fields only within their
intended TOML section bodies: for _CARGO_VERSION, match within the
[workspace.package] section, and for _PYPROJECT_VERSION, match within the
[project] section. This requires updating the regex patterns to include section
header context or using multi-line matching that anchors to the specific section
before extracting the version field. Apply the same section-aware approach to
any other version regex patterns in the file (around lines 19-22 and 35-45) to
ensure all version extraction and replacement operations target the correct
fields.
| def _read_text(p): | ||
| return Path(p).read_text() if p else "" | ||
|
|
||
|
|
||
| def _read_json(p): | ||
| return json.loads(Path(p).read_text()) if p else None | ||
|
|
There was a problem hiding this comment.
Return deterministic CLI errors for unreadable/invalid fixture inputs.
At Line 26-32 and Line 48-58, file-read and JSON parse failures are uncaught, so the CLI can terminate with a traceback instead of a controlled error message and exit code.
Suggested fix
@@
def main(argv: list[str]) -> int:
@@
- report = plan(
- m.load(args.manifest),
- artifact_manifest_data=_read_json(args.artifact_manifest),
- sst_diff_text=_read_text(args.sst_diff),
- migrations_in_db=_read_json(args.migrations),
- expected_baseline=_read_json(args.baseline),
- planned_dns=_read_json(args.dns_planned),
- owned_dns=_read_json(args.dns_owned),
- stage=args.stage,
- allow_infra_change=args.allow_infra_change,
- )
+ try:
+ report = plan(
+ m.load(args.manifest),
+ artifact_manifest_data=_read_json(args.artifact_manifest),
+ sst_diff_text=_read_text(args.sst_diff),
+ migrations_in_db=_read_json(args.migrations),
+ expected_baseline=_read_json(args.baseline),
+ planned_dns=_read_json(args.dns_planned),
+ owned_dns=_read_json(args.dns_owned),
+ stage=args.stage,
+ allow_infra_change=args.allow_infra_change,
+ )
+ except (OSError, json.JSONDecodeError) as exc:
+ print(f"error: {exc}", file=sys.stderr)
+ return 2Also applies to: 48-58
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/release_plan.py` around lines 26 - 32, The functions
_read_text and _read_json lack exception handling for file-read and JSON parsing
operations. Add try-except blocks around the Path(p).read_text() call in
_read_text and the json.loads(Path(p).read_text()) call in _read_json to catch
FileNotFoundError, IOError, and json.JSONDecodeError exceptions respectively.
When exceptions occur, either log a controlled error message and return
appropriate defaults (empty string for _read_text, None for _read_json) or raise
a custom exception with a clear error message. Apply the same exception handling
pattern to the similar code present at lines 48-58.
| try: | ||
| data = m.load(path) | ||
| except FileNotFoundError: | ||
| print(f"error: file not found: {path}", file=sys.stderr) | ||
| return 2 |
There was a problem hiding this comment.
Handle YAML parse and general file-read failures without uncaught traceback.
At Line 22-25, only FileNotFoundError is caught. Invalid YAML or permission/read errors will currently crash the CLI with a traceback instead of returning a controlled exit code.
Suggested fix
@@
import sys
from pathlib import Path
+import yaml
@@
try:
data = m.load(path)
- except FileNotFoundError:
+ except FileNotFoundError:
print(f"error: file not found: {path}", file=sys.stderr)
return 2
+ except yaml.YAMLError as exc:
+ print(f"error: invalid YAML in {path}: {exc}", file=sys.stderr)
+ return 1
+ except OSError as exc:
+ print(f"error: unable to read {path}: {exc}", file=sys.stderr)
+ return 2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/validate_manifest.py` around lines 21 - 25, The exception
handling in the try-except block around m.load(path) only catches
FileNotFoundError, but YAML parsing errors and other file-read permission errors
will cause uncaught tracebacks. Expand the exception handling to catch
additional exception types beyond FileNotFoundError, such as YAML parsing errors
and general exceptions that may occur during m.load(path) execution, and provide
appropriate error messages and controlled exit codes for each error type instead
of allowing tracebacks to propagate.
| def _current(kind: str, text: str): | ||
| return v.json_version(text) if kind == "json" else v.cargo_workspace_version(text) | ||
|
|
There was a problem hiding this comment.
pyproject files are parsed with the wrong extractor.
_current() sends pyproject to cargo_workspace_version, which is a logic error and couples correctness to incidental regex parity.
Proposed fix
def _current(kind: str, text: str):
- return v.json_version(text) if kind == "json" else v.cargo_workspace_version(text)
+ if kind == "json":
+ return v.json_version(text)
+ if kind == "pyproject":
+ return v.pyproject_version(text)
+ return v.cargo_workspace_version(text)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/release/write_versions.py` around lines 28 - 30, The _current()
function incorrectly routes pyproject files to cargo_workspace_version instead
of the appropriate extractor. Update the conditional logic in _current() to
explicitly handle the "pyproject" kind separately from "json" and "cargo" kinds,
routing pyproject files to the correct parser (likely json_version based on the
comment context). This will ensure pyproject files are parsed with their proper
extractor instead of relying on incidental regex compatibility between different
version extractors.
What
Foundation for the standardized release pipeline (Task 21): the pure-logic
control-plane layer that future release/deploy workflows will call. All under a
new isolated dir
scripts/release/— no existing files touched, no AWS /registry / CI dependencies, so every piece is unit-tested in isolation.
manifest.py+validate_manifest.pyrelease.yaml(the WHAT-to-ship contract)artifact_manifest.pypreflight.py+release_plan.pyledger.pylast_good_shafor rollbackversions.py+write_versions.pyThe DB baseline gate encodes the
1741/1753migration-drift incident class(divergent branch / "recorded baseline but table not built"); the DNS gate
blocks cross-stage record collisions; the resource gate blocks replace/delete
on Runner/RDS/Redis unless explicitly overridden.
Verification
Release ToolingCI workflow.runner replacediff →release_planexits 1 (blocked);--allow-infra-change→ exits 0 with a recorded warning.Out of scope (needs CI / AWS / repo settings — not in this PR)
Workflows that call this tooling (
release-build/release-plan/release-apply),the adapters feeding real
sst diff/ DB / DNS, AWS OIDC roles + GitHubEnvironments, agent-image CI, and prod. Tracked in
brian-notesTask 21 (#4–#16).Summary by CodeRabbit
New Features
Documentation
Chores