feat: add violet config + exec env allowlist + helper functions (PR 1/3)#13
Conversation
Additive infrastructure for the violet externalization refactor. Nothing in this commit changes runtime behavior — violet is not yet invoked. PR 2 swaps the InstallArtifactVersion call site; PR 3 removes the legacy function. - pkg/exec: add Command.EnvAllowlist (exact name or "PREFIX_*" glob) so child processes can be confined to a minimal-permission env. Empty allowlist preserves the current full os.Environ() passthrough, keeping testCommand unaffected. Wrap the last ~20 stderr lines into CommandResult.Err so the failure context is available without re-reading CommandResult.Stderr. - pkg/config: introduce OperatorConfig.Violet *VioletConfig (Bin / PackagePrefix / SkipPush *bool / PushArgs) and Version.ExpectedSha256. defaultConfig fills the MinIO prefix and SkipPush=true defaults when Violet is non-nil. *bool distinguishes "unset" from "explicit false" so the private-push scenario can opt out cleanly. - pkg/operator/operatorhub: declare const targetCatalogSource = "platform" next to systemNamespace, so the catalog source is no longer string-literal duplicated between Go and the violet CLI argv. - pkg/operator/operatorhub/violet.go: pure helpers used by PR 2. BuildPackageURL composes the <prefix>/<name>/<channel>/<name>.latest.ALL.<bv>.tgz convention; BuildVioletPushArgs assembles argv and auto-injects VIOLET_REGISTRY_USERNAME / _PASSWORD as --username / --password; MaskCommand redacts the password token for safe logging; VerifySha256 streams a file and checks the hex digest case-insensitively. - pkg/operator/operatorhub/violet_test.go: table tests covering both real channel formats (v4.6 and rc with rc.N.g<sha> suffix) plus skip-push, private-push, mask, and sha256 edge cases (22 sub-tests total). Plan: docs/plans/2026-05-18-refactor-externalize-violet-binary-plan.md
🤖 AI Code Review
SummaryPR 1/3 adds foundational infrastructure for externalizing the Review Statistics
Critical IssuesNone. WarningsNone. Suggestions
// Example improvement
if strings.HasSuffix(pat, "*") {
prefix := strings.TrimSuffix(pat, "*")
if strings.HasPrefix(name, prefix) {
return true
}
}
Positive Feedback
ℹ️ About this reviewThis review was automatically generated using the
|
|
|
||
| // DefaultVioletPackagePrefix is the MinIO root used when OperatorConfig.Violet | ||
| // is configured but its PackagePrefix is left blank. | ||
| const DefaultVioletPackagePrefix = "http://package-minio.alauda.cn:9199/packages/" |
Three silent-failure paths fixed (P1), four hardening / cleanup changes applied (P2). All P1 fixes close anti-patterns where preflight reported "clean" while actually bypassing the guard it was built to enforce. P1 — silent-failure closures: - #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster matching violet.clusters; empty kubeconfig CurrentContext is now a hard error with an actionable 'kubectl config use-context' hint. Previously both paths log.Warn'd and returned nil, defeating the guard exactly when it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs). - #15 BundleVersion required for operatorhub: empty value used to slip past validateConfig, producing 'operatorhub-foo.' AV names with a trailing dot — preflight reported clean while upgrade failed late with cryptic 404s. Required at load time. P2 — internal hardening: - #16 Drop the seen map from runPreflight: dead code under fail-fast semantics (architecture-strategist + simplicity-reviewer both flagged it independently). -14 LOC. - #018 checkInstallPlanResidue no longer discards the NestedString error. status.phase type drift now surfaces a wrapped error instead of being swallowed and producing a 'every IP is non-terminal' noise storm. - #019 New preflight.NewResidual constructor centralises kubectl cleanup template + %q shell-quoting. Three call sites in operatorhub/preflight.go go from 5-line struct literals to one-line constructor calls. P3 follow-ups (020-023) recorded as separate todos for later triage. Tests for the cmd-layer changes (#17) land in the next commit.
* feat(operator): scaffold PreflightBaseline interface
- new pkg/operator/preflight package holds the Residual value type so
both pkg/operator and its concrete impls can import without cycles
- OperatorInterface gains PreflightBaseline(ctx, version)
([]preflight.Residual, error); contract: read-only, baseline-only
- local: no-op returning (nil, nil) — README will document the assumption
that `make deploy` must be idempotent
- operatorhub: scaffolding stub; real check logic lands in follow-up
- plan: docs/plans/2026-05-19-feat-upgrade-preflight-check-plan.md
* feat(operatorhub): implement PreflightBaseline residue checks
Three read-only checks against the baseline of an upgrade path:
- Subscription/<name> in <namespace>
- ArtifactVersion/<artifact>.<bundleVersion> in cpaas-system
- non-terminal InstallPlan in <namespace>, filtered by the OLM-managed
label operators.coreos.com/<package>.<ns> so we never scan the
namespace's full historical IP archive
Behavior:
- 30s context timeout caps the whole call so a hung apiserver cannot
block the upgrade for the default o.timeout (10m)
- transient API errors (ServerTimeout / TooManyRequests /
ServiceUnavailable) are wrapped with a 'retry the run' hint
- permanent errors propagate with the check name attached
- kubectl cleanup hints quote both name and namespace via %q so
copy-pasted commands stay shell-safe
CSV residue is intentionally NOT checked: an independent CSV without
a corresponding Subscription or ArtifactVersion is a degenerate state
already covered by the other three. Plan documents how to add
PackageManifest-driven CSV resolution if a future incident requires it.
Tests cover clean cluster, single/multi residue, non-terminal phases
(empty/Planning/RequiresApproval/Installing), terminal phases
(Complete/Failed) ignored, transient + permanent error paths, and a
read-only contract assertion (spy reactor fails the test on any
Create/Update/Patch/Delete).
* feat(config): validate namespace + bundleVersion for operatorhub
Two new load-time checks gated on Type == operatorhub:
1. operatorConfig.namespace must be non-empty. An empty namespace
would let preflight's Subscription / InstallPlan queries silently
degrade to cluster-scope, producing false-positive residue
reports from completely unrelated operators.
2. Version.bundleVersion must match ^[a-zA-Z0-9._-]+$. The field is
interpolated into kubectl cleanup hints emitted by preflight and
into violet argv, so allowing shell metacharacters here would
propagate $, backticks, semicolons or quotes straight into the
user's terminal. Centralising the check at load time closes the
injection vector for every downstream consumer with a single
chokepoint.
Both checks fire only for operatorhub type — local mode runs
`make deploy` and has no equivalent risk surface.
Tests cover: namespace required (operatorhub), namespace optional
(local), shell-active characters rejected, realistic versions
(v4.6.3 / 4.6.3-rc.91 / v0.74.0) accepted.
* feat(cmd): wire preflight + cluster guard into Execute
Three additions to the upgrade entry point, all ahead of the upgrade loop:
1. *PreflightError (cmd-internal) formats []preflight.Residual into a
multi-line copy-pasteable report: one cleanup block per residual,
then a finalizer-unstuck command template, then the bypass hint.
The type stays in cmd/ so pkg/operator never picks up a dependency
on how the CLI talks to the user. Decision C (all-English text) is
the locked default; Error() is the single place to change wording
without touching any contract.
2. assertClusterMatch enforces the operator≡kubectl cluster identity
contract whenever operatorConfig.violet.clusters is set. The
recommended rule (Decision B, locked at default B1) is exact
string equality against KUBECONFIG's CurrentContext via
--confirm-cluster. In-cluster runs (no kubeconfig file) degrade
to a loud warning so CI pods aren't silently blocked.
3. runPreflight iterates UpgradePaths, calls op.PreflightBaseline on
each baseline (Versions[0]), and fails fast at the first dirty
path with a *PreflightError. (Kind, NS, Name) dedup is in place
for future aggregation; --skip-preflight bypasses the whole step
with a Warn-level audit line.
Cobra's SilenceUsage is set in AddFlags so that PreflightError's
actionable kubectl lines are not buried under --help output on a
non-zero exit. Flag-parsing errors still print usage (they execute
before RunE), so users who typo a flag are not stranded.
* docs(preflight): README usage + CLAUDE.md hard constraints
- README: new 'Preflight 前置检查' section documents what gets
scanned (Subscription / ArtifactVersion / non-terminal InstallPlan),
the 30s timeout, a sample error message, the two new flags
(--skip-preflight / --confirm-cluster), and the minimum RBAC verbs
needed to run preflight in isolation.
- CLAUDE.md:
- 数据流: preflight + cluster guard inserted ahead of the upgrade
loop; explicit note that preflight does NOT honour cfg.Immediate
(quality gate, not part of the upgrade chain).
- Operator 抽象 paragraph mentions PreflightBaseline and the leaf
pkg/operator/preflight package created to avoid an import cycle.
- 硬约束: four new entries — PreflightBaseline must stay read-only
and baseline-only (with a note on why CSV is intentionally
excluded and where to add it back if needed); cmd.SilenceUsage
is mandatory; --confirm-cluster matching rule (B1 default);
BundleVersion regex chokepoint.
- Plan acceptance criteria all marked complete.
* docs(plan): mark preflight-check plan as completed
* fix: address P1+P2 review findings for preflight check
Three silent-failure paths fixed (P1), four hardening / cleanup changes
applied (P2). All P1 fixes close anti-patterns where preflight reported
"clean" while actually bypassing the guard it was built to enforce.
P1 — silent-failure closures:
- #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster
matching violet.clusters; empty kubeconfig CurrentContext is now a hard
error with an actionable 'kubectl config use-context' hint. Previously
both paths log.Warn'd and returned nil, defeating the guard exactly when
it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs).
- #15 BundleVersion required for operatorhub: empty value used to slip
past validateConfig, producing 'operatorhub-foo.' AV names with a
trailing dot — preflight reported clean while upgrade failed late with
cryptic 404s. Required at load time.
P2 — internal hardening:
- #16 Drop the seen map from runPreflight: dead code under fail-fast
semantics (architecture-strategist + simplicity-reviewer both flagged
it independently). -14 LOC.
- #018 checkInstallPlanResidue no longer discards the NestedString error.
status.phase type drift now surfaces a wrapped error instead of being
swallowed and producing a 'every IP is non-terminal' noise storm.
- #019 New preflight.NewResidual constructor centralises kubectl cleanup
template + %q shell-quoting. Three call sites in operatorhub/preflight.go
go from 5-line struct literals to one-line constructor calls.
P3 follow-ups (020-023) recorded as separate todos for later triage.
Tests for the cmd-layer changes (#17) land in the next commit.
* test(cmd): unit tests for assertClusterMatch and runPreflight
Closes P2 #17 — the two cmd-layer functions added by PR #17 carried
zero coverage (pr-test-analyzer criticality 9).
Coverage:
- assertClusterMatch (10 sub-tests): violet nil / clusters empty no-ops;
in-cluster missing/mismatched/matching --confirm-cluster; file-mode
unreadable kubeconfig, empty current-context (pins the #14 fix in
place), missing/mismatched/matching --confirm-cluster.
- runPreflight (5 sub-tests): clean cluster, empty Versions skipped,
baseline-only invariant (regression guard: a refactor scanning all
versions would surface mid-upgrade AVs as false positives), fail-fast
across paths (second path's PreflightBaseline is verifiably not
reached), operator errors propagate unwrapped (must NOT be wrapped
as *PreflightError).
The fake operator records the Version passed to each PreflightBaseline
call so the baseline-only test asserts the actual contract rather than
just call count.
* chore(todos): remove completed P1/P2 review findings
The 7 todos (013-019) that drove the silent-failure fixes + cmd-layer
tests are done; their substance lives in the PR description and the
commits that closed them (1201665 + f721f8e). Removing the files
keeps todos/ as a list of OPEN work, which is its only useful state.
Remaining in todos/: 4 P3 follow-ups for this PR (020-023) and 6 P2
items from prior PRs that have not been actioned yet.
* chore(todos): drop P3 review follow-ups for PR #17
The four P3 nice-to-haves (test variants, internal simplifications,
agent-ergonomics knobs, misc cleanups) are not on the roadmap and
filing them as todos would just decorate the backlog with low-value
items nobody plans to pick up. The PR description still records the
reviewer recommendations for archeology; the file copies were
redundant.
---------
Co-authored-by: huanyang@alauda.io <huanyang@alauda.io>
Summary
PR 1 of 3 in the violet externalization refactor. This PR is purely additive — no existing call site invokes the new code yet, so runtime behavior is unchanged.
The stack:
What changed
Testing
Nothing else in the codebase calls `BuildPackageURL` / `BuildVioletPushArgs` / `MaskCommand` / `VerifySha256` yet, so this PR cannot regress existing flows.
Post-Deploy Monitoring & Validation
No additional operational monitoring required: this PR adds dormant code only and does not change any runtime call path. PR 2 introduces the actual behavior change and will carry the monitoring plan (download URL hit rate, violet exit codes, AV phase transition timing).
Design references