Skip to content

feat: add violet config + exec env allowlist + helper functions (PR 1/3)#13

Merged
yhuan123 merged 1 commit into
mainfrom
feat/violet-helpers-and-config
May 18, 2026
Merged

feat: add violet config + exec env allowlist + helper functions (PR 1/3)#13
yhuan123 merged 1 commit into
mainfrom
feat/violet-helpers-and-config

Conversation

@yhuan123

Copy link
Copy Markdown
Collaborator

Summary

PR 1 of 3 in the violet externalization refactor. This PR is purely additive — no existing call site invokes the new code yet, so runtime behavior is unchanged.

The stack:

  1. PR 1 (this one) — additive infra: config schema, exec env allowlist + stderr tail wrap, violet helper functions, table tests.
  2. PR 2 — swap `InstallArtifactVersion` to call `violet push`; keep the legacy function as dead code.
  3. PR 3 — delete the dead code.

What changed

File Change
`pkg/exec/exec.go` Add `Command.EnvAllowlist` (exact / `PREFIX_*` glob). Empty allowlist preserves current full `os.Environ()` passthrough. Wrap last ~20 stderr lines into `CommandResult.Err` for actionable failures.
`pkg/config/config.go` Add `OperatorConfig.Violet *VioletConfig` nested schema (`Bin` / `PackagePrefix` / `SkipPush *bool` / `PushArgs`) and `Version.ExpectedSha256`. `defaultConfig` fills MinIO prefix and `SkipPush=true` defaults when Violet is non-nil. `*bool` distinguishes "unset" from "explicit false".
`pkg/operator/operatorhub/operator.go` Declare `const targetCatalogSource = "platform"` next to `systemNamespace`.
`pkg/operator/operatorhub/violet.go` (new) Pure helpers: `BuildPackageURL`, `BuildVioletPushArgs` (auto-injects `VIOLET_REGISTRY_USERNAME`/`_PASSWORD` as `--username`/`--password`), `MaskCommand` (redacts password for logging), `VerifySha256`.
`pkg/operator/operatorhub/violet_test.go` (new) Table tests: 22 sub-tests across all four helpers, covering both real channel formats (`v4.6` and `rc`-with-sha-suffix), skip-push variants, private-push flow, password masking edge cases, and SHA-256 verification (correct / uppercase / mismatch / missing file).

Testing

  • `go vet ./...` — clean
  • `go build ./...` — clean
  • `go test ./pkg/operator/operatorhub/... -run 'TestBuildPackageURL|TestBuildVioletPushArgs|TestMaskCommand|TestVerifySha256' -v` — 22/22 pass

Nothing else in the codebase calls `BuildPackageURL` / `BuildVioletPushArgs` / `MaskCommand` / `VerifySha256` yet, so this PR cannot regress existing flows.

Post-Deploy Monitoring & Validation

No additional operational monitoring required: this PR adds dormant code only and does not change any runtime call path. PR 2 introduces the actual behavior change and will carry the monitoring plan (download URL hit rate, violet exit codes, AV phase transition timing).

Design references

  • Brainstorm: `docs/brainstorms/2026-05-18-violet-external-binary-brainstorm.md` (on `docs/violet-onboard-rolling-upgrade-plan` branch)
  • Plan: `docs/plans/2026-05-18-refactor-externalize-violet-binary-plan.md` (same branch)

Additive infrastructure for the violet externalization refactor. Nothing in
this commit changes runtime behavior — violet is not yet invoked. PR 2 swaps
the InstallArtifactVersion call site; PR 3 removes the legacy function.

- pkg/exec: add Command.EnvAllowlist (exact name or "PREFIX_*" glob) so child
  processes can be confined to a minimal-permission env. Empty allowlist
  preserves the current full os.Environ() passthrough, keeping testCommand
  unaffected. Wrap the last ~20 stderr lines into CommandResult.Err so the
  failure context is available without re-reading CommandResult.Stderr.

- pkg/config: introduce OperatorConfig.Violet *VioletConfig (Bin /
  PackagePrefix / SkipPush *bool / PushArgs) and Version.ExpectedSha256.
  defaultConfig fills the MinIO prefix and SkipPush=true defaults when
  Violet is non-nil. *bool distinguishes "unset" from "explicit false" so
  the private-push scenario can opt out cleanly.

- pkg/operator/operatorhub: declare const targetCatalogSource = "platform"
  next to systemNamespace, so the catalog source is no longer string-literal
  duplicated between Go and the violet CLI argv.

- pkg/operator/operatorhub/violet.go: pure helpers used by PR 2.
  BuildPackageURL composes the <prefix>/<name>/<channel>/<name>.latest.ALL.<bv>.tgz
  convention; BuildVioletPushArgs assembles argv and auto-injects
  VIOLET_REGISTRY_USERNAME / _PASSWORD as --username / --password; MaskCommand
  redacts the password token for safe logging; VerifySha256 streams a file
  and checks the hex digest case-insensitively.

- pkg/operator/operatorhub/violet_test.go: table tests covering both real
  channel formats (v4.6 and rc with rc.N.g<sha> suffix) plus skip-push,
  private-push, mask, and sha256 edge cases (22 sub-tests total).

Plan: docs/plans/2026-05-18-refactor-externalize-violet-binary-plan.md
@alaudabot

Copy link
Copy Markdown
Contributor

🤖 AI Code Review

Property Value
Model opencode/minimax-m2.5-free
Style strict
Issues Found 0
Config Source centralized
Profile ❌ Not Found
Personalized Prompt ❌ No
Prompt Path .github/review/profiles/alaudadevops/upgrade-test/pr-review.md
Alauda Skills ✅ base-acp-operator-list, base-acp-operator-release, base-authoring, base-m365, base-ocp-operator-list, base-skill-setup, builders-alauda-component-e2e-release, builders-alauda-component-upgrade, builders-alauda-pipeline, builders-claudetask-submit, builders-component-knowledge, builders-confluence, builders-dev-mesh-qa, builders-edge-ci-trace, builders-gitlab-ops, builders-helm-operator-generator, builders-install-cluster-plugin, builders-jira, builders-notify-wecom, builders-olm-operator-lifecycle, builders-prd-to-testcase, builders-publish-errata, builders-roadmap-studio, builders-story-split, builders-violet, builders-webapp-testing, cross-repo-add-mirror, cross-repo-publish, devops-add-bug-release-notes, devops-autodns, devops-bundle-csv-baseline-diff, devops-candidate-version-supervisor, devops-connectors-acceptance-test, devops-connectors-explore, devops-connectors-poc-case, devops-connectors-review, devops-connectors-unit-test, devops-connectors-upgrade-test, devops-connectors-write-user-docs, devops-creating-tekton-pipelines, devops-fix-go-vulns, devops-fork-alauda-binary-release, devops-gen-advanced-form-descriptors, devops-jira-rfd-acceptance, devops-knowledge-adoption, devops-refresh-containerfile-digests, devops-refresh-containerfile-tags, devops-replace-strings, devops-scan-docker-keywords, devops-sync-alauda-github-releases, devops-tekton-dynamic-form-optimizer, devops-tekton-operator-task-e2e, devops-tekton-pipeline-delivery, devops-tekton-refresh-results-tag, devops-tekton-task-delivery, devops-tekton-task-overview-template, devops-tekton-task-version-upgrade, devops-tekton-upgrade-notes, devops-tool-report-troubleshoot, devops-ui-e2e-code-audit, devops-ui-e2e-fix-base-on-report, devops-ui-e2e-regression-and-fix, devops-ui-generate-e2e-from-feature, devops-ui-pre-setup, devops-upgrade-go, devops-upstream-backport-cve, devops-upstream-upgrade
Reviewed at 2026-05-18 08:32:44 UTC

Summary

PR 1/3 adds foundational infrastructure for externalizing the violet binary invocation: new config schema (VioletConfig, Version.ExpectedSha256), an environment allowlist for exec, helper functions with table tests, and a targetCatalogSource constant. The code is additive and does not affect existing runtime paths, making this a safe, low-risk foundation layer.

Review Statistics

Category Count
Critical Issues 0
Warnings 0
Suggestions 4
Files Reviewed 5

Critical Issues

None.

Warnings

None.

Suggestions

  • pkg/exec/exec.go:85 (style/convention): matchAllowlist continues to the next iteration after a wildcard match but also has an unnecessary continue that can be removed since it's the last statement in the wildcard branch:
if strings.HasSuffix(pat, "*") {
    if strings.HasPrefix(name, strings.TrimSuffix(pat, "*")) {
        return true
    }
}
  • pkg/exec/exec.go:97 (style/convention): The strings.TrimSuffix(pat, "*") call on line 99 is evaluated unconditionally even when HasSuffix returns false. Consider extracting it to a variable to make intent clearer:
// Example improvement
if strings.HasSuffix(pat, "*") {
    prefix := strings.TrimSuffix(pat, "*")
    if strings.HasPrefix(name, prefix) {
        return true
    }
}
  • pkg/operator/operatorhub/violet.go:44 (style/formatting): The switch statement lacks a default case. Since all four cases return, this is functionally safe, but adding a default: return "", fmt.Errorf("unexpected") would make the exhaustive nature explicit and guard against future additions.

  • pkg/operator/operatorhub/violet_test.go (test/coverage): filterHostEnv and matchAllowlist in exec.go lack unit tests. Given these are now part of the public interface (via Command.EnvAllowlist), consider adding tests covering: empty allowlist, single exact match, wildcard prefix match, no match, and mixed allowlist entries.

Positive Feedback

  • Clean separation of concerns: The VioletConfig struct is well-designed with proper use of *bool to distinguish unset from explicit false.
  • Good error messages: BuildPackageURL returns descriptive errors for each empty input field, making debugging straightforward.
  • Security-conscious credential handling: Credentials are read from environment variables (VIOLET_REGISTRY_USERNAME/PASSWORD) and never persisted to disk or echoed in logs; MaskCommand provides safe logging.
  • Excellent test coverage: 22 subtests covering normal paths, edge cases, and error conditions across all four helper functions.
  • Backward compatibility preserved: EnvAllowlist with an empty slice defaults to full os.Environ() passthrough, ensuring no breaking changes.
  • Well-documented constants: The comment on targetCatalogSource explains its coupling with systemNamespace and why it lives in the same block.


ℹ️ About this review

This review was automatically generated using the run-actions workflow.

  • Shared prompt: .github/prompts/code-review.md
  • Config source: centralized
  • Profile path: Not Found
  • Profile ref: e75e733e9aa1b417a8b3c6441e53495dbcb418ad
  • No repository-specific prompt configured
  • Alauda skills: base-acp-operator-list, base-acp-operator-release, base-authoring, base-m365, base-ocp-operator-list, base-skill-setup, builders-alauda-component-e2e-release, builders-alauda-component-upgrade, builders-alauda-pipeline, builders-claudetask-submit, builders-component-knowledge, builders-confluence, builders-dev-mesh-qa, builders-edge-ci-trace, builders-gitlab-ops, builders-helm-operator-generator, builders-install-cluster-plugin, builders-jira, builders-notify-wecom, builders-olm-operator-lifecycle, builders-prd-to-testcase, builders-publish-errata, builders-roadmap-studio, builders-story-split, builders-violet, builders-webapp-testing, cross-repo-add-mirror, cross-repo-publish, devops-add-bug-release-notes, devops-autodns, devops-bundle-csv-baseline-diff, devops-candidate-version-supervisor, devops-connectors-acceptance-test, devops-connectors-explore, devops-connectors-poc-case, devops-connectors-review, devops-connectors-unit-test, devops-connectors-upgrade-test, devops-connectors-write-user-docs, devops-creating-tekton-pipelines, devops-fix-go-vulns, devops-fork-alauda-binary-release, devops-gen-advanced-form-descriptors, devops-jira-rfd-acceptance, devops-knowledge-adoption, devops-refresh-containerfile-digests, devops-refresh-containerfile-tags, devops-replace-strings, devops-scan-docker-keywords, devops-sync-alauda-github-releases, devops-tekton-dynamic-form-optimizer, devops-tekton-operator-task-e2e, devops-tekton-pipeline-delivery, devops-tekton-refresh-results-tag, devops-tekton-task-delivery, devops-tekton-task-overview-template, devops-tekton-task-version-upgrade, devops-tekton-upgrade-notes, devops-tool-report-troubleshoot, devops-ui-e2e-code-audit, devops-ui-e2e-fix-base-on-report, devops-ui-e2e-regression-and-fix, devops-ui-generate-e2e-from-feature, devops-ui-pre-setup, devops-upgrade-go, devops-upstream-backport-cve, devops-upstream-upgrade

@yhuan123 yhuan123 requested a review from kycheng May 18, 2026 10:54
Comment thread pkg/config/config.go

// DefaultVioletPackagePrefix is the MinIO root used when OperatorConfig.Violet
// is configured but its PackagePrefix is left blank.
const DefaultVioletPackagePrefix = "http://package-minio.alauda.cn:9199/packages/"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要硬编码在代码里

@yhuan123 yhuan123 merged commit 5e99557 into main May 18, 2026
4 checks passed
@yhuan123 yhuan123 deleted the feat/violet-helpers-and-config branch May 18, 2026 11:36
yhuan123 pushed a commit that referenced this pull request May 19, 2026
Three silent-failure paths fixed (P1), four hardening / cleanup changes
applied (P2). All P1 fixes close anti-patterns where preflight reported
"clean" while actually bypassing the guard it was built to enforce.

P1 — silent-failure closures:
- #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster
  matching violet.clusters; empty kubeconfig CurrentContext is now a hard
  error with an actionable 'kubectl config use-context' hint. Previously
  both paths log.Warn'd and returned nil, defeating the guard exactly when
  it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs).
- #15 BundleVersion required for operatorhub: empty value used to slip
  past validateConfig, producing 'operatorhub-foo.' AV names with a
  trailing dot — preflight reported clean while upgrade failed late with
  cryptic 404s. Required at load time.

P2 — internal hardening:
- #16 Drop the seen map from runPreflight: dead code under fail-fast
  semantics (architecture-strategist + simplicity-reviewer both flagged
  it independently). -14 LOC.
- #018 checkInstallPlanResidue no longer discards the NestedString error.
  status.phase type drift now surfaces a wrapped error instead of being
  swallowed and producing a 'every IP is non-terminal' noise storm.
- #019 New preflight.NewResidual constructor centralises kubectl cleanup
  template + %q shell-quoting. Three call sites in operatorhub/preflight.go
  go from 5-line struct literals to one-line constructor calls.

P3 follow-ups (020-023) recorded as separate todos for later triage.

Tests for the cmd-layer changes (#17) land in the next commit.
yhuan123 added a commit that referenced this pull request May 20, 2026
* feat(operator): scaffold PreflightBaseline interface

- new pkg/operator/preflight package holds the Residual value type so
  both pkg/operator and its concrete impls can import without cycles
- OperatorInterface gains PreflightBaseline(ctx, version)
  ([]preflight.Residual, error); contract: read-only, baseline-only
- local: no-op returning (nil, nil) — README will document the assumption
  that `make deploy` must be idempotent
- operatorhub: scaffolding stub; real check logic lands in follow-up
- plan: docs/plans/2026-05-19-feat-upgrade-preflight-check-plan.md

* feat(operatorhub): implement PreflightBaseline residue checks

Three read-only checks against the baseline of an upgrade path:

- Subscription/<name> in <namespace>
- ArtifactVersion/<artifact>.<bundleVersion> in cpaas-system
- non-terminal InstallPlan in <namespace>, filtered by the OLM-managed
  label operators.coreos.com/<package>.<ns> so we never scan the
  namespace's full historical IP archive

Behavior:
- 30s context timeout caps the whole call so a hung apiserver cannot
  block the upgrade for the default o.timeout (10m)
- transient API errors (ServerTimeout / TooManyRequests /
  ServiceUnavailable) are wrapped with a 'retry the run' hint
- permanent errors propagate with the check name attached
- kubectl cleanup hints quote both name and namespace via %q so
  copy-pasted commands stay shell-safe

CSV residue is intentionally NOT checked: an independent CSV without
a corresponding Subscription or ArtifactVersion is a degenerate state
already covered by the other three. Plan documents how to add
PackageManifest-driven CSV resolution if a future incident requires it.

Tests cover clean cluster, single/multi residue, non-terminal phases
(empty/Planning/RequiresApproval/Installing), terminal phases
(Complete/Failed) ignored, transient + permanent error paths, and a
read-only contract assertion (spy reactor fails the test on any
Create/Update/Patch/Delete).

* feat(config): validate namespace + bundleVersion for operatorhub

Two new load-time checks gated on Type == operatorhub:

1. operatorConfig.namespace must be non-empty. An empty namespace
   would let preflight's Subscription / InstallPlan queries silently
   degrade to cluster-scope, producing false-positive residue
   reports from completely unrelated operators.

2. Version.bundleVersion must match ^[a-zA-Z0-9._-]+$. The field is
   interpolated into kubectl cleanup hints emitted by preflight and
   into violet argv, so allowing shell metacharacters here would
   propagate $, backticks, semicolons or quotes straight into the
   user's terminal. Centralising the check at load time closes the
   injection vector for every downstream consumer with a single
   chokepoint.

Both checks fire only for operatorhub type — local mode runs
`make deploy` and has no equivalent risk surface.

Tests cover: namespace required (operatorhub), namespace optional
(local), shell-active characters rejected, realistic versions
(v4.6.3 / 4.6.3-rc.91 / v0.74.0) accepted.

* feat(cmd): wire preflight + cluster guard into Execute

Three additions to the upgrade entry point, all ahead of the upgrade loop:

1. *PreflightError (cmd-internal) formats []preflight.Residual into a
   multi-line copy-pasteable report: one cleanup block per residual,
   then a finalizer-unstuck command template, then the bypass hint.
   The type stays in cmd/ so pkg/operator never picks up a dependency
   on how the CLI talks to the user. Decision C (all-English text) is
   the locked default; Error() is the single place to change wording
   without touching any contract.

2. assertClusterMatch enforces the operator≡kubectl cluster identity
   contract whenever operatorConfig.violet.clusters is set. The
   recommended rule (Decision B, locked at default B1) is exact
   string equality against KUBECONFIG's CurrentContext via
   --confirm-cluster. In-cluster runs (no kubeconfig file) degrade
   to a loud warning so CI pods aren't silently blocked.

3. runPreflight iterates UpgradePaths, calls op.PreflightBaseline on
   each baseline (Versions[0]), and fails fast at the first dirty
   path with a *PreflightError. (Kind, NS, Name) dedup is in place
   for future aggregation; --skip-preflight bypasses the whole step
   with a Warn-level audit line.

Cobra's SilenceUsage is set in AddFlags so that PreflightError's
actionable kubectl lines are not buried under --help output on a
non-zero exit. Flag-parsing errors still print usage (they execute
before RunE), so users who typo a flag are not stranded.

* docs(preflight): README usage + CLAUDE.md hard constraints

- README: new 'Preflight 前置检查' section documents what gets
  scanned (Subscription / ArtifactVersion / non-terminal InstallPlan),
  the 30s timeout, a sample error message, the two new flags
  (--skip-preflight / --confirm-cluster), and the minimum RBAC verbs
  needed to run preflight in isolation.

- CLAUDE.md:
  - 数据流: preflight + cluster guard inserted ahead of the upgrade
    loop; explicit note that preflight does NOT honour cfg.Immediate
    (quality gate, not part of the upgrade chain).
  - Operator 抽象 paragraph mentions PreflightBaseline and the leaf
    pkg/operator/preflight package created to avoid an import cycle.
  - 硬约束: four new entries — PreflightBaseline must stay read-only
    and baseline-only (with a note on why CSV is intentionally
    excluded and where to add it back if needed); cmd.SilenceUsage
    is mandatory; --confirm-cluster matching rule (B1 default);
    BundleVersion regex chokepoint.

- Plan acceptance criteria all marked complete.

* docs(plan): mark preflight-check plan as completed

* fix: address P1+P2 review findings for preflight check

Three silent-failure paths fixed (P1), four hardening / cleanup changes
applied (P2). All P1 fixes close anti-patterns where preflight reported
"clean" while actually bypassing the guard it was built to enforce.

P1 — silent-failure closures:
- #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster
  matching violet.clusters; empty kubeconfig CurrentContext is now a hard
  error with an actionable 'kubectl config use-context' hint. Previously
  both paths log.Warn'd and returned nil, defeating the guard exactly when
  it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs).
- #15 BundleVersion required for operatorhub: empty value used to slip
  past validateConfig, producing 'operatorhub-foo.' AV names with a
  trailing dot — preflight reported clean while upgrade failed late with
  cryptic 404s. Required at load time.

P2 — internal hardening:
- #16 Drop the seen map from runPreflight: dead code under fail-fast
  semantics (architecture-strategist + simplicity-reviewer both flagged
  it independently). -14 LOC.
- #018 checkInstallPlanResidue no longer discards the NestedString error.
  status.phase type drift now surfaces a wrapped error instead of being
  swallowed and producing a 'every IP is non-terminal' noise storm.
- #019 New preflight.NewResidual constructor centralises kubectl cleanup
  template + %q shell-quoting. Three call sites in operatorhub/preflight.go
  go from 5-line struct literals to one-line constructor calls.

P3 follow-ups (020-023) recorded as separate todos for later triage.

Tests for the cmd-layer changes (#17) land in the next commit.

* test(cmd): unit tests for assertClusterMatch and runPreflight

Closes P2 #17 — the two cmd-layer functions added by PR #17 carried
zero coverage (pr-test-analyzer criticality 9).

Coverage:
- assertClusterMatch (10 sub-tests): violet nil / clusters empty no-ops;
  in-cluster missing/mismatched/matching --confirm-cluster; file-mode
  unreadable kubeconfig, empty current-context (pins the #14 fix in
  place), missing/mismatched/matching --confirm-cluster.
- runPreflight (5 sub-tests): clean cluster, empty Versions skipped,
  baseline-only invariant (regression guard: a refactor scanning all
  versions would surface mid-upgrade AVs as false positives), fail-fast
  across paths (second path's PreflightBaseline is verifiably not
  reached), operator errors propagate unwrapped (must NOT be wrapped
  as *PreflightError).

The fake operator records the Version passed to each PreflightBaseline
call so the baseline-only test asserts the actual contract rather than
just call count.

* chore(todos): remove completed P1/P2 review findings

The 7 todos (013-019) that drove the silent-failure fixes + cmd-layer
tests are done; their substance lives in the PR description and the
commits that closed them (1201665 + f721f8e). Removing the files
keeps todos/ as a list of OPEN work, which is its only useful state.

Remaining in todos/: 4 P3 follow-ups for this PR (020-023) and 6 P2
items from prior PRs that have not been actioned yet.

* chore(todos): drop P3 review follow-ups for PR #17

The four P3 nice-to-haves (test variants, internal simplifications,
agent-ergonomics knobs, misc cleanups) are not on the roadmap and
filing them as todos would just decorate the backlog with low-value
items nobody plans to pick up. The PR description still records the
reviewer recommendations for archeology; the file copies were
redundant.

---------

Co-authored-by: huanyang@alauda.io <huanyang@alauda.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants