chore: rename artifact_version.go + drop 6 P2 todos fixed by PR #14#15
Merged
Conversation
…o (PR 3/3) 修复 PR 1 之前就遗留的文件名 typo(多了一个尾巴的 g)。 PR 3 原计划要清理的 dead code `createArtifactVersion` 已经在 PR 2 的 commit 12f63d2 (Drop dead createArtifactVersion: violet binary owns the write path now) 一并清掉了——此 PR 收尾把这个一直没人收的文件名 typo 也解决掉,确保仓库里再没有 `versiong` 这个错字。 同步更新引用:docs/brainstorms、docs/plans、todos/004/005/007/010。 其中 `:行号` 是 2026-05-18 plan / PR 2 review 时刻的历史快照,PR 2 重写后行号已不对应,但属于历史性引用,保留原值不动。 Plan: docs/plans/2026-05-18-refactor-externalize-violet-binary-plan.md
Contributor
🤖 AI Code Review
SummaryPR 3/3 renames Review Statistics
Critical Issues(None) Warnings(None) Suggestions
Positive Feedback
ℹ️ About this reviewThis review was automatically generated using the
|
PR #14 的 commit a036487 + 12f63d2 已经把以下 6 个 P2 finding 实际修复, 对应 todo 文件失去价值,直接删除(按"todos/ 是看板不是档案"约定)。 | 删除文件 | 修复 commit | 改动要点 | |---------|-------------|---------| | 001 http-download-no-timeout | a036487 | downloadToTemp 用 context.WithTimeout(ctx, o.timeout) 包裹 | | 004 package-manifest-substring-csv-match| a036487 | strings.Contains → 精确匹配 + surface NestedString type error | | 005 wait-av-no-retry-on-transient-errors| a036487 | 引入 isTransientAPIError + switch 分支:NotFound/Transient 继续轮询 | | 006 download-close-error-swallowed | a036487 | explicit Close + 显式 error check,避免 truncated tgz 绕过 sha256 | | 007 empty-csv-fed-to-subscription | a036487 | InstallArtifactVersion 返回 (av, csv, err) 替代二次 NestedString | | 008 channel-default-inconsistency | a036487 + 12f63d2 | 删 "stable" 静默 fallback;channel required-check 移到 validateConfig | 剩余 pending 的 P2(002 / 003 / 009 / 010 / 011 / 012)继续保留—— 代码尚未落地,归档为 completed 会变成"修报告掩盖未做工作"。
kycheng
approved these changes
May 19, 2026
yhuan123
pushed a commit
that referenced
this pull request
May 19, 2026
Three silent-failure paths fixed (P1), four hardening / cleanup changes applied (P2). All P1 fixes close anti-patterns where preflight reported "clean" while actually bypassing the guard it was built to enforce. P1 — silent-failure closures: - #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster matching violet.clusters; empty kubeconfig CurrentContext is now a hard error with an actionable 'kubectl config use-context' hint. Previously both paths log.Warn'd and returned nil, defeating the guard exactly when it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs). - #15 BundleVersion required for operatorhub: empty value used to slip past validateConfig, producing 'operatorhub-foo.' AV names with a trailing dot — preflight reported clean while upgrade failed late with cryptic 404s. Required at load time. P2 — internal hardening: - #16 Drop the seen map from runPreflight: dead code under fail-fast semantics (architecture-strategist + simplicity-reviewer both flagged it independently). -14 LOC. - #018 checkInstallPlanResidue no longer discards the NestedString error. status.phase type drift now surfaces a wrapped error instead of being swallowed and producing a 'every IP is non-terminal' noise storm. - #019 New preflight.NewResidual constructor centralises kubectl cleanup template + %q shell-quoting. Three call sites in operatorhub/preflight.go go from 5-line struct literals to one-line constructor calls. P3 follow-ups (020-023) recorded as separate todos for later triage. Tests for the cmd-layer changes (#17) land in the next commit.
yhuan123
added a commit
that referenced
this pull request
May 20, 2026
* feat(operator): scaffold PreflightBaseline interface
- new pkg/operator/preflight package holds the Residual value type so
both pkg/operator and its concrete impls can import without cycles
- OperatorInterface gains PreflightBaseline(ctx, version)
([]preflight.Residual, error); contract: read-only, baseline-only
- local: no-op returning (nil, nil) — README will document the assumption
that `make deploy` must be idempotent
- operatorhub: scaffolding stub; real check logic lands in follow-up
- plan: docs/plans/2026-05-19-feat-upgrade-preflight-check-plan.md
* feat(operatorhub): implement PreflightBaseline residue checks
Three read-only checks against the baseline of an upgrade path:
- Subscription/<name> in <namespace>
- ArtifactVersion/<artifact>.<bundleVersion> in cpaas-system
- non-terminal InstallPlan in <namespace>, filtered by the OLM-managed
label operators.coreos.com/<package>.<ns> so we never scan the
namespace's full historical IP archive
Behavior:
- 30s context timeout caps the whole call so a hung apiserver cannot
block the upgrade for the default o.timeout (10m)
- transient API errors (ServerTimeout / TooManyRequests /
ServiceUnavailable) are wrapped with a 'retry the run' hint
- permanent errors propagate with the check name attached
- kubectl cleanup hints quote both name and namespace via %q so
copy-pasted commands stay shell-safe
CSV residue is intentionally NOT checked: an independent CSV without
a corresponding Subscription or ArtifactVersion is a degenerate state
already covered by the other three. Plan documents how to add
PackageManifest-driven CSV resolution if a future incident requires it.
Tests cover clean cluster, single/multi residue, non-terminal phases
(empty/Planning/RequiresApproval/Installing), terminal phases
(Complete/Failed) ignored, transient + permanent error paths, and a
read-only contract assertion (spy reactor fails the test on any
Create/Update/Patch/Delete).
* feat(config): validate namespace + bundleVersion for operatorhub
Two new load-time checks gated on Type == operatorhub:
1. operatorConfig.namespace must be non-empty. An empty namespace
would let preflight's Subscription / InstallPlan queries silently
degrade to cluster-scope, producing false-positive residue
reports from completely unrelated operators.
2. Version.bundleVersion must match ^[a-zA-Z0-9._-]+$. The field is
interpolated into kubectl cleanup hints emitted by preflight and
into violet argv, so allowing shell metacharacters here would
propagate $, backticks, semicolons or quotes straight into the
user's terminal. Centralising the check at load time closes the
injection vector for every downstream consumer with a single
chokepoint.
Both checks fire only for operatorhub type — local mode runs
`make deploy` and has no equivalent risk surface.
Tests cover: namespace required (operatorhub), namespace optional
(local), shell-active characters rejected, realistic versions
(v4.6.3 / 4.6.3-rc.91 / v0.74.0) accepted.
* feat(cmd): wire preflight + cluster guard into Execute
Three additions to the upgrade entry point, all ahead of the upgrade loop:
1. *PreflightError (cmd-internal) formats []preflight.Residual into a
multi-line copy-pasteable report: one cleanup block per residual,
then a finalizer-unstuck command template, then the bypass hint.
The type stays in cmd/ so pkg/operator never picks up a dependency
on how the CLI talks to the user. Decision C (all-English text) is
the locked default; Error() is the single place to change wording
without touching any contract.
2. assertClusterMatch enforces the operator≡kubectl cluster identity
contract whenever operatorConfig.violet.clusters is set. The
recommended rule (Decision B, locked at default B1) is exact
string equality against KUBECONFIG's CurrentContext via
--confirm-cluster. In-cluster runs (no kubeconfig file) degrade
to a loud warning so CI pods aren't silently blocked.
3. runPreflight iterates UpgradePaths, calls op.PreflightBaseline on
each baseline (Versions[0]), and fails fast at the first dirty
path with a *PreflightError. (Kind, NS, Name) dedup is in place
for future aggregation; --skip-preflight bypasses the whole step
with a Warn-level audit line.
Cobra's SilenceUsage is set in AddFlags so that PreflightError's
actionable kubectl lines are not buried under --help output on a
non-zero exit. Flag-parsing errors still print usage (they execute
before RunE), so users who typo a flag are not stranded.
* docs(preflight): README usage + CLAUDE.md hard constraints
- README: new 'Preflight 前置检查' section documents what gets
scanned (Subscription / ArtifactVersion / non-terminal InstallPlan),
the 30s timeout, a sample error message, the two new flags
(--skip-preflight / --confirm-cluster), and the minimum RBAC verbs
needed to run preflight in isolation.
- CLAUDE.md:
- 数据流: preflight + cluster guard inserted ahead of the upgrade
loop; explicit note that preflight does NOT honour cfg.Immediate
(quality gate, not part of the upgrade chain).
- Operator 抽象 paragraph mentions PreflightBaseline and the leaf
pkg/operator/preflight package created to avoid an import cycle.
- 硬约束: four new entries — PreflightBaseline must stay read-only
and baseline-only (with a note on why CSV is intentionally
excluded and where to add it back if needed); cmd.SilenceUsage
is mandatory; --confirm-cluster matching rule (B1 default);
BundleVersion regex chokepoint.
- Plan acceptance criteria all marked complete.
* docs(plan): mark preflight-check plan as completed
* fix: address P1+P2 review findings for preflight check
Three silent-failure paths fixed (P1), four hardening / cleanup changes
applied (P2). All P1 fixes close anti-patterns where preflight reported
"clean" while actually bypassing the guard it was built to enforce.
P1 — silent-failure closures:
- #13/#14 cluster guard: in-cluster mode now requires --confirm-cluster
matching violet.clusters; empty kubeconfig CurrentContext is now a hard
error with an actionable 'kubectl config use-context' hint. Previously
both paths log.Warn'd and returned nil, defeating the guard exactly when
it mattered most (CI pods with violet.clusters set, multi-context kubeconfigs).
- #15 BundleVersion required for operatorhub: empty value used to slip
past validateConfig, producing 'operatorhub-foo.' AV names with a
trailing dot — preflight reported clean while upgrade failed late with
cryptic 404s. Required at load time.
P2 — internal hardening:
- #16 Drop the seen map from runPreflight: dead code under fail-fast
semantics (architecture-strategist + simplicity-reviewer both flagged
it independently). -14 LOC.
- #018 checkInstallPlanResidue no longer discards the NestedString error.
status.phase type drift now surfaces a wrapped error instead of being
swallowed and producing a 'every IP is non-terminal' noise storm.
- #019 New preflight.NewResidual constructor centralises kubectl cleanup
template + %q shell-quoting. Three call sites in operatorhub/preflight.go
go from 5-line struct literals to one-line constructor calls.
P3 follow-ups (020-023) recorded as separate todos for later triage.
Tests for the cmd-layer changes (#17) land in the next commit.
* test(cmd): unit tests for assertClusterMatch and runPreflight
Closes P2 #17 — the two cmd-layer functions added by PR #17 carried
zero coverage (pr-test-analyzer criticality 9).
Coverage:
- assertClusterMatch (10 sub-tests): violet nil / clusters empty no-ops;
in-cluster missing/mismatched/matching --confirm-cluster; file-mode
unreadable kubeconfig, empty current-context (pins the #14 fix in
place), missing/mismatched/matching --confirm-cluster.
- runPreflight (5 sub-tests): clean cluster, empty Versions skipped,
baseline-only invariant (regression guard: a refactor scanning all
versions would surface mid-upgrade AVs as false positives), fail-fast
across paths (second path's PreflightBaseline is verifiably not
reached), operator errors propagate unwrapped (must NOT be wrapped
as *PreflightError).
The fake operator records the Version passed to each PreflightBaseline
call so the baseline-only test asserts the actual contract rather than
just call count.
* chore(todos): remove completed P1/P2 review findings
The 7 todos (013-019) that drove the silent-failure fixes + cmd-layer
tests are done; their substance lives in the PR description and the
commits that closed them (1201665 + f721f8e). Removing the files
keeps todos/ as a list of OPEN work, which is its only useful state.
Remaining in todos/: 4 P3 follow-ups for this PR (020-023) and 6 P2
items from prior PRs that have not been actioned yet.
* chore(todos): drop P3 review follow-ups for PR #17
The four P3 nice-to-haves (test variants, internal simplifications,
agent-ergonomics knobs, misc cleanups) are not on the roadmap and
filing them as todos would just decorate the backlog with low-value
items nobody plans to pick up. The PR description still records the
reviewer recommendations for archeology; the file copies were
redundant.
---------
Co-authored-by: huanyang@alauda.io <huanyang@alauda.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
Plan 拆出三个 PR:
PR 3 原计划要删的 `createArtifactVersion` 在 PR 2 的 commit `12f63d2` 一并清掉了。这个 PR 把另外两个 long-overdue 的 cleanup 也一起收尾。
Summary
Commit 1 — rename Go 源文件 typo
Commit 2 — 清理 todos/
PR #14 的 `a036487` + `12f63d2` 已经把 6 个 P2 finding 实际修复。todos/ 目录定位为「看板」而非「档案」,按"完成即删"约定直接移除对应 pending 文件:
剩余 pending P2(002 / 003 / 009 / 010 / 011 / 012)继续保留 — 代码尚未落地,归档会变成"编辑报告掩盖未做工作"。
验证
回退性
纯 rename + 字符串替换 + 删 stale 文档,git revert 一键恢复,无运行时影响。