Skip to content

feat(cli): add okactl command-line tool for sandbox operations#497

Open
Liquorice-Ma wants to merge 10 commits into
openkruise:masterfrom
Liquorice-Ma:agents-cli
Open

feat(cli): add okactl command-line tool for sandbox operations#497
Liquorice-Ma wants to merge 10 commits into
openkruise:masterfrom
Liquorice-Ma:agents-cli

Conversation

@Liquorice-Ma

@Liquorice-Ma Liquorice-Ma commented Jun 3, 2026

Copy link
Copy Markdown

Ⅰ. Describe what this PR does

Add okactl — a kubectl-style CLI tool for OpenKruise Agents sandbox operations, eliminating the need to hand-write YAML or use kubectl edit.
Commands
okactl scale sandboxset NAME --replicas=N (alias: sbs)
Scale a SandboxSet's replica count via JSON Merge Patch (atomic, no optimistic-lock conflicts).
okactl set image sandboxset NAME CONTAINER=IMAGE [...] (alias: sbs)
Update container images in a SandboxSet's inline template. Detects TemplateRef usage and guides users to modify the SandboxTemplate directly. Supports --wait with --timeout to poll until the rolling update completes.
okactl restart sandbox NAME [-c CONTAINER ...] [--all] [--failure-policy=Fail|Ignore] (alias: sbx)
Restart containers in a running Sandbox by creating an OpenKruise ContainerRecreateRequest (CRR). Requires explicit -c or --all to prevent accidental full restarts. Supports --failure-policy (Fail stops on error, Ignore continues).
okactl create suo -l SELECTOR CONTAINER=IMAGE [...] (alias: sandboxupdateops)
Create a SandboxUpdateOps to batch-update images for claimed sandboxes. Uses server-side label selector filtering. Automatically cleans up existing active SUOs, with finalizer removal as a fallback only if the controller is unavailable.
okactl status sbs NAME / okactl status suo NAME
Show update progress for SandboxSet or SandboxUpdateOps, with automatic diagnosis of stalled updates (e.g., ImagePullBackOff, insufficient resources).
Files
cmd/okactl/main.go — CLI entrypoint
pkg/cli/ — All command implementations and unit tests
docs/developer-manuals/okactl.md — Developer manual
docs/proposals/ — Three design proposals
test/e2e/okactl_test.go — E2E tests

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. Describe how to verify it

# Build
make build-okactl

# Scale a SandboxSet
okactl scale sandboxset my-sbs --replicas=5 -n sandbox-system

# Update container images
okactl set image sandboxset my-sbs main=nginx:2.0 sidecar=envoy:2.0 -n sandbox-system

# Restart containers
okactl restart sandbox my-sbx -c main -c sidecar -n sandbox-system
okactl restart sandbox my-sbx --all -n sandbox-system

# Batch-update claimed sandboxes
okactl create suo -l app=openclaw main=nginx:2.0 -n sandbox-system

# Check update progress
okactl status sbs my-sbs -n sandbox-system
okactl status suo my-suo -n sandbox-system

# Unit tests
go test ./pkg/cli/...

Ⅳ. Special notes for reviews

restart sandbox uses dynamic.Interface to create OpenKruise ContainerRecreateRequest (CRR) because CRR belongs to apps.kruise.io, not agents.kruise.io.
create suo deletes existing active SUOs before creating a new one. It first issues a Delete and waits for the controller to process finalizer cleanup (which also removes sandbox labels). Finalizer is force-removed via MergePatch only as a timeout fallback.
restart sandbox requires explicit -c or --all; it does not restart all containers silently by default.
Developer manual is also synced to the openkruise.io repo.

@kruise-bot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zmberg for approval by writing /assign @zmberg in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.97749% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.13%. Comparing base (1ee08c4) to head (2146550).
⚠️ Report is 56 commits behind head on master.

Files with missing lines Patch % Lines
pkg/cli/restart.go 83.75% 17 Missing and 9 partials ⚠️
pkg/cli/setimage.go 86.97% 16 Missing and 9 partials ⚠️
pkg/cli/options.go 74.54% 11 Missing and 3 partials ⚠️
pkg/cli/create.go 89.00% 6 Missing and 5 partials ⚠️
pkg/cli/scale.go 94.33% 2 Missing and 1 partial ⚠️
pkg/cli/status.go 96.72% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #497      +/-   ##
==========================================
+ Coverage   78.34%   80.13%   +1.78%     
==========================================
  Files         162      208      +46     
  Lines       11739    15416    +3677     
==========================================
+ Hits         9197    12353    +3156     
- Misses       2187     2606     +419     
- Partials      355      457     +102     
Flag Coverage Δ
unittests 80.13% <86.97%> (+1.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/agent-cli-design.md Outdated

OpenKruise Agents 项目原有四个组件(controller、manager、gateway、runtime),都是长运行服务。日常运维中,对 SandboxSet 的扩缩容、镜像更新、容器重启等操作只能通过 `kubectl edit` 或手写 YAML 完成,操作繁琐且容易出错。

为此,我们新增了第五个组件 **agent-cli** —— 一个 kubectl 风格的命令行工具,提供三个核心命令:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider rename to okactl

Comment thread docs/agent-cli-design.md Outdated

前两个命令(scale、set image)是**纯客户端操作**,直接通过 K8s API 修改 SandboxSet CR,由已有的 SandboxSet controller 自动处理变更。

第三个命令(restart)采用了**CRD 驱动模式**(参照 OpenKruise 的 ContainerRecreateRequest):CLI 创建一个 `SandboxContainerRestart` CR,新增的 controller 监听并执行实际的容器重启操作。

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli create CRR is enough, no need to create SandboxContainerRestart CR

Comment thread docs/agent-cli-design.md Outdated
@@ -0,0 +1,227 @@
# agent-cli 命令行工具设计与实现文档

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz put the proposal in docs/proposals, and rewrite the design in English for wider audience

@Liquorice-Ma Liquorice-Ma changed the title 新增agent-cli命令行工具 feat(cli): add okactl command-line tool for sandbox operations Jun 5, 2026
feat(cli): add okactl command-line tool for sandbox operations

fix(cli): add in-cluster config support for okactl running inside Pods

Update okactl binary

okactl scale/setimage/restart -h
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 4 times, most recently from 920a645 to 57d6ff2 Compare June 15, 2026 03:48
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 2 times, most recently from 06fb203 to b48e3df Compare June 22, 2026 06:42
Add create suo subcommand that creates SandboxUpdateOps to batch update
container images of claimed sandboxes by label selector.
Includes auto-cleanup of existing SUOs, container name validation,
and correct PodTemplateSpec-level patch structure.

Also adds proposal doc for the create suo feature.
- Add E2E tests for scale, set image, create suo, and restart commands
- Add dedicated e2e-okactl.yaml workflow running on kind cluster
- Add unit tests to meet 80% coverage requirement
- Fix set image optimistic lock conflict with retry.RetryOnConflict
@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 2 times, most recently from 59cca84 to cbfe589 Compare June 22, 2026 09:37
… display

- Add top-level 'okactl status' command group with 'sbs' and 'suo' subcommands
  - status sbs: show SandboxSet rolling update progress with auto-diagnosis
  - status suo: show SandboxUpdateOps batch update progress
  - Both support --wait flag for polling until completion
- Add cobra Aliases for resource short names (sandboxset/sbs, sandboxupdateops/suo)
- Customize cobra usage template to display subcommand aliases in help output
- Remove 'set image status' subcommand (replaced by top-level 'status sbs')
- Update set image examples to reference 'okactl status sbs'
- Add developer manual for okactl CLI
- Add proposal document for status command design
- Add unit tests (coverage: 85.2%) and E2E tests for new commands

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>

@furykerry furykerry left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz never submit binary to the repo

Comment thread pkg/cli/create.go Outdated
}

var deleted []string
for i := range list.Items {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteActiveSandboxUpdateOps deletes ALL SUOs, not just active ones, plz Only delete SUOs with Phase == Pending || Phase == Updating

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread pkg/cli/restart.go Outdated
}

if sbx.Spec.Template == nil {
return nil

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz fetch the referenced SandboxTemplate to validate if Template nil

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread pkg/cli/setimage.go Outdated

// diagnoseSandboxSetUpdate checks sandboxes belonging to a SandboxSet and reports any issues.
// It builds a kubernetes client to inspect pod status when sandbox messages are empty.
func diagnoseSandboxSetUpdate(globalOpts *GlobalOptions, sbs *agentsv1alpha1.SandboxSet, reported *map[string]bool) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ach diagnosis call creates two new REST clients via globalOpts.AgentsClient() and globalOpts.KubeClient(), which involves re-reading kubeconfig and establishing new TLS connections. In --wait mode this happens every 3 poll cycles (~9s), consider pass the already-created clients as parameters to diagnoseSandboxSetUpdate.

Comment thread pkg/cli/create.go
}

// formatSuoImagePairs formats a map of container=image pairs as a slice of "container=image" strings.
func formatSuoImagePairs(images map[string]string) []string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ormatSuoImagePairs() and buildSuoImagePatch() generate Non-deterministic output from map iteration, consider sort keys before iteration in both functions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread pkg/cli/setimage.go Outdated
}

// parseContainerImages parses "container=image" pairs and returns a map.
func parseContainerImages(args []string) (map[string]string, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setimage.go and create.go duplicate parseContainerImages / parseSuoContainerImages, Consider extracting a shared parseImageArgs() to reduce duplication.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

okactl status sbs my-pool --wait

# Batch update images for claimed sandboxes
okactl create suo -l app=my-app app=nginx:1.25

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to considate create suo command with set image

okactl set image -l app=my-app app=nginx:1.25

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally set image supported both sbs and sbx, but create suo was split out in an earlier review so that set image only handles SandboxSet while claimed sandboxes are updated via SUO.

Comment thread docs/developer-manuals/okactl.md Outdated
okactl set image sbs my-pool app=nginx:1.25

# Check update progress (or wait for completion)
okactl status sbs my-pool --wait

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wait should be the option of set image not status

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

马赫 added 2 commits June 24, 2026 12:01
Binary is now distributed via GitHub Releases instead of committing
to the repository.

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>
- Use apierrors.IsNotFound instead of string matching in waitForSUODeletion
- Replace parseSuoSelectorToMap with metav1.ParseToLabelSelector for full
  label selector syntax support (key in (v1,v2), key!=value, etc.)
- Validate container names against all matching sandboxes instead of only
  the first; warn on partial mismatch, error only when missing from all
- Add Running phase check to restart command before creating CRR
- Remove --wait flag from status command (belongs to set image only)
- Sync developer manual and proposal documents

Signed-off-by: 马赫 <Yurong.mh@alibaba-inc.com>

@zmberg zmberg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed a couple of items — see inline comments.

Comment thread pkg/cli/restart.go
}

containers := o.containers
if len(containers) == 0 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting ALL containers by default is dangerous.

When no -c flag is provided, the command silently restarts every container in the sandbox (including sidecars, init containers, etc.). A user might accidentally run okactl restart sbx my-sbx without -c and cause a production incident.

Suggested fix: make -c required and add an explicit --all flag:

okactl restart sbx my-sbx -c app        # restart specific container
okactl restart sbx my-sbx --all         # explicitly restart all
okactl restart sbx my-sbx               # error: must specify -c or --all

Alternatively, when no -c is given, list available container names and ask the user to specify explicitly.

Comment thread cmd/okactl/main.go
limitations under the License.
*/

package main

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description is severely outdated and does not match the actual implementation.

The PR description mentions:

  • Binary name: agent-cli → Actual: okactl
  • Path: cmd/agent-cli/ → Actual: cmd/okactl/
  • New CRD: SandboxContainerRestart (scr) → Actual: uses OpenKruise ContainerRecreateRequest (CRR), no new CRD
  • New controller: pkg/controller/sandboxcontainerrestart/ → Actual: no new controller
  • 3 commands → Actual: 5 commands (scale, set image, restart, create suo, status)
  • Included 47MB binary to be removed → Actual: binary already removed, .gitignore added

Please update the PR description to accurately reflect the current implementation.

@zmberg zmberg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-level review — see 8 inline comments below covering error handling, missing timeouts, hardcoded values, and logic gaps.

Comment thread pkg/cli/setimage.go Outdated

printSandboxSetStatus(sbs)
var reported map[string]bool
kubeClient, _ := globalOpts.KubeClient()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error silently ignored. Project guidelines state "Never ignore errors. Always check and handle err."

If the kubeconfig is invalid, the diagnosis feature will silently skip with no feedback to the user. Consider returning the error or at least printing a warning.

kubeClient, err := globalOpts.KubeClient()
if err != nil {
    fmt.Fprintf(os.Stderr, "Warning: failed to create kube client for diagnosis: %v\n", err)
}

Comment thread pkg/cli/setimage.go
// Create kubeClient once for diagnosis instead of on every poll cycle
kubeClient, _ := globalOpts.KubeClient()

for {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No timeout for --wait. The polling loop runs forever with no exit condition beyond completion. If the update never completes (e.g., misconfigured image, persistent resource shortage), the CLI hangs indefinitely.

Also, ctx.Done() is never checked, which means Ctrl+C / SIGTERM won't be handled gracefully.

Suggestion: add a --timeout flag (default e.g. 5m) and use context.WithTimeout. Also add select { case <-ctx.Done(): return ctx.Err(); case <-time.After(pollInterval): } instead of time.Sleep.

Comment thread pkg/cli/setimage.go Outdated
}

sbxList, err := agentsClient.Sandboxes(ns).List(context.TODO(), metav1.ListOptions{
LabelSelector: fmt.Sprintf("agents.kruise.io/sandbox-template=%s", sbs.Name),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded label selector key. The string "agents.kruise.io/sandbox-template" should use the constant from the API package (e.g. agentsv1alpha1.LabelSandboxTemplate). If the label key ever changes, this will silently break.

Comment thread pkg/cli/create.go Outdated
return fmt.Errorf("invalid label selector %q: %w", o.selector, err)
}

// List all sandboxes and filter by label selector

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client-side filtering instead of server-side. All sandboxes are listed and then filtered locally. For namespaces with many sandboxes this is inefficient and puts unnecessary load on the API server.

Suggestion: pass the selector to the API server directly:

sbxList, err := client.Sandboxes(ns).List(ctx, metav1.ListOptions{
    LabelSelector: o.selector,
})

This also makes the labels.Parse call at line 112 unnecessary.

Comment thread pkg/cli/create.go Outdated
for i := range sandboxes {
sbx := &sandboxes[i]
known := make(map[string]bool)
if sbx.Spec.Template != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateSuoImageContainers doesn't handle TemplateRef sandboxes. When sbx.Spec.Template == nil (TemplateRef is used), the known map is empty, so every requested container will be counted as "missing" for that sandbox. This could trigger false positive warnings or even block creation if ALL matched sandboxes use TemplateRef.

Suggestion: resolve container names from the referenced SandboxTemplate when Template is nil, similar to how fetchContainerNames in restart.go handles it.

Comment thread pkg/cli/create.go Outdated
// removeSUOFinalizer removes the finalizer from a SandboxUpdateOps via JSON patch.
// This allows the SUO to be deleted immediately without waiting for the controller to process it.
func removeSUOFinalizer(client apiv1alpha1.ApiV1alpha1Interface, ns, name string) error {
patch := []byte(`[{"op":"replace","path":"/metadata/finalizers","value":[]}]`)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded JSON Patch with replace op. The replace operation on /metadata/finalizers will fail if the field doesn't exist (no finalizers). Consider using MergePatch with {"metadata":{"finalizers":[]}} which is idempotent regardless of whether finalizers exist.

Also, this bypasses the SUO controller's finalizer logic. If the controller relies on the finalizer to clean up internal state (e.g., expectations, rate limiters), removing it externally could leave stale state behind.

Comment thread pkg/cli/restart.go Outdated
"podName": sandboxName,
"containers": containerTargets,
"strategy": map[string]interface{}{
"failurePolicy": "Fail",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failurePolicy hardcoded to "Fail". Users cannot choose the Ignore policy, which would continue restarting remaining containers even if one fails. Consider adding a --failure-policy flag to let users choose between Fail and Ignore.

@zmberg zmberg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional code-level review — 8 more inline comments covering CLI responsibility boundaries, error handling, and type safety.

General note: create.go currently only contains the create suo subcommand. Consider renaming to createsuo.go for consistency with setimage.go naming convention.

Comment thread pkg/cli/create.go Outdated
return fmt.Errorf("invalid label selector %q: %w", o.selector, err)
}

// List all sandboxes and filter by label selector

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI should not perform sandbox listing, filtering, or container validation. These three blocks (list sandboxes, client-side filter by label selector, validate container names) are the SUO controller's responsibility, not the CLI's. The CLI's job is simply to create the SUO resource with the selector and patch — the controller will match sandboxes by selector and record any container name issues in the SUO status.

Problems with CLI doing this validation:

  1. Client-side filtering is inefficient (already noted in a separate comment)
  2. Validation logic is duplicated with the controller, requiring maintenance in two places
  3. TOCTOU issue: the check inspects current state, but sandboxes may change before the SUO is processed

Suggestion: Remove the list/filter/validate block entirely. Just build the SUO spec and create it.

Comment thread pkg/cli/create.go Outdated
}

// Delete any active (non-terminal) SUO before creating a new one
if err := deleteActiveSandboxUpdateOps(client, ns); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI should not manage existing resource lifecycles. A CLI tool's job is to create resources, not to delete active SUOs or force-remove finalizers. If there's a conflict with an existing active SUO, the API server or an admission webhook should reject the new creation with a clear error.

Additionally, deleteActiveSandboxUpdateOps deletes ALL active SUOs in the namespace regardless of selector, which could affect unrelated operations.

Suggestion: Remove this block entirely. If conflict detection is needed, add it as an admission webhook on the API server side.

Comment thread pkg/cli/create.go
return err
}

labelSelector, err := metav1.ParseToLabelSelector(o.selector)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label selector is parsed twice. labels.Parse at line 112 for client-side filtering, and metav1.ParseToLabelSelector here for the SUO spec. If the client-side filtering is removed (as suggested above), only this parse is needed.

Comment thread pkg/cli/create.go Outdated
const maxWait = 10 * time.Second
const pollInterval = 500 * time.Millisecond

for elapsed := time.Duration(0); elapsed < maxWait; elapsed += pollInterval {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elapsed counter is inaccurate. It only accumulates pollInterval but ignores the actual time spent in the Get call. Under slow API server conditions, the real wait duration could significantly exceed maxWait.

Suggestion: Use context.WithTimeout and check ctx.Done() instead.

Comment thread pkg/cli/options.go
// When running locally, it falls back to the kubeconfig file.
func (o *GlobalOptions) RESTConfig() (*rest.Config, error) {
if o.KubeConfig == "" && o.Context == "" {
if cfg, err := inClusterConfigFn(); err == nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-cluster config failure is silently swallowed. If the user expects in-cluster config to work but it fails (e.g., missing ServiceAccount token), the code silently falls back to kubeconfig with no indication. Consider logging at debug level when in-cluster config is attempted but fails.

Comment thread pkg/cli/restart.go
})
}

// CRR targets the Pod which has the same name as the Sandbox

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the official OpenKruise typed client instead of unstructured.Unstructured. OpenKruise provides a type-sa ent API at github.com/openkruise/kruise-api that includes ContainerRecreateRequest typed definitions. Using map[string]interface{} to manually construct the CRR is error-prone and hard to maintain.

Reference: https://github.com/openkruise/kruise-api

Benefits of using the typed client:

  1. Compile-time type checking, preventing field name typos
  2. No need to manually maintain apiVersion/kind strings
  3. DeepCopy and serialization are guaranteed by generated code
  4. Direct use of apps.kruise.io/v1alpha1.ContainerRecreateRequest type

Comment thread pkg/cli/restart.go Outdated
"apiVersion": "apps.kruise.io/v1alpha1",
"kind": "ContainerRecreateRequest",
"metadata": map[string]interface{}{
"generateName": sandboxName + "-restart-",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateName causes CRR accumulation. Each restart creates a new CRR with no cleanup of previous ones. Multiple restarts of the sa sandbox will accumulate CRR objects in the namespace.

Suggestion: Consider checking for an existing active CRR and reusing it, or using a deterministic name to avoid duplication.

Comment thread pkg/cli/setimage.go Outdated
}
for container := range images {
if !found[container] {
return fmt.Errorf("container %q not found in sandboxset %q", container, name)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error wrapping inside RetryOnConflict. This validation error is correctly not retried, but it gets wrapped by the outer error as "failed to update sandboxset: container not found" — misleading because the update didn't fail, the pre-update validation did.

Suggestion: Move validation outside RetryOnConflict, or return a sentinel error that skips the outer wrapping.

@Liquorice-Ma Liquorice-Ma force-pushed the agents-cli branch 2 times, most recently from b56e69e to 8519b29 Compare June 29, 2026 05:57
fix 16 comments

12

12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants