Skip to content

fix: thread StatusSocketPath through autopilot controllers#7605

Open
kimjune01 wants to merge 2 commits into
k0sproject:mainfrom
kimjune01:fix/autopilot-status-socket-path
Open

fix: thread StatusSocketPath through autopilot controllers#7605
kimjune01 wants to merge 2 commits into
k0sproject:mainfrom
kimjune01:fix/autopilot-status-socket-path

Conversation

@kimjune01

Copy link
Copy Markdown

Description

Autopilot hardcoded status.DefaultSocketPath in multiple controllers, breaking --status-socket and custom --run-dir configurations. The socket path configured via CfgVars.StatusSocketPath was never threaded through to autopilot's signal controllers or the cron updater.

This adds StatusSocketPath to RootConfig and passes it through:

  • signal.RegisterControllersk0s.RegisterControllers
  • restart / restarted reconcilers (version check and PID lookup)
  • updates.RegisterControllersnewCronUpdater (status info fetch)

Falls back to status.DefaultSocketPath when StatusSocketPath is empty, preserving existing behavior for default configurations.

Fixes #6750

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

Verified the StatusSocketPath propagation manually by tracing the call chain from component/controller/autopilot.go and component/worker/autopilot.go through to each consumer. All previously hardcoded status.DefaultSocketPath references are replaced with the threaded value.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My commit messages are signed-off

Autopilot hardcoded status.DefaultSocketPath in multiple controllers,
breaking --status-socket and custom --run-dir configurations.

Thread StatusSocketPath from CfgVars through RootConfig into signal
controllers (restart, restarted, version handler), and the cron updater.
Fall back to DefaultSocketPath when StatusSocketPath is empty.

Fixes k0sproject#6750

Signed-off-by: June Kim <kimjune01@gmail.com>
@kimjune01 kimjune01 requested review from a team as code owners May 9, 2026 16:50
@kimjune01 kimjune01 requested review from makhov and twz123 May 9, 2026 16:50
@twz123

twz123 commented May 11, 2026

Copy link
Copy Markdown
Member

Thanks! Were you able to test this?

@kimjune01

Copy link
Copy Markdown
Author

No runtime testing yet. The verification was through manual code inspection of the call chain. I can set up a test configuration with --status-socket if that would be helpful for validating the fix.

@kimjune01

Copy link
Copy Markdown
Author

Yes, I've now run the full autopilot unit test suite locally against this branch:

$ go test ./pkg/autopilot/... -count=1
ok   github.com/k0sproject/k0s/pkg/autopilot/channels         0.518s
ok   github.com/k0sproject/k0s/pkg/autopilot/checks            0.889s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller         1.390s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/delegate      2.761s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans         2.462s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans/cmdprovider/airgapupdate  6.282s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans/cmdprovider/k0supdate     3.652s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans/cmdprovider/k0supdate/discovery  7.935s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans/cmdprovider/k0supdate/utils      1.954s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/plans/core     4.132s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/signal/airgap  5.481s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/signal/common  4.849s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/signal/common/predicate  4.412s
ok   github.com/k0sproject/k0s/pkg/autopilot/controller/signal/k0s     7.166s
ok   github.com/k0sproject/k0s/pkg/autopilot/signaling/v2      7.189s
ok   github.com/k0sproject/k0s/pkg/autopilot/updater            6.175s

go build and go vet also pass clean on all changed packages (pkg/autopilot/..., pkg/component/controller/, pkg/component/worker/).

What I haven't tested: running an actual k0s cluster with --status-socket or a custom --run-dir to confirm the socket path reaches the autopilot controllers at runtime. That would require a multi-node cluster with autopilot enabled, which I don't have set up locally. If there's a preferred integration test setup or a smoke test I should run, happy to do that.

@twz123

twz123 commented May 13, 2026

Copy link
Copy Markdown
Member

What I haven't tested: running an actual k0s cluster with --status-socket or a custom --run-dir to confirm the socket path reaches the autopilot controllers at runtime. That would require a multi-node cluster with autopilot enabled, which I don't have set up locally. If there's a preferred integration test setup or a smoke test I should run, happy to do that.

You could parameterize one of the existing integration tests to use a custom run dir and a custom status socket. Not sure if it even has to be a multi-node cluster that gets updated? At least for the status socket, the code looks like it should fail for a single node cluster, as well, AFAICT.

I'd try to add a variant for, say, the ap-single integration test à la ap-single-custom-socket or sth. along the lines (see how os.Getenv("K0S_INTTEST_TARGET") is used for parameterization in other inttests), and then see if that's enough to repro the problem. But beware, the AP tests have some ... limitations 🙈: #3719. You might need to double check manually, or fix that issue first 😅

… run-dir

Adds a K0S_INTTEST_TARGET=ap-single-custom-socket variant to the existing
ap-single integration test. When invoked under that target, the controller
is launched with --status-socket=/run/k0s/custom/status.sock and
--data-dir=/var/lib/k0s-custom, exercising the StatusSocketPath plumbing
that the parent commit threads through the autopilot signal controllers.

After the k0supdate Plan completes, the test cross-checks (defense against
known issue k0sproject#3719) that the custom socket file exists and the default
/run/k0s/status.sock does not. With the previous hardcoded-default-socket
behavior, autopilot's post-restart status probe would have created or
contacted the default path.

The new check target reuses the ap-single test package and is wired into
the smoketests list so CI picks it up.

Refs k0sproject#6750

Signed-off-by: June Kim <kimjune01@gmail.com>
@kimjune01

Copy link
Copy Markdown
Author

Pushed 6a6574c parameterizing ap-single per your suggestion. New target K0S_INTTEST_TARGET=ap-single-custom-socket reuses the existing ap-single package; when set, SetupTest launches the controller with --status-socket=/run/k0s/custom/status.sock --data-dir=/var/lib/k0s-custom, then TestApply runs the same k0supdate Plan it always runs. After the Plan reaches PlanCompleted, the test asserts the custom socket exists and /run/k0s/status.sock does not — that second check is the regression guard against the bug, since prior to this PR the autopilot signal controllers' post-restart probe would have hit the hardcoded default path. (I added it explicitly because of #3719: PlanCompleted alone isn't sufficient evidence the cluster path actually worked.) Wired into smoketests so CI runs it.

Verified locally: go vet ./inttest/ap-single/..., go vet ./pkg/autopilot/..., and a compile-only go test -run NONE ./inttest/ap-single/... all pass; make -n check-ap-single-custom-socket resolves to the expected K0S_INTTEST_TARGET=ap-single-custom-socket go test ... github.com/k0sproject/k0s/inttest/ap-single. The actual bootloose run needs CI — I don't have a Docker-on-Linux setup that can spin the alpine bootloose image. If the post-Plan socket assertions turn out to be too strict on bootloose (e.g. some other component touches /run/k0s/status.sock before the autopilot controllers run), happy to relax to "custom socket exists" only.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that need to be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Non-default --status-socket or --run-dir break autopilot

2 participants