fix: change priorityclass default for nllb pod by cmwylie19 · Pull Request #7787 · k0sproject/k0s

cmwylie19 · 2026-06-11T22:03:15Z

Description

The nllb pod has a very low priority class, during a graceful shutdown kubelet removes it first which results in a severed path from the worker to the kas (kube-apiserver) before the other pods can drain or report statuses.

This was a suggestion from @jnummelin. I still think we need a way to add graceful shutdown too (7783) but @jnummelin suggests there may be another way.

Noticed an error in the tests when setting only the numeric Priority on the static Pod broke the check-nllb jobs. Tthe kubelet runs the static Pod fine locally, but for every static Pod it also creates a read-only "mirror" Pod in the API so the Pod is visible to kubectl in the cluster, and that create call goes through normal admission. When it registers the mirror Pod in the kas, the Priority admission controller rejects an explicit integer priority that doesn't match a priorityClassName (priority admission controller computed 0 from the given PriorityClass name). So the Pod never became ready and the tests timed out. So that is why i needed to add the priorityClassName too

Fixes #7786
Relates to #7783

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Manual test
[n/a] Auto test added

Checklist

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

pschichtel · 2026-06-12T00:05:14Z

Does the priority actually matter? We concluded in #6958, where this was considered before, that it probably doesn't, since nllb is running as static pods and the pod resource doesn't actually influence scheduling and execution of these pods.

twz123 · 2026-06-12T07:47:54Z

+			// The Envoy Pod is the worker's load-balanced path to the control
+			// plane, so it must outlive ordinary workloads during graceful node
+			// shutdown and be protected from node-pressure eviction.
+			PriorityClassName: "system-node-critical",


We need the same for the Traefik pod, as well, then.

AFAIK we need to use the numerical priorities only for static pods as kubelet does NOT understand the priority classes directly. It's the api-server / controller-manager that maps the prio class to numerical value at pod creation time.

Gotcha, TY for the feedback. I will update this to adjust to priority only!

✅ @jnummelin let me know if you or anyone else wants to see any more changes

jnummelin · 2026-06-12T07:48:19Z

hmm, interesting. Based on this I personally think eviction/drain cases are treated the same as for other pods.

I think we really need to do some testing on this.

Also, I don't think we can use priorityClass on a static pod. That's something the api-server uses to map into a priority number. So AFAIK kubelet doesn't really understand the priorityClass per se at all, only the numerical values.

jnummelin · 2026-06-12T08:40:11Z

I think more importantly, priority affects the order of things in the case of a gracefull node shutdown:

During a graceful shutdown, kubelet terminates pods in two phases:

Terminate regular pods running on the node.
Terminate [critical pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical) running on the node.

ref: https://kubernetes.io/docs/concepts/cluster-administration/node-shutdown/

cmwylie19 · 2026-06-12T12:07:47Z

hmm, interesting. Based on this I personally think eviction/drain cases are treated the same as for other pods.

I think we really need to do some testing on this.

Also, I don't think we can use priorityClass on a static pod. That's something the api-server uses to map into a priority number. So AFAIK kubelet doesn't really understand the priorityClass per se at all, only the numerical values.

Gotcha, I am going to add some debugging notes @Josh-Tracy from testing.

First config out of the box

> k get cm -n kube-system worker-config-default-1.32 -o yaml | grep Grace
\"shutdownGracePeriod\":\"0s\",\"shutdownGracePeriodCriticalPods\":\"0s\"

> systemd-inhibit --list | grep kubelet #no output, good.

Updated config for this test - 30s gives use 15 seconds for normal pods and 15 seconds for critical pods.

> cat /etc/k0s/k0s.yaml 
apiVersion: k0s.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s
spec:
  featureGates:
    - name: ImageVolume
      enabled: true
  network:
    nodeLocalLoadBalancing:
      enabled: true
      type: EnvoyProxy
  workerProfiles:
    - name: default
      values:
        shutdownGracePeriod: 30s
        shutdownGracePeriodCriticalPods: 15s
> systemctl restart k0sworker
> systemctl restart k0scontroller

What the worker logs

> journalctl -u k0sworker -f
The system will power off now!

Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.128688    1722 nodeshutdown_manager_linux.go:236] \"Shutdown manager detected new shutdown event, isNodeShuttingDownNow\" event=true" component=kubelet stream=stderr
Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.128729    1722 nodeshutdown_manager_linux.go:244] \"Shutdown manager detected new shutdown event\" event=\"shutdown\"" component=kubelet stream=stderr
Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.128761    1722 nodeshutdown_manager_linux.go:293] \"Shutdown manager processing shutdown event\"" component=kubelet stream=stderr
Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.129848    1722 setters.go:602] \"Node became not ready\" node=\"canes-da6a\" condition={\"type\":\"Ready\",\"status\":\"False\",\"lastHeartbeatTime\":\"2026-06-11T14:34:17Z\",\"lastTransitionTime\":\"2026-06-11T14:34:17Z\",\"reason\":\"KubeletNotReady\",\"message\":\"node is shutting down\"}" component=kubelet stream=stderr
Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.130053    1722 nodeshutdown_manager.go:153] \"Shutdown manager killing pod with gracePeriod\" pod=\"kube-system/coredns-9f76996f-vlrq2\" gracePeriod=15" component=kubelet stream=stderr
Jun 11 14:34:17 canes-da6a k0s[1346]: time="2026-06-11 14:34:17" level=info msg="I0611 14:34:17.130068    1722 nodeshutdown_manager.go:153] \"Shutdown manager killing pod with gracePeriod\" pod=\"kube-system/nllb-canes-da6a\" gracePeriod=15" component=kubelet stream=stderr

Result:

the pods just sat in running, pending, or unknown when watching from the control node and the worker node's k0sworker was reporting the following for the pods. Might need to fix this? I think its the nllb pod being terminated immediately meaning the worker cannot communicate with the controllers API server and report status due to nllb having a priorityClass of 0

Failed to update status for pod ... Patch "https://[::1]:7443/...": unexpected EOF
... dial tcp [::1]:7443: connect: connection refused

Based on your link, would it be reasonable to bump/adjust the priority of Traefik and NLLB pod?

Once enhancement we did on our end was to adjust the workerProfile, but we are unable to assign priority to these critical workloads which is why our shutdowns are not happening gracefully.

  workerProfiles:
    - name: default
      values:
        shutdownGracePeriodByPodPriority:
          - priority: 0
            shutdownGracePeriodSeconds: 15
          - priority: 1000000           # postgres-operator-pod
            shutdownGracePeriodSeconds: 20
          - priority: 1000000000        # kubevirt-cluster-critical
            shutdownGracePeriodSeconds: 30
          - priority: 2000000000        # system-cluster-critical
            shutdownGracePeriodSeconds: 20
          - priority: 2000001000        # system-node-critical
            shutdownGracePeriodSeconds: 20

Josh-Tracy · 2026-06-12T12:23:48Z

Want to clarify that shutdown does happen gracefully as expected. The status of the workloads that were running just never updates as expected.

jnummelin · 2026-06-13T07:41:38Z

Based on your link, would it be reasonable to bump/adjust the priority of Traefik and NLLB pod?

Yes, that is my understanding. I have to say, the upstream docs are not really clear on this topic for static pods though.

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

twz123

Thanks!

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

cmwylie19 · 2026-06-17T00:46:29Z

Thanks!

Had to add priorityClassName to satisfy the kas requirements for the priority admission controller of the static pod because the priority did not match an actual priority class. Didn't notice until i saw failing tests

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

twz123 · 2026-06-18T18:07:51Z

Could you please revert the timeout changes? They won't make the integration tests less flaky. We already have that on the radar. I'll just retrigger them until they pass. That's kinda annoying, but some of the requred fixes require upstream commits and aren't that trivial.

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

cmwylie19 · 2026-06-18T18:27:08Z

Could you please revert the timeout changes? They won't make the integration tests less flaky. We already have that on the radar. I'll just retrigger them until they pass. That's kinda annoying, but some of the requred fixes require upstream commits and aren't that trivial.

perfect, done!

k0s-bot · 2026-06-19T04:51:43Z

Successfully created backport PR for release-1.36:

[Backport release-1.36] fix: change priorityclass default for nllb pod #7829

fix: change priorityclass default

4c28ff7

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

cmwylie19 requested review from a team as code owners June 11, 2026 22:03

cmwylie19 requested review from jnummelin and kke June 11, 2026 22:03

twz123 reviewed Jun 12, 2026

View reviewed changes

chore: system-node-critical priority value

7763ba1

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

twz123 previously approved these changes Jun 16, 2026

View reviewed changes

twz123 added bug Something isn't working area/worker backport/release-1.36 PR that needs to be backported/cherrypicked to the release-1.36 branch labels Jun 16, 2026

chore: fix test

30966f3

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

cmwylie19 dismissed twz123’s stale review via 30966f3 June 17, 2026 00:41

chore: bump timeout

edd0566

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

chore: revert timeout

c809852

Signed-off-by: Case Wylie <cmwylie19@gmail.com>

twz123 approved these changes Jun 19, 2026

View reviewed changes

twz123 merged commit 955832c into k0sproject:main Jun 19, 2026
217 of 220 checks passed

k0s-bot mentioned this pull request Jun 19, 2026

[Backport release-1.36] fix: change priorityclass default for nllb pod #7829

Open

Conversation

cmwylie19 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist

Uh oh!

pschichtel commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twz123 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

jnummelin Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

cmwylie19 Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

cmwylie19 Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

jnummelin commented Jun 12, 2026

Uh oh!

jnummelin commented Jun 12, 2026

Uh oh!

cmwylie19 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Josh-Tracy commented Jun 12, 2026

Uh oh!

jnummelin commented Jun 13, 2026

Uh oh!

twz123 left a comment

Choose a reason for hiding this comment

Uh oh!

cmwylie19 commented Jun 17, 2026

Uh oh!

twz123 commented Jun 18, 2026

Uh oh!

cmwylie19 commented Jun 18, 2026

Uh oh!

Uh oh!

k0s-bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cmwylie19 commented Jun 11, 2026 •

edited

Loading

pschichtel commented Jun 12, 2026 •

edited

Loading

cmwylie19 commented Jun 12, 2026 •

edited

Loading