Skip to content

feat: Allow configuring priorityClassName and terminationGracePeriodS…#7783

Open
cmwylie19 wants to merge 1 commit into
k0sproject:mainfrom
cmwylie19:7782-nllb-config
Open

feat: Allow configuring priorityClassName and terminationGracePeriodS…#7783
cmwylie19 wants to merge 1 commit into
k0sproject:mainfrom
cmwylie19:7782-nllb-config

Conversation

@cmwylie19

Copy link
Copy Markdown
Contributor

Description

The node-local load balancing (NLLB) Envoy Pod is a static pod generated by k0s in makePodManifest(). Its EnvoyProxy config only exposes image, imagePullPolicy, apiServerBindPort and konnectivityServerBindPort, so there is no way to set a priorityClassName or a terminationGracePeriodSeconds on the Pod.

This matters in practice. The Envoy Pod runs at priority 0, yet it is the worker's load-balanced path to the control plane. With graceful node shutdown enabled (shutdownGracePeriod / shutdownGracePeriodCriticalPods via a worker profile), the kubelet shutdown manager kills non-critical pods first and critical pods last. Because the Envoy Pod is priority 0, it is killed in the first phase, severing the worker's path to the API server ([::1]:7443) before the remaining pods can drain or report status:

Failed to update status for pod ...: Patch "https://[::1]:7443/...": unexpected EOF
... dial tcp [::1]:7443: connect: connection refused

This change exposes two new optional fields on spec.network.nodeLocalLoadBalancing.envoyProxy:

  • priorityClassName (string) — e.g. system-node-critical, so the Pod is protected from node-pressure eviction and shut down last during graceful node shutdown.
  • terminationGracePeriodSeconds (int64, >= 0) — override the default 30s grace period to let Envoy drain in-flight connections.

Both fields are plumbed through envoyPodParams into makePodManifest() and set on the Pod spec. Both default to unset, so existing behavior is unchanged.

Fixes #7782

Type of change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Manual: configured envoyProxy.priorityClassName: system-node-critical and terminationGracePeriodSeconds in k0s.yaml, restarted the worker, and confirmed the resulting static Pod carries both fields. With system-node-critical set, the Envoy Pod is shut down in the critical-pods phase during graceful node shutdown instead of immediately.

Auto: TestMakePodManifest asserts the fields propagate to the Pod spec (and are absent by default); TestEnvoyProxy_PriorityClassAndGracePeriod covers config parsing and rejection of a negative grace period.

Checklist

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

…econds on the NLLB Envoy Pod

Signed-off-by: Case Wylie <cmwylie19@gmail.com>
@cmwylie19 cmwylie19 requested review from a team as code owners June 11, 2026 18:07
@cmwylie19 cmwylie19 requested review from kke and twz123 June 11, 2026 18:07
@jnummelin

Copy link
Copy Markdown
Member

I wonder if we should rather actually "hardcode" the priority to system-node-critical 🤔

Like you said, having priority at 0 makes no sense as this really is node critical component. I can't think of a reason why anyone would actually have it on any other prio level.

The other thing I'm wondering is if we'd be able to use the recently merged resource patches feature for this. That'd allow users to basically override anything on the generated manifest without us having to maintain bazillion different opts in the config.

@cmwylie19

Copy link
Copy Markdown
Contributor Author

I wonder if we should rather actually "hardcode" the priority to system-node-critical 🤔

Like you said, having priority at 0 makes no sense as this really is node critical component. I can't think of a reason why anyone would actually have it on any other prio level.

The other thing I'm wondering is if we'd be able to use the recently merged resource patches feature for this. That'd allow users to basically override anything on the generated manifest without us having to maintain bazillion different opts in the config.

I couldn't agree more, it definitely needs system-node-critical, yes that looks like it may give us what we need in terms of theterminationGracePeriodSeconds too.

@cmwylie19

Copy link
Copy Markdown
Contributor Author

I wonder if we should rather actually "hardcode" the priority to system-node-critical 🤔
Like you said, having priority at 0 makes no sense as this really is node critical component. I can't think of a reason why anyone would actually have it on any other prio level.
The other thing I'm wondering is if we'd be able to use the recently merged resource patches feature for this. That'd allow users to basically override anything on the generated manifest without us having to maintain bazillion different opts in the config.

I couldn't agree more, it definitely needs system-node-critical, yes that looks like it may give us what we need in terms of theterminationGracePeriodSeconds too.

Just let me know how you would like me to proceed forward, if you want a default priorityClass or anything or if we need to close this. Happy to help!

@cmwylie19

Copy link
Copy Markdown
Contributor Author

I wonder if we should rather actually "hardcode" the priority to system-node-critical 🤔

Like you said, having priority at 0 makes no sense as this really is node critical component. I can't think of a reason why anyone would actually have it on any other prio level.

The other thing I'm wondering is if we'd be able to use the recently merged resource patches feature for this. That'd allow users to basically override anything on the generated manifest without us having to maintain bazillion different opts in the config.

@jnummelin I am not sure if we can use resource patches if I am following correctly based on the fact that the name of the nllb pod is based on the node name and the node-name is going to be random.

nllb-canes-b29b                   1/1     Running   0               7d19h   192.168.4.67   canes-b29b 
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
  namespace: kube-system
spec:
  network:
    nodeLocalLoadBalancing:
      envoyProxy:
        patches:
        - target:
            kind: Pod
            name: ???
            namespace: kube-system
          patch:
            type: StrategicMergePatch
            content: |
              spec:
                terminationGracePeriodSeconds: 60

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that need to be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure NLLB Envoy Pod Settings - Proposal for Enhancing Envoy Pod Configurability

2 participants