Skip to content

[ON HOLD] Modify kubelet with flags using Helm chart and jq during validation phase to improve node pressure management#1034

Open
pdamianov-dev wants to merge 33 commits intomainfrom
pdamianov-dev/k8s-pressure
Open

[ON HOLD] Modify kubelet with flags using Helm chart and jq during validation phase to improve node pressure management#1034
pdamianov-dev wants to merge 33 commits intomainfrom
pdamianov-dev/k8s-pressure

Conversation

@pdamianov-dev
Copy link
Contributor

This is related to the creation of a suitable framework to run pressure tests against node images as part of Node Hardening efforts.

Supporting documents:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “CRI pressure” execution path that can modify kubelet flags via a privileged DaemonSet before running ClusterLoader2, enabling node pressure testing for node hardening work.

Changes:

  • Switches the k8s-resource-pressure topology to use a new cri-pressure engine template.
  • Adds a new pipeline step template that applies a kubelet-modifying DaemonSet before running the benchmark.
  • Extends cri.py with a modify-kubelet subcommand and adds DaemonSet-manifest generation in clusterloader2/utils.py.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

File Description
steps/topology/k8s-resource-pressure/execute-clusterloader2.yml Routes the topology to the new cri-pressure execution template.
steps/engine/clusterloader2/cri-pressure/execute.yml Adds a pre-benchmark step to apply a kubelet config updater DaemonSet, then runs CL2.
modules/python/clusterloader2/utils.py Adds DaemonSet YAML generation for updating kubelet flags (but also modifies imports/initialization).
modules/python/clusterloader2/cri/cri.py Adds modify-kubelet CLI command and wires it to the DaemonSet generator.

"--registry_info", type=str, help="Container registry information scraped",
)

# Sub-command for modify-kubelet
Copy link
Contributor

@vittoriasalim vittoriasalim Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need to apply daemonset before benchmarking start, you can apply it at validation stage rather than in the engine e.g : https://github.com/Azure/telescope/pull/1045/changes
https://github.com/Azure/telescope/blob/main/steps/topology/karpenter/validate-resources.yml

I am not exactly sure what is this for
if you need to modify the node image, can do something like this https://github.com/Azure/telescope/pull/1044/changes

@@ -1,8 +1,34 @@
trigger: none

parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need parameters , simply enable them in your test

            n1-p60-memory-managed:
              node_count: 1
              max_pods: 60
              repeats: 1
              operation_timeout: 3m
              load_type: memory
              pod_startup_latency_threshold: 23s
              kubernetes_version: "1.34"
              k8s_os_disk_type: Managed
              scrape_kubelets: True
              enable_custom_kubelet: false
              kubelet_config_type: "eviction-hard" # eviction-soft, eviction-soft-grace-period

in validate-resources

can call
$KUBELET_CONFIG_TYPE

if KUBELET_CONFIG_TYPE == ëviction-hard :

  • override the memory, nodefs and pid
    else if
    else if
    so on and so on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to pass in different flags with each run without changing the pipeline definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intention may not be to schedule this pipeline, we will most likely do it manually or have a separate pipeline trigger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if i set a queue time variable instead?

Copy link
Contributor

@vittoriasalim vittoriasalim Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this pipeline guideline where all additional variables must be stored in the matrix format, we recommend all pipelines to follow this format.
Image

Copy link
Contributor

@vittoriasalim vittoriasalim Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable group are usually for secret key values like credentials and masked value of aks_custom_header. Since you only care about manual run, you can have 3 stages eviction-hard, eviction-soft. eviction-soft-grace-period. And you can select only the one which you want to trigger. Another option is to use the test branch and you can override whatever values you want for manual testing. And when you are sure of the values, you can submit schedule PR to the main branch
Image

Copy link
Contributor

@vittoriasalim vittoriasalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove parameters and only enable those variable inside your test

rename use_custom_kubelet to enable_custom_kubelet

@pdamianov-dev pdamianov-dev changed the title Add new tooling to cri.py to support modifying kubelet using a daemonset with predefined custom kubelet flags [ON HOLD] Modify kubelet with flags using Helm chart and jq during validation phase to improve node pressure management Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants