Add custom image support by jasminetMSFT · Pull Request #1054 · Azure/telescope

jasminetMSFT · 2026-02-12T00:47:15Z

Summary

Enables the use of custom container images in image pull benchmarks, allowing more realistic testing with large, multi-layer images that match real customer workloads.

Changes

Custom image support in deployment templates: Added TestImage parameter to deployment_template.yaml to allow overriding the default test image
CRI module updates: Extended cri.py and execute.yml to pass through the test_image parameter from pipeline configuration
Updated image-pull-n10 pipeline:
- Configured to use benchmark/customer-replica:v1 (10GB, 79-layer image matching customer manifest)
- Increased timeouts: operation_timeout: 30m, pod_startup_latency_threshold: 600s
- Reduced max_pods from 30 to 26 to prevent pod scheduling failures due to node resource constraints

Testing

Validated with 10-node cluster pulling 10GB custom image:

Pipeline: link

Copilot

Pull request overview

Adds support for overriding the container image used by the ClusterLoader2 CRI/image-pull benchmark so pipelines can exercise large, multi-layer “customer-like” images.

Changes:

Plumbs a new test_image/CL2_TEST_IMAGE parameter from the pipeline step into the CRI override generator (cri.py) and CL2 config.
Updates the deployment template to use a configurable image (AKS/Linux/memory path) and adds a topology spread constraint to improve node distribution.
Refines the containerd throughput “AvgPerNode” PromQL query to exclude nodes with no pulls.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
steps/engine/clusterloader2/cri/execute.yml	Passes `--test_image` into the CRI override step via env/config.
modules/python/clusterloader2/cri/cri.py	Adds `--test_image` CLI arg and writes `CL2_TEST_IMAGE` into overrides.
modules/python/clusterloader2/cri/config/deployment_template.yaml	Introduces `TestImage` template param, uses it for the AKS/Linux/memory image, and adds topology spread constraints.
modules/python/clusterloader2/cri/config/containerd-measurements.yaml	Filters “AvgPerNode” to active nodes in PromQL.
modules/python/clusterloader2/cri/config/config.yaml	Wires `CL2_TEST_IMAGE` into the deployment template fill map.

Comments suppressed due to low confidence (1)

modules/python/clusterloader2/cri/config/deployment_template.yaml:59

When TestImage is not the default, this template no longer renders an explicit command/args for the memory container (the stress block is gated by if eq $TestImage ...). That changes the benchmark from a known long-running workload to whatever the image entrypoint does, which can cause early exits or add non-pull-related variance. Consider keeping a stable long-running command for all images, or making the command configurable alongside TestImage.

          {{if eq $TestImage "e2e-test-images/resource-consumer:1.13"}}
        command:
          - stress
        args:
          - --vm

Copilot · 2026-02-12T00:53:12Z

modules/python/clusterloader2/cri/config/deployment_template.yaml

+      # Spread pods across nodes to ensure each node pulls the image
+      topologySpreadConstraints:
+      - maxSkew: 1
+        topologyKey: kubernetes.io/hostname
+        whenUnsatisfiable: ScheduleAnyway


The comment says this topologySpreadConstraints block “ensure[s] each node pulls the image”, but with whenUnsatisfiable: ScheduleAnyway it’s only best-effort (it won’t guarantee one pod per node). Consider either softening the wording, or switching to DoNotSchedule if you need a hard guarantee.

Copilot · 2026-02-12T00:53:12Z

modules/python/clusterloader2/cri/cri.py

    node_count, node_per_step, max_pods, repeats, operation_timeout,
    load_type, scale_enabled, pod_startup_latency_threshold, provider,
-    registry_endpoint, os_type, scrape_kubelets, scrape_containerd, containerd_scrape_interval, host_network, override_file):
+    registry_endpoint, test_image, os_type, scrape_kubelets, scrape_containerd, containerd_scrape_interval, host_network, override_file):


override_config_clusterloader2 now requires the new test_image parameter, but existing in-repo callers (e.g., modules/python/tests/test_cri.py) invoke it without test_image, which will raise a TypeError and break the unit test suite. Consider making test_image optional with a default (matching the CLI default) and/or updating all internal call sites/tests to pass it explicitly.

Suggested change

registry_endpoint, test_image, os_type, scrape_kubelets, scrape_containerd, containerd_scrape_interval, host_network, override_file):

registry_endpoint, test_image=None, os_type=None, scrape_kubelets=False, scrape_containerd=False, containerd_scrape_interval=None, host_network=False, override_file=None):

Copilot · 2026-02-12T00:53:12Z

modules/python/clusterloader2/cri/config/deployment_template.yaml

          {{if eq $Provider "aks"}}
-        image: {{$RegistryEndpoint}}/e2e-test-images/resource-consumer:1.13
+        image: {{$RegistryEndpoint}}/{{$TestImage}}
          {{else}}
        image: registry.k8s.io/e2e-test-images/resource-consumer:1.13
          {{end}}


TestImage is added as a template parameter, but it’s only used in the Linux/memory branch (Provider == aks). The CPU branch (and non-AKS providers) still uses a hard-coded image, so setting CL2_TEST_IMAGE will be ignored in those cases. If custom image support is intended generally for this scenario, apply it consistently (or document the limitation).

jikuma · 2026-02-12T22:27:38Z

pipelines/perf-eval/ACR Benchmark/image-pull-n10.yml

            image-pull-10pods-authenticated:
              node_count: 10
-              max_pods: 30
+              max_pods: 26


Why are we reducing the pod count.

Read the summary, its because of node resource constrain.

Is node autoscaling enabled on the node pool?

No, auto scaling is not enabled as shown in this line.

jasminetMSFT added 8 commits February 11, 2026 03:14

Add test_image parameter support for custom benchmark images

8a62452

Update new-pipeline-test.yml for custom image testing

b4aac2d

modify image in deployment file

3ca734f

Increase timeouts for 10GB image pull test

dced8b3

Revert new-pipeline-test.yml to original template

3bcc4b1

change image in image-pull-n10

4a0b7a3

Reduce max_pods to 26 to prevent scheduling failures

9827643

Revert new-pipeline-test.yml to original

ec6dbec

jasminetMSFT marked this pull request as ready for review February 12, 2026 00:48

jasminetMSFT requested review from alyssa1303, anson627, sumanthreddy29 and xinWeiWei24 as code owners February 12, 2026 00:48

Copilot AI review requested due to automatic review settings February 12, 2026 00:48

jasminetMSFT requested review from vittoriasalim and wonderyl as code owners February 12, 2026 00:48

Copilot started reviewing on behalf of jasminetMSFT February 12, 2026 00:49 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Update CRI tests to include test_image parameter

d342ff2

jikuma reviewed Feb 12, 2026

View reviewed changes

jikuma approved these changes Feb 12, 2026

View reviewed changes

jasminetMSFT changed the title ~~Jasminet/custom image support~~ Add custom image support Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom image support#1054

Add custom image support#1054
jasminetMSFT wants to merge 9 commits intomainfrom
jasminet/custom-image-support

jasminetMSFT commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

jikuma Feb 12, 2026

Uh oh!

jikuma Feb 12, 2026

Uh oh!

jikuma Feb 12, 2026

Uh oh!

jasminetMSFT Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	registry_endpoint, test_image, os_type, scrape_kubelets, scrape_containerd, containerd_scrape_interval, host_network, override_file):
	registry_endpoint, test_image=None, os_type=None, scrape_kubelets=False, scrape_containerd=False, containerd_scrape_interval=None, host_network=False, override_file=None):

Conversation

jasminetMSFT commented Feb 12, 2026

Summary

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

jikuma Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

jikuma Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

jikuma Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

jasminetMSFT Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants