feat(autoscaler): support scale from zero for Homogeneous Target by Abirdcfly · Pull Request #909 · volcano-sh/kthena

Abirdcfly · 2026-04-22T01:29:57Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

Add scale from zero logic in recommendation algorithm
Support fetch pending request metrics from router endpoint
Configure router metrics endpoint via ROUTER_METRICS_ENDPOINT env var
Use existing kthena_router_fairness_queue_size metric for pending requests

When minReplicas is set to 0 and there are pending requests for the model, autoscaler will automatically scale up from 0 to 1 instance.
Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

1. Add scale from zero logic in recommendation algorithm 2. Support fetch pending request metrics from router endpoint 3. Configure router metrics endpoint via ROUTER_METRICS_ENDPOINT env var 4. Use existing kthena_router_fairness_queue_size metric for pending requests When minReplicas is set to 0 and there are pending requests for the model, autoscaler will automatically scale up from 0 to 1 instance.

volcano-sh-bot · 2026-04-22T01:30:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign git-malu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces scale-from-zero support by enabling the autoscaler to fetch metrics from a router endpoint when no pods are active. The changes include updates to the metric collection logic and the recommendation algorithm. Review feedback identifies a critical logic error in the scale-up condition that could prevent activation for low request volumes and suggests several performance optimizations, such as reusing an HTTP client and streaming metric data to reduce memory allocations.

gemini-code-assist · 2026-04-22T01:31:44Z

+	if currentCount == 0 {
+		// If there are any pending requests, scale up to at least 1
+		if desired >= 1 {
+			return getCeilDesiredInstances(desired)
+		}
+		return 0
+	}


The condition desired >= 1 prevents the autoscaler from scaling up from zero when the number of pending requests is less than the target value per instance. For example, if the target is 10 and there is 1 pending request, desired is 0.1, and the code will return 0, leaving the system at zero replicas. To ensure the autoscaler scales up for any pending request, the check should be performed on the raw metric value. Since getCeilDesiredInstances returns at least 1 for any value > 0, this will correctly scale to 1 pod.

Suggested change

if currentCount == 0 {

// If there are any pending requests, scale up to at least 1

if desired >= 1 {

return getCeilDesiredInstances(desired)

}

return 0

}

if currentCount == 0 {

// If there are any pending requests, scale up to at least 1

if metric > 0 {

return getCeilDesiredInstances(desired)

}

return 0

}

gemini-code-assist · 2026-04-22T01:31:44Z

+	// RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero
+	// Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics
+	RouterMetricsEndpoint string


It is recommended to add a shared *http.Client to the MetricCollector struct. Creating a new http.Client for every metrics collection cycle (as seen in fetchMetricsFromRouter) is inefficient because it prevents TCP/TLS connection reuse and increases resource consumption.

// RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero // Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics RouterMetricsEndpoint string httpClient *http.Client

gemini-code-assist · 2026-04-22T01:31:44Z

-		WatchMetricList: util.ExtractKeysToSet(metricTargets),
+		MetricTargets:         metricTargets,
+		WatchMetricList:       util.ExtractKeysToSet(metricTargets),
+		RouterMetricsEndpoint: routerMetricsEndpoint,


Initialize the shared httpClient with the desired timeout in the constructor to ensure it is ready for use by the collector's methods.

RouterMetricsEndpoint: routerMetricsEndpoint, httpClient: &http.Client{ Timeout: util.AutoscaleCtxTimeoutSeconds * time.Second, },

gemini-code-assist · 2026-04-22T01:31:44Z

+	client := &http.Client{
+		Timeout: util.AutoscaleCtxTimeoutSeconds * time.Second,
+	}


Use the shared httpClient from the collector instead of creating a new one for every request to improve performance and resource efficiency.

resp, err := collector.httpClient.Do(req)

gemini-code-assist · 2026-04-22T01:31:44Z

+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return metrics, err
+	}
+
+	// Parse Prometheus metrics
+	decoder := expfmt.NewDecoder(strings.NewReader(string(body)), expfmt.NewFormat(expfmt.TypeTextPlain))


Instead of reading the entire response body into memory with io.ReadAll, you can pass resp.Body directly to the Prometheus decoder. This reduces memory allocations and improves performance by streaming the response content.

// Parse Prometheus metrics decoder := expfmt.NewDecoder(resp.Body, expfmt.NewFormat(expfmt.TypeTextPlain))

Copilot

Pull request overview

Adds “scale from zero” support for homogeneous targets by allowing the autoscaler to use router-provided pending-request metrics when there are no backend pods to scrape.

Changes:

Extend MetricCollector.UpdateMetrics to return external metrics, and plumb them through Scaler.Scale.
Fetch kthena-router Prometheus metrics from an optional ROUTER_METRICS_ENDPOINT when there are zero pods.
Add scale-from-zero logic for external metrics in the recommendation algorithm.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
pkg/autoscaler/autoscaler/scaler.go	Passes external metrics from the collector into the recommendation algorithm.
pkg/autoscaler/autoscaler/optimizer.go	Updates `UpdateMetrics` call sites for the new return signature.
pkg/autoscaler/autoscaler/metric_collector.go	Adds router metrics endpoint config + external metrics fetching/parsing when no pods exist.
pkg/autoscaler/algorithm/recommendation.go	Introduces special-case behavior for external metrics when `currentCount == 0`.

Comments suppressed due to low confidence (1)

pkg/autoscaler/algorithm/recommendation.go:105

This new scale-from-zero behavior for external metrics isn’t covered by unit tests. Since pkg/autoscaler/algorithm/recommendation_test.go already has extensive coverage for external metrics, please add cases where CurrentInstancesCount==0 and external metric is (a) 0 and (b) >0 to lock in the expected scale-from-zero semantics.

	// Handle scale from zero case
	if currentCount == 0 {
		// If there are any pending requests, scale up to at least 1
		if desired >= 1 {
			return getCeilDesiredInstances(desired)
		}
		return 0
	}
	ratio := desired / float64(currentCount)
	if math.Abs(ratio-1.0) <= tolerance {
		return currentCount
	}
	return getCeilDesiredInstances(desired)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T01:34:05Z

 func NewMetricCollector(target *v1alpha1.Target, binding *v1alpha1.AutoscalingPolicyBinding, metricTargets map[string]float64) *MetricCollector {
+	// Get router metrics endpoint from environment variable if set
+	routerMetricsEndpoint := os.Getenv("ROUTER_METRICS_ENDPOINT")
+
 	return &MetricCollector{
 		PastHistograms: datastructure.NewSnapshotSlidingWindow[map[string]HistogramInfo](util.SecondToTimestamp(util.SloQuantileSlidingWindowSeconds), util.SecondToTimestamp(util.SloQuantileDataKeepSeconds)),
 		Target:         target,
 		Scope: Scope{
 			Namespace:      binding.Namespace,
 			OwnedBindingId: binding.UID,
 		},
-		MetricTargets:   metricTargets,
-		WatchMetricList: util.ExtractKeysToSet(metricTargets),
+		MetricTargets:         metricTargets,
+		WatchMetricList:       util.ExtractKeysToSet(metricTargets),
+		RouterMetricsEndpoint: routerMetricsEndpoint,
 	}


ROUTER_METRICS_ENDPOINT is introduced as the configuration knob here, but the Helm chart’s controller-manager Deployment template currently only sets POD_NAMESPACE and doesn’t expose a way to set additional env vars. That means the new router metrics endpoint can’t be configured when installing via the chart. Consider wiring this env var through chart values (or documenting another supported configuration mechanism) so the feature is actually usable by default installations.

Copilot · 2026-04-22T01:34:06Z

+						// Add the queue size as pending request metric
+						addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())


The external metric is being stored under the hard-coded name "pending_requests", but the actual Prometheus metric you parse is "kthena_router_fairness_queue_size" and autoscaling policies match external metrics by metricTargets keys. If a user configures metricName=kthena_router_fairness_queue_size (as implied by the PR description), the algorithm will never see the external metric. Consider using the Prometheus metric name (mf.GetName()) as the key, or ensuring the policy/controller consistently uses the "pending_requests" name (including validation/docs).

Suggested change

// Add the queue size as pending request metric

addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())

// Store the value under the Prometheus metric name so autoscaling

// policies can match the configured external metric key directly.

addMetric(metrics, mf.GetName(), metric.GetGauge().GetValue())

Copilot · 2026-04-22T01:34:06Z

 	desired := metric / target
+	// Handle scale from zero case
+	if currentCount == 0 {
+		// If there are any pending requests, scale up to at least 1
+		if desired >= 1 {
+			return getCeilDesiredInstances(desired)
+		}
+		return 0
+	}


The new scale-from-zero branch changes behavior in a way that can prevent scaling up when there are pending requests but metric/target < 1 (e.g., target=10, metric=1). Previously, currentCount==0 would end up returning ceil(desired) which would scale to 1 for any positive desired. This also doesn’t fully match the PR description (“when there are pending requests … scale up from 0 to 1”). Consider basing the scale-from-zero decision on metric>0 (scale to at least 1), or otherwise clearly documenting/enforcing the intended threshold semantics.

Copilot AI review requested due to automatic review settings April 22, 2026 01:29

volcano-sh-bot added do-not-merge/work-in-progress kind/feature labels Apr 22, 2026

volcano-sh-bot requested review from git-malu and hzxuzhonghu April 22, 2026 01:30

volcano-sh-bot added the size/L label Apr 22, 2026

Copilot started reviewing on behalf of Abirdcfly April 22, 2026 01:30 View session

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autoscaler): support scale from zero for Homogeneous Target#909

feat(autoscaler): support scale from zero for Homogeneous Target#909
Abirdcfly wants to merge 1 commit intovolcano-sh:mainfrom
Abirdcfly:scale2zero

Abirdcfly commented Apr 22, 2026

Uh oh!

volcano-sh-bot commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Add the queue size as pending request metric
		addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())

-						// Add the queue size as pending request metric
-						addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())
+						// Store the value under the Prometheus metric name so autoscaling
+						// policies can match the configured external metric key directly.
+						addMetric(metrics, mf.GetName(), metric.GetGauge().GetValue())

Conversation

Abirdcfly commented Apr 22, 2026

Uh oh!

volcano-sh-bot commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants