Skip to content

feat(autoscaler): support scale from zero for Homogeneous Target#909

Draft
Abirdcfly wants to merge 1 commit intovolcano-sh:mainfrom
Abirdcfly:scale2zero
Draft

feat(autoscaler): support scale from zero for Homogeneous Target#909
Abirdcfly wants to merge 1 commit intovolcano-sh:mainfrom
Abirdcfly:scale2zero

Conversation

@Abirdcfly
Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:

  1. Add scale from zero logic in recommendation algorithm
  2. Support fetch pending request metrics from router endpoint
  3. Configure router metrics endpoint via ROUTER_METRICS_ENDPOINT env var
  4. Use existing kthena_router_fairness_queue_size metric for pending requests

When minReplicas is set to 0 and there are pending requests for the model, autoscaler will automatically scale up from 0 to 1 instance.
Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


1. Add scale from zero logic in recommendation algorithm
2. Support fetch pending request metrics from router endpoint
3. Configure router metrics endpoint via ROUTER_METRICS_ENDPOINT env var
4. Use existing kthena_router_fairness_queue_size metric for pending requests

When minReplicas is set to 0 and there are pending requests for the model,
autoscaler will automatically scale up from 0 to 1 instance.
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign git-malu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces scale-from-zero support by enabling the autoscaler to fetch metrics from a router endpoint when no pods are active. The changes include updates to the metric collection logic and the recommendation algorithm. Review feedback identifies a critical logic error in the scale-up condition that could prevent activation for low request volumes and suggests several performance optimizations, such as reusing an HTTP client and streaming metric data to reduce memory allocations.

Comment on lines +94 to +100
if currentCount == 0 {
// If there are any pending requests, scale up to at least 1
if desired >= 1 {
return getCeilDesiredInstances(desired)
}
return 0
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition desired >= 1 prevents the autoscaler from scaling up from zero when the number of pending requests is less than the target value per instance. For example, if the target is 10 and there is 1 pending request, desired is 0.1, and the code will return 0, leaving the system at zero replicas. To ensure the autoscaler scales up for any pending request, the check should be performed on the raw metric value. Since getCeilDesiredInstances returns at least 1 for any value > 0, this will correctly scale to 1 pod.

Suggested change
if currentCount == 0 {
// If there are any pending requests, scale up to at least 1
if desired >= 1 {
return getCeilDesiredInstances(desired)
}
return 0
}
if currentCount == 0 {
// If there are any pending requests, scale up to at least 1
if metric > 0 {
return getCeilDesiredInstances(desired)
}
return 0
}

Comment on lines +50 to +52
// RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero
// Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics
RouterMetricsEndpoint string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to add a shared *http.Client to the MetricCollector struct. Creating a new http.Client for every metrics collection cycle (as seen in fetchMetricsFromRouter) is inefficient because it prevents TCP/TLS connection reuse and increases resource consumption.

	// RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero
	// Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics
	RouterMetricsEndpoint string
	httpClient            *http.Client

WatchMetricList: util.ExtractKeysToSet(metricTargets),
MetricTargets: metricTargets,
WatchMetricList: util.ExtractKeysToSet(metricTargets),
RouterMetricsEndpoint: routerMetricsEndpoint,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Initialize the shared httpClient with the desired timeout in the constructor to ensure it is ready for use by the collector's methods.

		RouterMetricsEndpoint: routerMetricsEndpoint,
		httpClient: &http.Client{
			Timeout: util.AutoscaleCtxTimeoutSeconds * time.Second,
		},

Comment on lines +285 to +287
client := &http.Client{
Timeout: util.AutoscaleCtxTimeoutSeconds * time.Second,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the shared httpClient from the collector instead of creating a new one for every request to improve performance and resource efficiency.

	resp, err := collector.httpClient.Do(req)

Comment on lines +299 to +305
body, err := io.ReadAll(resp.Body)
if err != nil {
return metrics, err
}

// Parse Prometheus metrics
decoder := expfmt.NewDecoder(strings.NewReader(string(body)), expfmt.NewFormat(expfmt.TypeTextPlain))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of reading the entire response body into memory with io.ReadAll, you can pass resp.Body directly to the Prometheus decoder. This reduces memory allocations and improves performance by streaming the response content.

	// Parse Prometheus metrics
	decoder := expfmt.NewDecoder(resp.Body, expfmt.NewFormat(expfmt.TypeTextPlain))

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “scale from zero” support for homogeneous targets by allowing the autoscaler to use router-provided pending-request metrics when there are no backend pods to scrape.

Changes:

  • Extend MetricCollector.UpdateMetrics to return external metrics, and plumb them through Scaler.Scale.
  • Fetch kthena-router Prometheus metrics from an optional ROUTER_METRICS_ENDPOINT when there are zero pods.
  • Add scale-from-zero logic for external metrics in the recommendation algorithm.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/autoscaler/autoscaler/scaler.go Passes external metrics from the collector into the recommendation algorithm.
pkg/autoscaler/autoscaler/optimizer.go Updates UpdateMetrics call sites for the new return signature.
pkg/autoscaler/autoscaler/metric_collector.go Adds router metrics endpoint config + external metrics fetching/parsing when no pods exist.
pkg/autoscaler/algorithm/recommendation.go Introduces special-case behavior for external metrics when currentCount == 0.
Comments suppressed due to low confidence (1)

pkg/autoscaler/algorithm/recommendation.go:105

  • This new scale-from-zero behavior for external metrics isn’t covered by unit tests. Since pkg/autoscaler/algorithm/recommendation_test.go already has extensive coverage for external metrics, please add cases where CurrentInstancesCount==0 and external metric is (a) 0 and (b) >0 to lock in the expected scale-from-zero semantics.
	// Handle scale from zero case
	if currentCount == 0 {
		// If there are any pending requests, scale up to at least 1
		if desired >= 1 {
			return getCeilDesiredInstances(desired)
		}
		return 0
	}
	ratio := desired / float64(currentCount)
	if math.Abs(ratio-1.0) <= tolerance {
		return currentCount
	}
	return getCeilDesiredInstances(desired)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 55 to 69
func NewMetricCollector(target *v1alpha1.Target, binding *v1alpha1.AutoscalingPolicyBinding, metricTargets map[string]float64) *MetricCollector {
// Get router metrics endpoint from environment variable if set
routerMetricsEndpoint := os.Getenv("ROUTER_METRICS_ENDPOINT")

return &MetricCollector{
PastHistograms: datastructure.NewSnapshotSlidingWindow[map[string]HistogramInfo](util.SecondToTimestamp(util.SloQuantileSlidingWindowSeconds), util.SecondToTimestamp(util.SloQuantileDataKeepSeconds)),
Target: target,
Scope: Scope{
Namespace: binding.Namespace,
OwnedBindingId: binding.UID,
},
MetricTargets: metricTargets,
WatchMetricList: util.ExtractKeysToSet(metricTargets),
MetricTargets: metricTargets,
WatchMetricList: util.ExtractKeysToSet(metricTargets),
RouterMetricsEndpoint: routerMetricsEndpoint,
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROUTER_METRICS_ENDPOINT is introduced as the configuration knob here, but the Helm chart’s controller-manager Deployment template currently only sets POD_NAMESPACE and doesn’t expose a way to set additional env vars. That means the new router metrics endpoint can’t be configured when installing via the chart. Consider wiring this env var through chart values (or documenting another supported configuration mechanism) so the feature is actually usable by default installations.

Copilot uses AI. Check for mistakes.
Comment on lines +325 to +326
// Add the queue size as pending request metric
addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external metric is being stored under the hard-coded name "pending_requests", but the actual Prometheus metric you parse is "kthena_router_fairness_queue_size" and autoscaling policies match external metrics by metricTargets keys. If a user configures metricName=kthena_router_fairness_queue_size (as implied by the PR description), the algorithm will never see the external metric. Consider using the Prometheus metric name (mf.GetName()) as the key, or ensuring the policy/controller consistently uses the "pending_requests" name (including validation/docs).

Suggested change
// Add the queue size as pending request metric
addMetric(metrics, "pending_requests", metric.GetGauge().GetValue())
// Store the value under the Prometheus metric name so autoscaling
// policies can match the configured external metric key directly.
addMetric(metrics, mf.GetName(), metric.GetGauge().GetValue())

Copilot uses AI. Check for mistakes.
Comment on lines 92 to +100
desired := metric / target
// Handle scale from zero case
if currentCount == 0 {
// If there are any pending requests, scale up to at least 1
if desired >= 1 {
return getCeilDesiredInstances(desired)
}
return 0
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new scale-from-zero branch changes behavior in a way that can prevent scaling up when there are pending requests but metric/target < 1 (e.g., target=10, metric=1). Previously, currentCount==0 would end up returning ceil(desired) which would scale to 1 for any positive desired. This also doesn’t fully match the PR description (“when there are pending requests … scale up from 0 to 1”). Consider basing the scale-from-zero decision on metric>0 (scale to at least 1), or otherwise clearly documenting/enforcing the intended threshold semantics.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants