feat(autoscaler): support scale from zero for Homogeneous Target#909
feat(autoscaler): support scale from zero for Homogeneous Target#909Abirdcfly wants to merge 1 commit intovolcano-sh:mainfrom
Conversation
1. Add scale from zero logic in recommendation algorithm 2. Support fetch pending request metrics from router endpoint 3. Configure router metrics endpoint via ROUTER_METRICS_ENDPOINT env var 4. Use existing kthena_router_fairness_queue_size metric for pending requests When minReplicas is set to 0 and there are pending requests for the model, autoscaler will automatically scale up from 0 to 1 instance.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces scale-from-zero support by enabling the autoscaler to fetch metrics from a router endpoint when no pods are active. The changes include updates to the metric collection logic and the recommendation algorithm. Review feedback identifies a critical logic error in the scale-up condition that could prevent activation for low request volumes and suggests several performance optimizations, such as reusing an HTTP client and streaming metric data to reduce memory allocations.
| if currentCount == 0 { | ||
| // If there are any pending requests, scale up to at least 1 | ||
| if desired >= 1 { | ||
| return getCeilDesiredInstances(desired) | ||
| } | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
The condition desired >= 1 prevents the autoscaler from scaling up from zero when the number of pending requests is less than the target value per instance. For example, if the target is 10 and there is 1 pending request, desired is 0.1, and the code will return 0, leaving the system at zero replicas. To ensure the autoscaler scales up for any pending request, the check should be performed on the raw metric value. Since getCeilDesiredInstances returns at least 1 for any value > 0, this will correctly scale to 1 pod.
| if currentCount == 0 { | |
| // If there are any pending requests, scale up to at least 1 | |
| if desired >= 1 { | |
| return getCeilDesiredInstances(desired) | |
| } | |
| return 0 | |
| } | |
| if currentCount == 0 { | |
| // If there are any pending requests, scale up to at least 1 | |
| if metric > 0 { | |
| return getCeilDesiredInstances(desired) | |
| } | |
| return 0 | |
| } |
| // RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero | ||
| // Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics | ||
| RouterMetricsEndpoint string |
There was a problem hiding this comment.
It is recommended to add a shared *http.Client to the MetricCollector struct. Creating a new http.Client for every metrics collection cycle (as seen in fetchMetricsFromRouter) is inefficient because it prevents TCP/TLS connection reuse and increases resource consumption.
// RouterMetricsEndpoint is the optional endpoint of the router's metrics endpoint for scale-from-zero
// Format: http://<router-service-name>.<namespace>.svc.cluster.local:<port>/metrics
RouterMetricsEndpoint string
httpClient *http.Client| WatchMetricList: util.ExtractKeysToSet(metricTargets), | ||
| MetricTargets: metricTargets, | ||
| WatchMetricList: util.ExtractKeysToSet(metricTargets), | ||
| RouterMetricsEndpoint: routerMetricsEndpoint, |
There was a problem hiding this comment.
| client := &http.Client{ | ||
| Timeout: util.AutoscaleCtxTimeoutSeconds * time.Second, | ||
| } |
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return metrics, err | ||
| } | ||
|
|
||
| // Parse Prometheus metrics | ||
| decoder := expfmt.NewDecoder(strings.NewReader(string(body)), expfmt.NewFormat(expfmt.TypeTextPlain)) |
There was a problem hiding this comment.
Instead of reading the entire response body into memory with io.ReadAll, you can pass resp.Body directly to the Prometheus decoder. This reduces memory allocations and improves performance by streaming the response content.
// Parse Prometheus metrics
decoder := expfmt.NewDecoder(resp.Body, expfmt.NewFormat(expfmt.TypeTextPlain))There was a problem hiding this comment.
Pull request overview
Adds “scale from zero” support for homogeneous targets by allowing the autoscaler to use router-provided pending-request metrics when there are no backend pods to scrape.
Changes:
- Extend
MetricCollector.UpdateMetricsto return external metrics, and plumb them throughScaler.Scale. - Fetch
kthena-routerPrometheus metrics from an optionalROUTER_METRICS_ENDPOINTwhen there are zero pods. - Add scale-from-zero logic for external metrics in the recommendation algorithm.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/autoscaler/autoscaler/scaler.go | Passes external metrics from the collector into the recommendation algorithm. |
| pkg/autoscaler/autoscaler/optimizer.go | Updates UpdateMetrics call sites for the new return signature. |
| pkg/autoscaler/autoscaler/metric_collector.go | Adds router metrics endpoint config + external metrics fetching/parsing when no pods exist. |
| pkg/autoscaler/algorithm/recommendation.go | Introduces special-case behavior for external metrics when currentCount == 0. |
Comments suppressed due to low confidence (1)
pkg/autoscaler/algorithm/recommendation.go:105
- This new scale-from-zero behavior for external metrics isn’t covered by unit tests. Since pkg/autoscaler/algorithm/recommendation_test.go already has extensive coverage for external metrics, please add cases where CurrentInstancesCount==0 and external metric is (a) 0 and (b) >0 to lock in the expected scale-from-zero semantics.
// Handle scale from zero case
if currentCount == 0 {
// If there are any pending requests, scale up to at least 1
if desired >= 1 {
return getCeilDesiredInstances(desired)
}
return 0
}
ratio := desired / float64(currentCount)
if math.Abs(ratio-1.0) <= tolerance {
return currentCount
}
return getCeilDesiredInstances(desired)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func NewMetricCollector(target *v1alpha1.Target, binding *v1alpha1.AutoscalingPolicyBinding, metricTargets map[string]float64) *MetricCollector { | ||
| // Get router metrics endpoint from environment variable if set | ||
| routerMetricsEndpoint := os.Getenv("ROUTER_METRICS_ENDPOINT") | ||
|
|
||
| return &MetricCollector{ | ||
| PastHistograms: datastructure.NewSnapshotSlidingWindow[map[string]HistogramInfo](util.SecondToTimestamp(util.SloQuantileSlidingWindowSeconds), util.SecondToTimestamp(util.SloQuantileDataKeepSeconds)), | ||
| Target: target, | ||
| Scope: Scope{ | ||
| Namespace: binding.Namespace, | ||
| OwnedBindingId: binding.UID, | ||
| }, | ||
| MetricTargets: metricTargets, | ||
| WatchMetricList: util.ExtractKeysToSet(metricTargets), | ||
| MetricTargets: metricTargets, | ||
| WatchMetricList: util.ExtractKeysToSet(metricTargets), | ||
| RouterMetricsEndpoint: routerMetricsEndpoint, | ||
| } |
There was a problem hiding this comment.
ROUTER_METRICS_ENDPOINT is introduced as the configuration knob here, but the Helm chart’s controller-manager Deployment template currently only sets POD_NAMESPACE and doesn’t expose a way to set additional env vars. That means the new router metrics endpoint can’t be configured when installing via the chart. Consider wiring this env var through chart values (or documenting another supported configuration mechanism) so the feature is actually usable by default installations.
| // Add the queue size as pending request metric | ||
| addMetric(metrics, "pending_requests", metric.GetGauge().GetValue()) |
There was a problem hiding this comment.
The external metric is being stored under the hard-coded name "pending_requests", but the actual Prometheus metric you parse is "kthena_router_fairness_queue_size" and autoscaling policies match external metrics by metricTargets keys. If a user configures metricName=kthena_router_fairness_queue_size (as implied by the PR description), the algorithm will never see the external metric. Consider using the Prometheus metric name (mf.GetName()) as the key, or ensuring the policy/controller consistently uses the "pending_requests" name (including validation/docs).
| // Add the queue size as pending request metric | |
| addMetric(metrics, "pending_requests", metric.GetGauge().GetValue()) | |
| // Store the value under the Prometheus metric name so autoscaling | |
| // policies can match the configured external metric key directly. | |
| addMetric(metrics, mf.GetName(), metric.GetGauge().GetValue()) |
| desired := metric / target | ||
| // Handle scale from zero case | ||
| if currentCount == 0 { | ||
| // If there are any pending requests, scale up to at least 1 | ||
| if desired >= 1 { | ||
| return getCeilDesiredInstances(desired) | ||
| } | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
The new scale-from-zero branch changes behavior in a way that can prevent scaling up when there are pending requests but metric/target < 1 (e.g., target=10, metric=1). Previously, currentCount==0 would end up returning ceil(desired) which would scale to 1 for any positive desired. This also doesn’t fully match the PR description (“when there are pending requests … scale up from 0 to 1”). Consider basing the scale-from-zero decision on metric>0 (scale to at least 1), or otherwise clearly documenting/enforcing the intended threshold semantics.
What type of PR is this?
/kind feature
What this PR does / why we need it:
When minReplicas is set to 0 and there are pending requests for the model, autoscaler will automatically scale up from 0 to 1 instance.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: