[Draft] Phase 1: PD-aware autoscaling (soft role coordination) by WHOIM1205 · Pull Request #948 · volcano-sh/kthena

WHOIM1205 · 2026-05-01T10:30:48Z

Phase 1: PD-aware autoscaling (soft role coordination)

This PR introduces an optional coordination mechanism for HeterogeneousTarget that allows the optimizer to bias replica distribution across roles (e.g. prefill / decode) using observed pressure signals, while preserving the existing cost-based behavior.

What this adds

New optional coordination field on HeterogeneousTarget
Mode: Off (default) | Preferred
PreferredRatio: soft per-role min/max percentage band

When enabled (mode: Preferred), the optimizer:

Starts from the existing cost-based allocation
Computes a pressure-based share using available metrics:
- queue_depth
- kv_cache_utilization
- ttft
Blends cost and pressure (alpha = 0.5)
Clips to the configured preferred ratio band
Renormalizes and converts to integer replicas (largest remainder)
Clamps to per-target minReplicas / maxReplicas

Key properties

Fully opt-in — default behavior is unchanged
No new control loop or CRD changes beyond the optional field
Panic mode bypasses coordination (treated as Off)
Missing or invalid pressure signals fall back to cost-only behavior
Infeasible bands are handled via best-effort allocation (drift allowed, no hard failure)

Implementation scope

API:
- Coordination, CoordinationMode, RoleRange
Optimizer:
- Post-processing step after cost-based allocation
- Helpers:
  - costShareFromBaseline
  - pressureShareNormalized
  - roundSharesToReplicas
Validation:
- Ensures valid ranges and matching role names

Notes for reviewers

This is a draft PR intended to validate the approach and integration point before adding tests and further polish.
Pressure signals are currently a small fixed set (Phase 1 decision to keep scope minimal).
Largest-remainder rounding is made deterministic via stable sorting.

Required before merge

make generate must be run so that zz_generated.deepcopy.go includes the new Coordination field.

Until regeneration:

HeterogeneousTarget.DeepCopyInto does not deep-copy the Coordination pointer
This can cause aliasing in copied objects

This is intentional for the draft to keep the diff minimal and reviewable.

Follow-ups (not part of this PR)

Unit tests for coordination logic
Configurable alpha / metrics (Phase 2)
Scheduler-side improvements (future work)

Goal of this PR

Validate:

API shape
Optimizer integration
Overall direction of PD-aware coordination

before proceeding to Phase 2 and test coverage.

Adds optional Coordination block on HeterogeneousTarget to softly bias replica allocation across roles using a cost+pressure blend clipped to a preferred ratio band. No-op when Mode is Off or unset; bypassed in panic mode. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

volcano-sh-bot · 2026-05-01T10:30:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces PD-aware coordination for heterogeneous targets, enabling the autoscaler to distribute replicas based on pressure signals and preferred ratio bands. The implementation includes new API types, deep-copy methods, and logic in the optimizer to blend cost and pressure shares. Feedback highlights an issue in the replica rounding logic that can cause total count drift and suggests additional validation to ensure the sum of minimum preferred ratios does not exceed 100%.

gemini-code-assist · 2026-05-01T10:32:53Z

+	for _, nm := range names {
+		if out[nm] < minReps[nm] {
+			out[nm] = minReps[nm]
+		} else if out[nm] > maxReps[nm] {
+			out[nm] = maxReps[nm]
+		}
+	}


The current implementation of roundSharesToReplicas applies clamping to minReps and maxReps after the total number of replicas has been distributed. This can lead to a final replica map where the sum of replicas does not match the total (the recommended instances). Since total is guaranteed to be within [sum(minReps), sum(maxReps)], a valid distribution always exists. To avoid this "drift", the distribution logic should be aware of the per-parameter bounds during the allocation process rather than as a post-processing step. This is important to ensure the autoscaler respects the total capacity recommended by the algorithm.

gemini-code-assist · 2026-05-01T10:32:53Z

+	for name, r := range coord.PreferredRatio {
+		entry := ratioPath.Key(name)
+		if r.Min < 0 || r.Min > 100 {
+			allErrs = append(allErrs, field.Invalid(entry.Child("min"), r.Min, "must be between 0 and 100"))
+		}
+		if r.Max < 0 || r.Max > 100 {
+			allErrs = append(allErrs, field.Invalid(entry.Child("max"), r.Max, "must be between 0 and 100"))
+		}
+		if r.Min > r.Max {
+			allErrs = append(allErrs, field.Invalid(entry, r, "min must be <= max"))
+		}
+		if _, ok := validNames[name]; !ok {
+			allErrs = append(allErrs, field.Invalid(entry, name, "must match a heterogeneousTarget.params[].target.targetRef.name"))
+		}
+	}


The validation for preferredRatio should ensure that the sum of the minimum percentages across all roles does not exceed 100%. If sum(Min) > 100, the renormalization logic in applyCoordination will scale down the shares, potentially resulting in an allocation that violates the user-specified minimum ratios and leading to unexpected behavior.

var sumMin int32 for name, r := range coord.PreferredRatio { entry := ratioPath.Key(name) if r.Min < 0 || r.Min > 100 { allErrs = append(allErrs, field.Invalid(entry.Child("min"), r.Min, "must be between 0 and 100")) } if r.Max < 0 || r.Max > 100 { allErrs = append(allErrs, field.Invalid(entry.Child("max"), r.Max, "must be between 0 and 100")) } if r.Min > r.Max { allErrs = append(allErrs, field.Invalid(entry, r, "min must be <= max")) } if _, ok := validNames[name]; !ok { allErrs = append(allErrs, field.Invalid(entry, name, "must match a heterogeneousTarget.params[].target.targetRef.name")) } sumMin += r.Min } if sumMin > 100 { allErrs = append(allErrs, field.Invalid(ratioPath, coord.PreferredRatio, "sum of min ratios must not exceed 100")) }

Copilot

Pull request overview

This PR introduces an optional “PD-aware coordination” mechanism for heterogeneous autoscaling, adding an API surface to configure coordination and implementing an optimizer-side biasing step that blends cost-based allocation with per-role pressure signals, with optional preferred-ratio bands and panic-mode bypass.

Changes:

Add coordination configuration under HeterogeneousTarget (mode + preferred ratio bands).
Validate coordination configuration in the AutoscalingPolicyBinding admission validator.
Apply coordination biasing in the autoscaler optimizer when enabled (and not in panic mode).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
pkg/model-booster-controller/webhook/autoscaling_binding_validator.go	Adds admission-time validation for the new coordination block and preferred ratio constraints.
pkg/autoscaler/autoscaler/optimizer.go	Implements Phase 1 coordination blending (cost + pressure) and preferred-ratio clipping during replica distribution.
pkg/apis/workload/v1alpha1/coordination_deepcopy.go	Adds temporary hand-written DeepCopy implementations for new coordination types (intended to be replaced by generated code).
pkg/apis/workload/v1alpha1/autoscalingpolicybinding_types.go	Extends the API types to include `Coordination` configuration on `HeterogeneousTarget`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-05-01T10:37:37Z

+	// Build the set of valid role names from existing params so we can flag typos.
+	validNames := make(map[string]struct{}, len(asp_binding.Spec.HeterogeneousTarget.Params))
+	for _, p := range asp_binding.Spec.HeterogeneousTarget.Params {
+		validNames[p.Target.TargetRef.Name] = struct{}{}
+	}


validateCoordination builds validNames from heterogeneousTarget.params[].target.targetRef.name but doesn’t validate that those names are non-empty and unique. Because coordination (and other optimizer code paths) key maps by name, empty/duplicate names can silently overwrite entries and make preferredRatio validation unreliable. Please add validation to require each param’s targetRef.name be set and unique (at least when coordination is present/enabled).

Copilot · 2026-05-01T10:37:37Z

+// Hand-written DeepCopy implementations for the Phase 1 PD-aware
+// coordination types.
+//
+// IMPORTANT (reviewers): zz_generated.deepcopy.go is intentionally left
+// untouched. Before merging, run `make generate` so controller-gen
+// regenerates that file — this will (a) duplicate the methods below into
+// zz_generated.deepcopy.go and (b) extend HeterogeneousTarget.DeepCopyInto
+// to deep-copy the new Coordination pointer. After regeneration, this file
+// can be deleted.


The API types were updated to add Coordination, but the generated deepcopy file (zz_generated.deepcopy.go) hasn’t been regenerated, so AutoscalingPolicyBinding/HeterogeneousTarget DeepCopy* currently won’t deep-copy the new Coordination pointer. This can lead to shared pointers/maps across copies. Also, keeping these hand-written DeepCopy methods in-tree will cause duplicate-method compile errors the next time make generate runs. Please run code generation and commit the updated zz_generated.deepcopy.go (including deep-copying HeterogeneousTarget.Coordination), then remove this temporary file.

Copilot · 2026-05-01T10:37:37Z

+	// Convert to integers via largest-remainder so the sum is preserved,
+	// then clamp to each param's Min/Max. Drift after clamping is allowed.
+	return roundSharesToReplicas(names, raw, total, minReps, maxReps)
+}


roundSharesToReplicas clamps each role to its [min,max] after rounding but then accepts total drift. That means the returned replicasMap can sum to a value different from recommendedInstances, which breaks the optimizer’s implied invariant (only distribution changes, not the total) and desynchronizes Status history (it records the corrected total, not the applied total). Please adjust the post-clamp result to preserve the requested total (e.g., redistribute the delta across roles with remaining headroom/room-down while respecting min/max), or avoid clamping by ensuring the rounding step already respects min/max.

Copilot · 2026-05-01T10:37:38Z

+// pressureMetricNames are the Phase 1 placeholder signals the optimizer
+// consults to estimate per-role load. They are read by name from whatever
+// the configured MetricCollector exposes — there is no hard coupling: a
+// missing metric simply contributes zero. Phase 2 will let users configure
+// these per role; for now keeping them as a small constant set keeps the
+// blast radius small and the diff reviewable.
+var pressureMetricNames = []string{
+	"queue_depth",          // prefill + decode: pending requests
+	"kv_cache_utilization", // decode: KV-cache pressure (typically 0..1)
+	"ttft",                 // prefill: time-to-first-token (seconds)
+}


The comment says these pressure signals are read from whatever the MetricCollector exposes, but MetricCollector currently only scrapes metrics listed in autoscalePolicy.Spec.Metrics (its WatchMetricList). As a result, coordination will almost always see these metrics as missing unless users also add them to scaling metrics (which will then affect the total recommendation, not just distribution). Consider extending MetricCollector (or the optimizer) to collect a separate/extra metric set for coordination without feeding them into MetricTargets, and/or clarify this requirement in the comment/docs.

Copilot · 2026-05-01T10:37:38Z

+	for _, nm := range names {
+		r, ok := coord.PreferredRatio[nm]
+		if !ok {
+			continue
+		}
+		lo, hi := float64(r.Min)/100.0, float64(r.Max)/100.0
+		if raw[nm] < lo {
+			raw[nm] = lo
+		}
+		if raw[nm] > hi {
+			raw[nm] = hi
+		}


applyCoordination assumes PreferredRatio values are well-formed (0<=Min<=Max<=100). If an older object bypasses webhook validation/CRD validation (or validation is disabled), negative/invalid ranges can push raw shares negative and produce negative replica counts after rounding. Please defensively sanitize/ignore invalid RoleRange entries here (clamp to [0,100] and skip if Min>Max) so the optimizer can’t emit negative replicas even with bad inputs.

Copilot · 2026-05-01T10:37:38Z

 	allErrs = append(allErrs, validateOptimizeAndScalingPolicyExistence(asp_binding)...)
 	allErrs = append(allErrs, v.validateAutoscalingPolicyExistence(ctx, asp_binding)...)
 	allErrs = append(allErrs, validateBindingTargetKind(asp_binding)...)
+	allErrs = append(allErrs, validateCoordination(asp_binding)...)


New coordination validation was added, but there are no unit tests covering it. Since this package already has autoscaling_binding_validator_test.go, please add cases for invalid coordination.mode, out-of-range min/max, min>max, and unknown preferredRatio role names to prevent regressions.

Copilot AI review requested due to automatic review settings May 1, 2026 10:30

volcano-sh-bot requested review from YaoZengzeng and git-malu May 1, 2026 10:30

volcano-sh-bot added the size/L label May 1, 2026

Copilot started reviewing on behalf of WHOIM1205 May 1, 2026 10:31 View session

WHOIM1205 changed the title ~~feat(autoscaler): phase 1 PD-aware coordination~~ [Draft] Phase 1: PD-aware autoscaling (soft role coordination) May 1, 2026

WHOIM1205 marked this pull request as draft May 1, 2026 10:32

volcano-sh-bot added the do-not-merge/work-in-progress label May 1, 2026

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Copilot AI reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Phase 1: PD-aware autoscaling (soft role coordination)#948

[Draft] Phase 1: PD-aware autoscaling (soft role coordination)#948
WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
WHOIM1205:feat/phase1-pd-aware-autoscaling

WHOIM1205 commented May 1, 2026 •

edited

Loading

Uh oh!

volcano-sh-bot commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Copilot AI May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WHOIM1205 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Phase 1: PD-aware autoscaling (soft role coordination)

What this adds

Key properties

Implementation scope

Notes for reviewers

Required before merge

Follow-ups (not part of this PR)

Goal of this PR

Uh oh!

volcano-sh-bot commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WHOIM1205 commented May 1, 2026 •

edited

Loading