Skip to content

[Draft] Phase 1: PD-aware autoscaling (soft role coordination)#948

Draft
WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
WHOIM1205:feat/phase1-pd-aware-autoscaling
Draft

[Draft] Phase 1: PD-aware autoscaling (soft role coordination)#948
WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
WHOIM1205:feat/phase1-pd-aware-autoscaling

Conversation

@WHOIM1205
Copy link
Copy Markdown
Contributor

@WHOIM1205 WHOIM1205 commented May 1, 2026

Phase 1: PD-aware autoscaling (soft role coordination)

This PR introduces an optional coordination mechanism for HeterogeneousTarget that allows the optimizer to bias replica distribution across roles (e.g. prefill / decode) using observed pressure signals, while preserving the existing cost-based behavior.


What this adds

  • New optional coordination field on HeterogeneousTarget
  • Mode: Off (default) | Preferred
  • PreferredRatio: soft per-role min/max percentage band

When enabled (mode: Preferred), the optimizer:

  1. Starts from the existing cost-based allocation
  2. Computes a pressure-based share using available metrics:
    • queue_depth
    • kv_cache_utilization
    • ttft
  3. Blends cost and pressure (alpha = 0.5)
  4. Clips to the configured preferred ratio band
  5. Renormalizes and converts to integer replicas (largest remainder)
  6. Clamps to per-target minReplicas / maxReplicas

Key properties

  • Fully opt-in — default behavior is unchanged
  • No new control loop or CRD changes beyond the optional field
  • Panic mode bypasses coordination (treated as Off)
  • Missing or invalid pressure signals fall back to cost-only behavior
  • Infeasible bands are handled via best-effort allocation (drift allowed, no hard failure)

Implementation scope

  • API:
    • Coordination, CoordinationMode, RoleRange
  • Optimizer:
    • Post-processing step after cost-based allocation
    • Helpers:
      • costShareFromBaseline
      • pressureShareNormalized
      • roundSharesToReplicas
  • Validation:
    • Ensures valid ranges and matching role names

Notes for reviewers

  • This is a draft PR intended to validate the approach and integration point before adding tests and further polish.

  • Pressure signals are currently a small fixed set (Phase 1 decision to keep scope minimal).

  • Largest-remainder rounding is made deterministic via stable sorting.


Required before merge

make generate must be run so that zz_generated.deepcopy.go includes the new Coordination field.

Until regeneration:

  • HeterogeneousTarget.DeepCopyInto does not deep-copy the Coordination pointer
  • This can cause aliasing in copied objects

This is intentional for the draft to keep the diff minimal and reviewable.


Follow-ups (not part of this PR)

  • Unit tests for coordination logic
  • Configurable alpha / metrics (Phase 2)
  • Scheduler-side improvements (future work)

Goal of this PR

Validate:

  • API shape
  • Optimizer integration
  • Overall direction of PD-aware coordination

before proceeding to Phase 2 and test coverage.

Adds optional Coordination block on HeterogeneousTarget to softly bias replica allocation across roles using a cost+pressure blend clipped to a preferred ratio band. No-op when Mode is Off or unset; bypassed in panic mode.

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Copilot AI review requested due to automatic review settings May 1, 2026 10:30
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@WHOIM1205 WHOIM1205 changed the title feat(autoscaler): phase 1 PD-aware coordination [Draft] Phase 1: PD-aware autoscaling (soft role coordination) May 1, 2026
@WHOIM1205 WHOIM1205 marked this pull request as draft May 1, 2026 10:32
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces PD-aware coordination for heterogeneous targets, enabling the autoscaler to distribute replicas based on pressure signals and preferred ratio bands. The implementation includes new API types, deep-copy methods, and logic in the optimizer to blend cost and pressure shares. Feedback highlights an issue in the replica rounding logic that can cause total count drift and suggests additional validation to ensure the sum of minimum preferred ratios does not exceed 100%.

Comment on lines +420 to +426
for _, nm := range names {
if out[nm] < minReps[nm] {
out[nm] = minReps[nm]
} else if out[nm] > maxReps[nm] {
out[nm] = maxReps[nm]
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of roundSharesToReplicas applies clamping to minReps and maxReps after the total number of replicas has been distributed. This can lead to a final replica map where the sum of replicas does not match the total (the recommended instances). Since total is guaranteed to be within [sum(minReps), sum(maxReps)], a valid distribution always exists. To avoid this "drift", the distribution logic should be aware of the per-parameter bounds during the allocation process rather than as a post-processing step. This is important to ensure the autoscaler respects the total capacity recommended by the algorithm.

Comment on lines +190 to +204
for name, r := range coord.PreferredRatio {
entry := ratioPath.Key(name)
if r.Min < 0 || r.Min > 100 {
allErrs = append(allErrs, field.Invalid(entry.Child("min"), r.Min, "must be between 0 and 100"))
}
if r.Max < 0 || r.Max > 100 {
allErrs = append(allErrs, field.Invalid(entry.Child("max"), r.Max, "must be between 0 and 100"))
}
if r.Min > r.Max {
allErrs = append(allErrs, field.Invalid(entry, r, "min must be <= max"))
}
if _, ok := validNames[name]; !ok {
allErrs = append(allErrs, field.Invalid(entry, name, "must match a heterogeneousTarget.params[].target.targetRef.name"))
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation for preferredRatio should ensure that the sum of the minimum percentages across all roles does not exceed 100%. If sum(Min) > 100, the renormalization logic in applyCoordination will scale down the shares, potentially resulting in an allocation that violates the user-specified minimum ratios and leading to unexpected behavior.

	var sumMin int32
	for name, r := range coord.PreferredRatio {
		entry := ratioPath.Key(name)
		if r.Min < 0 || r.Min > 100 {
			allErrs = append(allErrs, field.Invalid(entry.Child("min"), r.Min, "must be between 0 and 100"))
		}
		if r.Max < 0 || r.Max > 100 {
			allErrs = append(allErrs, field.Invalid(entry.Child("max"), r.Max, "must be between 0 and 100"))
		}
		if r.Min > r.Max {
			allErrs = append(allErrs, field.Invalid(entry, r, "min must be <= max"))
		}
		if _, ok := validNames[name]; !ok {
			allErrs = append(allErrs, field.Invalid(entry, name, "must match a heterogeneousTarget.params[].target.targetRef.name"))
		}
		sumMin += r.Min
	}
	if sumMin > 100 {
		allErrs = append(allErrs, field.Invalid(ratioPath, coord.PreferredRatio, "sum of min ratios must not exceed 100"))
	}

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional “PD-aware coordination” mechanism for heterogeneous autoscaling, adding an API surface to configure coordination and implementing an optimizer-side biasing step that blends cost-based allocation with per-role pressure signals, with optional preferred-ratio bands and panic-mode bypass.

Changes:

  • Add coordination configuration under HeterogeneousTarget (mode + preferred ratio bands).
  • Validate coordination configuration in the AutoscalingPolicyBinding admission validator.
  • Apply coordination biasing in the autoscaler optimizer when enabled (and not in panic mode).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
pkg/model-booster-controller/webhook/autoscaling_binding_validator.go Adds admission-time validation for the new coordination block and preferred ratio constraints.
pkg/autoscaler/autoscaler/optimizer.go Implements Phase 1 coordination blending (cost + pressure) and preferred-ratio clipping during replica distribution.
pkg/apis/workload/v1alpha1/coordination_deepcopy.go Adds temporary hand-written DeepCopy implementations for new coordination types (intended to be replaced by generated code).
pkg/apis/workload/v1alpha1/autoscalingpolicybinding_types.go Extends the API types to include Coordination configuration on HeterogeneousTarget.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +183 to +187
// Build the set of valid role names from existing params so we can flag typos.
validNames := make(map[string]struct{}, len(asp_binding.Spec.HeterogeneousTarget.Params))
for _, p := range asp_binding.Spec.HeterogeneousTarget.Params {
validNames[p.Target.TargetRef.Name] = struct{}{}
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCoordination builds validNames from heterogeneousTarget.params[].target.targetRef.name but doesn’t validate that those names are non-empty and unique. Because coordination (and other optimizer code paths) key maps by name, empty/duplicate names can silently overwrite entries and make preferredRatio validation unreliable. Please add validation to require each param’s targetRef.name be set and unique (at least when coordination is present/enabled).

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +27
// Hand-written DeepCopy implementations for the Phase 1 PD-aware
// coordination types.
//
// IMPORTANT (reviewers): zz_generated.deepcopy.go is intentionally left
// untouched. Before merging, run `make generate` so controller-gen
// regenerates that file — this will (a) duplicate the methods below into
// zz_generated.deepcopy.go and (b) extend HeterogeneousTarget.DeepCopyInto
// to deep-copy the new Coordination pointer. After regeneration, this file
// can be deleted.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API types were updated to add Coordination, but the generated deepcopy file (zz_generated.deepcopy.go) hasn’t been regenerated, so AutoscalingPolicyBinding/HeterogeneousTarget DeepCopy* currently won’t deep-copy the new Coordination pointer. This can lead to shared pointers/maps across copies. Also, keeping these hand-written DeepCopy methods in-tree will cause duplicate-method compile errors the next time make generate runs. Please run code generation and commit the updated zz_generated.deepcopy.go (including deep-copying HeterogeneousTarget.Coordination), then remove this temporary file.

Copilot uses AI. Check for mistakes.
Comment on lines +309 to +312
// Convert to integers via largest-remainder so the sum is preserved,
// then clamp to each param's Min/Max. Drift after clamping is allowed.
return roundSharesToReplicas(names, raw, total, minReps, maxReps)
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roundSharesToReplicas clamps each role to its [min,max] after rounding but then accepts total drift. That means the returned replicasMap can sum to a value different from recommendedInstances, which breaks the optimizer’s implied invariant (only distribution changes, not the total) and desynchronizes Status history (it records the corrected total, not the applied total). Please adjust the post-clamp result to preserve the requested total (e.g., redistribute the delta across roles with remaining headroom/room-down while respecting min/max), or avoid clamping by ensuring the rounding step already respects min/max.

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +234
// pressureMetricNames are the Phase 1 placeholder signals the optimizer
// consults to estimate per-role load. They are read by name from whatever
// the configured MetricCollector exposes — there is no hard coupling: a
// missing metric simply contributes zero. Phase 2 will let users configure
// these per role; for now keeping them as a small constant set keeps the
// blast radius small and the diff reviewable.
var pressureMetricNames = []string{
"queue_depth", // prefill + decode: pending requests
"kv_cache_utilization", // decode: KV-cache pressure (typically 0..1)
"ttft", // prefill: time-to-first-token (seconds)
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says these pressure signals are read from whatever the MetricCollector exposes, but MetricCollector currently only scrapes metrics listed in autoscalePolicy.Spec.Metrics (its WatchMetricList). As a result, coordination will almost always see these metrics as missing unless users also add them to scaling metrics (which will then affect the total recommendation, not just distribution). Consider extending MetricCollector (or the optimizer) to collect a separate/extra metric set for coordination without feeding them into MetricTargets, and/or clarify this requirement in the comment/docs.

Copilot uses AI. Check for mistakes.
Comment on lines +285 to +296
for _, nm := range names {
r, ok := coord.PreferredRatio[nm]
if !ok {
continue
}
lo, hi := float64(r.Min)/100.0, float64(r.Max)/100.0
if raw[nm] < lo {
raw[nm] = lo
}
if raw[nm] > hi {
raw[nm] = hi
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyCoordination assumes PreferredRatio values are well-formed (0<=Min<=Max<=100). If an older object bypasses webhook validation/CRD validation (or validation is disabled), negative/invalid ranges can push raw shares negative and produce negative replica counts after rounding. Please defensively sanitize/ignore invalid RoleRange entries here (clamp to [0,100] and skip if Min>Max) so the optimizer can’t emit negative replicas even with bad inputs.

Copilot uses AI. Check for mistakes.
allErrs = append(allErrs, validateOptimizeAndScalingPolicyExistence(asp_binding)...)
allErrs = append(allErrs, v.validateAutoscalingPolicyExistence(ctx, asp_binding)...)
allErrs = append(allErrs, validateBindingTargetKind(asp_binding)...)
allErrs = append(allErrs, validateCoordination(asp_binding)...)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New coordination validation was added, but there are no unit tests covering it. Since this package already has autoscaling_binding_validator_test.go, please add cases for invalid coordination.mode, out-of-range min/max, min>max, and unknown preferredRatio role names to prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants