Skip to content

feat: make gangPolicy optional (#448)#959

Open
AyushSriv06 wants to merge 1 commit intovolcano-sh:mainfrom
AyushSriv06:feature/optional-gang-policy
Open

feat: make gangPolicy optional (#448)#959
AyushSriv06 wants to merge 1 commit intovolcano-sh:mainfrom
AyushSriv06:feature/optional-gang-policy

Conversation

@AyushSriv06
Copy link
Copy Markdown

What type of PR is this?

What this PR does / why we need it:
This PR introduces a disableGangScheduling toggle to the ModelBooster API. Previously, gangPolicy was unconditionally injected into generated ModelServing resources. This change allows users to opt-out of gang scheduling for environments where it is not required or supported.

Which issue(s) this PR fixes:
Fixes #448

Copilot AI review requested due to automatic review settings May 8, 2026 11:51
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

Welcome @AyushSriv06! It looks like this is your first PR to volcano-sh/kthena 🎉

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a disableGangScheduling field to the ModelBooster CRD, allowing users to opt-out of gang scheduling for generated ModelServing resources. The changes encompass CRD schema updates, documentation, client-go apply configurations, and the controller logic required to handle the new flag. A review comment suggests using an existing local variable in the conversion logic to improve code consistency and readability.

Comment thread pkg/model-booster-controller/convert/model_serving.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a disableGangScheduling toggle to the ModelBooster backend API to allow generated ModelServing resources to omit spec.template.gangPolicy, enabling opt-out from gang scheduling behavior where it’s not desired/supported.

Changes:

  • Added spec.backend.disableGangScheduling to the ModelBooster CRD/API types and client-go applyconfiguration.
  • Updated BuildModelServing conversion to nil out serving.Spec.Template.GangPolicy when the toggle is enabled, plus added unit coverage.
  • Regenerated/updated CRD reference docs and Helm CRD schema; updated golden expected YAML revisions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/model-booster-controller/convert/model_serving.go Clears GangPolicy in generated ModelServing when disableGangScheduling is enabled.
pkg/model-booster-controller/convert/model_serving_test.go Adds tests validating GangPolicy is removed when disabled and preserved by default.
pkg/apis/workload/v1alpha1/model_booster_types.go Introduces the disableGangScheduling field on ModelBackend.
client-go/applyconfiguration/workload/v1alpha1/modelbackend.go Exposes WithDisableGangScheduling for server-side apply usage.
charts/kthena/charts/workload/crds/workload.serving.volcano.sh_modelboosters.yaml Updates Helm-packaged CRD schema to include disableGangScheduling.
docs/kthena/docs/reference/crd/workload.serving.volcano.sh.md Updates generated CRD reference documentation for the new field.
pkg/model-booster-controller/convert/testdata/expected/model-serving.yaml Updates expected revision label output (golden).
pkg/model-booster-controller/convert/testdata/expected/disaggregated-model-serving.yaml Updates expected revision label output (golden).
pkg/model-booster-controller/convert/testdata/expected/disaggregated-model-serving-mooncake.yaml Updates expected revision label output (golden).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/model-booster-controller/convert/model_serving.go
Comment thread pkg/apis/workload/v1alpha1/model_booster_types.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread pkg/model-booster-controller/convert/model_serving.go
Comment thread pkg/apis/workload/v1alpha1/model_booster_types.go
Comment thread docs/kthena/docs/reference/crd/workload.serving.volcano.sh.md Outdated
Comment thread pkg/model-booster-controller/convert/model_serving_test.go
Signed-off-by: Ayush <ayushsrisks@gmail.com>
@AyushSriv06 AyushSriv06 force-pushed the feature/optional-gang-policy branch from 28b4c8d to 25e464d Compare May 8, 2026 12:46
@LiZhenCheng9527
Copy link
Copy Markdown
Collaborator

In fact, this issue is simply intended to discuss whether this feature is required.

@LiZhenCheng9527
Copy link
Copy Markdown
Collaborator

/cc @hzxuzhonghu
Do you think this skill is necessary?
IMO, Gang scheduling is widely recognised as a necessary capability in LLM scenarios. It should not be disabled.

@AyushSriv06
Copy link
Copy Markdown
Author

/cc @hzxuzhonghu Do you think this skill is necessary? IMO, Gang scheduling is widely recognised as a necessary capability in LLM scenarios. It should not be disabled.

hello thanks for the review this is why i have taken up the issue,
making it optional will help in case of single-node models where gang scheduling just adds overhead, also when scaling independent replicas where partial availability is preferred over waiting for all resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make gangPolicy optional

4 participants