-
Notifications
You must be signed in to change notification settings - Fork 110
docs: add user-guide for scheduling gates queue admission #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
volcano-sh-bot
merged 2 commits into
volcano-sh:master
from
devzizu:feat/scheduling-gates-queue-admission
May 6, 2026
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
142 changes: 142 additions & 0 deletions
142
content/en/docs/user_guide_how_to_use_scheduling_gates_queue_admission.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| +++ | ||
| title = "Scheduling Gates Queue Admission User Guide" | ||
| date = 2026-05-04 | ||
| type = "docs" | ||
| weight = 50 | ||
| url = "/en/docs/user-guide/how_to_use_scheduling_gates_queue_admission/" | ||
| [menu.docs] | ||
| parent = "user-guide" | ||
| +++ | ||
|
|
||
| ## Overview | ||
|
|
||
| This page describes how to enable and use the `SchedulingGatesQueueAdmission` feature to prevent cluster autoscalers (such as [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://karpenter.sh/)) from triggering unnecessary scale-ups when pods are waiting for Volcano queue capacity. | ||
|
|
||
|
|
||
|
|
||
| ## Problem | ||
|
|
||
| Volcano marks pods as `Unschedulable` for any allocation failure, whether it's due to insufficient cluster resources (where autoscaling is appropriate) or queue capacity limits (where autoscaling is not needed). Cluster autoscalers cannot distinguish between these scenarios, causing unnecessary node scale-ups. | ||
|
|
||
| The problem is described in detail in the [design document](https://github.com/volcano-sh/volcano/blob/master/docs/design/scheduling-gates-queue-admission.md). | ||
|
|
||
| ## Solution | ||
|
|
||
| This feature uses Kubernetes [`schedulingGates`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) to hold pods until the queue has capacity. While gated, pods are invisible to autoscalers. The gate is removed only after the queue capacity check passes and if the pod then cannot be scheduled due to missing nodes, it is marked as Unschedulable, allowing autoscalers to respond correctly. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Volcano v1.15+ with the `SchedulingGatesQueueAdmission` feature gate enabled. | ||
| - The `capacity` plugin configured in the scheduler (the feature is implemented in the capacity plugin and [will soon be integrated in `proportion` as well](https://github.com/volcano-sh/volcano/issues/5271)). | ||
|
|
||
| ## 1. Enable the Feature Gate | ||
|
|
||
| The feature is Alpha and disabled by default. Enable it on both the **scheduler** and **webhook-manager**. | ||
|
|
||
| ### Using Helm | ||
|
|
||
| ```bash | ||
| helm install volcano volcano/volcano --namespace volcano-system --create-namespace \ | ||
| --set custom.scheduler_feature_gates="SchedulingGatesQueueAdmission=true" \ | ||
| --set custom.admission_feature_gates="SchedulingGatesQueueAdmission=true" | ||
| ``` | ||
|
|
||
| ### Using kubectl apply | ||
|
|
||
| Add the following flag to both the `volcano-scheduler` and `volcano-admission` deployments: | ||
|
|
||
| ```yaml | ||
| --feature-gates=SchedulingGatesQueueAdmission=true | ||
| ``` | ||
|
|
||
| Optionally, configure the number of async gate removal workers (default: `5`): | ||
|
|
||
| ```yaml | ||
| --gate-removal-worker-num=10 | ||
| ``` | ||
|
|
||
| These workers asynchronously process gate removals — each worker picks up a pod whose queue capacity check has passed and removes its scheduling gate, allowing the pod to proceed to scheduling. Increasing this number can help throughput when many pods are being ungated concurrently. | ||
|
|
||
| ## 2. Configure the Capacity Plugin | ||
|
|
||
| Ensure the `capacity` plugin is enabled in your scheduler configuration. The reserved resource tracking that prevents race conditions between gate removal and pod allocation is implemented in this plugin. | ||
|
|
||
| Example scheduler configuration: | ||
|
|
||
| ```yaml | ||
| actions: "enqueue, allocate, backfill" | ||
| tiers: | ||
| - plugins: | ||
| - name: priority | ||
| - name: gang | ||
| - plugins: | ||
| - name: predicates | ||
| - name: capacity | ||
| - name: nodeorder | ||
| ``` | ||
|
|
||
| ## 3. Opt-in Pods | ||
|
|
||
| The feature is opt-in per pod, and **one can start using it by adding the following annotation to pods** that should use gate-controlled queue admission: | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: my-pod | ||
| annotations: | ||
| # Opt-in annotation | ||
| scheduling.volcano.sh/queue-allocation-gate: "true" | ||
| spec: | ||
| schedulerName: volcano | ||
| containers: | ||
| - name: worker | ||
| image: nginx | ||
| resources: | ||
| requests: | ||
| cpu: "1" | ||
| memory: "1Gi" | ||
| ``` | ||
|
|
||
| When this pod is created: | ||
|
|
||
| 1. The Volcano webhook injects a `scheduling.volcano.sh/queue-allocation-gate` scheduling gate. | ||
| 2. The pod stays gated (invisible to autoscalers) until the queue has capacity. | ||
| 3. Once capacity is available, the scheduler removes the gate. | ||
| 4. If the pod can be placed on a node, it gets scheduled normally. | ||
| 5. If no node matches (e.g., needs a specific node type), it gets marked `Unschedulable`, correctly triggering the autoscaler. | ||
|
|
||
| ## 4. Verify the Feature is Working | ||
|
|
||
| After creating an opted-in pod, verify the gate was injected by the mutation webhook: | ||
|
|
||
| ```bash | ||
| kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}' | ||
| ``` | ||
|
|
||
| Expected output (while waiting for queue capacity): | ||
|
|
||
| ```json | ||
| [{"name":"scheduling.volcano.sh/queue-allocation-gate"}] | ||
| ``` | ||
|
|
||
| Once the queue has capacity and the scheduler removes the gate, the field will be empty: | ||
|
|
||
| ```bash | ||
| kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}' | ||
| # empty output | ||
| ``` | ||
|
|
||
| ## Interaction with other Scheduling Gates | ||
|
|
||
| If a pod has additional scheduling gates from other controllers (*e.g.*, `example.com/my-gate`), Volcano will not remove its gate until the pod has **only the Volcano gate remaining**. This ensures Volcano does not interfere with other gate controllers and avoids reserving queue capacity for pods that are still blocked by external dependencies. | ||
|
|
||
| ## Limitations | ||
|
|
||
| - Once a pod's gate is removed, it reserves queue capacity until it is scheduled or deleted. If the pod remains unschedulable (*e.g.*, waiting for the autoscaler to add nodes), it continues to hold queue capacity, potentially blocking other pods. Additionally, the feature currently **does not implement a timeout** for reserved capacity. Operators should be aware that pods that have been ungated but remain unschedulable can hold queue capacity indefinitely. | ||
| - The feature is **only implemented in the `capacity` plugin**. Users relying on the `proportion` plugin for queue resource management will still face false autoscaler scale-ups, as the scheduling gates mechanism is not yet integrated with `proportion`. Tracking issue: [#5271](https://github.com/volcano-sh/volcano/issues/5271). | ||
|
|
||
| ## Related | ||
|
|
||
| - [Design document](https://github.com/volcano-sh/volcano/blob/master/docs/design/scheduling-gates-queue-admission.md) | ||
| - [Kubernetes Pod Scheduling Readiness](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.