diff --git a/content/en/docs/user_guide.md b/content/en/docs/user_guide.md index c3da0db6..1129ebbc 100644 --- a/content/en/docs/user_guide.md +++ b/content/en/docs/user_guide.md @@ -33,3 +33,4 @@ This section contains the User Guide for Volcano. * [Volcano Job Plugin -- SSH User Guide](/en/docs/user-guide/how_to_use_ssh_plugin/) * [Volcano Job Plugin -- SVC User Guide](/en/docs/user-guide/how_to_use_svc_plugin/) * [Volcano vGPU User Guide](/en/docs/user-guide/how_to_use_volcano_vgpu/) +* [Scheduling Gates Queue Admission User Guide](/en/docs/user-guide/how_to_use_scheduling_gates_queue_admission/) diff --git a/content/en/docs/user_guide_how_to_use_scheduling_gates_queue_admission.md b/content/en/docs/user_guide_how_to_use_scheduling_gates_queue_admission.md new file mode 100644 index 00000000..e680efe8 --- /dev/null +++ b/content/en/docs/user_guide_how_to_use_scheduling_gates_queue_admission.md @@ -0,0 +1,142 @@ ++++ +title = "Scheduling Gates Queue Admission User Guide" +date = 2026-05-04 +type = "docs" +weight = 50 +url = "/en/docs/user-guide/how_to_use_scheduling_gates_queue_admission/" +[menu.docs] + parent = "user-guide" ++++ + +## Overview + +This page describes how to enable and use the `SchedulingGatesQueueAdmission` feature to prevent cluster autoscalers (such as [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://karpenter.sh/)) from triggering unnecessary scale-ups when pods are waiting for Volcano queue capacity. + + + +## Problem + +Volcano marks pods as `Unschedulable` for any allocation failure, whether it's due to insufficient cluster resources (where autoscaling is appropriate) or queue capacity limits (where autoscaling is not needed). Cluster autoscalers cannot distinguish between these scenarios, causing unnecessary node scale-ups. + +The problem is described in detail in the [design document](https://github.com/volcano-sh/volcano/blob/master/docs/design/scheduling-gates-queue-admission.md). + +## Solution + +This feature uses Kubernetes [`schedulingGates`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) to hold pods until the queue has capacity. While gated, pods are invisible to autoscalers. The gate is removed only after the queue capacity check passes and if the pod then cannot be scheduled due to missing nodes, it is marked as Unschedulable, allowing autoscalers to respond correctly. + +## Prerequisites + +- Volcano v1.15+ with the `SchedulingGatesQueueAdmission` feature gate enabled. +- The `capacity` plugin configured in the scheduler (the feature is implemented in the capacity plugin and [will soon be integrated in `proportion` as well](https://github.com/volcano-sh/volcano/issues/5271)). + +## 1. Enable the Feature Gate + +The feature is Alpha and disabled by default. Enable it on both the **scheduler** and **webhook-manager**. + +### Using Helm + +```bash +helm install volcano volcano/volcano --namespace volcano-system --create-namespace \ + --set custom.scheduler_feature_gates="SchedulingGatesQueueAdmission=true" \ + --set custom.admission_feature_gates="SchedulingGatesQueueAdmission=true" +``` + +### Using kubectl apply + +Add the following flag to both the `volcano-scheduler` and `volcano-admission` deployments: + +```yaml +--feature-gates=SchedulingGatesQueueAdmission=true +``` + +Optionally, configure the number of async gate removal workers (default: `5`): + +```yaml +--gate-removal-worker-num=10 +``` + +These workers asynchronously process gate removals — each worker picks up a pod whose queue capacity check has passed and removes its scheduling gate, allowing the pod to proceed to scheduling. Increasing this number can help throughput when many pods are being ungated concurrently. + +## 2. Configure the Capacity Plugin + +Ensure the `capacity` plugin is enabled in your scheduler configuration. The reserved resource tracking that prevents race conditions between gate removal and pod allocation is implemented in this plugin. + +Example scheduler configuration: + +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: priority + - name: gang +- plugins: + - name: predicates + - name: capacity + - name: nodeorder +``` + +## 3. Opt-in Pods + +The feature is opt-in per pod, and **one can start using it by adding the following annotation to pods** that should use gate-controlled queue admission: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: my-pod + annotations: + # Opt-in annotation + scheduling.volcano.sh/queue-allocation-gate: "true" +spec: + schedulerName: volcano + containers: + - name: worker + image: nginx + resources: + requests: + cpu: "1" + memory: "1Gi" +``` + +When this pod is created: + +1. The Volcano webhook injects a `scheduling.volcano.sh/queue-allocation-gate` scheduling gate. +2. The pod stays gated (invisible to autoscalers) until the queue has capacity. +3. Once capacity is available, the scheduler removes the gate. +4. If the pod can be placed on a node, it gets scheduled normally. +5. If no node matches (e.g., needs a specific node type), it gets marked `Unschedulable`, correctly triggering the autoscaler. + +## 4. Verify the Feature is Working + +After creating an opted-in pod, verify the gate was injected by the mutation webhook: + +```bash +kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}' +``` + +Expected output (while waiting for queue capacity): + +```json +[{"name":"scheduling.volcano.sh/queue-allocation-gate"}] +``` + +Once the queue has capacity and the scheduler removes the gate, the field will be empty: + +```bash +kubectl get pod my-pod -o jsonpath='{.spec.schedulingGates}' +# empty output +``` + +## Interaction with other Scheduling Gates + +If a pod has additional scheduling gates from other controllers (*e.g.*, `example.com/my-gate`), Volcano will not remove its gate until the pod has **only the Volcano gate remaining**. This ensures Volcano does not interfere with other gate controllers and avoids reserving queue capacity for pods that are still blocked by external dependencies. + +## Limitations + +- Once a pod's gate is removed, it reserves queue capacity until it is scheduled or deleted. If the pod remains unschedulable (*e.g.*, waiting for the autoscaler to add nodes), it continues to hold queue capacity, potentially blocking other pods. Additionally, the feature currently **does not implement a timeout** for reserved capacity. Operators should be aware that pods that have been ungated but remain unschedulable can hold queue capacity indefinitely. +- The feature is **only implemented in the `capacity` plugin**. Users relying on the `proportion` plugin for queue resource management will still face false autoscaler scale-ups, as the scheduling gates mechanism is not yet integrated with `proportion`. Tracking issue: [#5271](https://github.com/volcano-sh/volcano/issues/5271). + +## Related + +- [Design document](https://github.com/volcano-sh/volcano/blob/master/docs/design/scheduling-gates-queue-admission.md) +- [Kubernetes Pod Scheduling Readiness](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)