Bug: GPU node never scales down to 0

## Problem

When using `autoscaler.min: 0` on a node pool, the node is provisioned on scale-up but **never removed** when idle. Tested over 8+ cluster rebuilds across two days, with and without GPU Operator, KubeAI, and taints. Scale-down never triggered in any test.

## Possible Root Cause

The `cluster-autoscaler` enters a **permanently stuck state** after the initial scale-up. During node provisioning, the autoscaler reports "readiness not found" for the new node group. After this phase, it stops producing any meaningful logs, only `node_instances_cache` refreshes. It never evaluates scale-down candidates, even when the node has zero workload pods.

**Cluster-autoscaler logs (stuck after 16:29, never recovers):**
```
16:29:20Z  clusterstate.go:700  Readiness for node group gpu-gcp-eexjr4t not found
16:29:20Z  orchestrator.go:623  Node group gpu-gcp-eexjr4t is not ready for scaleup - unhealthy
16:29:30Z  clusterstate.go:495  Failed to find readiness information for gpu-gcp-eexjr4t
... then only cache refreshes forever, no scale-down evaluation
```

**Adapter sidecar is functional**: when it starts successfully, it handles `Refresh`, `NodeGroups`, `NodeGroupForNode`, `NodeGroupTargetSize` requests. The problem is the cluster-autoscaler process itself gets stuck.

**Additional issue**: The adapter has a startup race condition, it takes ~6 min to bind gRPC port 50000 (waits for Manager). On pod restart it sometimes never binds at all.

## Tested Configurations

All produced the same result (no scale-down):

| Test | GPU Operator | KubeAI | Taints | Result |
|------|-------------|--------|--------|--------|
| 1 | Yes | Yes (Ollama) | Yes | No scale-down |
| 2 | Yes | Yes (vLLM) | Yes | No scale-down |
| 3 | Yes | No | Yes | No scale-down |
| 4 | **No** | **No** | **No** | **No scale-down** |

Test 4 is the minimal reproduction, bare cluster, no GPU Operator, no KubeAI, no taints. Simple busybox pod scheduled on GPU node, then deleted. GPU node stayed indefinitely.

## Reproduction (Minimal)

1. Apply the InputManifest below
2. Wait for cluster to build (control node only, GPU pool at 0)
3. Remove control plane taint: `kubectl taint nodes -l node-role.kubernetes.io/control-plane node-role.kubernetes.io/control-plane-`
4. Deploy a pod targeting the GPU node:
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: DoesNotExist
  containers:
    - name: sleep
      image: busybox
      command: ["sleep", "3600"]
      resources:
        requests:
          cpu: "1"
          memory: "1Gi"
```
5. Wait for GPU node to provision and pod to become Running (~10 min)
6. Delete the pod: `kubectl delete pod gpu-test`
7. Wait 20+ min GPU node is never removed

## InputManifest

```yaml
apiVersion: claudie.io/v1beta1
kind: InputManifest
metadata:
  name: inference-cluster
spec:
  providers:
    - name: hetzner-control
      providerType: hetzner
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.10.0
        path: "templates/terraformer/hetzner"
      secretRef:
        name: hetzner-secret
        namespace: e2e-secrets
    - name: gcp-gpu
      providerType: gcp
      templates:
        repository: "https://github.com/berops/claudie-config"
        tag: v0.10.0
        path: "templates/terraformer/gcp"
      secretRef:
        name: gcp-secret
        namespace: e2e-secrets

  nodePools:
    dynamic:
      - name: control-hzn
        providerSpec:
          name: hetzner-control
          region: hel1
        count: 1
        serverType: cx33
        image: ubuntu-24.04

      - name: gpu-gcp
        providerSpec:
          name: gcp-gpu
          region: us-central1
          zone: us-central1-a
        autoscaler:
          min: 0
          max: 1
        serverType: g2-standard-8
        image: ubuntu-2404-noble-amd64-v20251001
        machineSpec:
          nvidiaGpuCount: 1
          nvidiaGpuType: nvidia-l4

  kubernetes:
    clusters:
      - name: inference
        version: v1.32.0
        network: 192.168.10.0/24
        pools:
          control:
            - control-hzn
          compute:
            - gpu-gcp
```

## Autoscaler Pod Details

```
containers:
  - name: cluster-autoscaler (connects to localhost:50000, watches inference cluster API)
  - name: autoscaler-adapter (gRPC server on :50000, bridges to Claudie Manager)

args: --cloud-provider=externalgrpc
      --ignore-daemonsets-utilization=true
      --balance-similar-node-groups=true
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: GPU node never scales down to 0 #2023

Problem

Possible Root Cause

Tested Configurations

Reproduction (Minimal)

InputManifest

Autoscaler Pod Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	GPU Operator	KubeAI	Taints	Result
1	Yes	Yes (Ollama)	Yes	No scale-down
2	Yes	Yes (vLLM)	Yes	No scale-down
3	Yes	No	Yes	No scale-down
4	No	No	No	No scale-down

Bug: GPU node never scales down to 0 #2023

Description

Problem

Possible Root Cause

Tested Configurations

Reproduction (Minimal)

InputManifest

Autoscaler Pod Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions