feat: add VolumeAttachment wait after drain using cached client #207

Trojan295 · 2026-01-08T16:09:24Z

Add optional feature to wait for VolumeAttachments to be deleted after draining a node, preventing Multi-Attach errors when CSI drivers need time to clean up volumes before the node is terminated.

Uses controller-runtime's cached client with field indexes for efficient VolumeAttachment and Pod queries without hitting the API server directly. Follows Karpenter's approach of excluding VAs from non-drainable pods (DaemonSets, static pods) to avoid deadlocks.

Key changes:

Add Drain config section with WaitForVolumeDetach, VolumeDetachTimeout, and CacheSyncTimeout settings
Create controller-runtime cache with field indexes for VolumeAttachment and Pod resources by spec.nodeName
Implement getVolumeAttachmentsForNode() using Karpenter-style filtering
Graceful degradation: if cache sync fails, feature is disabled but drain operations continue normally

Environment variables:

DRAIN_WAIT_FOR_VOLUME_DETACH: Enable feature (default: false)
DRAIN_VOLUME_DETACH_TIMEOUT: Max wait time (default: 60s)
CACHE_SYNC_TIMEOUT: Cache sync timeout (default: 120s)

Add optional feature to wait for VolumeAttachments to be deleted after draining a node, preventing Multi-Attach errors when CSI drivers need time to clean up volumes before the node is terminated. Uses controller-runtime's cached client with field indexes for efficient VolumeAttachment and Pod queries without hitting the API server directly. Follows Karpenter's approach of excluding VAs from non-drainable pods (DaemonSets, static pods) to avoid deadlocks. Key changes: - Add Drain config section with WaitForVolumeDetach, VolumeDetachTimeout, and CacheSyncTimeout settings - Create controller-runtime cache with field indexes for VolumeAttachment and Pod resources by spec.nodeName - Implement getVolumeAttachmentsForNode() using Karpenter-style filtering - Graceful degradation: if cache sync fails, feature is disabled but drain operations continue normally Environment variables: - DRAIN_WAIT_FOR_VOLUME_DETACH: Enable feature (default: false) - DRAIN_VOLUME_DETACH_TIMEOUT: Max wait time (default: 60s) - CACHE_SYNC_TIMEOUT: Cache sync timeout (default: 120s)

…tach The spec.nodeName field selector is not supported by the Kubernetes API server for VolumeAttachment resources. This caused the waitForVolumeDetach function to fail silently. Switch to using the controller-runtime cached client which has a custom field indexer configured for spec.nodeName lookups on VolumeAttachments.

Trojan295 added 3 commits January 8, 2026 15:15

use API

31bfc06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add VolumeAttachment wait after drain using cached client #207

feat: add VolumeAttachment wait after drain using cached client #207

Uh oh!

Trojan295 commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add VolumeAttachment wait after drain using cached client #207

Are you sure you want to change the base?

feat: add VolumeAttachment wait after drain using cached client #207

Uh oh!

Conversation

Trojan295 commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants