Skip to content

Comments

Add shared GPUs for deployments using DRA and MPS/timeslicing#693

Merged
Phillezi merged 36 commits intomainfrom
dra-rbac
Feb 18, 2026
Merged

Add shared GPUs for deployments using DRA and MPS/timeslicing#693
Phillezi merged 36 commits intomainfrom
dra-rbac

Conversation

@Phillezi
Copy link
Member

This PR adds features related to allocating (as an admin) GPUs with a desired sharing strategy that users can consume from deployments.

This feature is controlled with the zone capability dra and should only be enabled for clusters with a version of k8s 1.34 or above. It also requires the cluster to have the nvidia dra driver installed.

Additional changes

VM can be controlled with the role permission useVms, if not set It will default to true. If set as: useVMs: false it will deny that role from creating new VMs.

GPU usage can be controlled with the role quota gpus.

Phillezi and others added 30 commits October 14, 2025 11:20
…cy in the native go dns resolver (netgo)

Had problems with resolving *.localhost  subdomains.
…nt vendors gpu configuration with nvidia impl for now + code to generate nvidia opaque types for gpu config so we dont have to rely on importing nvidias code, it has a init func that fails unless a specific variable is set to a version at compile time (with ldflags), this made testing extremely hard.

Left for full k8s-layer impl:
- Actually impl CreateResourceClaimManifest
- Add Tolerations to the ResourceClaimPublic requests
- Figure out if / how Constraints from the resourcev1.ResourceClaimSpec should be added

Left for broader (api) impl:
- Model structs for this, with additional data for controlling rbac
- DTOs
- Routes
@Phillezi Phillezi merged commit 6540d1c into main Feb 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant