enable per-physical-GPU replica counts in time-slicing and MPS by jonathan-meiri · Pull Request #1787 · NVIDIA/k8s-device-plugin

jonathan-meiri · 2026-05-19T13:11:13Z

Summary

Enable per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps config. See #1786 for the design context (prior art, what this does and doesn't try to do, MPS vs time-slicing semantics).

Contributed by @Meiri28 on behalf of @runatom-ai.

Closes #1786

Changes

api/config/v1/replicas.go: replace disableResoureRenaming with applyDefaults. The new helper still fills the existing defaults (auto-rename when renameByDefault: true and Rename is unset; default Devices.All = true when no selector is set) but no longer strips user-supplied Rename or Devices.List / Devices.Count. The two "not yet supported in the config" warnings go away.
api/config/v1/config.go: update DisableResourceNamingInConfig to call the new helper. Function name and public signature unchanged.
internal/rm/allocate.go: in distributedAlloc, change the sort key from min(total - available) to max(available). The two orderings are equivalent in the homogeneous case, but the previous key biased toward exhausting smaller physical GPUs first when replica counts differ. The now-unused total field and its initialization loop are removed.

Device-map construction in internal/rm/device_map.go is unchanged — it already handles Devices.All / Devices.Count / Devices.List correctly. Net behavior change is ~50 LoC plus tests.

Commits are DCO-signed.

TDD red phase. These tests describe two pieces of behavior that are either missing or subtly incorrect today: 1. internal/rm: distributedAlloc must respect heterogeneous physical-GPU replica counts. The current sort key (least-used-first) is equivalent to max-available-first only when all GPUs have the same replica count, and biases toward exhausting the smaller GPU first when they differ. The new TestDistributedAlloc_HeterogeneousReplicas_RespectsCapacity captures this gap; two regression tests cover the existing homogeneous case so the upcoming fix does not change current behavior. 2. api/config/v1: DisableResourceNamingInConfig must preserve per-entry Rename and Devices.List fields so operators can configure different replica counts for different physical GPUs. Today both are silently stripped, collapsing heterogeneous configs into a single homogeneous resource. Two new tests (TimeSlicing and MPS) describe the desired preservation behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com>

Two paired changes that together let operators configure different time-slicing (and MPS) replica counts for different physical GPUs on a single node. api/config/v1: Replace disableResoureRenaming with applyDefaults, which fills in defaults for unset Rename and Devices fields but leaves explicit user configuration intact. Previously, any per-entry Rename or non-Devices.All selection was silently stripped on config load, collapsing heterogeneous configs into a single homogeneous resource. The device-map construction in internal/rm/device_map.go already handled Devices.All / Count / List correctly; only the config-load gate was preventing the feature from working. internal/rm: Change distributedAlloc's sort key from min(total - available) to max(available). The two orderings are equivalent in the existing homogeneous-replicas case, but the previous key biased toward exhausting smaller devices first when replica counts differ. The 'total' field is no longer used and has been removed along with the second loop that populated it. Failing tests added in the preceding commit now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>

copy-pr-bot · 2026-05-19T13:11:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jonathan-meiri · 2026-05-19T13:18:10Z

Contributing on behalf of @runatom-ai.

TDD red phase for an issue independent of NVIDIA#1787. Setup: a node with two physical GPUs of equal advertised replica counts, where one slot on the "second" GPU has already been allocated to another pod. A new pod requests two more slots. The function name and docstring of distributedAlloc promise an even spread, but today the function deterministically picks both of the new pod's slots from the GPU with the most remaining replicas — leaving the other physical GPU's available slot untouched. The bug is in the sort tie-break. After the first pick the per-GPU 'used' counts tie across the candidates, and sort.Slice is unstable, so the next iteration ends up picking the next slot on the GPU we just picked from rather than rotating to the sibling GPU that still has capacity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>

Meiri28 and others added 2 commits May 19, 2026 15:48

jonathan-meiri changed the title ~~enable per-physical-GPU replica counts in time-slicing and MPS~~ [runatom-ai] enable per-physical-GPU replica counts in time-slicing and MPS May 19, 2026

jonathan-meiri changed the title ~~[runatom-ai] enable per-physical-GPU replica counts in time-slicing and MPS~~ enable per-physical-GPU replica counts in time-slicing and MPS May 19, 2026

Merge branch 'main' into heterogeneous-time-slicing-replicas

93fe507

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable per-physical-GPU replica counts in time-slicing and MPS#1787

enable per-physical-GPU replica counts in time-slicing and MPS#1787
jonathan-meiri wants to merge 3 commits into
NVIDIA:mainfrom
jonathan-meiri:heterogeneous-time-slicing-replicas

jonathan-meiri commented May 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

jonathan-meiri commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonathan-meiri commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

jonathan-meiri commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonathan-meiri commented May 19, 2026 •

edited

Loading