Skip to content

enable per-physical-GPU replica counts in time-slicing and MPS#1787

Open
jonathan-meiri wants to merge 3 commits into
NVIDIA:mainfrom
jonathan-meiri:heterogeneous-time-slicing-replicas
Open

enable per-physical-GPU replica counts in time-slicing and MPS#1787
jonathan-meiri wants to merge 3 commits into
NVIDIA:mainfrom
jonathan-meiri:heterogeneous-time-slicing-replicas

Conversation

@jonathan-meiri
Copy link
Copy Markdown

@jonathan-meiri jonathan-meiri commented May 19, 2026

Summary

Enable per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps config. See #1786 for the design context (prior art, what this does and doesn't try to do, MPS vs time-slicing semantics).

Contributed by @Meiri28 on behalf of @runatom-ai.

Closes #1786

Changes

  • api/config/v1/replicas.go: replace disableResoureRenaming with applyDefaults. The new helper still fills the existing defaults (auto-rename when renameByDefault: true and Rename is unset; default Devices.All = true when no selector is set) but no longer strips user-supplied Rename or Devices.List / Devices.Count. The two "not yet supported in the config" warnings go away.
  • api/config/v1/config.go: update DisableResourceNamingInConfig to call the new helper. Function name and public signature unchanged.
  • internal/rm/allocate.go: in distributedAlloc, change the sort key from min(total - available) to max(available). The two orderings are equivalent in the homogeneous case, but the previous key biased toward exhausting smaller physical GPUs first when replica counts differ. The now-unused total field and its initialization loop are removed.

Device-map construction in internal/rm/device_map.go is unchanged — it already handles Devices.All / Devices.Count / Devices.List correctly. Net behavior change is ~50 LoC plus tests.

Commits are DCO-signed.

Meiri28 and others added 2 commits May 19, 2026 15:48
TDD red phase. These tests describe two pieces of behavior that are
either missing or subtly incorrect today:

1. internal/rm: distributedAlloc must respect heterogeneous physical-GPU
   replica counts. The current sort key (least-used-first) is equivalent
   to max-available-first only when all GPUs have the same replica
   count, and biases toward exhausting the smaller GPU first when they
   differ. The new TestDistributedAlloc_HeterogeneousReplicas_RespectsCapacity
   captures this gap; two regression tests cover the existing
   homogeneous case so the upcoming fix does not change current
   behavior.

2. api/config/v1: DisableResourceNamingInConfig must preserve per-entry
   Rename and Devices.List fields so operators can configure different
   replica counts for different physical GPUs. Today both are silently
   stripped, collapsing heterogeneous configs into a single homogeneous
   resource. Two new tests (TimeSlicing and MPS) describe the desired
   preservation behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>
Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com>
Two paired changes that together let operators configure different
time-slicing (and MPS) replica counts for different physical GPUs on
a single node.

api/config/v1: Replace disableResoureRenaming with applyDefaults, which
fills in defaults for unset Rename and Devices fields but leaves
explicit user configuration intact. Previously, any per-entry Rename
or non-Devices.All selection was silently stripped on config load,
collapsing heterogeneous configs into a single homogeneous resource.
The device-map construction in internal/rm/device_map.go already
handled Devices.All / Count / List correctly; only the config-load
gate was preventing the feature from working.

internal/rm: Change distributedAlloc's sort key from min(total - available)
to max(available). The two orderings are equivalent in the existing
homogeneous-replicas case, but the previous key biased toward
exhausting smaller devices first when replica counts differ. The
'total' field is no longer used and has been removed along with the
second loop that populated it.

Failing tests added in the preceding commit now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com>
Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jonathan-meiri
Copy link
Copy Markdown
Author

Contributing on behalf of @runatom-ai.

@jonathan-meiri jonathan-meiri changed the title enable per-physical-GPU replica counts in time-slicing and MPS [runatom-ai] enable per-physical-GPU replica counts in time-slicing and MPS May 19, 2026
@jonathan-meiri jonathan-meiri changed the title [runatom-ai] enable per-physical-GPU replica counts in time-slicing and MPS enable per-physical-GPU replica counts in time-slicing and MPS May 19, 2026
jonathan-meiri pushed a commit to jonathan-meiri/k8s-device-plugin that referenced this pull request May 19, 2026
TDD red phase for an issue independent of NVIDIA#1787.

Setup: a node with two physical GPUs of equal advertised replica
counts, where one slot on the "second" GPU has already been allocated
to another pod. A new pod requests two more slots. The function name
and docstring of distributedAlloc promise an even spread, but today
the function deterministically picks both of the new pod's slots from
the GPU with the most remaining replicas — leaving the other physical
GPU's available slot untouched.

The bug is in the sort tie-break. After the first pick the per-GPU
'used' counts tie across the candidates, and sort.Slice is unstable,
so the next iteration ends up picking the next slot on the GPU we
just picked from rather than rotating to the sibling GPU that still
has capacity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com>
Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps

2 participants