enable per-physical-GPU replica counts in time-slicing and MPS#1787
Open
jonathan-meiri wants to merge 3 commits into
Open
enable per-physical-GPU replica counts in time-slicing and MPS#1787jonathan-meiri wants to merge 3 commits into
jonathan-meiri wants to merge 3 commits into
Conversation
TDD red phase. These tests describe two pieces of behavior that are either missing or subtly incorrect today: 1. internal/rm: distributedAlloc must respect heterogeneous physical-GPU replica counts. The current sort key (least-used-first) is equivalent to max-available-first only when all GPUs have the same replica count, and biases toward exhausting the smaller GPU first when they differ. The new TestDistributedAlloc_HeterogeneousReplicas_RespectsCapacity captures this gap; two regression tests cover the existing homogeneous case so the upcoming fix does not change current behavior. 2. api/config/v1: DisableResourceNamingInConfig must preserve per-entry Rename and Devices.List fields so operators can configure different replica counts for different physical GPUs. Today both are silently stripped, collapsing heterogeneous configs into a single homogeneous resource. Two new tests (TimeSlicing and MPS) describe the desired preservation behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com>
Two paired changes that together let operators configure different time-slicing (and MPS) replica counts for different physical GPUs on a single node. api/config/v1: Replace disableResoureRenaming with applyDefaults, which fills in defaults for unset Rename and Devices fields but leaves explicit user configuration intact. Previously, any per-entry Rename or non-Devices.All selection was silently stripped on config load, collapsing heterogeneous configs into a single homogeneous resource. The device-map construction in internal/rm/device_map.go already handled Devices.All / Count / List correctly; only the config-load gate was preventing the feature from working. internal/rm: Change distributedAlloc's sort key from min(total - available) to max(available). The two orderings are equivalent in the existing homogeneous-replicas case, but the previous key biased toward exhausting smaller devices first when replica counts differ. The 'total' field is no longer used and has been removed along with the second loop that populated it. Failing tests added in the preceding commit now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>
Author
|
Contributing on behalf of @runatom-ai. |
jonathan-meiri
pushed a commit
to jonathan-meiri/k8s-device-plugin
that referenced
this pull request
May 19, 2026
TDD red phase for an issue independent of NVIDIA#1787. Setup: a node with two physical GPUs of equal advertised replica counts, where one slot on the "second" GPU has already been allocated to another pod. A new pod requests two more slots. The function name and docstring of distributedAlloc promise an even spread, but today the function deterministically picks both of the new pod's slots from the GPU with the most remaining replicas — leaving the other physical GPU's available slot untouched. The bug is in the sort tie-break. After the first pick the per-GPU 'used' counts tie across the candidates, and sort.Slice is unstable, so the next iteration ends up picking the next slot on the GPU we just picked from rather than rotating to the sibling GPU that still has capacity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: runatom-ai <258621014+runatom-ai@users.noreply.github.com> Signed-off-by: Jonathan Meiri <33288957+Meiri28@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enable per-physical-GPU replica counts in
sharing.timeSlicingandsharing.mpsconfig. See #1786 for the design context (prior art, what this does and doesn't try to do, MPS vs time-slicing semantics).Contributed by @Meiri28 on behalf of @runatom-ai.
Closes #1786
Changes
api/config/v1/replicas.go: replacedisableResoureRenamingwithapplyDefaults. The new helper still fills the existing defaults (auto-rename whenrenameByDefault: trueandRenameis unset; defaultDevices.All = truewhen no selector is set) but no longer strips user-suppliedRenameorDevices.List/Devices.Count. The two "not yet supported in the config" warnings go away.api/config/v1/config.go: updateDisableResourceNamingInConfigto call the new helper. Function name and public signature unchanged.internal/rm/allocate.go: indistributedAlloc, change the sort key frommin(total - available)tomax(available). The two orderings are equivalent in the homogeneous case, but the previous key biased toward exhausting smaller physical GPUs first when replica counts differ. The now-unusedtotalfield and its initialization loop are removed.Device-map construction in
internal/rm/device_map.gois unchanged — it already handlesDevices.All/Devices.Count/Devices.Listcorrectly. Net behavior change is ~50 LoC plus tests.Commits are DCO-signed.