Support per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps

## Summary

Allow operators to configure **different replica counts for different physical GPUs on the same node** under the existing `sharing.timeSlicing` and `sharing.mps` config. The schema for this already exists (`ReplicatedResource.Devices` + `ReplicatedResource.Rename`); the device-map construction in `internal/rm/device_map.go` already supports it. The only thing blocking the feature is one helper (`disableResoureRenaming` in `api/config/v1/replicas.go`) that strips these fields on config load.

## Motivation

Today, every physical GPU on a node ends up with the same replica count because `sharing.timeSlicing.resources` collapses to a single homogeneous entry. That's fine for symmetric nodes, but it forces an all-or-nothing trade-off: either every GPU on the node gets aggressive sharing (more slices, smaller share each) or none does.

Real-world cases where per-GPU replica counts would help:

- **Light + heavy on the same node.** Reserve one GPU at `replicas: 2` for latency-sensitive inference (50% share each, two co-tenants max) and another at `replicas: 8` for batch jobs that don't mind small slices.
- **Mixed GPU classes on one node** (different ages or memory tiers). With distinct `rename`s, each GPU advertises as a different resource name — and consumers explicitly request the tier they need.
- **MPS specifically.** Because MPS enforces `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE = 100 / replicas`, per-GPU replicas translates directly into per-GPU compute caps. This is where the feature is most semantically clean.

Per-node configs (via the GPU Operator's labelled-config map) only solve this when entire nodes are homogeneous. Within a single node, no current option works.

## Proposal

After the change, this config would do what it reads as:

```yaml
version: v1
sharing:
  timeSlicing:
    resources:
      - name: nvidia.com/gpu
        rename: nvidia.com/gpu-light
        devices: ["0"]
        replicas: 2
      - name: nvidia.com/gpu
        rename: nvidia.com/gpu-heavy
        devices: ["1"]
        replicas: 8
```

Pods request `nvidia.com/gpu-light` or `nvidia.com/gpu-heavy` explicitly. Behavior for configs that omit `devices:` / `rename:` is unchanged (existing single-resource configs keep working).

## What this is NOT trying to do

- **Not changing time-slicing compute semantics.** Time-slicing on a single physical GPU round-robins CUDA contexts in the closed-source driver. Holding more time-slicing replicas on the same physical GPU does *not* give a pod a larger compute share — the device plugin has no way to inject scheduler weight there. The `failRequestsGreaterThanOne: true` default still applies and is still the right guardrail for time-slicing.
  - For **MPS**, the picture is cleaner: replicas → `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE`, which the MPS daemon actually enforces. So MPS users get the intuitive "more replicas per GPU = more compute share" behavior, and the per-GPU `replicas` config becomes a per-GPU compute-cap knob.
- **Not introducing new k8s scheduling semantics.** The existing per-resource-name model is unchanged. Pods just see two (or more) distinct resource names instead of one.

## Prior art

The same feature gap has been raised before. None of these were closed by a decision; most went stale:

- #628 "Dedicated GPU's for time slicing on multi GPU set ups." Closed stale. @frittentheke identified the exact `disableResoureRenaming` gate.
- #491 "More flexible time-slicing strategy configuration." Open. @klueska: *"not opposed in principle"* but raised scheduler-integration concerns.
- #1422 "Heterogeneous GPU MPS Replicas Configuration." Closed; the answer was "use per-node configs via the GPU Operator," which doesn't help within-node heterogeneity.
- #1018 (PR) "bugfix: Allow for optional TimeSlicing configuration." Open. Adjacent area.
- #1621 (PR) "Add configurable allocation policy (packed/distributed) for replicated and MIG resources." Open since Feb 2026; touches the same allocator. Happy to coordinate with @wkd-woo to avoid conflicts.

The comment on `DisableResourceNamingInConfig` already hints at intent: *"This may be reenabled in a future release."*

## Next steps

We have a working implementation with tests against `5c15cbe` and will open a PR shortly. Filing this issue first so the design context isn't buried inside the PR description.

cc @elezar @klueska

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps #1786

Summary

Motivation

Proposal

What this is NOT trying to do

Prior art

Next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support per-physical-GPU replica counts in sharing.timeSlicing and sharing.mps #1786

Description

Summary

Motivation

Proposal

What this is NOT trying to do

Prior art

Next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions