Skip to content

[Bug]: Cannot assign GPU per container in multi-container Pod with time-slicing #1690

@hoangcongchuc

Description

@hoangcongchuc

Summary

When using NVIDIA GPU time-slicing (device-sharing-strategy=time-slicing, rename-by-default=false), Kubernetes does not allow assigning GPU resources independently to multiple containers within the same Pod.

GPU requests are aggregated at the Pod level, making it impossible to run multiple GPU-dependent containers in a single Pod unless the underlying node has multiple physical GPUs.


Environment

  • Kubernetes (EKS)

  • Karpenter (NodePool / EC2NodeClass)

  • Bottlerocket (bottlerocket@v1.53.0)

  • Instance types:

    • g6e.xlarge, g6e.2xlarge (1 physical GPU)
  • NVIDIA device plugin:

    • device-sharing-strategy = "time-slicing"
    • replicas = 4
    • rename-by-default = false

Problem Scenario

We run a Pod with multiple containers, each requiring GPU:

containers:
  - name: container-a
    resources:
      limits:
        nvidia.com/gpu: 1

  - name: container-b
    resources:
      limits:
        nvidia.com/gpu: 1

Expectation

Since time-slicing exposes:

nvidia.com/gpu: 4

We expect:

  • Each container consumes 1 GPU slice
  • Total Pod uses 2 slices
  • Pod should run on a node with 1 physical GPU

Actual Behavior

  • Kubernetes aggregates GPU requests at the Pod level:
Pod requires: nvidia.com/gpu = 2
  • Karpenter evaluates instance types based on physical GPUs
  • Instance types with 1 GPU cannot satisfy request

➡️ Pod is unschedulable unless using multi-GPU instances


Observed Workarounds & Trade-offs

❌ Option 1 — Only 1 container uses GPU

container-a → GPU
container-b → CPU only
  • Not viable: container-b requires GPU → crashes / restarts

❌ Option 2 — Split into multiple Pods

Pod A → container-a (GPU)
Pod B → container-b (GPU)
  • Requires architectural changes
  • Effectively turns into multiple services
  • Not suitable when containers are tightly coupled

✅ Option 3 — Use multi-GPU nodes (current workaround)

  • Use instance with ≥2 physical GPUs
  • Example: node with 4 GPUs
  • Pod schedules successfully

Downside:

  • Significantly higher cost
  • Wastes GPU capacity when workloads are small
  • Defeats purpose of GPU time-slicing (cost efficiency)

Key Insight

Time-slicing increases allocatable GPU slots, but:

  • Kubernetes still treats nvidia.com/gpu as:

    • integer
    • Pod-scoped
  • There is no concept of:

    • per-container GPU isolation
    • or fractional GPU assignment inside a Pod

Impact

  • Cannot efficiently run multi-container GPU workloads

  • Forces one of:

    • code refactor (split services)
    • over-provisioning (multi-GPU nodes)
  • Reduces cost efficiency of GPU sharing setups


Question

Is this limitation:

  • inherent to Kubernetes device plugin design?

  • or is there a recommended pattern for:

    • multi-container GPU workloads with time-slicing?

Conclusion

GPU time-slicing does not enable “1 GPU per container” within a Pod.
GPU resources are still aggregated at the Pod level, forcing users to either redesign workloads or over-provision infrastructure.


Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovements to existing features, performance, or usability (not bug fixes or new features).

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions