Skip to content

Use shareing GPU: MPS with KUBERNETES_VIRTUAL_GPUS #474

@arthas3014

Description

@arthas3014

What is the version?

4.1.1-4.0.4

What happened?

I started several Pods on the A100 using mps, and running dcgm-exporter with env: KUBERNETES_VIRTUAL_GPUS: true,although each pod can be bound to an metric with label vgpu:xx,but all the pod metrics have the same value in the time series.Each pod should use a different case of gpu, otherwise these same value of pod indicators would not be meaningful?

What did you expect to happen?

All Pods should have different metric values for usage

What is the GPU model?

A100

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions