VEP #254: Guest GPU Metrics via VSOCK by machadovilaca · Pull Request #255 · kubevirt/enhancements

machadovilaca · 2026-04-10T10:18:27Z

VEP Metadata

Tracking issue: #254
SIG label: /sig observability /sig compute

What this PR does

GPU workloads running inside KubeVirt virtual machines currently lack observability. Cluster administrators and users have no way to monitor GPU utilization, memory usage, temperature, power consumption, or error counts for GPUs passed through to VMs.

This VEP introduces a mechanism for collecting GPU metrics from inside the guest and exposing them as Prometheus metrics on the host.

Special notes for your reviewer

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

kubevirt-bot · 2026-04-10T10:18:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alaypatel07 · 2026-04-14T14:31:15Z

/cc @rthallisey

dominikholler · 2026-04-14T15:04:22Z

+VSOCK (`AF_VSOCK`) provides socket-based communication between guest and host without virtio-serial.
+
+**Rejected because:**
+- VSOCK requires kernel support that is not universally available, especially on older guests and Windows.


Windows support is upcoming, and I would be surprised if Linux guests which are too old for vsock would utilize GPUs

@enp0s3 said the exactly same on kv vep discussion meeting, if the general opinion is that "Windows support is upcoming" is fine, then I would be okay with it

I actually tried VSOCK first, and was able to get a poc running, ditched it because of the windows support
and since we already used virtio-serial, seemed like a good approach

Windows already supported

updated to use DCGM with VSOCK

dominikholler · 2026-04-14T15:06:59Z

+
+**Rejected because:**
+- VSOCK requires kernel support that is not universally available, especially on older guests and Windows.
+- Virtio-serial is already used by KubeVirt for qemu-guest-agent and downward metrics, making it a proven transport.


@kostyanf14 (and maybe @jcanocan ) Do you agree?

@YanVugenfirer

Virtio-serial is being used already by downward metrics and it works fine. However, I don't have any experience in Windows guests.

kostyanf14 · 2026-04-14T16:35:05Z

+**Rejected because:**
+- VSOCK requires kernel support that is not universally available, especially on older guests and Windows.
+- Virtio-serial is already used by KubeVirt for qemu-guest-agent and downward metrics, making it a proven transport.
+- Virtio-serial channels appear as simple character devices in the guest, making the agent trivial to implement on both Linux and Windows.


Reject for Virtio-serial because we see some problems with huge data transfer virtio-win/kvm-guest-drivers-windows#1462

here the data sent should be small, i shared the structure in the VEP document

machadovilaca · 2026-04-15T10:34:44Z

@dominikholler @kostyanf14 added vsock as an alternative in the design section

machadovilaca · 2026-04-15T10:35:22Z

/cc @michalskrivanek

added guest-file-read as an alternative in the design section

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

rthallisey · 2026-04-16T12:57:49Z

+## Motivation
+
+GPU passthrough and vGPU workloads are increasingly common in KubeVirt for AI/ML training, inference, and media processing. Host-level GPU
+monitoring tools like NVIDIA DCGM exporter are not available in these configurations. The NVIDIA GPU Operator does not deploy this service


Should kubevirt be responsible for maintaining a gpu metric system? This seems generally useful for virtualization.

KubeVirt wouldn't really be maintaining a GPU metric system. The heavy work would be done entirely by DCGM inside the guest, and KubeVirt would handle the communication and exposition of the data.

With that said, NVIDIA DCGM exporter is the de facto way to expose GPU metrics to Prometheus, so it would be ideal and simpler for users if it stayed that way. What I think that would mean:

1- KubeVirt creates a VSOCK for the VMI if any GPU device is configured
2- DCGM listening of VSOCK is running on the guest
3- NVIDIA GPU Operator deploys DCGM exporter on nodes configured for vGPU and GPU passthrough
4- DCGM exporter connects to VMI's via VSOCK to query DCGM data and expose metrics to Prometheus

I would then add some Prometheus recording rules to correlate DCGM exporter metrics with KubeVirt VMI's.

@machadovilaca I think that adding the VOSCK automatically will be a problematic approach, the correct way to configure VSOCK for a VM is via the KubeVirt API.

rthallisey · 2026-04-16T13:05:07Z

+
+#### VSOCK
+
+VSOCK (`AF_VSOCK`) is a socket address family for guest-host communication using the virtio-vsock transport. KubeVirt already has VSOCK


Using Vsock can bypass the launcher and go right to the handler or another node-loacl exporter. I can see some advantages to that.

I do not understand the full meaning of this comment. I just want to highlight that it might be beneficial to avoid bypassing KubeVirt APIs, e.g. because of #223

The data path in the vsock approach is simpler, e.g. guest -> exporter.

rthallisey · 2026-04-16T13:12:05Z

+Collect GPU metrics from the host using NVIDIA DCGM or the GPU node exporter.
+
+**Rejected because:**
+- The NVIDIA GPU Operator does not deploy DCGM exporter on nodes where GPUs are configured for passthrough or vGPU, because the host no


This is changing. The DCGM team is working on adding vsock support so we can gather metrics from inside guest and share it over vsock. Once implemented, it would make sense to always have DCGM exporter on the host.

do you have any pointers for that work? I tried proposing the support for vsock on dcgm exporter and they suggested pursuing other approaches

NVIDIA/dcgm-exporter#649 (comment)

https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html#features

Host engine Added support for listening on the VSOCK protocol. Added support for the following fields: DCGM_FI_DEV_GET_GPU_RECOVERY_ACTION DCGM_FI_DEV_GPU_RECOVERY_ACTION DCGM_FI_DEV_MEMORY_UNREPAIRABLE_FLAG DCGM_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_TOTAL DCGM_FI_DEV_NVLINK_PPCNT_IBPC_PORT_XMIT_WAIT

updated to use DCGM with VSOCK

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

enp0s3

@machadovilaca Hi. Actually I don't see any added value for the vGPU metrics to be maintained by the KubeVirt tree. It will couple DCGM metrics development with KV release cycle. I think that DCGM can manage with its own metrics exporter.

enp0s3 · 2026-04-28T03:51:30Z

@machadovilaca @rthallisey To be more specific, I don't understand the drawbacks of the following approach:

Aggregate VMI to vGPU metrics on a higher level using Prometheus PromQL
Using the regular VM pod network, attach kubernetes service the VM network and collect the metrics that way, instead of using VSOCK.

enp0s3 · 2026-04-28T03:55:19Z

@rthallisey Another drawback of using VSOCK is that we are willing to graduate it to be namespace confined, which might impose obstacles for current DCGM design, leading to the need of escalating the privileges of DCGM to be able to access every network namespace on the node.

machadovilaca · 2026-04-28T09:42:01Z

@machadovilaca @rthallisey To be more specific, I don't understand the drawbacks of the following approach:

Aggregate VMI to vGPU metrics on a higher level using Prometheus PromQL

Using the regular VM pod network, attach kubernetes service the VM network and collect the metrics that way, instead of using VSOCK.

@enp0s3

whether or not we decide to support this VSOCK DCGM approach, if the metrics Prometheus ingests are coming from DCGM directly, we would always need a way to correlate them to VMIs. And even if we end up not doing any action to simplify this metric collection, we can still provide recording rules for the correlations
it is just a question of simplifying the work for the end user, the approach you are suggesting should even work right now. But for each VMI the user now needs to: 1. configure network for the vmi, 2. install dcgm in the vmi, 3. create the k8s service, 4. create a service monitor, 5. correlate gpu and vmi metrics

imo we should look for ways to simplify this

rthallisey · 2026-04-28T12:51:01Z

Another drawback of using VSOCK is that we are willing to graduate it to be namespace confined

Fair point. Perhaps though there's a way to solve global vsock with some security hardening. I'd like to at least explore that option.

The two use cases I see are:

Advertise kubevirt specific gpu metrics that tie vmi to gpu
Advertise all gpu metrics from a passthrough GPU to the dcgm-exporter

Use case 1 is meant for kubevirt admins. Use case 2 can tie into the existing GPU tooling ecosystem.

enp0s3 · 2026-04-28T16:12:09Z

@machadovilaca @rthallisey To be more specific, I don't understand the drawbacks of the following approach:

Aggregate VMI to vGPU metrics on a higher level using Prometheus PromQL

Using the regular VM pod network, attach kubernetes service the VM network and collect the metrics that way, instead of using VSOCK.

@enp0s3

whether or not we decide to support this VSOCK DCGM approach, if the metrics Prometheus ingests are coming from DCGM directly, we would always need a way to correlate them to VMIs. And even if we end up not doing any action to simplify this metric collection, we can still provide recording rules for the correlations

+1 for recording rules, IMO its much more simpler then the current approach.

it is just a question of simplifying the work for the end user, the approach you are suggesting should even work right now. But for each VMI the user now needs to: 1. configure network for the vmi, 2. install dcgm in the vmi, 3. create the k8s service, 4. create a service monitor, 5. correlate gpu and vmi metrics

imo we should look for ways to simplify this

this can be simplified using tools like Helm.

enp0s3

I would prefer this VEP to be eventually converged with VEP 143, if we will go with the VSOCK approach I would like the code to live in the separate monitoring stack

enp0s3 · 2026-04-28T16:14:26Z

+
+- Expose per-VM, per-GPU utilization metrics as Prometheus metrics from virt-handler.
+- Support both GPU passthrough and vGPU devices.
+- Support Linux and Windows guests.


How are we going to support Windows? Do we have a way to test this?

enp0s3 · 2026-04-28T16:15:49Z

+
+- https://github.com/kubevirt/kubevirt
+
+## Design


We need to mention that VSOCK had to be requested in the VM spec

my idea was that if the feature gate is enabled and GPUs are present in the spec, virt-controller would automatically attach a VSOCK device to the domain XML

@machadovilaca But its only relevant for NVIDIA GPUs, isn't it?

initially my idea was to keep this generic enough to handle other gpus in the future
but with some of the work moving to the dgcm exporter supporting vsock, and nvidia operator deploying the exporter on all use cases, it is now more focused on nvidia, yes

IMO its suboptimal to always attach VSOCK upon GPU resource request.

i think many times if a gpu is attached, dcgm will be installed in the guest alongside with the drivers, so the dcgm exporter would be able to collect metrics with no additional user operation over what he does today

but i can change that. what do you think is better, using the existing attach vsock field or creating a new one specific for gpu metrics, that if enabled, would create the vsock?

@machadovilaca I would leave the responsibility to attach the VSOCK for the user. In case the VSOCK isn't attached via the VM spec we won't collect the metrics.

ok

then with the new dcgm exporter support for vsock
we should be able to just add some documentation for the users to enable vsock on the spec, install dcgm in the guest, and start the dcgm service listening on the vsock

now, nvidia needs to update nvidia gpu operator to deploy the the dcgm exporter on nodes configured for vgpu or gpu passthrough, which it currently doesn't

on kubevirt side, we would only need to add the recording rules, that correlated the dcgm exporter metrics, with the vmis

enp0s3 · 2026-04-29T03:21:51Z

+          deviceName: nvidia.com/A100
+```
+
+## Alternatives


@machadovilaca Can you please add in the alternative why inline communication method using regular kubernetes network was rejected? What are the drawbacks of deploying additional kubernetes resources in order to expose the metric service that is running inside the guest?

In addition what are the drawbacks of using recording rules or any other high level tools to tie VM to GPU? Why explicit instrumentation is needed to tie these resources?

added DCGM via regular Kubernetes Networking alternative

recording rules are not an issue, even if we expose VSOCK for DCGM exporter to collect the DCGM metrics, we would need the recording rules

@machadovilaca ~~I would leave the responsibility to attach the VSOCK for the user. In case the VSOCK isn't attached via the VM spec we won't collect the metrics.~~ sorry confused with another thread.

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

enp0s3

/lgtm

@machadovilaca Thank you!

enp0s3 · 2026-04-30T13:06:23Z

@vladikr Hi, could you please have a look?

enp0s3 · 2026-04-30T13:32:31Z

@machadovilaca Three more topics we should converge:

Windows support, can we commit to it?
Combined consumption of VSOCK, both by the metric collector and by KubeVirt API. We should mention the limitations.
The trigger to collect the metrics, there can be non-NVIDIA GPU with attached VSOCK, how would we differ that case?

vladikr · 2026-04-30T14:28:59Z

@machadovilaca @enp0s3 To be honest, personally, I didn't have the capacity to review this proposal deep enough to understand whether it fits KubeVirt or not ...
I am not able to give my approval for this cycle.
Let's defer it to the next one, which will give us enough time to have a proper discussion.

… to Linux Signed-off-by: machadovilaca <machadovilaca@gmail.com>

kubevirt-bot · 2026-04-30T15:00:57Z

New changes are detected. LGTM label has been removed.

enp0s3 · 2026-04-30T15:55:37Z

@vladikr Makes sense. Sorry for the noise. It looks like we need more time.

rthallisey · 2026-04-30T17:02:19Z

+
+This VEP introduces a mechanism for collecting GPU metrics from inside the guest and exposing them as Prometheus metrics on the host. NVIDIA
+DCGM (Data Center GPU Manager) 4.5.0 added native support for listening on the VSOCK protocol, enabling direct guest-to-host communication
+without a custom guest agent. virt-launcher connects to DCGM inside the guest via VSOCK and exposes the GPU metrics data through the unified


DCGM-exporter running on the host will poll for metrics over the socket. DCGM should be running in the guest and listening on the socket. Having dcgm-exporter or dcgm connect to the launcher over vosck doesn't sound right to me.

VEP kubevirt#254: Guest GPU Metrics via virtio-serial

cf731e5

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Apr 10, 2026

kubevirt-bot requested review from lyarwood, vladikr and xpivarc April 10, 2026 10:18

kubevirt-bot added the size/L label Apr 10, 2026

machadovilaca mentioned this pull request Apr 13, 2026

VEP 254: Guest GPU Metrics via VSOCK #254

Open

4 tasks

kubevirt-bot requested a review from rthallisey April 14, 2026 14:31

dominikholler reviewed Apr 14, 2026

View reviewed changes

kostyanf14 reviewed Apr 14, 2026

View reviewed changes

kubevirt-bot requested a review from michalskrivanek April 15, 2026 10:35

VEP kubevirt#254: Present guest-host communication alternatives

f64ad5c

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

machadovilaca force-pushed the vep-254-guest-gpu-metrics-via-virtio-serial branch from 02f5441 to f64ad5c Compare April 15, 2026 10:37

rthallisey reviewed Apr 16, 2026

View reviewed changes

VEP kubevirt#254: Replace custom agent with DCGM VSOCK

75c58da

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

machadovilaca changed the title ~~VEP #254: Guest GPU Metrics via virtio-serial~~ VEP #254: Guest GPU Metrics via VSOCK Apr 22, 2026

enp0s3 reviewed Apr 27, 2026

View reviewed changes

enp0s3 reviewed Apr 28, 2026

View reviewed changes

enp0s3 reviewed Apr 29, 2026

View reviewed changes

machadovilaca added 2 commits April 29, 2026 10:37

VEP kubevirt#254: Add DCGM via regular Kubernetes Networking alternative

68385bb

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

VEP kubevirt#254: Integrate GPU metrics with GetVMStats unified RPC

db608f1

Signed-off-by: machadovilaca <machadovilaca@gmail.com>

enp0s3 reviewed Apr 30, 2026

View reviewed changes

kubevirt-bot assigned enp0s3 Apr 30, 2026

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

VEP kubevirt#254: Add opt-in annotation, VSOCK limitations, and scope…

152ed4d

… to Linux Signed-off-by: machadovilaca <machadovilaca@gmail.com>

kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

rthallisey reviewed Apr 30, 2026

View reviewed changes


		#### VSOCK

		VSOCK (`AF_VSOCK`) is a socket address family for guest-host communication using the virtio-vsock transport. KubeVirt already has VSOCK

Conversation

machadovilaca commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VEP Metadata

What this PR does

Special notes for your reviewer

Uh oh!

kubevirt-bot commented Apr 10, 2026

Uh oh!

alaypatel07 commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcanocan Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

machadovilaca commented Apr 15, 2026

Uh oh!

machadovilaca commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enp0s3 left a comment

Choose a reason for hiding this comment

Uh oh!

enp0s3 commented Apr 28, 2026

Uh oh!

enp0s3 commented Apr 28, 2026

Uh oh!

machadovilaca commented Apr 28, 2026

Uh oh!

rthallisey commented Apr 28, 2026

Uh oh!

enp0s3 commented Apr 28, 2026

Uh oh!

enp0s3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

machadovilaca commented Apr 10, 2026 •

edited

Loading

jcanocan Apr 20, 2026 •

edited

Loading

enp0s3 Apr 29, 2026 •

edited

Loading