diff --git a/veps/sig-observability/gpu-metrics-via-vsock/vep.md b/veps/sig-observability/gpu-metrics-via-vsock/vep.md
new file mode 100644
index 00000000..d38bab70
--- /dev/null
+++ b/veps/sig-observability/gpu-metrics-via-vsock/vep.md
@@ -0,0 +1,371 @@
+# VEP #254: Guest GPU Metrics via VSOCK
+
+## VEP Status Metadata
+
+### Target releases
+
+- This VEP targets alpha for version: v1.9
+- This VEP targets beta for version:
+- This VEP targets GA for version:
+
+### Release Signoff Checklist
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue created, which links to VEP dir in [kubevirt/enhancements]
+- [ ] (R) Alpha target version is explicitly mentioned and approved
+- [ ] (R) Beta target version is explicitly mentioned and approved
+- [ ] (R) GA target version is explicitly mentioned and approved
+
+## Overview
+
+GPU workloads running inside KubeVirt virtual machines currently lack observability. Cluster administrators and users have no way to monitor
+GPU utilization, memory usage, temperature, power consumption, or error counts for GPUs passed through to VMs.
+
+This VEP introduces a mechanism for collecting GPU metrics from inside the guest and exposing them as Prometheus metrics on the host. NVIDIA
+DCGM (Data Center GPU Manager) 4.5.0 added native support for listening on the VSOCK protocol, enabling direct guest-to-host communication
+without a custom guest agent. virt-launcher connects to DCGM inside the guest via VSOCK and exposes the GPU metrics data through the unified
+`GetVMStats` gRPC call ([VEP #143](https://github.com/kubevirt/enhancements/pull/81)). virt-handler queries virt-launcher via `GetVMStats`,
+and the observability-controller queries virt-handler to produce `kubevirt_vmi_gpu_*` metrics.
+
+## Motivation
+
+GPU passthrough and vGPU workloads are increasingly common in KubeVirt for AI/ML training, inference, and media processing. Host-level GPU
+monitoring tools like NVIDIA DCGM exporter are not available in these configurations. The NVIDIA GPU Operator does not deploy this service
+on nodes where GPUs are configured for passthrough or vGPU, because the host no longer has direct access to the device. This leaves GPU
+workloads inside VMs completely unmonitored.
+
+NVIDIA DCGM 4.5.0 introduced native VSOCK support, allowing the DCGM daemon inside a guest VM to accept connections from the host over
+VSOCK. By leveraging this capability, KubeVirt can collect GPU metrics directly from DCGM without maintaining a custom guest agent, providing
+per-VM, per-GPU observability that is consistent with the existing `kubevirt_vmi_*` metrics namespace and enabling unified dashboards and
+alerting.
+
+## Goals
+
+- Expose per-VM, per-GPU utilization metrics as Prometheus metrics from the observability-controller.
+- Support both GPU passthrough and vGPU devices.
+- Support Linux guests.
+- Leverage DCGM's native VSOCK support to avoid maintaining a custom guest agent.
+- Integrate with the unified `GetVMStats` gRPC call rather than introducing a separate RPC.
+
+## Non Goals
+
+- Managing GPU drivers or DCGM installation inside the guest.
+- Supporting non-NVIDIA GPUs (AMD, Intel) in the initial implementation.
+- Alerting rules or Grafana dashboards (these can be added separately).
+- Collecting GPU metrics from the host side (e.g., via DCGM on the host).
+- Windows guest support (see [Future Work](#future-work)).
+
+## Definition of Users
+
+- **Cluster administrators** who need to monitor GPU utilization across VMs for capacity planning, cost allocation, and health monitoring.
+- **VM users** running GPU workloads who want to see GPU metrics alongside other VM metrics in existing monitoring infrastructure.
+- **Platform teams** building autoscaling or scheduling decisions based on GPU utilization.
+
+## User Stories
+
+### GPU Utilization Monitoring
+As a cluster administrator, I want to see GPU utilization, memory usage, and temperature for each VM so I can identify underutilized or
+overheating GPUs and take action.
+
+### Capacity Planning
+As a platform engineer, I want per-VM GPU metrics in Prometheus so I can build dashboards showing GPU utilization trends across the cluster
+and plan capacity.
+
+### Error Detection
+As an operations engineer, I want to be alerted when a GPU inside a VM reports ECC errors so I can proactively migrate the workload before
+hardware failure.
+
+## Repos
+
+- https://github.com/kubevirt/kubevirt
+- https://github.com/kubevirt/kubevirt-observability-controller
+
+## Design
+
+The design integrates GPU metrics into the existing monitoring data pipeline established by
+[VEP #143](https://github.com/kubevirt/enhancements/pull/81). GPU metrics become a new data category within the unified `GetVMStats` gRPC
+call, following the same pattern as domain stats, guest info, and filesystem data.
+
+```
+Guest VM (QEMU)                     virt-launcher                      virt-handler                observability-controller
++--------------------------+        +----------------------------+     +---------------------+     +-------------------------+
+|                          |        |                            |     |                     |     |                         |
+|  DCGM (nv-hostengine)   |         |  DomainManager             |     |                     |     |                         |
+|  - collects GPU metrics  | <====> |    gpuMetricsCache         |     |                     |     |                         |
+|  - listens on VSOCK      | vsock  |      TimeDefinedCache      |     |                     |     |                         |
+|                          |        |    scrapeGPUMetrics()      |     |                     |     |  queries virt-handler   |
++--------------------------+        |                            |     |  GetVMStats()       | <-- |  for all VMI data       |
+                                    |  cmd-server (gRPC)         |     |   includes gpuStats |     |                         |
+                                    |    GetVMStats() handler    | <-- |                     |     |  emits:                 |
+                                    |      includes gpuStats     |     |                     |     |    kubevirt_vmi_gpu_*   |
+                                    +----------------------------+     +---------------------+     +-------------------------+
+```
+
+### 1. Opting In: Annotation and VSOCK Enablement
+
+Users must annotate the VM to enable GPU metrics collection and enable VSOCK on the VM spec. The annotation
+`kubevirt.io/gpu-metrics-collector` signals to virt-launcher that it should connect to DCGM via VSOCK and collect GPU metrics. Without this
+annotation, virt-launcher does not attempt any GPU metrics collection, regardless of whether the VM has GPUs or VSOCK enabled.
+
+```yaml
+annotations:
+  kubevirt.io/gpu-metrics-collector: "dcgm-vsock"
+```
+
+The annotation serves three purposes:
+
+1. **virt-launcher trigger**: virt-launcher only attempts DCGM VSOCK connections on VMIs carrying this annotation, avoiding unnecessary
+connection attempts and timeouts on VMs with non-NVIDIA GPUs or VMs where DCGM is not installed.
+2. **virt-api VSOCK conflict warning**: virt-api can check for this annotation and warn users when they attempt to use VSOCK via the
+KubeVirt client (`virtctl vsock`) that the VSOCK device is also in use for GPU metrics collection on the DCGM port.
+3. **Explicit opt-in**: Keeps GPU metrics collection strictly opt-in, making it clear which VMs are participating.
+
+Users must also enable VSOCK on the VM spec. VSOCK (`AF_VSOCK`) is a socket address family for guest-host communication using the
+virtio-vsock transport. KubeVirt already has VSOCK support with per-VMI CID assignment by virt-controller.
+
+DCGM 4.5.0 added native support for listening on the VSOCK protocol. The DCGM daemon (`nv-hostengine`) inside the guest can be configured
+to listen on a VSOCK port, allowing virt-launcher on the host to connect and query GPU metrics using DCGM's client protocol. This provides
+proper socket semantics including flow control and connection state detection.
+
+**Advantages over virtio-serial:**
+- Standard socket API with flow control and connection state detection.
+- No data transfer size limitations (virtio-serial Windows drivers fail WriteFile calls >2MB).
+- Already supported by KubeVirt with per-VMI CID assignment.
+- DCGM natively supports VSOCK, eliminating the need for a custom guest agent.
+
+**Downsides:**
+- Requires Linux kernel 4.8+ in the guest; older kernels have no support.
+- Windows guests require virtio-win drivers with VSOCK support.
+
+**Shared VSOCK usage:**
+
+KubeVirt already uses VSOCK for two purposes: an internal gRPC service on VSOCK port 1 for TLS certificate distribution to guests, and
+user-initiated port-forwarding via `virtctl vsock`. GPU metrics collection adds another consumer on the same virtio-vsock device.
+Limitations to be aware of:
+
+- **Bandwidth sharing**: All VSOCK traffic (KubeVirt internal, `virtctl vsock`, and DCGM metrics) shares a single virtio-vsock device per
+VM. There is no QoS or prioritization between consumers.
+- **No connection isolation**: VSOCK multiplexes connections by port number on the same device.
+- **Port management**: DCGM must listen on a VSOCK port that does not conflict with port 1 (reserved by KubeVirt for its internal gRPC
+service) or other guest services listening on VSOCK.
+
+In practice, the GPU metrics payload is small (a few KB per scrape) and collected at most once every 3.25 seconds, so contention is unlikely
+under normal operation.
+
+### 2. Guest: DCGM with VSOCK
+
+NVIDIA DCGM runs inside the guest VM as the GPU metrics provider. The DCGM daemon (`nv-hostengine`) is configured to listen on a VSOCK
+port, accepting connections from the host. Users are responsible for installing and configuring DCGM in the guest.
+
+DCGM collects GPU metrics via NVML and exposes them through its client API. The guest only needs DCGM installed and configured to listen
+on VSOCK; no additional KubeVirt-specific agent is required.
+
+The metrics collected from DCGM include GPU utilization, memory usage, temperature, power consumption, ECC errors, encoder/decoder
+utilization, and running process counts.
+
+### 3. virt-launcher: GPU Metrics in GetVMStats
+
+GPU metrics are integrated into the `GetVMStats` gRPC handler introduced by [VEP #143](https://github.com/kubevirt/enhancements/pull/81).
+A new `GpuStatsRequest` / `gpuStats` field is added to `VMStatsRequest` / `VMStatsResponse`, following the same pattern as the other data
+categories (domain stats, guest info, filesystems, etc.).
+
+**DomainManager (`LibvirtDomainManager`)**: When the VMI carries the `kubevirt.io/gpu-metrics-collector: "dcgm-vsock"` annotation,
+a `gpuMetricsCache` (`TimeDefinedCache[string]`, 3250ms TTL) caches the metrics from DCGM. The recalculation function
+(`scrapeGPUMetrics`) connects to the guest's DCGM via VSOCK (using the VMI's CID and a well-known port), queries GPU metrics through the
+DCGM client protocol, and returns the response. This follows the same caching pattern as `domainStatsCache` for domain stats. If the
+annotation is absent, no VSOCK connection is attempted.
+
+**GetVMStats handler**: When the caller includes `GpuStatsRequest` in the `VMStatsRequest`, the handler reads from `gpuMetricsCache` and
+populates the `gpuStats` field in the response. This keeps GPU metrics collection consistent with the unified monitoring data pipeline.
+
+### 4. virt-handler: Requesting GPU Stats
+
+virt-handler includes `GpuStatsRequest` when calling `GetVMStats` on each virt-launcher. The GPU metrics data is returned alongside domain
+stats and other monitoring data in the same `GetVMStats` response.
+
+### 5. observability-controller: Emitting Prometheus Metrics
+
+The observability-controller ([VEP #143](https://github.com/kubevirt/enhancements/pull/81)) queries virt-handler to collect runtime VM data.
+GPU metrics are collected as part of this existing flow. The controller parses the `gpuStats` data from `GetVMStats` responses and emits
+`kubevirt_vmi_gpu_*` Prometheus metrics.
+
+The `gpuMetrics` resource metrics implementation emits collector results for each GPU device found in the response. This follows the same
+`resourceMetrics` pattern used for CPU, memory, block, network, and filesystem metrics.
+
+### Metrics Emitted
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `kubevirt_vmi_gpu_utilization_percent` | Gauge | GPU compute utilization (0-100) |
+| `kubevirt_vmi_gpu_memory_utilization_percent` | Gauge | GPU memory controller utilization (0-100) |
+| `kubevirt_vmi_gpu_memory_used_bytes` | Gauge | GPU memory used in bytes |
+| `kubevirt_vmi_gpu_memory_total_bytes` | Gauge | GPU total memory in bytes |
+| `kubevirt_vmi_gpu_temperature_celsius` | Gauge | GPU temperature in degrees Celsius |
+| `kubevirt_vmi_gpu_power_usage_milliwatts` | Gauge | GPU power draw in milliwatts |
+| `kubevirt_vmi_gpu_ecc_errors_single_bit_total` | Gauge | Lifetime corrected ECC error count |
+| `kubevirt_vmi_gpu_ecc_errors_double_bit_total` | Gauge | Lifetime uncorrected ECC error count |
+| `kubevirt_vmi_gpu_encoder_utilization_percent` | Gauge | Video encoder utilization (0-100) |
+| `kubevirt_vmi_gpu_decoder_utilization_percent` | Gauge | Video decoder utilization (0-100) |
+| `kubevirt_vmi_gpu_running_processes` | Gauge | Number of compute processes on the GPU |
+
+All per-device metrics carry labels: `node`, `namespace`, `name`, `gpu_index`, `gpu_uuid`, `gpu_name`, plus VMI labels prefixed with
+`kubernetes_vmi_label_`.
+
+## API Examples
+
+Users must enable VSOCK on the VM spec and have DCGM installed and listening on VSOCK inside the guest:
+
+```yaml
+apiVersion: kubevirt.io/v1
+kind: VirtualMachineInstance
+metadata:
+  name: gpu-workload
+  annotations:
+    kubevirt.io/gpu-metrics-collector: "dcgm-vsock"
+spec:
+  domain:
+    devices:
+      autoattachVSOCK: true
+      gpus:
+        - name: gpu1
+          deviceName: nvidia.com/A100
+```
+
+## Alternatives
+
+### Custom Guest Agent via Virtio-Serial
+
+A standalone Go binary (`gpu-metrics-agent`) runs inside the guest, collects GPU metrics via NVML, and communicates with the host over a
+dedicated virtio-serial channel using a simple text protocol (`GET\n` -> JSON response).
+
+**Rejected because:**
+- Requires maintaining a separate guest agent repository and release lifecycle.
+- Virtio-serial lacks flow control and connection state detection.
+- Windows virtio-serial drivers have known issues with large data transfers (>2MB).
+- DCGM 4.5.0's native VSOCK support makes a custom agent unnecessary.
+
+### Custom Guest Agent via VSOCK
+
+Same as above but using VSOCK instead of virtio-serial as the transport.
+
+**Rejected because:**
+- Still requires maintaining a custom guest agent when DCGM can serve metrics directly.
+
+### Host-Side GPU Metrics (DCGM / Node Exporter)
+
+Collect GPU metrics from the host using NVIDIA DCGM or the GPU node exporter.
+
+**Rejected because:**
+- The NVIDIA GPU Operator does not deploy DCGM exporter on nodes where GPUs are configured for passthrough or vGPU, because the host no
+longer has direct access to the device.
+
+### QEMU Guest Agent guest-file-read
+
+The guest writes GPU metrics to a file, and the host reads it via QGA's `guest-file-open`, `guest-file-read`, and `guest-file-close`
+commands.
+
+**Rejected because:**
+- Each scrape requires three QGA round-trips, adding latency.
+- Reading while writing can produce partial or corrupt data.
+- Enabling `guest-file-read` allows reading arbitrary guest files, requiring careful security analysis.
+
+### QEMU Guest Agent guest-exec
+
+The host uses QGA `guest-exec` to run a metrics collection command inside the guest.
+
+**Rejected because:**
+- `guest-exec` is disabled by default in many distributions (e.g., RHEL/CentOS) due to security concerns.
+- Common SELinux issues blocking executed commands.
+- Output is base64-encoded and must be polled, adding latency.
+
+### Exposing DCGM via regular Kubernetes Networking
+
+Instead of using VSOCK, DCGM inside the guest could listen on a standard network
+interface and be exposed to Prometheus via Kubernetes Services and
+ServiceMonitors.
+
+**Rejected because:**
+
+- **Additional resource overhead**: Each VM would need a dedicated Service and
+ServiceMonitor created and deleted in sync with the VM lifecycle. This scales
+with the number of GPU VMs and adds complexity that does not exist with the
+VSOCK approach.
+
+- **Network dependency**: Requires the guest to have a network interface on the
+pod network. Not all VM use-cases will have usable network configurations.
+SR-IOV only, isolated via Multus, or no network connectivity at all. VSOCK is
+independent of the networking configuration.
+
+- **Security**: With VSOCK, communication is scoped to host-guest only and is
+managed entirely by KubeVirt. virt-handler's Service and ServiceMonitor are the
+only externally reachable endpoints where this data will be exposed, and their
+security is handled by KubeVirt. Exposing DCGM on a network interface shifts
+this responsibility to the user, who must secure DCGM against access from other
+sources.
+
+### Dedicated GetGPUMetrics gRPC RPC
+
+A separate `GetGPUMetrics` RPC on the `Cmd` gRPC service, called by virt-handler alongside `GetDomainStats` and `GetFilesystems`.
+
+**Rejected because:**
+- VEP #143 introduces a unified `GetVMStats` RPC that consolidates all monitoring data into a single call. Adding a separate RPC for GPU
+metrics would work against that consolidation goal.
+- GPU metrics fit naturally as a new data category within `GetVMStats`, following the same pattern as domain stats, guest info, and
+filesystems.
+
+## Scalability
+
+- **Caching**: GPU metrics are cached in virt-launcher with a 3.25-second TTL, so multiple scrapes within that window reuse the same data
+without reconnecting to DCGM.
+- **Unified collection**: GPU metrics are fetched as part of the `GetVMStats` call, adding no additional gRPC round-trips between
+virt-handler and virt-launcher.
+- **No persistent connections**: The host does not maintain long-lived connections to DCGM in the guest.
+- **Scale**: Comparable to the existing domain stats and filesystem stats collection, which already collect per-VMI data as part of
+`GetVMStats`.
+
+## Update/Rollback Compatibility
+
+- VSOCK must be enabled per-VMI by the user. Once the `GPUMetrics` feature gate is implemented, disabling it or rolling back will stop GPU
+metrics collection for new VMIs; existing running VMIs are unaffected.
+- This VEP depends on the `GetVMStats` RPC from VEP #143. If VEP #143 is not yet implemented, GPU metrics cannot be collected.
+- GPU metrics collection is opt-in via the `kubevirt.io/gpu-metrics-collector` annotation. If the annotation is absent, DCGM is not
+installed, or VSOCK is not enabled, virt-launcher returns empty `gpuStats` and no GPU metrics are emitted for that VMI.
+- No API changes beyond the annotation and requiring `autoattachVSOCK: true`; no migration compatibility concerns.
+
+## Functional Testing Approach
+
+- **Unit tests**: Test the GPU stats handler within `GetVMStats` with mock VSOCK responses (success, error, timeout, DCGM not running).
+- **Unit tests**: Test VSOCK connection setup for VMIs with GPU devices and VSOCK enabled vs. absent.
+- **Integration tests**: Start a VMI with a mock DCGM VSOCK listener, verify `kubevirt_vmi_gpu_*` metrics are emitted from the
+observability-controller metrics endpoint.
+
+## Future Work
+
+### Windows Guest Support
+
+Windows VSOCK driver support was added recently in virtio-win build 285. Once the driver matures and DCGM's VSOCK support on Windows is
+validated, Windows guest support can be added as a follow-up.
+
+## Implementation History
+
+## Graduation Requirements
+
+### Alpha
+
+- [ ] Feature gate `GPUMetrics` guards all code changes
+- [ ] `GpuStatsRequest` / `gpuStats` field added to `GetVMStats` proto messages (depends on VEP #143)
+- [ ] virt-launcher connects to guest DCGM via VSOCK and populates `gpuStats` in `GetVMStats` response
+- [ ] observability-controller parses GPU stats and emits Prometheus metrics
+- [ ] Unit tests for GPU stats collection, VSOCK connection, and DCGM protocol handling
+- [ ] Documentation for enabling VSOCK and installing/configuring DCGM in the guest
+
+### Beta
+
+- [ ] Integration tests with mock DCGM VSOCK listener in kubevirtci
+- [ ] Prometheus recording rules and/or alerts for common GPU failure scenarios
+- [ ] DCGM version compatibility validated (minimum version requirements documented)
+
+### GA
+
+- [ ] Stable for at least two releases with no breaking changes