feat: add csi-volume-device-exporter by sradco · Pull Request #1 · openshift-virtualization/csi-volume-device-exporter

sradco · 2026-05-04T07:35:54Z

Summary

A Prometheus exporter DaemonSet that maps CSI volumes to their underlying node block devices, enabling correlation of storage path health metrics with Kubernetes workloads.

When a storage path degrades (e.g., a Fibre Channel link drops or an NVMe-oF controller dies), this exporter - combined with new node_exporter collectors — enables alerts that identify which PVs and VMs are affected.

Jira: #https://redhat.atlassian.net/browse/CNV-66837

Components

Discovery engine - Reads kubelet vol_data.json + /proc/1/mountinfo for universal CSI driver coverage. Also supports driver-specific JSON (Trident, HPE).
Prometheus metrics - csi_volume_node_device_info maps each CSI volume to its block device, plus self-monitoring metrics.
Alerts:
- CSIVolumeMultipathDegraded (warning) - PV-backed multipath has non-active paths
- CSIVolumeMultipathLost (critical) - All multipath paths down, I/O failing
- CSIVolumeNVMeSubsystemDegraded (warning) - NVMe-oF subsystem has non-live controllers
- CSIVolumeNVMeSubsystemLost (critical) - All NVMe-oF controllers dead
- CSIVolumeDeviceExporterDown (warning) - Exporter not scraped
Alert unit tests - promtool-based tests in hack/prom-rule-ci/.
Deployment manifests - DaemonSet, PodMonitor, SCC for OpenShift.
Runbooks - Actionable runbooks for each alert.
CI workflow - GitHub Actions verify (test, lint, alert tests).

Security model

Non-privileged container (no capabilities, read-only rootfs)
Read-only hostPath mounts (/var/lib/kubelet, /proc, /sys)
No Kubernetes API access
UBI 9 Minimal base image, static binary

Related PRs

blockdevice: add DMMultipathDevices for DM-multipath sysfs parsing prometheus/procfs#796 (DM-multipath sysfs parsing)
sysfs: add NVMeSubsystemClass for NVMe-oF subsystem parsing prometheus/procfs#797 (NVMe-oF subsystem sysfs parsing with namespace discovery)
collector: add dmmultipath collector for DM-multipath sysfs metrics prometheus/node_exporter#3581 (dmmultipath collector)
collector: add nvmesubsystem collector for NVMe-oF path health prometheus/node_exporter#3579 (nvmesubsystem collector with namespace_info)

Test plan

Unit tests pass (make test)
Alert unit tests pass (make test-alerts) — covers all 5 alerts
Cluster validation on OpenShift 4.21 (Ceph RBD + Cinder)
PromQL three-way join validated with synthetic sysfs fixtures
Lint (make lint)

A Prometheus exporter DaemonSet that maps CSI volumes to their underlying node block devices, enabling correlation of storage path health metrics with Kubernetes workloads. Components: - Discovery engine (kubelet vol_data.json + mountinfo, Trident, HPE) - Prometheus metrics (csi_volume_node_device_info + self-monitoring) - Alerts (CSIVolumeMultipathDegraded, CSIVolumeDeviceExporterDown) - Alert unit tests (promtool) - Deployment manifests (DaemonSet, PodMonitor, SCC) - CI workflow (verify: test, lint, alert tests) Signed-off-by: Shirly Radco <sradco@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

awels

So what about discovering other vendors like Dell, portworx, etc? I am assuming they all have their own JSON format. That will be annoying to maintain.

awels · 2026-05-04T12:36:17Z

@@ -0,0 +1,28 @@
+apiVersion: security.openshift.io/v1


Do we really need a special SCC? Can't we use any of the privileged ones that already exist?

Commented about it in the code now

I considered the built-in SCCs.
The closest is node-exporter, but it's owned by the cluster-monitoring operator - binding our ServiceAccount to it creates a hidden dependency on another operator's internals that could break silently if they change it. A custom least-privilege SCC (as done by virt-handler and other CNV DaemonSets) is the correct pattern here. The justification is now documented in a comment in deploy/scc.yaml

@awels do you agree?

awels · 2026-05-04T12:50:29Z

+1. Check the multipath device state:
+
+   ```bash
+   kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \


This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?

awels · 2026-05-04T12:50:48Z

+1. Check the overall multipath state on the affected node:
+
+   ```bash
+   kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \


This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?

awels · 2026-05-04T12:51:04Z

+1. Check the NVMe subsystem controller states:
+
+   ```bash
+   kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \


This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?

awels · 2026-05-04T12:51:10Z

+1. Check NVMe subsystem state:
+
+   ```bash
+   kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \


This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?

akalenyu

Really preliminary pass, interesting stuff

When a storage path degrades (e.g., a Fibre Channel link drops or an NVMe-oF controller dies), this exporter — combined with new node_exporter collectors — enables alerts that identify which PVs and VMs are affected.

Have you considered leveraging kubevirt's PausedIOError condition to possibly achieve similar alerts? https://kubevirt.io/user-guide/storage/disks_and_volumes/#error-policy

sradco · 2026-05-05T15:20:56Z

So what about discovering other vendors like Dell, portworx, etc? I am assuming they all have their own JSON format. That will be annoying to maintain.

The exporter has a universal discovery path that reads kubelet's own CSI metadata (vol_data.json + mountinfo) — this works for every CSI driver without any driver-specific code. The Trident/HPE modules are optional enrichment for cases where those drivers expose extra metadata in their own JSON. We don't need to add per-vendor code for basic functionality.

sradco · 2026-05-05T15:29:24Z

Really preliminary pass, interesting stuff

When a storage path degrades (e.g., a Fibre Channel link drops or an NVMe-oF controller dies), this exporter — combined with new node_exporter collectors — enables alerts that identify which PVs and VMs are affected.

Have you considered leveraging kubevirt's PausedIOError condition to possibly achieve similar alerts? https://kubevirt.io/user-guide/storage/disks_and_volumes/#error-policy

+1. PausedIOError fires after I/O has already failed and the VM is paused: it's a "damage done" reactive signal. Our multipath/NVMe alerts fire on path degradation, when some paths are unhealthy but I/O may still be working via surviving paths - that's the early warning window to act before workloads are impacted.

Additionally, our alerts cover any PV-backed workload (not just VMs) and identify the root cause (which FC link, NVMe controller, or fabric segment failed), which PausedIOError doesn't provide.

sradco · 2026-05-18T09:29:43Z

@awels , @akalenyu , Do you approve this PR?

akalenyu · 2026-05-18T10:04:37Z

@awels , @akalenyu , Do you approve this PR?

Not just yet, still need some time. This is not a trivial path to take, so, while the impl. may be sound, I am missing some back and forth on the approach itself (theres some security concerns as well, with hostPID for instance).

Meanwhile, I noticed this PR is linked to a csi-addons change, which sounds interesting. Could you elaborate?

akalenyu · 2026-05-18T10:16:21Z

@awels , @akalenyu , Do you approve this PR?

Not just yet, still need some time. This is not a trivial path to take, so, while the impl. may be sound, I am missing some back and forth on the approach itself (theres some security concerns as well, with hostPID for instance).

Meanwhile, I noticed this PR is linked to a csi-addons change, which sounds interesting. Could you elaborate?

For example, what if we take the csi-addons path, and have each driver implement the RPC and tell us this information?
Here's how it looks like in practice https://github.com/ceph/ceph-csi/blob/a5474f81497297aa9cd341e12529923b1349084d/internal/csi-addons/rbd/reclaimspace.go#L120-L123

sradco · 2026-05-20T07:34:32Z

@awels , @akalenyu , Do you approve this PR?

Not just yet, still need some time. This is not a trivial path to take, so, while the impl. may be sound, I am missing some back and forth on the approach itself (theres some security concerns as well, with hostPID for instance).

Thank you! This is a very good point.
I will remove hostPID: true from daemonset.yaml, replace the host-proc volume from hostPath: /proc to hostPath: /proc/1/mountinfo, update the volume mount path accordingly and update ParseMountInfo to accept the direct file path instead of a proc directory.

Meanwhile, I noticed this PR is linked to a csi-addons change, which sounds interesting. Could you elaborate?

@akalenyu, On the csi-addons path, I explored this (hence the linked closed PR).
The conclusion was that it's an architectural mismatch. kubernetes-csi-addons is a CSI API extension project, where every feature adds a gRPC operation that CSI drivers implement via a sidecar.
The exporter has no CRDs, no gRPC, no sidecar, it's a standalone DaemonSet reading host files.
It would establish a completely new paradigm in that project, which is not appropriate without community design discussion first.

More fundamentally: even if each CSI driver exposed block device names via an RPC, that only solves half the problem. The actual path health state comes from node_exporter's multipath/NVMe metrics and the exporter's job is to provide the join key (CSI volume → block device) to correlate those metrics with Kubernetes workloads.

Users today are blind: we have new multipath/NVMe path health metrics in node_exporter but no way to tie them to impacted PVs or VMs. This exporter is the the bridge that works for all drivers immediately, with no driver changes required.

Add three new alerts: - CSIVolumeMultipathLost (critical): all paths to a multipath device are down, I/O is likely failing - CSIVolumeNVMeSubsystemDegraded (warning): NVMe-oF subsystem has at least one non-live controller path - CSIVolumeNVMeSubsystemLost (critical): all NVMe-oF controller paths are dead The NVMe alerts use node_nvmesubsystem_namespace_info to precisely map NVMe namespace devices (nvme0n1) to their subsystems, enabling correct correlation even on nodes with multiple NVMe subsystems. Includes runbooks and promtool unit tests for all new alerts. Signed-off-by: Shirly Radco <sradco@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

sradco · 2026-05-20T10:09:03Z

@akalenyu @awels I updated the code based on the review. Please let me know what you think.
It is a critical priority item

sradco · 2026-05-20T10:10:19Z

Hi @jan--f , @jsafrane , I would appreciate your review of this pr.

sradco force-pushed the initial-exporter branch 2 times, most recently from 9e21efa to 3893e1d Compare May 4, 2026 07:54

sradco force-pushed the initial-exporter branch 4 times, most recently from fdaac67 to 6e760c0 Compare May 4, 2026 08:58

awels reviewed May 4, 2026

View reviewed changes

akalenyu reviewed May 5, 2026

View reviewed changes

sradco force-pushed the initial-exporter branch from 6e760c0 to 50af95b Compare May 5, 2026 15:36

sradco mentioned this pull request May 14, 2026

Add csi-volume-device-exporter: Prometheus exporter for CSI volume-to-device mapping csi-addons/kubernetes-csi-addons#1039

Closed

7 tasks

sradco force-pushed the initial-exporter branch from 50af95b to fdb99da Compare May 20, 2026 10:02

Conversation

sradco commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Components

Security model

Related PRs

Test plan

Uh oh!

awels left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akalenyu left a comment

Choose a reason for hiding this comment

Uh oh!

sradco commented May 5, 2026

Uh oh!

sradco commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sradco commented May 18, 2026

Uh oh!

akalenyu commented May 18, 2026

Uh oh!

akalenyu commented May 18, 2026

Uh oh!

sradco commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sradco commented May 20, 2026

Uh oh!

sradco commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sradco commented May 4, 2026 •

edited

Loading

sradco commented May 5, 2026 •

edited

Loading

sradco commented May 20, 2026 •

edited

Loading