feat: add csi-volume-device-exporter#1
Conversation
9e21efa to
3893e1d
Compare
A Prometheus exporter DaemonSet that maps CSI volumes to their underlying node block devices, enabling correlation of storage path health metrics with Kubernetes workloads. Components: - Discovery engine (kubelet vol_data.json + mountinfo, Trident, HPE) - Prometheus metrics (csi_volume_node_device_info + self-monitoring) - Alerts (CSIVolumeMultipathDegraded, CSIVolumeDeviceExporterDown) - Alert unit tests (promtool) - Deployment manifests (DaemonSet, PodMonitor, SCC) - CI workflow (verify: test, lint, alert tests) Signed-off-by: Shirly Radco <sradco@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
fdaac67 to
6e760c0
Compare
awels
left a comment
There was a problem hiding this comment.
So what about discovering other vendors like Dell, portworx, etc? I am assuming they all have their own JSON format. That will be annoying to maintain.
| @@ -0,0 +1,28 @@ | |||
| apiVersion: security.openshift.io/v1 | |||
There was a problem hiding this comment.
Do we really need a special SCC? Can't we use any of the privileged ones that already exist?
There was a problem hiding this comment.
Commented about it in the code now
There was a problem hiding this comment.
I considered the built-in SCCs.
The closest is node-exporter, but it's owned by the cluster-monitoring operator - binding our ServiceAccount to it creates a hidden dependency on another operator's internals that could break silently if they change it. A custom least-privilege SCC (as done by virt-handler and other CNV DaemonSets) is the correct pattern here. The justification is now documented in a comment in deploy/scc.yaml
| 1. Check the multipath device state: | ||
|
|
||
| ```bash | ||
| kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \ |
There was a problem hiding this comment.
This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?
| 1. Check the overall multipath state on the affected node: | ||
|
|
||
| ```bash | ||
| kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \ |
There was a problem hiding this comment.
This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?
| 1. Check the NVMe subsystem controller states: | ||
|
|
||
| ```bash | ||
| kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \ |
There was a problem hiding this comment.
This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?
| 1. Check NVMe subsystem state: | ||
|
|
||
| ```bash | ||
| kubectl debug node/$NODE -it --image=registry.access.redhat.com/ubi9/ubi-minimal \ |
There was a problem hiding this comment.
This is kind of a mix of U/S and D/S commands and images. I don't think registry.access.redhat.com is publicly visible, maybe point to quay.io instead?
akalenyu
left a comment
There was a problem hiding this comment.
Really preliminary pass, interesting stuff
When a storage path degrades (e.g., a Fibre Channel link drops or an NVMe-oF controller dies), this exporter — combined with new node_exporter collectors — enables alerts that identify which PVs and VMs are affected.
Have you considered leveraging kubevirt's PausedIOError condition to possibly achieve similar alerts? https://kubevirt.io/user-guide/storage/disks_and_volumes/#error-policy
The exporter has a universal discovery path that reads kubelet's own CSI metadata (vol_data.json + mountinfo) — this works for every CSI driver without any driver-specific code. The Trident/HPE modules are optional enrichment for cases where those drivers expose extra metadata in their own JSON. We don't need to add per-vendor code for basic functionality. |
+1. Additionally, our alerts cover any PV-backed workload (not just VMs) and identify the root cause (which FC link, NVMe controller, or fabric segment failed), which |
|
Not just yet, still need some time. This is not a trivial path to take, so, while the impl. may be sound, I am missing some back and forth on the approach itself (theres some security concerns as well, with hostPID for instance). Meanwhile, I noticed this PR is linked to a csi-addons change, which sounds interesting. Could you elaborate? |
For example, what if we take the csi-addons path, and have each driver implement the RPC and tell us this information? |
Thank you! This is a very good point.
@akalenyu, On the csi-addons path, I explored this (hence the linked closed PR). More fundamentally: even if each CSI driver exposed block device names via an RPC, that only solves half the problem. The actual path health state comes from Users today are blind: we have new multipath/NVMe path health metrics in |
Add three new alerts: - CSIVolumeMultipathLost (critical): all paths to a multipath device are down, I/O is likely failing - CSIVolumeNVMeSubsystemDegraded (warning): NVMe-oF subsystem has at least one non-live controller path - CSIVolumeNVMeSubsystemLost (critical): all NVMe-oF controller paths are dead The NVMe alerts use node_nvmesubsystem_namespace_info to precisely map NVMe namespace devices (nvme0n1) to their subsystems, enabling correct correlation even on nodes with multiple NVMe subsystems. Includes runbooks and promtool unit tests for all new alerts. Signed-off-by: Shirly Radco <sradco@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
A Prometheus exporter DaemonSet that maps CSI volumes to their underlying node block devices, enabling correlation of storage path health metrics with Kubernetes workloads.
When a storage path degrades (e.g., a Fibre Channel link drops or an NVMe-oF controller dies), this exporter - combined with new node_exporter collectors — enables alerts that identify which PVs and VMs are affected.
Jira: #https://redhat.atlassian.net/browse/CNV-66837
Components
vol_data.json+/proc/1/mountinfofor universal CSI driver coverage. Also supports driver-specific JSON (Trident, HPE).csi_volume_node_device_infomaps each CSI volume to its block device, plus self-monitoring metrics.CSIVolumeMultipathDegraded(warning) - PV-backed multipath has non-active pathsCSIVolumeMultipathLost(critical) - All multipath paths down, I/O failingCSIVolumeNVMeSubsystemDegraded(warning) - NVMe-oF subsystem has non-live controllersCSIVolumeNVMeSubsystemLost(critical) - All NVMe-oF controllers deadCSIVolumeDeviceExporterDown(warning) - Exporter not scrapedhack/prom-rule-ci/.Security model
/var/lib/kubelet,/proc,/sys)Related PRs
Test plan
make test)make test-alerts) — covers all 5 alertsmake lint)