The KubeVirt metrics should align with the Kubernetes metrics names.
The KubeVirt Users should have the same experience when searching for a node, container, pod and virtual machine metrics.
Naming requirements:
-
Check if a similar Kubernetes metric, for node, container or pod, exists and try to align to it.
-
KubeVirt metrics prefixes:
- Running VM metrics should have a
kubevirt_vmi_prefix - HCO operator metrics should have a
kubevirt_hco_prefix - Network operator metrics should have a
kubevirt_network_prefix - Storage operator metrics should have a
kubevirt_cdi_prefix - SSP operator metrics should have a
kubevirt_ssp_prefix - HPP Operator metrics should have a
kubevirt_hpp_prefix
For Example, see the following Kubernetes network metrics:
- node_network_receive_packets_total
- node_network_transmit_packets_total
- container_network_receive_packets_total
- container_network_transmit_packets_total
The KubeVirt metrics for vmi should be:
- kubevirt_vmi_network_receive_packets_total
- Kubevirt_vmi_network_transmit_packets_total
- Running VM metrics should have a
-
Metric
Helpmessage MUST be verbose, since it is being propagated to the metrics.md file, when runningmake-generate.
Use recording rules when doing calculations or when using the same query for other alerts or dashboards, instead of repeating the same query in many places.
The Prometheus recording rules appear in Prometheus as metrics.
In order to easily identify the KubeVirt recording rules, they should follow the same naming conventions as the metrics.
When creating a KubeVirt alert rule, please follow the OpenShift Alerting Consistency Guide.
In addition to the OpeShift Style Guide the KubeVirt alerts MUST include:
-
kubernetes_operator_part_oflabel indicating the operator name. Value should be set tokubevirt. -
kubernetes_operator_componentlabel indicating the value of the sub operator name. -
operator_health_impactlabel indicating how the alert impacts the operator's functionality. This label differs fromseverity, asseverityindicates the ability to deliver a service for the cluster as a whole, whereoperator_health_impactindicates the impact of the issue on the operator's functionality. The loss of operator's functionality doesn't necessarily mean that the ability to deliver services for the cluster as a whole is affected. For example, an alert may have awarningseverity, when talking about the impact on the cluster health, but have acriticalimpact on the operator's health. Also, when an alert is tied to a specific workload it can have awarningseverity, but no impact on the operator's health.Valid values for this labels are:
critical- For alerts that indicate that there is a loss of operator's functionality and part of the operator might not work as expected.warning- For alerts that indicate that there is a risk for the operator's functionality and soon parts of the operator might not work as expected.none- For alerts that don't indicate that there is a loss of operator's functionality and it is working as expected.
Optional labels:
prioritylabel indicating the alert's level of importance and the order in which it should be fixed.
- Valid priorities are:
high,medium, orlow. The higher the priority, the sooner the alert should be resolved. - If the alert doesn't include a
prioritylabel, we can assume it is amediumpriority alert.
Note: KubeVirt alert runbooks are saved in kubevirt/monitoring repository.