[BUG] Consul Is Degraded Alert and Consul Is Down Alert are not working correctly

**Describe the bug**

**ConsulIsDegradedAlarm**
Query 
```
sum(kube_pod_status_ready{exported_namespace="{{ .Release.Namespace }}",exported_pod=~"{{ template "consul.fullname" . }}-server-[0-9]+",condition="false"}) / sum(kube_pod_status_ready{exported_namespace="{{ .Release.Namespace }}",exported_pod=~"{{ template "consul.fullname" . }}-server-[0-9]+"}) > 0

```
https://github.com/Netcracker/qubership-consul/blob/main/charts/helm/consul-service/templates/prometheus-rules.yaml#L33

ConsulIsDegradedAlarm alert works incorrectly - it does not always trigger and triggers for too short a duration. Problems with Consul may go unnoticed.

Specific cases of incorrect behavior:

1. When deleting one of the Consul pods:
   - If Consul quickly recovers the pod, the alert may not trigger at all.
   - If pod is deleted, the alert only triggers for the time the pod is in Terminating status.

2. When there are errors in a pod:
   - If a pod restarts due to an error, the alert may operate unstably, triggering only briefly during the restart.

3. When scaling the StatefulSet to 2:
   - If the number of Consul pods is reduced to 2, the alert will only trigger during the termination of the third pod.
   - After the third pod is deleted, the calculation changes: 0 unready pods / 2 total pods = 0, and the alert stops triggering, even though the Consul cluster is in a degraded state (Consul requires 3 servers).


**ConsulIsDownAlarm**
Query 
```
sum(kube_pod_status_ready{exported_namespace="{{ .Release.Namespace }}",exported_pod=~"{{ template "consul.fullname" . }}-server-[0-9]+",condition="false"}) / sum(kube_pod_status_ready{exported_namespace="{{ .Release.Namespace }}",exported_pod=~"{{ template "consul.fullname" . }}-server-[0-9]+"}) == 1
```

1. When deleting one of the Consul pods:
   - In most cases, Consul quickly recovers pods

2. When scaling the StatefulSet to 0:
   - it is triggered only if EACH of the 3 pods is not in the Running status, but still exists. 
   - If you set the statefullset to 0, the ConsulIsDownAlarm triggers for a short time.
   - After 3 pods do not exist, this alert will become inactive, and ConsulDoesNotExistAlarm will be triggered.

**Expected behavior**
Alerts work stably and correctly show the consul's problems

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment:**
 - Application Version: main
 - K8S Version: 

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Consul Is Degraded Alert and Consul Is Down Alert are not working correctly #165

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Consul Is Degraded Alert and Consul Is Down Alert are not working correctly #165

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions