Skip to content

feat(events): add stale event review reminders #154

@alexandervazquez98

Description

@alexandervazquez98

Pre-flight Checks

  • I have searched existing issues and this is not a duplicate
  • I understand this issue needs status:approved before a PR can be opened

Problem Description

Operators need visibility into old or stale events that may require human review, especially collection-failure events that can remain open for a long time after metrics or relationships change.

The business rule is that events should not be silently closed by the system without operator/monitorist review. However, if old events are never surfaced again, they can be forgotten and continue aging indefinitely.

This came up while planning #152: SNMP no-response collection failures should be fixed going forward, but historical stale/orphan events should not be auto-cleaned without human intervention.

Proposed Solution

Add an operator-facing reminder/recommendation mechanism for stale events.

Possible behavior:

  • Detect candidate stale events using safe criteria, for example:
    • old OPEN or ACK events beyond a configurable age;
    • collection-failure events whose metric/CI relationship no longer exists;
    • events not refreshed for a long time;
    • initially focused on SNMP collection-failure events, later extensible to other event families.
  • Show them in a notifications/recommendations section instead of closing them automatically.
  • Include enough context for the operator to decide:
    • event id/title/severity/status;
    • CI and metric, when still resolvable;
    • age, last_seen, and reason for recommendation;
    • suggested action such as review/close/recover if applicable.
  • Require explicit operator action for any closure/recovery.
  • Record audit metadata when an operator acts on a recommendation.
  • Keep the recommendation process dry-run/read-only by default.

Affected Area

Other

Alternatives Considered

  • Auto-clean stale/orphan events during fix(polling): treat SNMP no-response events as warnings #152. Rejected because it conflicts with the business rule that every event should pass through an operator/monitorist.
  • Only document a manual database query. This is safer than auto-cleaning, but weak because it does not keep stale events visible in normal operations.

Additional Context

Related to #152, but should be implemented separately so the SNMP no-response severity fix remains focused.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus:approvedIssue approved for implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions