Skip to content

feat(events): normalize legacy event discriminators #155

@alexandervazquez98

Description

@alexandervazquez98

Pre-flight Checks

  • I have searched existing issues and this is not a duplicate
  • I understand this issue needs status:approved before a PR can be opened

Problem Description

While judging #152, the new event discriminator model exposed a broader legacy-compatibility problem: historical events do not consistently have discriminator fields such as event_type, failure_family, or source_protocol.

The #152 fix adds discriminator-aware behavior for future polling rounds, but old threshold, availability, and collection-failure events may still be ambiguous. This can create edge cases during upgrade, for example:

  • continuing threshold or availability breaches may create a new event instead of reusing a legacy event with event_type = NULL;
  • generic collection failures may match legacy Metric Collection Failed: events that were actually SNMP no-response events;
  • free-text legacy messages are fragile as the only compatibility signal;
  • protocol/failure-family boundaries can be unclear when source_protocol and failure_family are missing.

This is related to #152 but should not be folded into that bugfix because it is a broader data normalization/migration concern.

Proposed Solution

Add a dedicated legacy event normalization/reconciliation path.

Possible approach:

  • Define canonical event discriminator rules for existing events:
    • THRESHOLD_BREACH
    • AVAILABILITY
    • COLLECTION_FAILURE
    • protocol/failure family where inferable.
  • Provide a dry-run report that classifies current events and identifies ambiguous cases.
  • Provide an explicit, audited migration/reconciliation command or admin workflow to backfill safe discriminator fields.
  • Avoid silently closing/recovering events; this issue is about classification/backfill, not operator closure.
  • Preserve compatibility with feat(events): add stale event review reminders #154, where stale event reminders/recommendations should surface old events for operator review.
  • Add tests for legacy event matching so future lifecycle code does not duplicate or overwrite old events incorrectly.

Affected Area

Other

Alternatives Considered

  • Keep extending fix(polling): treat SNMP no-response events as warnings #152 to handle every legacy ambiguity. Rejected because the diff is already large and the remaining findings are migration/compatibility concerns rather than the core no-response warning bug.
  • Rely permanently on message STARTS WITH 'Metric Collection Failed:'. Rejected because free-text matching is too fragile for long-term event lifecycle semantics.

Additional Context

Created after Judgment Day Round 4 on #152. Round 4 found plausible but unconfirmed legacy compatibility warnings around null event_type, missing source_protocol, generic failure_family, and legacy message-prefix matching.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus:approvedIssue approved for implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions