You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched existing issues and this is not a duplicate
I understand this issue needs status:approved before a PR can be opened
Problem Description
While judging #152, the new event discriminator model exposed a broader legacy-compatibility problem: historical events do not consistently have discriminator fields such as event_type, failure_family, or source_protocol.
The #152 fix adds discriminator-aware behavior for future polling rounds, but old threshold, availability, and collection-failure events may still be ambiguous. This can create edge cases during upgrade, for example:
continuing threshold or availability breaches may create a new event instead of reusing a legacy event with event_type = NULL;
generic collection failures may match legacy Metric Collection Failed: events that were actually SNMP no-response events;
free-text legacy messages are fragile as the only compatibility signal;
protocol/failure-family boundaries can be unclear when source_protocol and failure_family are missing.
This is related to #152 but should not be folded into that bugfix because it is a broader data normalization/migration concern.
Proposed Solution
Add a dedicated legacy event normalization/reconciliation path.
Possible approach:
Define canonical event discriminator rules for existing events:
THRESHOLD_BREACH
AVAILABILITY
COLLECTION_FAILURE
protocol/failure family where inferable.
Provide a dry-run report that classifies current events and identifies ambiguous cases.
Provide an explicit, audited migration/reconciliation command or admin workflow to backfill safe discriminator fields.
Avoid silently closing/recovering events; this issue is about classification/backfill, not operator closure.
Add tests for legacy event matching so future lifecycle code does not duplicate or overwrite old events incorrectly.
Affected Area
Other
Alternatives Considered
Keep extending fix(polling): treat SNMP no-response events as warnings #152 to handle every legacy ambiguity. Rejected because the diff is already large and the remaining findings are migration/compatibility concerns rather than the core no-response warning bug.
Rely permanently on message STARTS WITH 'Metric Collection Failed:'. Rejected because free-text matching is too fragile for long-term event lifecycle semantics.
Additional Context
Created after Judgment Day Round 4 on #152. Round 4 found plausible but unconfirmed legacy compatibility warnings around null event_type, missing source_protocol, generic failure_family, and legacy message-prefix matching.
Pre-flight Checks
Problem Description
While judging #152, the new event discriminator model exposed a broader legacy-compatibility problem: historical events do not consistently have discriminator fields such as
event_type,failure_family, orsource_protocol.The #152 fix adds discriminator-aware behavior for future polling rounds, but old threshold, availability, and collection-failure events may still be ambiguous. This can create edge cases during upgrade, for example:
event_type = NULL;Metric Collection Failed:events that were actually SNMP no-response events;source_protocolandfailure_familyare missing.This is related to #152 but should not be folded into that bugfix because it is a broader data normalization/migration concern.
Proposed Solution
Add a dedicated legacy event normalization/reconciliation path.
Possible approach:
THRESHOLD_BREACHAVAILABILITYCOLLECTION_FAILUREAffected Area
Other
Alternatives Considered
message STARTS WITH 'Metric Collection Failed:'. Rejected because free-text matching is too fragile for long-term event lifecycle semantics.Additional Context
Created after Judgment Day Round 4 on #152. Round 4 found plausible but unconfirmed legacy compatibility warnings around null
event_type, missingsource_protocol, genericfailure_family, and legacy message-prefix matching.