Fix flaky heartbeat and view change tests#4
Open
chgeuer wants to merge 1 commit intoityonemo:mainfrom
Open
Conversation
ViewChangeTest: Replaced live 3-node cluster with isolated single- replica setup using dummy peer processes. The original test asserted intermediate vote state (view_change_votes map) which gets cleared when the view change completes — a TOCTOU race. Now asserts on DoViewChange telemetry event instead. HeartbeatTest: Three fixes: - Used deterministically-sorted node names (hbt_a/b/c) so the correct primary is identified. The original names (primary_xxx/backup1_xxx) sorted incorrectly under :erlang.term_to_binary/1, causing the test to stop the wrong node. - Register a dummy process under the dead primary's name after stopping it, so the backups' broadcast in start_manual_view_change doesn't crash on send/2 to an unregistered atom. - Attach telemetry listener before stopping the primary to avoid a window where the timeout event fires before anyone is listening.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3
Changes
ViewChangeTest (line 130)
Replaced the live 3-node cluster with an isolated single-replica setup using dummy peer processes. The original test asserted intermediate
view_change_votesstate which gets cleared when the view change completes — a TOCTOU race. Now asserts on the[:view_change, :do_view_change, :sent]telemetry event instead, which proves majority was reached without inspecting transient state.HeartbeatTest (line 109)
Three fixes:
Deterministic node names: Used
hbt_a_/hbt_b_/hbt_c_prefixes that sort correctly under:erlang.term_to_binary/1, so the test identifies and stops the actual primary for view 0.Dummy process for dead primary: After stopping the primary, registers a dummy process under its name so the backups'
start_manual_view_changebroadcast viasend/2does not crash on an unregistered atom.Telemetry listener ordering: Attaches the
[:timer, :primary_timeout]listener before stopping the primary, eliminating the window where the event fires with no listener.Verification
Both tests pass 5/5 consecutive runs (previously failed ~50-80% of the time). Full suite: 95 tests, 0 failures.