Fix/await completed race condition async#1944
Open
w8mr wants to merge 4 commits into
Open
Conversation
f20ddc0 to
3b8cd8f
Compare
Critical bug fix: Counter was not decremented when EventTransitions failed with non-retry strategies (BlockTransition, etc.), causing awaitCompleted() to never return even though execution was complete. Fix: Decrement counter for EventTransition failures that are NOT retrying. Retrying transitions keep counter unchanged (still in-flight). Impact: - Before: awaitCompleted() would hang if EventTransition blocked - After: awaitCompleted() correctly returns when execution complete Testing: WebshopRecipeSpec still passes 10/10 times.
Add AwaitCompletedRaceConditionSpec to baker-akka-runtime module that reliably reproduces the race condition bug and verifies the fix. The test creates a recipe with two sensory events (OrderPlaced and PaymentMade) where PaymentMade triggers a ReserveItems interaction that produces an ItemsReserved output event. The race occurs when awaitCompleted() returns before the ItemsReserved EventTransition completes. Test includes: - Single test case that checks for reservedItems ingredient after awaitCompleted - Stress test with 20 iterations to reliably catch the race condition Test results: - On buggy commit (eecacde): FAILS ~80% of the time (stress test) - On fix branch: PASSES 100% consistently (all 20 iterations) Also consolidate multiple analysis documents into single comprehensive documentation file and archive detailed analysis documents. docs/awaitCompleted-race-condition-fix.md: - Complete description of the bug, fix, and test - Root cause analysis - Implementation details - Verification instructions
Fixes the race condition in awaitCompleted() by tracking in-flight event transitions without blocking actor threads. Changes: 1. Added inFlightEventTransitions: Int field to Instance state 2. Increment counter when event transitions start (in step function) 3. Decrement counter when event transitions complete (in event sourcing) 4. Updated hasCompletedExecution to check counter == 0 Benefits over synchronous solution: - Maintains async execution (no actor blocking) - Better performance under load - Follows Akka best practices - Minimal code changes (~33 lines) Trade-offs: - Counter is not persisted (resets on crash) - Mitigation: Counter resets on recovery, in-flight ops lost anyway Testing: WebshopRecipeSpec passes 10/10 times (100% success rate).
3b8cd8f to
8c144e9
Compare
The test 'should wait by awaitCompleted for a long-running process to finish' was using '_ =' instead of '_ <-' when firing the sensory event, causing it to not wait for the event to be received. This created a race condition where awaitCompleted could be called before the interaction even started, causing it to return immediately (duration = 0ms) instead of waiting for the interaction to complete (expected > 500ms). Changed to '_ <-' to properly await the event being received before timing the awaitCompleted call. Before fix: Test failed with '0 milliseconds was not greater than 500 milliseconds' After fix: Test passes consistently
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.