Skip to content

Fix/await completed race condition async#1944

Open
w8mr wants to merge 4 commits into
masterfrom
fix/awaitCompleted-race-condition-async
Open

Fix/await completed race condition async#1944
w8mr wants to merge 4 commits into
masterfrom
fix/awaitCompleted-race-condition-async

Conversation

@w8mr
Copy link
Copy Markdown
Collaborator

@w8mr w8mr commented May 16, 2026

No description provided.

@w8mr w8mr requested a review from Tim-Linschoten May 16, 2026 22:28
@w8mr w8mr force-pushed the fix/awaitCompleted-race-condition-async branch 3 times, most recently from f20ddc0 to 3b8cd8f Compare May 17, 2026 07:08
w8mr added 3 commits May 17, 2026 09:10
Critical bug fix: Counter was not decremented when EventTransitions failed
with non-retry strategies (BlockTransition, etc.), causing awaitCompleted()
to never return even though execution was complete.

Fix: Decrement counter for EventTransition failures that are NOT retrying.
Retrying transitions keep counter unchanged (still in-flight).

Impact:
- Before: awaitCompleted() would hang if EventTransition blocked
- After: awaitCompleted() correctly returns when execution complete

Testing: WebshopRecipeSpec still passes 10/10 times.
Add AwaitCompletedRaceConditionSpec to baker-akka-runtime module that
reliably reproduces the race condition bug and verifies the fix.

The test creates a recipe with two sensory events (OrderPlaced and
PaymentMade) where PaymentMade triggers a ReserveItems interaction that
produces an ItemsReserved output event. The race occurs when awaitCompleted()
returns before the ItemsReserved EventTransition completes.

Test includes:
- Single test case that checks for reservedItems ingredient after awaitCompleted
- Stress test with 20 iterations to reliably catch the race condition

Test results:
- On buggy commit (eecacde): FAILS ~80% of the time (stress test)
- On fix branch: PASSES 100% consistently (all 20 iterations)

Also consolidate multiple analysis documents into single comprehensive
documentation file and archive detailed analysis documents.

docs/awaitCompleted-race-condition-fix.md:
  - Complete description of the bug, fix, and test
  - Root cause analysis
  - Implementation details
  - Verification instructions
Fixes the race condition in awaitCompleted() by tracking in-flight event
transitions without blocking actor threads.

Changes:
1. Added inFlightEventTransitions: Int field to Instance state
2. Increment counter when event transitions start (in step function)
3. Decrement counter when event transitions complete (in event sourcing)
4. Updated hasCompletedExecution to check counter == 0

Benefits over synchronous solution:
- Maintains async execution (no actor blocking)
- Better performance under load
- Follows Akka best practices
- Minimal code changes (~33 lines)

Trade-offs:
- Counter is not persisted (resets on crash)
- Mitigation: Counter resets on recovery, in-flight ops lost anyway

Testing: WebshopRecipeSpec passes 10/10 times (100% success rate).
@w8mr w8mr force-pushed the fix/awaitCompleted-race-condition-async branch from 3b8cd8f to 8c144e9 Compare May 17, 2026 07:11
The test 'should wait by awaitCompleted for a long-running process to finish'
was using '_ =' instead of '_ <-' when firing the sensory event, causing it
to not wait for the event to be received.

This created a race condition where awaitCompleted could be called before
the interaction even started, causing it to return immediately (duration = 0ms)
instead of waiting for the interaction to complete (expected > 500ms).

Changed to '_ <-' to properly await the event being received before timing
the awaitCompleted call.

Before fix: Test failed with '0 milliseconds was not greater than 500 milliseconds'
After fix: Test passes consistently
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant