Add support for jitter so that we can ensure evenly distributed tasks don't cause all workers to restart at same time #570
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds jitter support to the
max_async_tasksparameter to prevent synchronized worker restarts. When multiple workers are configured with the samemax_async_taskslimit and receive evenly distributed tasks, they tend to hit their limits simultaneously, causing cascading restarts that can impact system stability.Changes
max_async_tasks_jitterparameter: Added across the receiver, broker, and CLI interfaces to configure randomized jittermax_async_taskslimit when creating the semaphore--max-async-tasks-jitterflag to worker command for runtime configurationTechnical Details
The implementation distributes worker restart times by randomizing each worker's actual task limit. For example, with
max_async_tasks=100andmax_async_tasks_jitter=10, workers will have limits ranging from 100 to 110. This prevents the thundering herd problem where synchronized restarts cause all workers to compete for the same resources simultaneously, leading to more graceful degradation under load.