Skip to content

Conversation

@adiberk
Copy link

@adiberk adiberk commented Dec 23, 2025

Summary

This PR adds jitter support to the max_async_tasks parameter to prevent synchronized worker restarts. When multiple workers are configured with the same max_async_tasks limit and receive evenly distributed tasks, they tend to hit their limits simultaneously, causing cascading restarts that can impact system stability.

Changes

  • New max_async_tasks_jitter parameter: Added across the receiver, broker, and CLI interfaces to configure randomized jitter
  • Randomized semaphore limits: The receiver now applies a random value (0 to jitter value) to the base max_async_tasks limit when creating the semaphore
  • CLI argument: Added --max-async-tasks-jitter flag to worker command for runtime configuration

Technical Details

The implementation distributes worker restart times by randomizing each worker's actual task limit. For example, with max_async_tasks=100 and max_async_tasks_jitter=10, workers will have limits ranging from 100 to 110. This prevents the thundering herd problem where synchronized restarts cause all workers to compete for the same resources simultaneously, leading to more graceful degradation under load.

@adiberk
Copy link
Author

adiberk commented Dec 23, 2025

@s3rius I would be curious as to whether you think this is valuable or not?

I hit an issue where in ensuring even distribution, my workers across 3 replicas would end up shutting down around the same time given the extremely high task usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant