feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts#30601
Conversation
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryAdds
Confidence Score: 5/5Safe to merge — the change is additive, opt-in, and falls back gracefully on older uvicorn versions. The implementation is a straightforward additive CLI flag that mirrors an existing pattern (--timeout_worker_healthcheck) already in the codebase. The early-return guard for the missing-base-flag case and the feature-detection for old uvicorn versions are both correct. The gunicorn path applies the option only when the base flag is also set. No existing behavior is altered. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/proxy_cli.py | Adds --max_requests_before_restart_jitter CLI flag with correct feature-detection for uvicorn>=0.41.0 (same pattern as timeout_worker_healthcheck), early-return guard when base flag is absent, and clean forwarding to gunicorn's max_requests_jitter. No logic errors found. |
| tests/test_litellm/proxy/test_proxy_cli.py | Adds six focused unit tests covering: uvicorn happy-path, gunicorn happy-path, gunicorn options dict, both "no base flag" warning paths, old-uvicorn version-detection fallback. All tests use mocks only (no real network calls), consistent with existing test style. |
Reviews (4): Last reviewed commit: "feat(proxy): add --max_requests_before_r..." | Re-trigger Greptile
|
@greptileai the warning/forward inconsistency you flagged is fixed in 19de008; the jitter kwarg is no longer forwarded when --max_requests_before_restart is unset, on both the uvicorn and gunicorn paths. Mind taking another look |
…er restarts Setting --max_requests_before_restart alone recycles every worker at almost the same time once they have served a similar number of requests, which under sustained load can drop a whole pod's capacity at once roughly every 7-10 days. This exposes a jitter knob that adds a random amount in [0, jitter] to the restart threshold per worker so restarts are staggered. It maps to uvicorn's limit_max_requests_jitter and gunicorn's max_requests_jitter. uvicorn only gained limit_max_requests_jitter in 0.41.0 while litellm still allows uvicorn>=0.33.0, so the uvicorn path feature-detects the parameter via the Config signature and warns instead of crashing on older versions. The flag has no effect without --max_requests_before_restart, so the kwarg is not forwarded in that case and a warning is printed on both the uvicorn and gunicorn paths. Resolves LIT-3774
19de008 to
0aaa1df
Compare
|
@greptileai squashed to a single commit and rebased onto the latest litellm_internal_staging in 0aaa1df; no content change from the 5/5 review, just history cleanup. Please re-confirm on the new HEAD |
|
CI note: the only red check is Docs for the new flag: BerriAI/litellm-docs#367 |
39ab43c
into
litellm_internal_staging
Relevant issues
Closes #24401 (the original community report and PR #24405 cover the same flag)
Linear ticket
Resolves LIT-3774
Problem
Setting
--max_requests_before_restartalone recycles every worker at almost the same time once they have served a similar number of requests. Under sustained or bursty load that drops a whole pod's worth of capacity at once; one customer saw all containers in a pod terminate together roughly every 7 to 10 days. The standard mitigation is jitter, and both uvicorn (limit_max_requests_jitter) and gunicorn (max_requests_jitter) support it, but LiteLLM did not expose itChanges
Adds a
--max_requests_before_restart_jitterCLI flag (envMAX_REQUESTS_BEFORE_RESTART_JITTER). Each worker adds a random amount in[0, jitter]to its restart threshold so workers recycle at different request counts instead of in lockstep. It maps to uvicorn'slimit_max_requests_jitterand gunicorn'smax_requests_jitteruvicorn only gained
limit_max_requests_jitterin 0.41.0, while LiteLLM still allowsuvicorn>=0.33.0. Rather than passing the kwarg unconditionally (which raisesTypeErroron 0.33 through 0.40), the uvicorn path feature-detects the parameter fromuvicorn.Config's signature, the same way the existing--timeout_worker_healthcheckflag does, and prints a clear "requires uvicorn>=0.41.0" warning instead of crashing. The flag has no effect without--max_requests_before_restart, which is warned about on both the uvicorn and gunicorn paths. Granian and hypercorn do not support a per-request recycle limit, so the flag is intentionally not threaded there (the granian path already warns that--max_requests_before_restartitself is unsupported)Type
🆕 New Feature
Screenshots / Proof of Fix
All runs use a real proxy launched via
python litellm/proxy/proxy_cli.pyagainst a real Postgres 16 (no mocks). This environment ships uvicorn 0.33.0 and gunicorn 23.0.0, so it exercises both the older-uvicorn fallback and the live gunicorn recycle behavior the customer relies on.litellm_internal_staging, the flag does not exist:curl http://localhost:4000/health/readiness, recording the cumulative request count at which a worker recycled (a newBooting workerline appeared in the gunicorn log).Without jitter the two workers recycle in lockstep, which is the reported failure mode:
With
--max_requests_before_restart_jitter 40the same workload spreads the restarts out so they no longer coincide:Pre-Submission checklist