feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts by yassin-berriai · Pull Request #30601 · BerriAI/litellm

yassin-berriai · 2026-06-17T03:17:50Z

Relevant issues

Closes #24401 (the original community report and PR #24405 cover the same flag)

Linear ticket

Resolves LIT-3774

Problem

Setting --max_requests_before_restart alone recycles every worker at almost the same time once they have served a similar number of requests. Under sustained or bursty load that drops a whole pod's worth of capacity at once; one customer saw all containers in a pod terminate together roughly every 7 to 10 days. The standard mitigation is jitter, and both uvicorn (limit_max_requests_jitter) and gunicorn (max_requests_jitter) support it, but LiteLLM did not expose it

Changes

Adds a --max_requests_before_restart_jitter CLI flag (env MAX_REQUESTS_BEFORE_RESTART_JITTER). Each worker adds a random amount in [0, jitter] to its restart threshold so workers recycle at different request counts instead of in lockstep. It maps to uvicorn's limit_max_requests_jitter and gunicorn's max_requests_jitter

uvicorn only gained limit_max_requests_jitter in 0.41.0, while LiteLLM still allows uvicorn>=0.33.0. Rather than passing the kwarg unconditionally (which raises TypeError on 0.33 through 0.40), the uvicorn path feature-detects the parameter from uvicorn.Config's signature, the same way the existing --timeout_worker_healthcheck flag does, and prints a clear "requires uvicorn>=0.41.0" warning instead of crashing. The flag has no effect without --max_requests_before_restart, which is warned about on both the uvicorn and gunicorn paths. Granian and hypercorn do not support a per-request recycle limit, so the flag is intentionally not threaded there (the granian path already warns that --max_requests_before_restart itself is unsupported)

Type

🆕 New Feature

Screenshots / Proof of Fix

All runs use a real proxy launched via python litellm/proxy/proxy_cli.py against a real Postgres 16 (no mocks). This environment ships uvicorn 0.33.0 and gunicorn 23.0.0, so it exercises both the older-uvicorn fallback and the live gunicorn recycle behavior the customer relies on.

Before the change, on litellm_internal_staging, the flag does not exist:

$ python litellm/proxy/proxy_cli.py --max_requests_before_restart_jitter 50 --local
Usage: proxy_cli.py [OPTIONS] [CLI_ARGS]...
Try 'proxy_cli.py --help' for help.

Error: No such option: --max_requests_before_restart_jitter Did you mean --max_requests_before_restart?

With the change the flag is wired into the CLI:

$ python litellm/proxy/proxy_cli.py --help | grep -A6 max_requests_before_restart_jitter
  --max_requests_before_restart_jitter INTEGER
                                  Stagger worker restarts by adding a random
                                  amount in [0, jitter] to
                                  --max_requests_before_restart so workers do
                                  not recycle at the same time (uvicorn:
                                  limit_max_requests_jitter, requires
                                  uvicorn>=0.41.0; gunicorn: ...

uvicorn path on uvicorn 0.33.0 (below the 0.41.0 floor); the proxy boots normally and degrades gracefully instead of crashing:

$ litellm --config config.yaml --max_requests_before_restart 1000 --max_requests_before_restart_jitter 50
LiteLLM Proxy: --max_requests_before_restart_jitter requires uvicorn>=0.41.0, but installed uvicorn==0.33.0. Ignoring the flag.
INFO:     Started server process [67612]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)

$ curl -s http://localhost:4000/health/readiness
{"status": "healthy", "db": "connected"}

gunicorn path end to end. Two workers, base threshold 20, hammered with sequential curl http://localhost:4000/health/readiness, recording the cumulative request count at which a worker recycled (a new Booting worker line appeared in the gunicorn log).

Without jitter the two workers recycle in lockstep, which is the reported failure mode:

$ litellm --config config.yaml --run_gunicorn --num_workers 2 --max_requests_before_restart 20
[nojit] recycle at cumulative request #41 (+1 worker booted)
[nojit] recycle at cumulative request #42 (+1 worker booted)
[nojit] recycle at cumulative request #83 (+2 worker booted)

With --max_requests_before_restart_jitter 40 the same workload spreads the restarts out so they no longer coincide:

$ litellm --config config.yaml --run_gunicorn --num_workers 2 --max_requests_before_restart 20 --max_requests_before_restart_jitter 40
[jit40] recycle at cumulative request #91 (+1 worker booted)
[jit40] recycle at cumulative request #118 (+1 worker booted)
[jit40] recycle at cumulative request #158 (+1 worker booted)
[jit40] recycle at cumulative request #172 (+1 worker booted)

Pre-Submission checklist

I have added meaningful tests
My PR passes all unit tests
My PR's scope is as isolated as possible; it only solves 1 specific problem

CLAassistant · 2026-06-17T03:17:58Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-06-17T03:20:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-17T15:45:54Z

Greptile Summary

Adds --max_requests_before_restart_jitter (env MAX_REQUESTS_BEFORE_RESTART_JITTER) to stagger worker recycle times so workers don't all restart in lockstep once they reach the same request count.

uvicorn path: uses inspect.signature(uvicorn.Config.__init__) to feature-detect limit_max_requests_jitter (requires ≥0.41.0) and degrades gracefully with a clear version warning on older installs, following the same pattern already used by --timeout_worker_healthcheck.
gunicorn path: forwards the value directly as max_requests_jitter; both paths warn and no-op when the base --max_requests_before_restart flag is absent.
Six new mock-only unit tests cover the uvicorn happy-path, old-uvicorn fallback, gunicorn options forwarding, and both "no base flag" warning branches.

Confidence Score: 5/5

Safe to merge — the change is additive, opt-in, and falls back gracefully on older uvicorn versions.

The implementation is a straightforward additive CLI flag that mirrors an existing pattern (--timeout_worker_healthcheck) already in the codebase. The early-return guard for the missing-base-flag case and the feature-detection for old uvicorn versions are both correct. The gunicorn path applies the option only when the base flag is also set. No existing behavior is altered.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/proxy/proxy_cli.py	Adds --max_requests_before_restart_jitter CLI flag with correct feature-detection for uvicorn>=0.41.0 (same pattern as timeout_worker_healthcheck), early-return guard when base flag is absent, and clean forwarding to gunicorn's max_requests_jitter. No logic errors found.
tests/test_litellm/proxy/test_proxy_cli.py	Adds six focused unit tests covering: uvicorn happy-path, gunicorn happy-path, gunicorn options dict, both "no base flag" warning paths, old-uvicorn version-detection fallback. All tests use mocks only (no real network calls), consistent with existing test style.

_{Reviews (4): Last reviewed commit: "feat(proxy): add --max_requests_before_r..." | Re-trigger Greptile}

yassin-berriai · 2026-06-17T15:46:49Z

@greptileai

yassin-berriai · 2026-06-17T15:51:11Z

@greptileai the warning/forward inconsistency you flagged is fixed in 19de008; the jitter kwarg is no longer forwarded when --max_requests_before_restart is unset, on both the uvicorn and gunicorn paths. Mind taking another look

…er restarts Setting --max_requests_before_restart alone recycles every worker at almost the same time once they have served a similar number of requests, which under sustained load can drop a whole pod's capacity at once roughly every 7-10 days. This exposes a jitter knob that adds a random amount in [0, jitter] to the restart threshold per worker so restarts are staggered. It maps to uvicorn's limit_max_requests_jitter and gunicorn's max_requests_jitter. uvicorn only gained limit_max_requests_jitter in 0.41.0 while litellm still allows uvicorn>=0.33.0, so the uvicorn path feature-detects the parameter via the Config signature and warns instead of crashing on older versions. The flag has no effect without --max_requests_before_restart, so the kwarg is not forwarded in that case and a warning is printed on both the uvicorn and gunicorn paths. Resolves LIT-3774

yassin-berriai · 2026-06-17T16:02:38Z

@greptileai squashed to a single commit and rebased onto the latest litellm_internal_staging in 0aaa1df; no content change from the 5/5 review, just history cleanup. Please re-confirm on the new HEAD

yassin-berriai · 2026-06-17T16:20:47Z

CI note: the only red check is ci/circleci: llm_translation_testing, which is failing on the latest litellm_internal_staging HEAD (cee6c9c) and on other open PRs (for example #30659) independently of this change; this PR only adds a CLI flag in proxy_cli.py and cannot affect provider translation. The proxy-runtime / Run tests entry was a cancelled duplicate run, not a failure (the active run and its coverage upload passed). Greptile is 5/5 on the current HEAD.

Docs for the new flag: BerriAI/litellm-docs#367

mateo-berri

LGTM; thanks!

yassin-berriai marked this pull request as ready for review June 17, 2026 15:43

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread litellm/proxy/proxy_cli.py

yassin-berriai force-pushed the litellm_uvicorn_max_requests_jitter branch from 19de008 to 0aaa1df Compare June 17, 2026 16:02

yassin-berriai mentioned this pull request Jun 17, 2026

docs(proxy): document --max_requests_before_restart_jitter BerriAI/litellm-docs#367

Open

yassin-berriai enabled auto-merge (squash) June 17, 2026 16:44

mateo-berri approved these changes Jun 17, 2026

View reviewed changes

yassin-berriai merged commit 39ab43c into litellm_internal_staging Jun 17, 2026
146 of 149 checks passed

yassin-berriai deleted the litellm_uvicorn_max_requests_jitter branch June 17, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts#30601

feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts#30601
yassin-berriai merged 1 commit into
litellm_internal_stagingfrom
litellm_uvicorn_max_requests_jitter

yassin-berriai commented Jun 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 17, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

mateo-berri left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yassin-berriai commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Linear ticket

Problem

Changes

Type

Screenshots / Proof of Fix

Pre-Submission checklist

Uh oh!

CLAassistant commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

yassin-berriai commented Jun 17, 2026

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yassin-berriai commented Jun 17, 2026 •

edited

Loading

codecov Bot commented Jun 17, 2026 •

edited

Loading

greptile-apps Bot commented Jun 17, 2026 •

edited

Loading