Skip to content

feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts#30601

Merged
yassin-berriai merged 1 commit into
litellm_internal_stagingfrom
litellm_uvicorn_max_requests_jitter
Jun 17, 2026
Merged

feat(proxy): add --max_requests_before_restart_jitter to stagger worker restarts#30601
yassin-berriai merged 1 commit into
litellm_internal_stagingfrom
litellm_uvicorn_max_requests_jitter

Conversation

@yassin-berriai

@yassin-berriai yassin-berriai commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Relevant issues

Closes #24401 (the original community report and PR #24405 cover the same flag)

Linear ticket

Resolves LIT-3774

Problem

Setting --max_requests_before_restart alone recycles every worker at almost the same time once they have served a similar number of requests. Under sustained or bursty load that drops a whole pod's worth of capacity at once; one customer saw all containers in a pod terminate together roughly every 7 to 10 days. The standard mitigation is jitter, and both uvicorn (limit_max_requests_jitter) and gunicorn (max_requests_jitter) support it, but LiteLLM did not expose it

Changes

Adds a --max_requests_before_restart_jitter CLI flag (env MAX_REQUESTS_BEFORE_RESTART_JITTER). Each worker adds a random amount in [0, jitter] to its restart threshold so workers recycle at different request counts instead of in lockstep. It maps to uvicorn's limit_max_requests_jitter and gunicorn's max_requests_jitter

uvicorn only gained limit_max_requests_jitter in 0.41.0, while LiteLLM still allows uvicorn>=0.33.0. Rather than passing the kwarg unconditionally (which raises TypeError on 0.33 through 0.40), the uvicorn path feature-detects the parameter from uvicorn.Config's signature, the same way the existing --timeout_worker_healthcheck flag does, and prints a clear "requires uvicorn>=0.41.0" warning instead of crashing. The flag has no effect without --max_requests_before_restart, which is warned about on both the uvicorn and gunicorn paths. Granian and hypercorn do not support a per-request recycle limit, so the flag is intentionally not threaded there (the granian path already warns that --max_requests_before_restart itself is unsupported)

Type

🆕 New Feature

Screenshots / Proof of Fix

All runs use a real proxy launched via python litellm/proxy/proxy_cli.py against a real Postgres 16 (no mocks). This environment ships uvicorn 0.33.0 and gunicorn 23.0.0, so it exercises both the older-uvicorn fallback and the live gunicorn recycle behavior the customer relies on.

  1. Before the change, on litellm_internal_staging, the flag does not exist:
$ python litellm/proxy/proxy_cli.py --max_requests_before_restart_jitter 50 --local
Usage: proxy_cli.py [OPTIONS] [CLI_ARGS]...
Try 'proxy_cli.py --help' for help.

Error: No such option: --max_requests_before_restart_jitter Did you mean --max_requests_before_restart?
  1. With the change the flag is wired into the CLI:
$ python litellm/proxy/proxy_cli.py --help | grep -A6 max_requests_before_restart_jitter
  --max_requests_before_restart_jitter INTEGER
                                  Stagger worker restarts by adding a random
                                  amount in [0, jitter] to
                                  --max_requests_before_restart so workers do
                                  not recycle at the same time (uvicorn:
                                  limit_max_requests_jitter, requires
                                  uvicorn>=0.41.0; gunicorn: ...
  1. uvicorn path on uvicorn 0.33.0 (below the 0.41.0 floor); the proxy boots normally and degrades gracefully instead of crashing:
$ litellm --config config.yaml --max_requests_before_restart 1000 --max_requests_before_restart_jitter 50
LiteLLM Proxy: --max_requests_before_restart_jitter requires uvicorn>=0.41.0, but installed uvicorn==0.33.0. Ignoring the flag.
INFO:     Started server process [67612]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)

$ curl -s http://localhost:4000/health/readiness
{"status": "healthy", "db": "connected"}
  1. gunicorn path end to end. Two workers, base threshold 20, hammered with sequential curl http://localhost:4000/health/readiness, recording the cumulative request count at which a worker recycled (a new Booting worker line appeared in the gunicorn log).

Without jitter the two workers recycle in lockstep, which is the reported failure mode:

$ litellm --config config.yaml --run_gunicorn --num_workers 2 --max_requests_before_restart 20
[nojit] recycle at cumulative request #41 (+1 worker booted)
[nojit] recycle at cumulative request #42 (+1 worker booted)
[nojit] recycle at cumulative request #83 (+2 worker booted)

With --max_requests_before_restart_jitter 40 the same workload spreads the restarts out so they no longer coincide:

$ litellm --config config.yaml --run_gunicorn --num_workers 2 --max_requests_before_restart 20 --max_requests_before_restart_jitter 40
[jit40] recycle at cumulative request #91 (+1 worker booted)
[jit40] recycle at cumulative request #118 (+1 worker booted)
[jit40] recycle at cumulative request #158 (+1 worker booted)
[jit40] recycle at cumulative request #172 (+1 worker booted)

Pre-Submission checklist

  • I have added meaningful tests
  • My PR passes all unit tests
  • My PR's scope is as isolated as possible; it only solves 1 specific problem

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@yassin-berriai yassin-berriai marked this pull request as ready for review June 17, 2026 15:43
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds --max_requests_before_restart_jitter (env MAX_REQUESTS_BEFORE_RESTART_JITTER) to stagger worker recycle times so workers don't all restart in lockstep once they reach the same request count.

  • uvicorn path: uses inspect.signature(uvicorn.Config.__init__) to feature-detect limit_max_requests_jitter (requires ≥0.41.0) and degrades gracefully with a clear version warning on older installs, following the same pattern already used by --timeout_worker_healthcheck.
  • gunicorn path: forwards the value directly as max_requests_jitter; both paths warn and no-op when the base --max_requests_before_restart flag is absent.
  • Six new mock-only unit tests cover the uvicorn happy-path, old-uvicorn fallback, gunicorn options forwarding, and both "no base flag" warning branches.

Confidence Score: 5/5

Safe to merge — the change is additive, opt-in, and falls back gracefully on older uvicorn versions.

The implementation is a straightforward additive CLI flag that mirrors an existing pattern (--timeout_worker_healthcheck) already in the codebase. The early-return guard for the missing-base-flag case and the feature-detection for old uvicorn versions are both correct. The gunicorn path applies the option only when the base flag is also set. No existing behavior is altered.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/proxy_cli.py Adds --max_requests_before_restart_jitter CLI flag with correct feature-detection for uvicorn>=0.41.0 (same pattern as timeout_worker_healthcheck), early-return guard when base flag is absent, and clean forwarding to gunicorn's max_requests_jitter. No logic errors found.
tests/test_litellm/proxy/test_proxy_cli.py Adds six focused unit tests covering: uvicorn happy-path, gunicorn happy-path, gunicorn options dict, both "no base flag" warning paths, old-uvicorn version-detection fallback. All tests use mocks only (no real network calls), consistent with existing test style.

Reviews (4): Last reviewed commit: "feat(proxy): add --max_requests_before_r..." | Re-trigger Greptile

Comment thread litellm/proxy/proxy_cli.py
@yassin-berriai

Copy link
Copy Markdown
Contributor Author

@greptileai

@yassin-berriai

Copy link
Copy Markdown
Contributor Author

@greptileai the warning/forward inconsistency you flagged is fixed in 19de008; the jitter kwarg is no longer forwarded when --max_requests_before_restart is unset, on both the uvicorn and gunicorn paths. Mind taking another look

…er restarts

Setting --max_requests_before_restart alone recycles every worker at almost the
same time once they have served a similar number of requests, which under
sustained load can drop a whole pod's capacity at once roughly every 7-10 days.

This exposes a jitter knob that adds a random amount in [0, jitter] to the
restart threshold per worker so restarts are staggered. It maps to uvicorn's
limit_max_requests_jitter and gunicorn's max_requests_jitter. uvicorn only
gained limit_max_requests_jitter in 0.41.0 while litellm still allows
uvicorn>=0.33.0, so the uvicorn path feature-detects the parameter via the
Config signature and warns instead of crashing on older versions. The flag has
no effect without --max_requests_before_restart, so the kwarg is not forwarded
in that case and a warning is printed on both the uvicorn and gunicorn paths.

Resolves LIT-3774
@yassin-berriai yassin-berriai force-pushed the litellm_uvicorn_max_requests_jitter branch from 19de008 to 0aaa1df Compare June 17, 2026 16:02
@yassin-berriai

Copy link
Copy Markdown
Contributor Author

@greptileai squashed to a single commit and rebased onto the latest litellm_internal_staging in 0aaa1df; no content change from the 5/5 review, just history cleanup. Please re-confirm on the new HEAD

@yassin-berriai

Copy link
Copy Markdown
Contributor Author

CI note: the only red check is ci/circleci: llm_translation_testing, which is failing on the latest litellm_internal_staging HEAD (cee6c9c) and on other open PRs (for example #30659) independently of this change; this PR only adds a CLI flag in proxy_cli.py and cannot affect provider translation. The proxy-runtime / Run tests entry was a cancelled duplicate run, not a failure (the active run and its coverage upload passed). Greptile is 5/5 on the current HEAD.

Docs for the new flag: BerriAI/litellm-docs#367

@yassin-berriai yassin-berriai enabled auto-merge (squash) June 17, 2026 16:44

@mateo-berri mateo-berri left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks!

@yassin-berriai yassin-berriai merged commit 39ab43c into litellm_internal_staging Jun 17, 2026
146 of 149 checks passed
@yassin-berriai yassin-berriai deleted the litellm_uvicorn_max_requests_jitter branch June 17, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Expose --limit-max-requests-jitter flag of uvicorn with LiteLLM

3 participants