Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/proxy/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,23 @@ This page documents all command-line interface (CLI) arguments available for the
litellm
```

### --max_requests_before_restart_jitter
- **Default:** `None`
- **Type:** `int`
- Adds a random amount in `[0, jitter]` to `--max_requests_before_restart` for each worker so workers recycle at staggered request counts instead of all at once. Has no effect without `--max_requests_before_restart`.
- For uvicorn: maps to `limit_max_requests_jitter` (requires `uvicorn>=0.41.0`; on older versions the flag is ignored with a warning)
- For gunicorn: maps to `max_requests_jitter`
- **Usage:**
```shell
litellm --max_requests_before_restart 10000 --max_requests_before_restart_jitter 1000
```
- **Usage - set Environment Variable:** `MAX_REQUESTS_BEFORE_RESTART_JITTER`
```shell
export MAX_REQUESTS_BEFORE_RESTART=10000
export MAX_REQUESTS_BEFORE_RESTART_JITTER=1000
litellm
```

## Server Backend Options

### --run_gunicorn
Expand Down
7 changes: 7 additions & 0 deletions docs/proxy/prod.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ When you run **multiple workers in one container** and rely on `--max_requests_b
CMD ["--port", "4000", "--config", "./proxy_server_config.yaml", "--num_workers", "4", "--run_gunicorn", "--max_requests_before_restart", "10000"]
```

When several workers boot together and serve a similar amount of traffic, they reach the request threshold at almost the same time and recycle in lockstep, dropping a chunk of capacity at once. Add `--max_requests_before_restart_jitter` to offset each worker's threshold by a random amount in `[0, jitter]` so restarts stagger instead of synchronizing. It maps to Uvicorn's [`limit_max_requests_jitter`](https://uvicorn.dev/settings/#resource-limits) (requires `uvicorn>=0.41.0`) and Gunicorn's [`max_requests_jitter`](https://gunicorn.org/reference/settings/#max_requests_jitter), and has no effect without `--max_requests_before_restart`.

```shell
# Stagger recycling so workers don't all restart at once
CMD ["--port", "4000", "--config", "./proxy_server_config.yaml", "--num_workers", "4", "--run_gunicorn", "--max_requests_before_restart", "10000", "--max_requests_before_restart_jitter", "1000"]
```

### 3c. Keep restarts hitless

A restart is "hitless" when in-flight requests finish before the process exits, so no client sees a dropped connection. Two cases matter in production:
Expand Down