Check for existing issues
What happened?
We've been experiencing periodic instability in our LiteLLM proxy pod. The issue manifests as the Prisma query engine process crashing:
13:50:30 - LiteLLM Proxy:WARNING: utils.py:4178 - Attempting Prisma DB reconnect. reason=db_health_watchdog_connection_error
13:50:30 - LiteLLM Proxy:ERROR: utils.py:3918 - prisma-query-engine PID 71 exited (waitpid thread); triggering reconnect.
Once the query engine crashes, all subsequent database calls fail immediately:
prisma.errors.ClientNotConnectedError: Client is not connected to the query engine, you must call `connect()` before attempting to query data.
LiteLLM detects the persistent failures and performs a clean shutdown, triggering a Kubernetes pod restart. This restart cycle repeated 4 times before the reconnect finally succeeded and the pod reached a stable state on the 5th attempt.
We have two questions:
- Is this a known bug in LiteLLM or Prisma, or is this expected behavior under certain conditions?
- Are there any configuration changes (e.g., connection pool tuning, reconnect retry settings, keepalive parameters) we can apply to either prevent the query engine from crashing or make the reconnect more resilient so it succeeds on the first attempt rather than requiring multiple pod restarts?
Steps to Reproduce
The issue happens intermittently, not easy to reproduce
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
1.83.10-stable
Twitter / LinkedIn details
No response
Check for existing issues
What happened?
We've been experiencing periodic instability in our LiteLLM proxy pod. The issue manifests as the Prisma query engine process crashing:
Once the query engine crashes, all subsequent database calls fail immediately:
LiteLLM detects the persistent failures and performs a clean shutdown, triggering a Kubernetes pod restart. This restart cycle repeated 4 times before the reconnect finally succeeded and the pod reached a stable state on the 5th attempt.
We have two questions:
Steps to Reproduce
The issue happens intermittently, not easy to reproduce
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
1.83.10-stable
Twitter / LinkedIn details
No response