Skip to content

Commit 9d6bbb8

Browse files
author
Project Team
committed
Create fresh ollama.Client per request to avoid stale connection hangs
A persistent httpx client reuses connections. After a streaming response completes, the underlying HTTP/1.1 connection can be left in a state where the server has closed it but the client hasn't detected that yet. The next request then hangs silently until the read timeout fires, holding the flock the entire time and starving every subsequent request. Create a new ollama.Client (and therefore a fresh httpx connection) for each inference call. The per-request overhead is negligible compared to the 10s inference time.
1 parent 1ebec5c commit 9d6bbb8

1 file changed

Lines changed: 10 additions & 5 deletions

File tree

app/ocr_backends.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,11 @@ def __init__(self, model: str = "llama3.2-vision", host: str = "http://localhost
9393
import httpx
9494
import ollama
9595
self.ollama = ollama
96-
self._client = ollama.Client(
97-
host=host,
98-
timeout=httpx.Timeout(timeout=float(timeout), connect=10.0),
99-
)
96+
self._httpx_timeout = httpx.Timeout(timeout=float(timeout), connect=10.0)
97+
# Do NOT create a persistent _client here. A long-lived httpx client
98+
# can end up with a stale/broken connection after a streaming response
99+
# completes, causing subsequent requests to hang silently. We create
100+
# a fresh client per request in _do_extract() instead.
100101
except ImportError:
101102
self._is_available = False
102103
self._availability_error = "ollama Python library not installed. Install with: pip install ollama"
@@ -271,7 +272,11 @@ def _do_extract(self, image_path: str, start_time: float) -> Dict[str, Any]:
271272
# streaming because the runner will still abort cleanly on pipe
272273
# breaks regardless of the keep_alive setting.
273274
chunks = []
274-
for chunk in self._client.chat(
275+
client = self.ollama.Client(
276+
host=self.host,
277+
timeout=self._httpx_timeout,
278+
)
279+
for chunk in client.chat(
275280
model=self.model,
276281
messages=[{
277282
'role': 'user',

0 commit comments

Comments
 (0)