You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…where fn is _sync_store / _sync_store_batch / _sync_update, and those functions call self._embedding_provider.embed(...)inside the lock (lines 1121, 1187, 1392, 1763, 1925, 3088).
The existing TODO acknowledges it:
Throughput on parallel store operations is reduced to one-at-a-time; a future optimisation could narrow the lock to the SQL portion of each _sync_* method (releasing it across the embedding network call) but correctness comes first.
distillery_status calls three locked operations: count_entries, list_feed_sources, probe_readiness (which also goes through _run_sync at line 1034). So when one slow embedding call holds the lock, status probes queue behind it three times over and clients see "server is dead."
distillery_status calls timed out repeatedly while writes continued to (slowly) complete.
Server remained unresponsive to status probes for >5 minutes after writes stopped.
Fix
Two-stage:
Stage 1 — split health from worker pool (small change, big win):
Make probe_readiness and count_entries use a second, read-only DuckDB connection that is not behind _conn_lock. DuckDB supports multiple concurrent read-only connections to the same file.
distillery_status then completes in ms regardless of write contention.
Stage 2 — release lock across embedding call (what the TODO says):
In _sync_store / _sync_store_batch / _sync_update, compute embeddings before acquiring the SQL portion of the lock. Pattern:
The SQL critical section then takes the lock for milliseconds, not seconds.
Pre-existing thread-safety invariant for DuckDBPyConnection is preserved because the lock still covers all SQL.
Stage 1 alone fixes the "looks dead" symptom. Stage 2 fixes the underlying throughput issue.
Acceptance
distillery_status returns within 1s while 10 concurrent embedding-bound writes are in flight.
Stage 2: end-to-end p95 latency on parallel distillery_store drops by ≥3x on a slow-provider stub.
Integration test: spawn 20 concurrent stores backed by a provider that sleeps 500ms per embed; assert total wall-time < 20 × 500ms (i.e. some parallelism).
Root cause
DuckDBStore._run_syncholds a single globalasyncio.Lockacross the entire sync function, including the embedding HTTP round-trip:src/distillery/store/duckdb.py:976:…where
fnis_sync_store/_sync_store_batch/_sync_update, and those functions callself._embedding_provider.embed(...)inside the lock (lines 1121, 1187, 1392, 1763, 1925, 3088).The existing TODO acknowledges it:
distillery_statuscalls three locked operations:count_entries,list_feed_sources,probe_readiness(which also goes through_run_syncat line 1034). So when one slow embedding call holds the lock, status probes queue behind it three times over and clients see "server is dead."Repro
In one session today:
distillery_storecalls fired (embedding-heavy).distillery_statuscalls timed out repeatedly while writes continued to (slowly) complete.Fix
Two-stage:
Stage 1 — split health from worker pool (small change, big win):
probe_readinessandcount_entriesuse a second, read-only DuckDB connection that is not behind_conn_lock. DuckDB supports multiple concurrent read-only connections to the same file.distillery_statusthen completes in ms regardless of write contention.Stage 2 — release lock across embedding call (what the TODO says):
_sync_store/_sync_store_batch/_sync_update, compute embeddings before acquiring the SQL portion of the lock. Pattern:DuckDBPyConnectionis preserved because the lock still covers all SQL.Stage 1 alone fixes the "looks dead" symptom. Stage 2 fixes the underlying throughput issue.
Acceptance
distillery_statusreturns within 1s while 10 concurrent embedding-bound writes are in flight.distillery_storedrops by ≥3x on a slow-provider stub.References
src/distillery/store/duckdb.py:943-995(_run_sync— the single global lock)src/distillery/store/duckdb.py:1017-1039(probe_readiness— goes through the same lock)src/distillery/store/duckdb.py:1121, 1187, 1392, 1763, 1925, 3088(embedding calls inside the locked section)src/distillery/mcp/tools/meta.py:120-152(status handler calling three locked store operations)