feat(redis-lock): Redis distributed lock for memory writes — Redis-first + file-lock fallback (C1/C2/H1/H2/H4/N5)#704
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: befe6cb245
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let redisLockManager: RedisLockManager | null = null; | ||
| let redisInitPromise: Promise<RedisLockManager | null> | null = null; |
There was a problem hiding this comment.
Move Redis lock declarations outside the class body
The new let declarations are placed directly inside MemoryStore's class body, which makes src/store.ts invalid TypeScript/ESM syntax and prevents the module from loading (writes fail before runtime logic is reached). This block needs to be at module scope (or converted to valid class members) so the package can be imported.
Useful? React with 👍 / 👎.
| const initPromise = (async () => { | ||
| try { | ||
| const mgr = await createRedisLockManager(); | ||
| if (mgr !== null) { | ||
| redisLockManager = mgr; // resolve 後寫入 cache,後續 caller 走 fast path |
There was a problem hiding this comment.
Assign init guard before starting Redis initialization
The compare-and-swap still races because initPromise is started before ownership is claimed in redisInitPromise; a losing concurrent caller can still create its own Redis manager and later overwrite redisLockManager, leaking an extra client and causing inconsistent cache state. Claim the shared promise first, then run initialization through that single promise.
Useful? React with 👍 / 👎.
| if (redisInitPromise !== null) { | ||
| return redisInitPromise; |
There was a problem hiding this comment.
Reset init promise after null manager initialization
When initialization returns null (for example, Redis is temporarily down), redisInitPromise remains permanently set and all future calls return that resolved null without retrying. This turns a transient outage into a permanent fallback to file locking until process restart, which defeats the Redis-first behavior under recovery scenarios.
Useful? React with 👍 / 👎.
4c63877 to
befe6cb
Compare
Codex 對抗辯論後追加修復(C1 / N5)C1 — 雙重執行問題(對抗後降級為效能問題)Codex 發現:T1 和 T2 都 start 辯論後結論:Codex 讓步 — 只有一個結果被 cache,沒有 orphan。但浪費是真實的。 修復:加 // T2: if init in progress, spin-wait
if (_initInProgress) {
while (_initInProgress) { await sleep(10); }
return redisLockManager; // 直接用 cache
}
_initInProgress = true;
try {
redisLockManager = await createRedisLockManager();
} finally {
_initInProgress = false;
}N5 — ioredis 死客戶端不回退(對抗後確認為真 bug)Codex 發現: 辯論後結論:Codex 確認這是真實 bug,需要轉為 修復: const isIoredisStoppedState =
errMsg.includes('Connection is closed') ||
errMsg.includes('Stream connection is closed') ||
errMsg.includes('is connecting') ||
errMsg.includes('is disconnected');
if (isRedisConnectionError(err) || isIoredisStoppedState) {
throw new RedisUnavailableError(`Redis connection failed: ${err}`);
}本地測試:8/8 ✅( |
rwmjhb
left a comment
There was a problem hiding this comment.
Thanks for the latest iteration. This version is in much better shape than the previous head: the install/import blocker is partially addressed and the syntax issue is gone. I am still requesting changes before merge for two remaining correctness/package issues.
First, ioredis is imported unconditionally from production code (src/redis-lock.ts, reached through src/store.ts), but it is declared under devDependencies. Consumers installing the published package without dev dependencies will still fail to load the store module before the Redis fallback logic can run. Please move ioredis to runtime dependencies, or make it a true optional/peer dependency with a lazy import and missing-module fallback.
Second, the Redis-to-file-lock fallback can split writers across two independent lock domains. If one process successfully holds the Redis lock and another process loses Redis connectivity, the second process falls back to runWithFileLockCore() and can enter the LanceDB write critical section under the file lock while the first process is still protected only by Redis. That reintroduces concurrent writes during partial Redis outages, which is exactly the class of failure this PR is trying to avoid. The fallback needs a design that does not allow Redis-locked and file-locked writers to run at the same time, or it should fail closed when Redis coordination is configured but unavailable.
Please also consider tightening the follow-up risks from the review: URL normalization currently drops path-selected Redis DBs, the Redis tests can pass without exercising failure paths when Redis is unavailable, and the global memory-write key serializes unrelated stores sharing the same Redis instance.
The direction is valuable, but the lock-domain issue and dependency classification need to be fixed before this is safe to merge.
- Add redis-lock.ts (new file) with isRedisConnectionError, RedisUnavailableError, RedisLockManager, createRedisLockManager - Add C1 compare-and-swap initPromise guard in getRedisLockManager() - Add H1 instanceof guard before Symbol.for check in runWithFileLock() - Add H2 depth=3 documentation in isRedisConnectionError() - Add H4 remove pre-flight ping, let first SET() fail naturally - Add N2 nodeTmpdir() function call fix - Extract runWithFileLockCore() as internal fallback for file lock
c067b31 to
6f69b7c
Compare
感謝 Maintainer Review — Issue 1 & 2 回應 + 修復程式碼已推送修復 commit Issue 1:
|
| 失敗模式 | 原因 | 處理方式 |
|---|---|---|
| Init failure | Redis 根本連不上 | createRedisLockManager() 回傳 null → file lock fallback(合理) |
| Runtime failure | Redis 執行中瞬斷 | acquire() 拋 RedisUnavailableError → 直接 throw(安全) |
store.ts 修復:
private async runWithFileLock<T>(fn: () => Promise<T>): Promise<T> {
- try {
- const mgr = await getRedisLockManager();
- if (mgr) {
- const release = await mgr.acquire("memory-write", 60000);
- try { return await fn(); }
- finally { await release(); }
- }
- } catch (err) {
- // M1: RedisUnavailableError 時進 file-lock fallback
- if (err instanceof RedisUnavailableError) {
- return this.runWithFileLockCore(fn);
- }
- if (err && typeof err === "object" && Symbol.for("RedisUnavailableError") in err) {
- return this.runWithFileLockCore(fn);
- }
- throw err;
+ const mgr = await getRedisLockManager();
+ if (mgr) {
+ // Redis lock manager 存在 → 用 Redis lock
+ // Runtime 中 Redis 瞬斷 → acquire() 拋出 RedisUnavailableError → 直接往上拋
+ const release = await mgr.acquire("memory-write", 60000);
+ try { return await fn(); }
+ finally { await release(); }
}
- return this.runWithFileLockCore(fn);
+ // Redis manager 不存在(init failure)→ file lock fallback(正常)
+ return this.runWithFileLockCore(fn);
}redis-lock.ts 修復(acquire() 開頭):
async acquire(key: string, ttl?: number): Promise<() => Promise<void>> {
+ if (!this.redis) {
+ throw new RedisUnavailableError("Redis client not initialized");
+ }防止 split brain 的效果:
正常狀態:
Process A — 持有 Redis lock(key="memory-write")→ 寫入 ✅
Redis 瞬斷後(修復後):
Process B — Redis 瞬斷 → acquire() 拋出 RedisUnavailableError → write fail ❌
Process A — 繼續用 Redis lock 寫,沒有人跟它搶 ✅
(沒有偷偷繞到 file lock,沒有 split brain)
Commit
6f69b7c — fix(redis-lock): dynamic import + Option E runtime fallback removal
…ce, Option E tests Issue 3 fix: parseRedisUrl() now uses URL API to extract hostname/port/db separately. DB selection (/0, /1, /2...) is preserved instead of being eaten by replace(). Added /^\d+$/ validation to reject non-numeric DB values (e.g. /abc) with warning log. Issue 4 fix: Lock key namespaced by dbPath hash (hashString). RedisLockManager stores _lockNamespace from config.dbPath. store.ts passes this.config.dbPath to getRedisLockManager(). Issue 5 fix: Test coverage for Option E runtime behavior. - acquire() throws RedisUnavailableError when redis is null - isHealthy() returns false when redis is null - Lock key namespace with dbPath works without crash Adversarial review by Claude Code confirmed: - Issue 3: NaN silent fallback bug found and fixed - Issue 4: hashString collision risk low (1.6M space); no module-level cache bug - Issue 5: Option E is a deliberate trade-off (consistency over availability)
Issues 3/4/5 修復 — Commit
|
rwmjhb
left a comment
There was a problem hiding this comment.
Thanks for iterating on this. I am going to close this PR rather than keep pushing on the current branch.
The Redis-lock direction is useful, but the current implementation still has core runtime/correctness blockers:
ioredisis still indevDependencies, so normal consumers will not install it and the Redis lock path will not activate in production installs.- Lock-domain selection can split inside one process. A caller that times out while Redis init is still in progress can fall back to the file lock, while a later caller in the same process uses Redis. That leaves concurrent writes protected by different lock domains.
- Redis URL parsing drops auth/TLS options and misparses legacy
host:portstyle values, so protected or TLS Redis deployments are likely to fail or connect incorrectly. - The full test run timed out, so the branch does not currently give us enough confidence for an invasive locking change.
Given the size of the branch and the remaining architectural blockers, please open a fresh, smaller PR if you want to continue this work. A good next version would put Redis runtime dependencies in dependencies, make lock-domain selection single-flight and consistent for the whole process, use robust Redis URL/config parsing, and include Redis success/failure tests that run in CI.
PR 拆分進度更新已根據 review 建議,將 #704 拆分為 3 個獨立的 PR。以下是目前的進度: ✅ PR-1(已建立)URL Parsing Fix + Lock Domain Single-Flight
⏳ PR-2(待建立)RedisLockManager 主體實作 ⏳ PR-3(待建立)整合測試 + CI 針對 review 的 4 個問題,PR-1 已處理的項目:
|
- package.json: test script 加入 redis-url-parsing.test.mjs - scripts/ci-test-manifest.mjs: storage-and-schema 群組加入 redis-url-parsing.test.mjs - scripts/verify-ci-test-manifest.mjs: EXPECTED_BASELINE 加入 redis-url-parsing.test.mjs
…dis-url-parsing.test.mjs Merge fix/pr704-redis-url-parsing-v2 with origin/master: - Add missing issue606_sdk-migration.test.mjs to verify baseline (master feature) - Correct ordering in ci-test: issue606 before issue680 before issue736 - Restore test/redis-url-parsing.test.mjs (PR CortexReach#704 test, dropped during rebase)
目標
為
memory-lancedb-pro實作 Redis 分散式鎖,解決高並發寫入時的 lock contention 問題。實作摘要
策略:Redis-first + file-lock fallback
RedisUnavailableError)時自動降級到proper-lockfile(本地 file lock)架構:
src/redis-lock.ts(231 行)createRedisLockManager()— 建立 ioredis client,失敗時返回nullRedisUnavailableError— 自訂錯誤類型isRedisConnectionError()— 遞迴檢查 error cause chain(depth=3)RedisLockManager— 封裝 Redis lock 邏輯(acquire/release/setTTL)src/store.ts(+82 行)getRedisLockManager()— module-level singleton,帶_initInProgressflagrunWithFileLock()— Redis-first wrapperrunWithFileLockCore()— file-lock fallback 實作nodeTmpdir()— N2 fix:正確的 function call 而非 property accessCodex adversarial review 修復(對抗辯論後)
初始修復(C1/H1/H2/H4)
getRedisLockManager()initPromise race_initInProgressflag 防止雙重 initSymbol.for跨 module boundary 脆弱instanceof作第一線,Symbol.for 降為 ESM-safe fallbackisRedisConnectionErrordepth=3 未文件化acquire()pre-flight ping → TOCTOU raceSET()自然失敗對抗辯論後追加修復(C1/N5)
createRedisLockManager()_initInProgressflag:T2 spin-wait 而非自己也 start initretryStrategy=null後,operation 拋非標準錯誤RedisUnavailableError否則 fallback 不觸發acquire()catch 補捉 "Connection is closed" 等錯誤訊息,轉為RedisUnavailableError設計亮點
_initInProgressflag(C1 final fix)雙重 guard 檢測 RedisUnavailableError
ioredis 死客戶端檢測(N5 fix)
驗證
node --check src/store.ts src/redis-lock.ts— TypeScript 編譯通過node --test test/redis-lock-error-types.test.ts— 6/6 通過node --test test/redis-lock-concurrent-init.test.ts— 2/2 通過本地測試環境: Redis 未啟動時會優雅降級到 file-lock,測試覆蓋:
getRedisLockManager()只建立一個 client(C1_initInProgressguard)RedisUnavailableError(N5 fix)檔案變更
src/redis-lock.tssrc/store.tstest/redis-lock-error-types.test.tstest/redis-lock-concurrent-init.test.tspackage.jsonpackage-lock.jsonRelated
Closes #662
Related: PR #703(前身,需廢棄)
Generated by OpenClaw — 2026-04-27