fix: prevent completed step from being written to cancelled run#128
Conversation
Add status='leased' guard to persistStep and updateProgress, matching
the existing pattern in completeRun/failRun/renewLease. This closes the
race window where cancel() sets status='cancelled' but persistStep only
checked lease_generation.
Changes:
- persistStep INSERT...SELECT: add AND status = 'leased'
- persistStep step index UPDATE: add .where('status', '=', 'leased')
- updateProgress: add .where('status', '=', 'leased')
- context.ts: emit step:cancel event before throwing CancelledError
when persistStep returns null due to cancellation
- context.ts: update comments to reflect new guard semantics
- Add regression test: cancel between fn() completion and persistStep
Closes #121
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughステップ永続化にリース(status='leased')チェックを追加し、実行中のキャンセル競合を検出して Changes
Sequence Diagram(s)sequenceDiagram
participant Worker as Worker\n(ステップ実行)
participant Controller as Controller\n(cancel / lease/status)
participant Storage as Storage/DB\n(WHERE: id, lease_generation, status='leased')
participant Events as EventEmitter
rect rgba(200,220,255,0.5)
Worker->>Storage: persistStep(insert/update) with lease_generation
end
Controller->>Storage: set run.status='cancelled' (no lease_generation bump)
Controller->>Worker: abort signal
Worker->>Storage: persistStep continuation (re-check WHERE includes status='leased')
alt Storage match fails (run not 'leased' or lease lost)
Storage-->>Worker: 0 rows affected
Worker->>Events: emit step:cancel
Worker-->>Controller: throw CancelledError
else Storage match succeeds
Storage-->>Worker: success
Worker->>Events: emit step:fail / step:complete (with explicit type + error)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use your project's `biome` configuration to improve the quality of JS/TS/CSS/JSON code reviews.Add a configuration file to your project to customize how CodeRabbit runs |
- persistStep now rejects cancelled runs (returns null), so the isCancelled ternary in the savedStep-truthy path is unreachable. Simplified to unconditional step:fail emit. - Remove 'leased' as const casts — string literal is inferred correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/durably/src/context.ts (1)
126-139:⚠️ Potential issue | 🟠 Major
persistStep()の失敗理由をAbortSignalだけで判定しているのが危険です。別インスタンス/別プロセスから
cancelRun()されたケースだと、persistStep()がstatus='cancelled'でnullを返してもcontroller.signal.abortedはfalseのままです。その場合ここはLeaseLostError側に倒れて、step:cancelも emit されません。savedStep === nullのときはgetRun()で現在のstatusを再確認するか、persistStep()から cancellation / lease-loss を区別できる結果を返したいです。Also applies to: 174-198
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/context.ts` around lines 126 - 139, When persistStep(...) returns null you must not assume controller.signal.aborted is the only reason; modify the code paths around persistStep (the block using persistStep, abortForLeaseLoss, throwIfAborted and the similar block at 174-198) to disambiguate cancellation vs lease loss by calling getRun(run.id) and checking run.status (or extend persistStep to return a discriminated result like {type: 'cancelled'|'lease-lost'|'ok', step}): if getRun shows status === 'cancelled' treat it as a cancellation (emit step:cancel and handle accordingly), otherwise treat it as lease loss (invoke abortForLeaseLoss/throw LeaseLostError). Ensure both occurrences use the same disambiguation logic so cancelRun-triggered cancellations are handled correctly.packages/durably/src/storage.ts (1)
931-955:⚠️ Potential issue | 🔴 CriticalUPDATE の実行結果を検証して、ガード条件失敗時に rollback すること。
INSERT ... SELECTの後にUPDATE durably_runs ... status = 'leased'が 0 行更新となった場合、トランザクション全体では成功するが、current_step_indexが進まずに step だけが残る可能性があります。ガード条件(status と lease_generation)が満たされなくなった場合の consistency を保つため、UPDATE のnumUpdatedRowsを検証し、0 だった場合はエラー throw で rollback させてください。修正イメージ
if (input.status === 'completed') { - await trx + const advanceResult = await trx .updateTable('durably_runs') .set({ current_step_index: input.index + 1, updated_at: completedAt, }) .where('id', '=', runId) .where('status', '=', 'leased' as const) .where('lease_generation', '=', leaseGeneration) - .execute() + .executeTakeFirst() + + if (Number(advanceResult.numUpdatedRows) === 0) { + throw new Error('persist-step-guard-failed') + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 931 - 955, After performing the INSERT ... SELECT into durably_steps (insertResult), inspect the result of the subsequent trx.updateTable('durably_runs').set(...).where(...).execute() (assign it to e.g. updateResult) and if Number(updateResult.numUpdatedRows) === 0 throw an error to force the transaction to rollback; this ensures that if the guard conditions (status = 'leased' and matching lease_generation) no longer hold and the run row wasn't advanced, the step insert is not left inconsistent. Use the same descriptive context in the thrown error so logs show it came from the current_step_index advance step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/durably/tests/shared/step.shared.ts`:
- Around line 674-730: テストはローカルの中断経路でしか動いておらず、persistStep() が DB 側で null
を返す分岐(savedStep === null / step:cancel)を検証していません。修正案:既存のケースを残しつつ新しいテストを追加して、1)
別の Durably インスタンスから同じ run を cancel() して外部キャンセル経路を作る(参照シンボル: createDurably,
d.cancel, jobs.raceDef.trigger, getRun, storage.getSteps)か、2) persistStep をスパイして
null を返すようにして DB ガード経路を強制する(参照シンボル: persistStep, context.ts の throwIfAborted および
savedStep === null 分岐)。どちらかを採用して、run.status が 'cancelled' であることと completed
ステップが増えないことをアサートしてください。
---
Outside diff comments:
In `@packages/durably/src/context.ts`:
- Around line 126-139: When persistStep(...) returns null you must not assume
controller.signal.aborted is the only reason; modify the code paths around
persistStep (the block using persistStep, abortForLeaseLoss, throwIfAborted and
the similar block at 174-198) to disambiguate cancellation vs lease loss by
calling getRun(run.id) and checking run.status (or extend persistStep to return
a discriminated result like {type: 'cancelled'|'lease-lost'|'ok', step}): if
getRun shows status === 'cancelled' treat it as a cancellation (emit step:cancel
and handle accordingly), otherwise treat it as lease loss (invoke
abortForLeaseLoss/throw LeaseLostError). Ensure both occurrences use the same
disambiguation logic so cancelRun-triggered cancellations are handled correctly.
In `@packages/durably/src/storage.ts`:
- Around line 931-955: After performing the INSERT ... SELECT into durably_steps
(insertResult), inspect the result of the subsequent
trx.updateTable('durably_runs').set(...).where(...).execute() (assign it to e.g.
updateResult) and if Number(updateResult.numUpdatedRows) === 0 throw an error to
force the transaction to rollback; this ensures that if the guard conditions
(status = 'leased' and matching lease_generation) no longer hold and the run row
wasn't advanced, the step insert is not left inconsistent. Use the same
descriptive context in the thrown error so logs show it came from the
current_step_index advance step.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1e1f5b43-76b4-46bd-b281-616088a3c2e7
📒 Files selected for processing (3)
packages/durably/src/context.tspackages/durably/src/storage.tspackages/durably/tests/shared/step.shared.ts
Use direct DB update to simulate external cancel (bypassing in-process abort signal), so the test verifies persistStep's status='leased' guard actually rejects the write — not just the throwIfAborted() fast path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/durably/src/context.ts`:
- Around line 126-128: The code treats savedStep === null as lease loss based
only on controller.signal.aborted, which misclassifies DB-side cancellations;
change the logic around storage.persistStep and the LeaseLostError path so that
when savedStep === null you re-query the authoritative DB state (e.g. call
storage.getRun(run.id) or storage.getStep(run.id, stepId)) and if that record is
marked cancelled prefer treating it as a cancellation (emit the
run:cancel/step:cancel flow) instead of throwing LeaseLostError; only fall back
to LeaseLostError when the DB shows the lease was actually reclaimed and not a
cancellation, and keep references to persistStep, savedStep, LeaseLostError,
controller.signal.aborted to locate and update the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8c82fd83-ed78-41f2-96db-05e13b794c44
📒 Files selected for processing (2)
packages/durably/src/context.tspackages/durably/src/storage.ts
When persistStep returns null, check the DB for the run's actual status to distinguish cancel from lease loss. This handles external cancellations (from another process) where the in-process abort signal isn't set. Extracted throwForRefusedStep() helper to eliminate 3x duplicated logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes a race condition where
persistStep()could write a completed step to an already-cancelled run. The bug existed becausepersistSteponly checkedlease_generation, whilecancelRunchangesstatusbut notlease_generation.Root Cause
completeRun,failRun, andrenewLeaseall guard with bothstatus = 'leased'ANDlease_generation. ButpersistStepandupdateProgressonly checkedlease_generation, creating a window where cancellation goes unnoticed.Fix
Add
status = 'leased'guard topersistStep(INSERT and UPDATE) andupdateProgress, matching the existing pattern. Also properly emitstep:cancelevent when persistStep is correctly refused due to cancellation.Test
New regression test: creates a step where
fn()completes, cancel happens beforepersistStep, verifies no completed step is written.Closes #121
🤖 Generated with Claude Code
Found by Codex (GPT-5.4) code review
Summary by CodeRabbit
バグ修正
テスト