Skip to content

Phase 1: Lease-based runtime model#101

Merged
coji merged 33 commits into
mainfrom
phase1-runtime-exploration
Mar 8, 2026
Merged

Phase 1: Lease-based runtime model#101
coji merged 33 commits into
mainfrom
phase1-runtime-exploration

Conversation

@coji
Copy link
Copy Markdown
Owner

@coji coji commented Mar 8, 2026

Summary

  • Implement lease-based runtime model (leased status, lease_owner, lease_expires_at) with processOne() / processUntilIdle() APIs
  • Add PostgreSQL-specific claim path using FOR UPDATE SKIP LOCKED + pg_advisory_xact_lock for atomicity and concurrency-key serialization
  • Split storage into QueueStore / CheckpointStore with per-backend adapter tests (PostgreSQL, SQLite, libSQL, SQLocal)
  • Fix PostgreSQL double-claim bug (CTE MATERIALIZED snapshot isolation) and concurrency-key race (advisory lock + retry loop)
  • Rewrite all Japanese RFC translations for readability

Key design decisions

  • Advisory locks for concurrency keys: FOR UPDATE SKIP LOCKED alone can't prevent two workers from claiming different rows with the same key. Advisory locks serialize per key; retry loop ensures unrelated work isn't suppressed.
  • No SERIALIZABLE: Advisory locks are targeted (only affect concurrency-key rows) vs SERIALIZABLE which can cause spurious conflicts across unrelated claims.
  • CTE removal: PostgreSQL MATERIALIZED CTEs use independent snapshots, causing the same rows to be visible to concurrent transactions.

Test plan

  • pnpm validate passes
  • All PostgreSQL tests pass (15/15, including previously skipped skipDirectConcurrentClaimRace)
  • All SQLite/libSQL/SQLocal tests pass
  • Stale-owner e2e tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • PostgreSQL dev environment via Docker Compose and new db start/stop scripts
    • Browser-local singleton detection and per-tab init with warning
    • New runtime methods for single-run and batch processing: processOne() and processUntilIdle()
  • Architecture & API Updates

    • Unified Store model and lease-based ownership; status renamed from "running" → "leased"
    • Event renamed: run:start → run:leased
    • Config keys renamed: pollingInterval → pollingIntervalMs; heartbeatInterval → leaseRenewIntervalMs; added leaseMs and preserveSteps
  • React Updates

    • Hook/property rename: isRunning → isLeased
  • Tests & Tooling

    • Added extensive cross-backend concurrency/semantics/stress tests and dialect helpers
  • Chores

    • Added npm scripts for DB workflows and test adjustments for Postgres support

coji and others added 4 commits March 8, 2026 08:39
Replace the MATERIALIZED CTE approach with a 3-step claim pattern:
SELECT FOR UPDATE SKIP LOCKED → pg_advisory_xact_lock → UPDATE.
This fixes both the CTE snapshot isolation bug (double-claiming the
same row) and the concurrency-key race (two workers claiming different
rows with the same key). Enable the previously skipped direct
concurrent claim test for PostgreSQL. Update RFC docs accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When advisory lock re-verification detects a concurrency-key conflict,
exclude that key and loop to find the next eligible candidate within
the same transaction. This prevents premature idle when unrelated
pending work is still claimable. Also tighten RFC wording that
overstated what was proven.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite all 7 Japanese RFC documents from scratch based on the
English originals. Key improvements: consistent です/ます style,
natural Japanese phrasing instead of literal translations, unified
technical terminology, relative link paths, and Phase 1 advisory
lock findings reflected in database-claim-patterns-ja.md.
Also include minor English doc updates for database-runtime-fit.md
and phase1-exploration-notes.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
durably-demo Ready Ready Preview Mar 8, 2026 8:15am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 8, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Implements a lease-based runtime and storage rearchitecture: replaces heartbeat/running semantics with lease/leased, introduces a unified public Store interface and queue primitives (processOne/processUntilIdle), adds Postgres test/CI support and migrations, and updates events, tests, docs, React bindings, examples, and test helpers accordingly.

Changes

Cohort / File(s) Summary
Docker & CI / NPM
docker-compose.postgres.yml, .github/workflows/ci.yml, package.json, packages/durably/package.json
Add Postgres 16-alpine service with healthcheck, CI PostgreSQL integration, db:up/db:down scripts and Postgres-focused test scripts; update built deps.
Runtime RFCs & Docs
docs/rfcs/runtime-rearchitecture/..., docs/rfcs/runtime-rearchitecture/ja/*
Large RFC and localization updates: introduce unified Store<TLabels>, leaseGeneration fencing, processOne/processUntilIdle as portable contract, claim/lease patterns, Phase 1 exploration notes, and deployment × DB matrix.
Schema & Migrations
packages/durably/src/schema.ts, packages/durably/src/migrations.ts
Schema evolution: add lease_owner, lease_expires_at, lease_generation, updated_at; replace running with leased; bump LATEST_SCHEMA_VERSION and add migrations to migrate legacy heartbeat data.
Storage API & Implementation
packages/durably/src/storage.ts, packages/durably/src/index.ts
Introduce RunStatus, lease fields on Run, unify queue/checkpoint into exported Store<TLabels>, rename methods (createRunenqueue, batchCreateRunsenqueueMany, claimNextPendingRunclaimNext), add createKyselyStore and checkpoint helpers, add lease-managed operations (renewLease, completeRun, persistStep, etc.).
Core Runtime & Worker
packages/durably/src/durably.ts, packages/durably/src/worker.ts, packages/durably/src/context.ts
Add queue-driven APIs (processOne, processUntilIdle), browser-local singleton/tab-lock handling, lease renewal loop, lease-loss abort (LeaseLostError), StepContext cooperative cancellation and abortLeaseOwnership, plus worker simplification delegating to processOne.
Events & Errors
packages/durably/src/events.ts, packages/durably/src/errors.ts, packages/durably/src/index.ts
Replace run:start with run:leased/RunLeasedEvent (add leaseOwner/leaseExpiresAt), add run:lease-renewed, export LeaseLostError, update public exports for storage types and store factory.
Tests & Test Helpers
packages/durably/tests/helpers/*, packages/durably/tests/shared/*, packages/durably/tests/node/*, packages/durably/tests/browser/*
Add Postgres/SQLite/libSQL/browser dialect helpers, new local-sqlite/better-sqlite3 support, reusable concurrency/semantics/stress test harnesses, many node/browser E2E tests validating leases, reclaim, migrations, and processUntilIdle semantics.
React bindings & Hooks
packages/durably-react/src/**, packages/durably-react/tests/**
Public API rename isRunningisLeased, status unions include leased, SSE/event handling to run:leased, auto-resume prefers leased, and polling option renames to pollingIntervalMs (plus lease config options documented).
Examples & Website Docs
examples/**, website/**, packages/durably-react/README.md, CLAUDE.md
Update examples and docs to lease-based lifecycle, new option names (pollingIntervalMs, leaseRenewIntervalMs, leaseMs, preserveSteps), event/status renames, and updated API docs and guides.
Tests / Test Configs & Shared Test updates
packages/durably/tests/shared/*, multiple test files
Refactor test suites and helpers to assert lease semantics, rename API usages in tests (createRunenqueue, createSteppersistStep), update polling option names in helpers, and add migration/step-preservation tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Runtime as Durably Runtime
    participant Store as Unified Store
    participant Worker as Worker (processOne)
    participant Checkpoint as Checkpoint/Steps

    Client->>Runtime: trigger(input)
    Runtime->>Store: enqueue(input)

    loop Worker processing
        Worker->>Store: claimNext(workerId, now, leaseMs)
        Note over Store: Acquire lease (leaseGeneration, leaseOwner, leaseExpiresAt)
        Store-->>Worker: Run (status: leased)
        Worker->>Checkpoint: getSteps(runId)
        Checkpoint-->>Worker: Steps[]
        Worker->>Worker: execute job (StepContext with leaseGeneration)
        Worker->>Store: persistStep(runId, leaseGeneration, input)
        Worker->>Store: completeRun(runId, leaseGeneration, output, completedAt)
        Worker->>Store: renewLease(...) or releaseExpiredLeases()
    end

    Client->>Runtime: getRun(runId)
    Runtime->>Store: getRun(runId)
    Store-->>Runtime: Run (status/lease fields)
Loading
sequenceDiagram
    participant R1 as Runtime A
    participant R2 as Runtime B
    participant Store as Unified Store

    R1->>Store: claimNext(A, now, leaseMs)
    R2->>Store: claimNext(B, now, leaseMs)
    Note over Store: Concurrency-key advisory lock / SKIP LOCKED ensures single winner
    Store-->>R1: Run (leased by A)
    Store-->>R2: null

    R1->>Store: renewLease(runId, leaseGeneration, now, leaseMs)
    Note over Store: Lease extended

    Note over Store: Lease expires
    R2->>Store: claimNext(B, now2, leaseMs)
    Store-->>R2: Run (reclaimed by B)

    R1->>Store: completeRun(runId, oldLeaseGeneration, ...)
    Store-->>R1: false (stale owner)
    R2->>Store: completeRun(runId, newLeaseGeneration, ...)
    Store-->>R2: true (completed)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through schemas, leases in my paws,
Enqueued a job and watched the store applause,
A lease was claimed — then renewed with grace,
One rabbit runtime won the concurrency race,
Migrations and tests tidied up the place.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch phase1-runtime-exploration

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (14)
packages/durably/src/errors.ts (1)

16-20: Expose runId as data on LeaseLostError.

Since this is now a public error type, callers should not have to parse the message text to learn which run lost its lease.

♻️ Proposed fix
 export class LeaseLostError extends Error {
+  readonly runId: string
+
   constructor(runId: string) {
     super(`Lease ownership was lost: ${runId}`)
     this.name = 'LeaseLostError'
+    this.runId = runId
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/errors.ts` around lines 16 - 20, The LeaseLostError
currently only encodes the runId in the message; update the LeaseLostError class
to store the runId as a public property so callers can read it without parsing
the message: add a public readonly runId: string field on the LeaseLostError
class and assign it in the constructor (while keeping the existing message and
name behavior), so callers can access error.runId directly from instances
created by new LeaseLostError(runId).
docs/rfcs/runtime-rearchitecture/core-runtime.md (1)

500-501: Minor: Consider simplifying phrasing.

"in support of runtime execution" could be shortened to "for runtime execution" or "supporting runtime execution".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` around lines 500 - 501, The
phrase "in support of runtime execution" in the sentence about "Claim and
reclaim become adapter-level semantics" is wordy—replace it with a simpler
alternative such as "for runtime execution" or "supporting runtime execution" to
improve clarity; update the line mentioning "Claim and reclaim" (and adjacent
line referencing `processOne()`) to use the chosen shorter phrasing
consistently.
packages/durably/src/worker.ts (1)

70-87: Consider returning an already-resolved Promise when no work is in-flight.

When stop() is called with no in-flight work, it implicitly returns undefined rather than a resolved Promise. While this works due to async semantics, explicitly returning Promise.resolve() would be clearer for callers expecting a Promise.

♻️ Optional: Explicit Promise return
     async stop(): Promise<void> {
       if (!running) {
         return
       }

       running = false

       if (pollingTimeout) {
         clearTimeout(pollingTimeout)
         pollingTimeout = null
       }

       if (inFlight) {
         return new Promise<void>((resolve) => {
           stopResolver = resolve
         })
       }
+
+      return Promise.resolve()
     },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/worker.ts` around lines 70 - 87, The stop() async method
can return undefined when there is no in-flight work; change it to explicitly
return a resolved Promise for clarity: after setting running = false and
clearing pollingTimeout, if inFlight is false return Promise.resolve(); keep the
existing branch that sets stopResolver and returns a new Promise when inFlight
is true. Update the stop() implementation to reference the same symbols
(running, pollingTimeout, inFlight, stopResolver) and return Promise.resolve()
in the no-inFlight case.
packages/durably/tests/node/db-stress.libsql.test.ts (1)

7-13: Add cleanup to remove the temporary database file.

Unlike the local-sqlite test which has a cleanup function to remove the temp file, this libSQL test leaves the file behind. This can accumulate test artifacts in the temp directory.

♻️ Proposed fix to add cleanup
+import { rm } from 'node:fs/promises'
 import { randomUUID } from 'node:crypto'
 import { tmpdir } from 'node:os'
 import { join } from 'node:path'
 import { createNodeDialectForFile } from '../helpers/node-dialect'
 import { createDbStressTests } from '../shared/db-stress.shared'

 createDbStressTests('libsql', () => {
   const filePath = join(tmpdir(), `durably-libsql-shared-${randomUUID()}.db`)

   return {
     createDialect: () => createNodeDialectForFile(filePath),
+    cleanup: async () => {
+      await rm(filePath, { force: true })
+    },
   }
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/node/db-stress.libsql.test.ts` around lines 7 - 13,
The libsql stress test creates a temp DB file but doesn't remove it; update the
object returned from createDbStressTests('libsql', ...) to include a cleanup
function that deletes the temp filePath (the one built with tmpdir() and
randomUUID()) after tests complete. Implement cleanup that checks for the file's
existence and removes it (e.g., fs.unlinkSync or fs.rmSync with force) and
export it alongside createDialect (refer to createDialect and
createNodeDialectForFile) so the test harness can call it.
packages/durably/tests/browser/db-stress.test.ts (1)

1-41: Consider reusing the shared stress test harness.

This browser test duplicates much of the logic from db-stress.shared.ts. The main difference is the browser dialect. Consider adapting createDbStressTests to accept a browser dialect factory, which would reduce duplication and ensure tests stay in sync.

That said, if browser-specific quirks require a separate implementation, this standalone approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/browser/db-stress.test.ts` around lines 1 - 41, The
test duplicates shared stress-test logic; update the shared harness to accept a
browser dialect factory: modify createDbStressTests to take a parameter like
browserDialectFactory (or a generic dialectFactory) and use
createBrowserDialectForName when invoking it from the browser test, then replace
the duplicated setup in db-stress.test.ts with a call to
createDbStressTests(dialectFactory) so the browser-specific dialect is injected
while preserving any browser-specific tweaks; reference createDbStressTests,
createBrowserDialectForName, and the runtimes/migrate/deleteFrom setup to locate
and consolidate the logic.
packages/durably/tests/shared/db-stress.shared.ts (1)

20-31: Consider migrating all runtimes or documenting single-migration assumption.

Only runtimes[0].migrate() is called, assuming all runtimes share the same underlying database. This works for file-backed SQLite and shared PostgreSQL schemas, but it's worth noting that this assumption is implicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/shared/db-stress.shared.ts` around lines 20 - 31, Test
setup calls runtimes[0].migrate() implicitly assuming a shared DB; either
migrate every runtime or document the single-migration assumption. Fix by
replacing the single call to runtimes[0].migrate() in the beforeEach (where
runtimes is created via createDurably) with a loop that awaits migrate() on each
runtime, or add an explicit comment in the beforeEach noting that only
runtimes[0] is migrated because all runtimes share the same underlying database
(file-backed SQLite or shared PostgreSQL schema).
packages/durably/src/events.ts (1)

34-45: Clarify the deprecation comment.

The JSDoc states RunStartEvent is a "Deprecated compatibility alias for run:leased," but RunStartEvent uses type: 'run:start', not 'run:leased'. These are semantically distinct event types. If RunStartEvent is being deprecated in favor of RunLeasedEvent, consider rewording to clarify the migration path (e.g., "Deprecated: prefer RunLeasedEvent for new code").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/events.ts` around lines 34 - 45, The JSDoc on
RunStartEvent is misleading: update the comment for the RunStartEvent interface
to clearly mark it as deprecated and point to the replacement RunLeasedEvent (or
whatever the new type is) and clarify the migration path; reference the
RunStartEvent interface and the replacement RunLeasedEvent/type name so readers
know to prefer RunLeasedEvent for new code and that RunStartEvent (with type:
'run:start') remains for compatibility only.
packages/durably/tests/shared/db-concurrency.shared.ts (1)

202-267: Potential resource leak: runtimes array not destroyed in this test.

This test creates separate runtimeA and runtimeB instances and destroys them at line 266, but the shared runtimes array (created in beforeEach) is still active. The afterEach hook will destroy runtimes, but since this test also calls runtimeA.migrate() and clears tables via runtimeA.db, the four runtimes instances remain idle but connected throughout test execution.

Consider either:

  1. Reusing runtimes[0] and runtimes[1] with job registration instead of creating new instances, or
  2. Destroying the shared runtimes early in this test if they're not needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/shared/db-concurrency.shared.ts` around lines 202 -
267, The test creates its own runtimeA/runtimeB but leaves the shared runtimes
array from beforeEach alive, risking leaked DB connections; either reuse the
existing runtimes by registering concurrencyJob on runtimes[0] and runtimes[1]
(instead of createDurably) or explicitly destroy the shared runtimes early in
this test (call await runtimes.forEach(async r => r.db.destroy()/r.destroy()) or
similar) before creating runtimeA/runtimeB so only the intended instances remain
active; update the test to use the chosen approach and ensure cleanup
(runtimeA/runtimeB and any shared runtimes) so no idle connections persist.
packages/durably/tests/browser/singleton-warning.test.ts (1)

8-13: Consider using Promise.allSettled for more robust cleanup.

If one runtime's stop() or destroy() throws, the remaining runtimes won't be cleaned up. Using Promise.allSettled would ensure all cleanup attempts complete regardless of individual failures.

♻️ Optional: More robust cleanup
   afterEach(async () => {
-    await Promise.all(runtimes.map((runtime) => runtime.stop()))
-    await Promise.all(runtimes.map((runtime) => runtime.db.destroy()))
+    await Promise.allSettled(runtimes.map((runtime) => runtime.stop()))
+    await Promise.allSettled(runtimes.map((runtime) => runtime.db.destroy()))
     runtimes.length = 0
     vi.restoreAllMocks()
   })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/browser/singleton-warning.test.ts` around lines 8 -
13, The afterEach cleanup currently uses Promise.all for runtimes.map(runtime =>
runtime.stop()) and runtimes.map(runtime => runtime.db.destroy()), which aborts
remaining cleanup if one promise rejects; change both to Promise.allSettled so
every runtime's stop() and db.destroy() is attempted, then inspect the settled
results and log or rethrow any failures as appropriate (use the runtimes
variable and the afterEach hook, and keep vi.restoreAllMocks() as-is) — this
ensures afterEach always tries to clean up all runtimes even if some
stop()/db.destroy() calls fail.
packages/durably/src/index.ts (1)

61-71: Be careful turning Phase 1 store shapes into root-level API.

The RFCs in this PR position processOne() as the portability contract and the queue/checkpoint split as adapter internals. Re-exporting QueueStore, CheckpointStore, and Storage from the root barrel makes those exploratory shapes part of the supported semver surface for all consumers. If third-party adapter authoring is not intentionally first-class yet, prefer an explicit experimental namespace or a deeper import path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/index.ts` around lines 61 - 71, This re-exports
QueueStore, CheckpointStore, and Storage at the package root, which promotes
exploratory adapter shapes to the public semver surface; instead remove these
three types from the root export list and surface them under an explicit
experimental/deep import instead (e.g., export them from an `experimental`
namespace or from a deeper module path), leaving only the intended portability
contract (e.g., processOne and public runtime types) at the top level; update
any internal references to QueueStore/CheckpointStore/Storage to import from the
new experimental/deep path so third-party consumers aren’t accidentally locked
into these adapter shapes.
packages/durably/tests/node/phase1-exploration.test.ts (1)

7-11: Drain runtimes in afterEach().

With two tests in this file, Lines 9-10 stop the first runtime again during the second teardown because the array is never cleared. That makes cleanup order-dependent for no benefit.

♻️ Suggested cleanup
   afterEach(async () => {
-    await Promise.all(runtimes.map((runtime) => runtime.stop()))
+    await Promise.all(runtimes.splice(0).map((runtime) => runtime.stop()))
   })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/node/phase1-exploration.test.ts` around lines 7 - 11,
The afterEach teardown currently stops all runtimes in the shared runtimes array
but never clears it, causing duplicate stop attempts across tests; update the
afterEach handler (the runtimes array and the async () => runtimes.map(runtime
=> runtime.stop()) block) to both await stopping all runtimes and then clear the
runtimes array (e.g., set runtimes.length = 0 or splice(0)) so each test’s
cleanup only affects that test’s entries.
packages/durably/src/context.ts (1)

189-192: Progress update is fire-and-forget but should handle errors.

The progress persistence call storage.checkpoint.updateProgress(run.id, progressData) is not awaited (intentional fire-and-forget), but any rejection will become an unhandled promise rejection.

Consider adding a .catch() to silently swallow or log errors:

🔧 Proposed fix to handle potential rejection
     progress(current: number, total?: number, message?: string): void {
       const progressData = { current, total, message }
       // Fire and forget - don't await
-      storage.checkpoint.updateProgress(run.id, progressData)
+      storage.checkpoint.updateProgress(run.id, progressData).catch(() => {
+        // Silently ignore progress update failures - they're non-critical
+      })
       // Emit progress event
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/context.ts` around lines 189 - 192, The progress method
currently calls storage.checkpoint.updateProgress(run.id, progressData) without
awaiting, which can produce unhandled promise rejections; update the progress
method to keep the fire-and-forget behavior but append a .catch(...) to the
returned promise (or otherwise handle rejection) so errors are either silently
swallowed or logged via the existing logger; modify the progress method around
storage.checkpoint.updateProgress to include this rejection handling while
preserving run.id and progressData usage.
packages/durably/tests/node/stale-owner.e2e.test.ts (1)

29-38: Timing configuration may cause flakiness in CI environments.

The 25ms leaseMs combined with 40ms sleeps (lines 76, 123, 179, 243) provides only ~15ms margin for lease expiration detection. Under CI load or slow I/O, this margin may be insufficient.

Consider either:

  1. Increasing leaseMs to 50-100ms with proportionally larger sleep buffers
  2. Using a polling assertion (e.g., vi.waitFor) to confirm lease expiration rather than a fixed delay
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/node/stale-owner.e2e.test.ts` around lines 29 - 38,
The test uses very short leases in createDurably (runtimeA and runtimeB with
leaseMs: 25) which makes CI flaky; update the lease configuration to a more
robust value (e.g., set leaseMs to 100 and keep leaseRenewIntervalMs
proportional) and either increase the fixed sleep times accordingly or replace
fixed delays with a polling assertion (e.g., vi.waitFor) to wait for the lease
to expire/owner change; locate the createDurably calls and the corresponding
test waits/sleeps and apply the increased leaseMs or switch the sleeps to
polling checks that verify lease expiration/owner change.
packages/durably/tests/node/process-until-idle.serverless.test.ts (1)

72-82: Test validates quick return but doesn't register any jobs.

The test calls runtime.migrate() without registering any jobs, then calls processUntilIdle(). This works because there's simply no work in the queue, but consider adding a comment clarifying this is intentional (testing empty queue behavior with no registered jobs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/node/process-until-idle.serverless.test.ts` around
lines 72 - 82, The test 'returns quickly when there is no claimable work' calls
runtime.migrate() and then runtime.processUntilIdle() without registering any
jobs, which is intentional but not documented; add an inline comment inside this
test (near the runtime.migrate() or before the processUntilIdle() call) stating
that the absence of registered jobs is deliberate and the test is verifying
empty-queue behavior for processUntilIdle(), referencing the test name and the
runtime.processUntilIdle() invocation so future readers understand this is not a
missing setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/rfcs/runtime-rearchitecture/deployment-models.md`:
- Around line 296-314: Update the Browser-Local Mode guidance to explicitly
state that OPFS-backed SQLite requires a Secure Context (HTTPS or localhost); in
the section mentioning "browser-local database runtime such as SQLocal /
OPFS-backed SQLite" add a short note that OPFS is only available in secure
contexts and that attempts to initialize OPFS on plain HTTP will fail, and
advise developers to use HTTPS or localhost or fall back to other storage
options when presenting how to create and share a single Durably instance per
tab.

In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md`:
- Around line 206-209: The paragraph ambiguously implies a new default of
retaining step history; explicitly state whether Phase 1 changes the existing
cleanup behavior or not: if Phase 1 does NOT change the contract, update the
text to say "This does not change the existing cleanupSteps contract; adapters
must continue deleting step output data at terminal state per cleanupSteps," and
if Phase 1 DOES change it, state "Phase 1 changes the default: step history is
retained by default and cleanupSteps semantics are updated to require explicit
cleanup policies," then add a short note for adapter implementors to follow the
revised contract; reference the cleanupSteps contract and "Phase 1" so readers
know which behavior applies.

In `@docs/rfcs/runtime-rearchitecture/README.md`:
- Around line 67-68: The README contains filesystem-absolute Markdown links like
[core-runtime.md](/Users/coji/progs/oss/durably/docs/rfcs/runtime-rearchitecture/core-runtime.md);
replace those with repo-relative links (e.g., [core-runtime.md](core-runtime.md)
or the appropriate relative path under docs/rfcs/runtime-rearchitecture/) so
links work in other checkouts and on GitHub, and apply the same change to the
other absolute links in this file (the other occurrences in the same README).

In `@packages/durably/src/durably.ts`:
- Around line 466-505: renewLease currently returns a boolean and the local
timer computes renewedLeaseExpiresAt from Date.now(), which can drift past the
persisted DB expiry; change queue.renewLease to return the persisted lease
expiry (e.g., an ISO timestamp or millis) and, in the leaseTimer callback (where
renewLease is called), use that returned expiry value to call
scheduleLeaseDeadline(...) and to populate leaseExpiresAt in the
'run:lease-renewed' event (reference: leaseTimer, queue.renewLease,
scheduleLeaseDeadline, renewedLeaseExpiresAt, state.leaseMs, run.id); also
update any types/signatures and error handling to accept the new return shape.
- Around line 507-591: The finally block always deletes persisted step outputs
via checkpoint.deleteSteps(run.id) when state.preserveSteps is false, which can
wipe checkpoints on non-terminal exits (e.g., LeaseLostError or when
queue.completeRun/queue.failRun returned false). Change the cleanup so deletion
only occurs when the run actually reached a terminal state and cleanupSteps is
enabled: use the boolean results already returned by queue.completeRun
(completed) and queue.failRun (failed) or track a terminal flag set when either
completes successfully, and only call checkpoint.deleteSteps(run.id) if
!state.preserveSteps && cleanupStepsEnabled && terminalReached; keep clearing
timers and calling dispose() unchanged. Ensure you reference LeaseLostError,
completeRun, failRun, checkpoint.deleteSteps, state.preserveSteps and run.id
when making the change.

In `@packages/durably/src/migrations.ts`:
- Around line 131-139: The migration currently converts legacy rows by running
db.updateTable('durably_runs').set({status: 'leased', lease_owner:
'migration:v2', lease_expires_at: null}).where('status', '=', 'running') which
incorrectly creates non-expiring synthetic leases and strands in-flight work;
before applying this update, add a check for any rows with status 'running'
(query durably_runs WHERE status = 'running') and if any exist either abort the
migration by throwing an error or map them to a safe deterministic state (e.g.,
a terminal or requeueable status) instead of setting lease_expires_at to
null—ensure the change is made in the same migration function in migrations.ts
and remove the set of lease_expires_at = null when choosing to abort or remap.

In `@packages/durably/src/storage.ts`:
- Around line 675-693: The renewLease implementation in renewLease currently
only checks status and lease_owner allowing stale processes to renew after
lease_expires_at; update renewLease (in packages/durably/src/storage.ts) to also
require the current lease not be expired by adding a condition that
lease_expires_at > now (compare the DB column lease_expires_at against the
incoming now string) in the query before performing the update so renewals are
rejected when the lease has already expired.
- Around line 762-774: The cancelRun update is not conditional and can overwrite
a run that completed between the prior read and this write; change cancelRun to
perform an atomic conditional update by adding a WHERE clause that only updates
when the run is not already in a terminal state (e.g., WHERE status NOT IN
('completed','failed','cancelled')) and (optionally) when the lease_owner or
expected status matches the prior read; have the method check the number of
affected rows (or return a boolean) so callers (e.g., durably.cancel) can detect
whether the transition succeeded and avoid clobbering terminal data.
- Around line 341-344: The getRuns label-filtering logic in
createKyselyQueueStore currently hardcodes json_extract(...) (SQLite) which
breaks when backend === 'postgres'; update createKyselyQueueStore (and the
getRuns implementation) to use a dialect-aware JSON extractor: either branch on
backend === 'postgres' to emit "column ->> 'key'" for Postgres and
json_extract(column, '$.key') for others, or add a helper function (e.g.,
jsonExtract(column, key) used by getRuns) that returns the correct SQL snippet
based on DatabaseBackend; ensure the DatabaseBackend parameter passed into
createKyselyQueueStore is used for this decision so label filters work on
PostgreSQL.

In `@packages/durably/tests/helpers/local-sqlite-dialect.ts`:
- Around line 11-12: Add a busy timeout to the local test SQLite connection so
concurrent writers don't get immediate SQLITE_BUSY; after creating the Database
(the sqlite variable) and setting WAL via sqlite.pragma('journal_mode = WAL'),
call the SQLite busy timeout PRAGMA (e.g., set sqlite.pragma('busy_timeout =
5000') or similar timeout in milliseconds) to allow retries under contention.

In `@packages/durably/tests/node/db-semantics.postgres.test.ts`:
- Line 4: The postgres test inconsistently invokes the factory instead of
passing it like the others; change the call to createDbSemanticsTests so it
passes the factory function createPostgresSchemaResource (remove the
parentheses) to match how createNodeDialect, createLocalSqliteDialect, and
createBrowserDialect are passed and ensure the harness receives the factory
function rather than a pre-invoked resource.

---

Nitpick comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Around line 500-501: The phrase "in support of runtime execution" in the
sentence about "Claim and reclaim become adapter-level semantics" is
wordy—replace it with a simpler alternative such as "for runtime execution" or
"supporting runtime execution" to improve clarity; update the line mentioning
"Claim and reclaim" (and adjacent line referencing `processOne()`) to use the
chosen shorter phrasing consistently.

In `@packages/durably/src/context.ts`:
- Around line 189-192: The progress method currently calls
storage.checkpoint.updateProgress(run.id, progressData) without awaiting, which
can produce unhandled promise rejections; update the progress method to keep the
fire-and-forget behavior but append a .catch(...) to the returned promise (or
otherwise handle rejection) so errors are either silently swallowed or logged
via the existing logger; modify the progress method around
storage.checkpoint.updateProgress to include this rejection handling while
preserving run.id and progressData usage.

In `@packages/durably/src/errors.ts`:
- Around line 16-20: The LeaseLostError currently only encodes the runId in the
message; update the LeaseLostError class to store the runId as a public property
so callers can read it without parsing the message: add a public readonly runId:
string field on the LeaseLostError class and assign it in the constructor (while
keeping the existing message and name behavior), so callers can access
error.runId directly from instances created by new LeaseLostError(runId).

In `@packages/durably/src/events.ts`:
- Around line 34-45: The JSDoc on RunStartEvent is misleading: update the
comment for the RunStartEvent interface to clearly mark it as deprecated and
point to the replacement RunLeasedEvent (or whatever the new type is) and
clarify the migration path; reference the RunStartEvent interface and the
replacement RunLeasedEvent/type name so readers know to prefer RunLeasedEvent
for new code and that RunStartEvent (with type: 'run:start') remains for
compatibility only.

In `@packages/durably/src/index.ts`:
- Around line 61-71: This re-exports QueueStore, CheckpointStore, and Storage at
the package root, which promotes exploratory adapter shapes to the public semver
surface; instead remove these three types from the root export list and surface
them under an explicit experimental/deep import instead (e.g., export them from
an `experimental` namespace or from a deeper module path), leaving only the
intended portability contract (e.g., processOne and public runtime types) at the
top level; update any internal references to QueueStore/CheckpointStore/Storage
to import from the new experimental/deep path so third-party consumers aren’t
accidentally locked into these adapter shapes.

In `@packages/durably/src/worker.ts`:
- Around line 70-87: The stop() async method can return undefined when there is
no in-flight work; change it to explicitly return a resolved Promise for
clarity: after setting running = false and clearing pollingTimeout, if inFlight
is false return Promise.resolve(); keep the existing branch that sets
stopResolver and returns a new Promise when inFlight is true. Update the stop()
implementation to reference the same symbols (running, pollingTimeout, inFlight,
stopResolver) and return Promise.resolve() in the no-inFlight case.

In `@packages/durably/tests/browser/db-stress.test.ts`:
- Around line 1-41: The test duplicates shared stress-test logic; update the
shared harness to accept a browser dialect factory: modify createDbStressTests
to take a parameter like browserDialectFactory (or a generic dialectFactory) and
use createBrowserDialectForName when invoking it from the browser test, then
replace the duplicated setup in db-stress.test.ts with a call to
createDbStressTests(dialectFactory) so the browser-specific dialect is injected
while preserving any browser-specific tweaks; reference createDbStressTests,
createBrowserDialectForName, and the runtimes/migrate/deleteFrom setup to locate
and consolidate the logic.

In `@packages/durably/tests/browser/singleton-warning.test.ts`:
- Around line 8-13: The afterEach cleanup currently uses Promise.all for
runtimes.map(runtime => runtime.stop()) and runtimes.map(runtime =>
runtime.db.destroy()), which aborts remaining cleanup if one promise rejects;
change both to Promise.allSettled so every runtime's stop() and db.destroy() is
attempted, then inspect the settled results and log or rethrow any failures as
appropriate (use the runtimes variable and the afterEach hook, and keep
vi.restoreAllMocks() as-is) — this ensures afterEach always tries to clean up
all runtimes even if some stop()/db.destroy() calls fail.

In `@packages/durably/tests/node/db-stress.libsql.test.ts`:
- Around line 7-13: The libsql stress test creates a temp DB file but doesn't
remove it; update the object returned from createDbStressTests('libsql', ...) to
include a cleanup function that deletes the temp filePath (the one built with
tmpdir() and randomUUID()) after tests complete. Implement cleanup that checks
for the file's existence and removes it (e.g., fs.unlinkSync or fs.rmSync with
force) and export it alongside createDialect (refer to createDialect and
createNodeDialectForFile) so the test harness can call it.

In `@packages/durably/tests/node/phase1-exploration.test.ts`:
- Around line 7-11: The afterEach teardown currently stops all runtimes in the
shared runtimes array but never clears it, causing duplicate stop attempts
across tests; update the afterEach handler (the runtimes array and the async ()
=> runtimes.map(runtime => runtime.stop()) block) to both await stopping all
runtimes and then clear the runtimes array (e.g., set runtimes.length = 0 or
splice(0)) so each test’s cleanup only affects that test’s entries.

In `@packages/durably/tests/node/process-until-idle.serverless.test.ts`:
- Around line 72-82: The test 'returns quickly when there is no claimable work'
calls runtime.migrate() and then runtime.processUntilIdle() without registering
any jobs, which is intentional but not documented; add an inline comment inside
this test (near the runtime.migrate() or before the processUntilIdle() call)
stating that the absence of registered jobs is deliberate and the test is
verifying empty-queue behavior for processUntilIdle(), referencing the test name
and the runtime.processUntilIdle() invocation so future readers understand this
is not a missing setup.

In `@packages/durably/tests/node/stale-owner.e2e.test.ts`:
- Around line 29-38: The test uses very short leases in createDurably (runtimeA
and runtimeB with leaseMs: 25) which makes CI flaky; update the lease
configuration to a more robust value (e.g., set leaseMs to 100 and keep
leaseRenewIntervalMs proportional) and either increase the fixed sleep times
accordingly or replace fixed delays with a polling assertion (e.g., vi.waitFor)
to wait for the lease to expire/owner change; locate the createDurably calls and
the corresponding test waits/sleeps and apply the increased leaseMs or switch
the sleeps to polling checks that verify lease expiration/owner change.

In `@packages/durably/tests/shared/db-concurrency.shared.ts`:
- Around line 202-267: The test creates its own runtimeA/runtimeB but leaves the
shared runtimes array from beforeEach alive, risking leaked DB connections;
either reuse the existing runtimes by registering concurrencyJob on runtimes[0]
and runtimes[1] (instead of createDurably) or explicitly destroy the shared
runtimes early in this test (call await runtimes.forEach(async r =>
r.db.destroy()/r.destroy()) or similar) before creating runtimeA/runtimeB so
only the intended instances remain active; update the test to use the chosen
approach and ensure cleanup (runtimeA/runtimeB and any shared runtimes) so no
idle connections persist.

In `@packages/durably/tests/shared/db-stress.shared.ts`:
- Around line 20-31: Test setup calls runtimes[0].migrate() implicitly assuming
a shared DB; either migrate every runtime or document the single-migration
assumption. Fix by replacing the single call to runtimes[0].migrate() in the
beforeEach (where runtimes is created via createDurably) with a loop that awaits
migrate() on each runtime, or add an explicit comment in the beforeEach noting
that only runtimes[0] is migrated because all runtimes share the same underlying
database (file-backed SQLite or shared PostgreSQL schema).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2191ec1-5e61-4adf-811f-13d4bc6cde72

📥 Commits

Reviewing files that changed from the base of the PR and between 464bae6 and 01c7500.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (53)
  • docker-compose.postgres.yml
  • docs/rfcs/runtime-rearchitecture/README.md
  • docs/rfcs/runtime-rearchitecture/core-runtime.md
  • docs/rfcs/runtime-rearchitecture/database-claim-patterns.md
  • docs/rfcs/runtime-rearchitecture/database-runtime-fit.md
  • docs/rfcs/runtime-rearchitecture/deployment-database-matrix.md
  • docs/rfcs/runtime-rearchitecture/deployment-models.md
  • docs/rfcs/runtime-rearchitecture/ja/README.md
  • docs/rfcs/runtime-rearchitecture/ja/ambient-agent-concepts-ja.md
  • docs/rfcs/runtime-rearchitecture/ja/core-runtime.md
  • docs/rfcs/runtime-rearchitecture/ja/database-adapter-sketches-ja.md
  • docs/rfcs/runtime-rearchitecture/ja/database-claim-patterns-ja.md
  • docs/rfcs/runtime-rearchitecture/ja/database-runtime-fit-ja.md
  • docs/rfcs/runtime-rearchitecture/ja/deployment-models-ja.md
  • docs/rfcs/runtime-rearchitecture/phase1-exploration-notes.md
  • package.json
  • packages/durably/package.json
  • packages/durably/src/context.ts
  • packages/durably/src/durably.ts
  • packages/durably/src/errors.ts
  • packages/durably/src/events.ts
  • packages/durably/src/index.ts
  • packages/durably/src/job.ts
  • packages/durably/src/migrations.ts
  • packages/durably/src/plugins/log-persistence.ts
  • packages/durably/src/schema.ts
  • packages/durably/src/server.ts
  • packages/durably/src/storage.ts
  • packages/durably/src/worker.ts
  • packages/durably/tests/browser/db-semantics.test.ts
  • packages/durably/tests/browser/db-stress.test.ts
  • packages/durably/tests/browser/singleton-warning.test.ts
  • packages/durably/tests/helpers/browser-dialect.ts
  • packages/durably/tests/helpers/local-sqlite-dialect.ts
  • packages/durably/tests/helpers/node-dialect.ts
  • packages/durably/tests/helpers/postgres-dialect.ts
  • packages/durably/tests/node/db-concurrency.libsql.test.ts
  • packages/durably/tests/node/db-concurrency.local-sqlite.test.ts
  • packages/durably/tests/node/db-concurrency.postgres.test.ts
  • packages/durably/tests/node/db-semantics.local-sqlite.test.ts
  • packages/durably/tests/node/db-semantics.postgres.test.ts
  • packages/durably/tests/node/db-semantics.test.ts
  • packages/durably/tests/node/db-stress.libsql.test.ts
  • packages/durably/tests/node/db-stress.local-sqlite.test.ts
  • packages/durably/tests/node/db-stress.postgres.test.ts
  • packages/durably/tests/node/migration-clean-break.test.ts
  • packages/durably/tests/node/phase1-exploration.test.ts
  • packages/durably/tests/node/process-until-idle.serverless.test.ts
  • packages/durably/tests/node/stale-owner.e2e.test.ts
  • packages/durably/tests/shared/db-concurrency.shared.ts
  • packages/durably/tests/shared/db-semantics.shared.ts
  • packages/durably/tests/shared/db-stress.shared.ts
  • packages/durably/tests/shared/migrate.shared.ts

Comment thread docs/rfcs/runtime-rearchitecture/deployment-models.md
Comment thread docs/rfcs/runtime-rearchitecture/ja/core-runtime.md Outdated
Comment thread docs/rfcs/runtime-rearchitecture/README.md Outdated
Comment thread packages/durably/src/durably.ts
Comment thread packages/durably/src/durably.ts
Comment thread packages/durably/src/storage.ts Outdated
Comment thread packages/durably/src/storage.ts
Comment thread packages/durably/src/storage.ts Outdated
Comment thread packages/durably/tests/helpers/local-sqlite-dialect.ts
Comment thread packages/durably/tests/node/db-semantics.postgres.test.ts
coji and others added 3 commits March 8, 2026 10:07
Update existing tests, React package, examples, and docs to use
'leased' status consistently as defined in the runtime rearchitecture
RFC. Remove the ClientRunStatus mapping layer that converted leased
back to running at the API boundary — the clean-break approach
exposes leased directly to all consumers.

Key changes:
- Legacy tests: running → leased, heartbeatAt → leaseExpiresAt,
  run:start → run:leased, cleanupSteps default change
- React hooks: RunStatus uses leased, isRunning → isLeased,
  run:start events → run:leased, MappedTypedRun removed
- Server: SSE emits run:leased, filter no longer maps running
- Examples and docs updated accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean break from legacy runtime: remove RunStartEvent type, heartbeatAt
field, and all remaining run:start / running references across core,
tests, React package, examples, and docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace run:start → run:leased, running → leased, heartbeatInterval →
leaseInterval, heartbeatAt → leaseExpiresAt, isRunning → isLeased,
RunStartEvent → RunLeasedEvent across all website and package docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove old option names: pollingInterval, heartbeatInterval,
  staleThreshold, cleanupSteps (use pollingIntervalMs,
  leaseRenewIntervalMs, leaseMs, preserveSteps)
- Remove redundant `store` property from Durably interface
  (use `storage.queue` / `storage.checkpoint` instead)
- Remove stale `running` from VALID_STATUSES, restore satisfies guard
- Use existing getErrorMessage() helper instead of inline pattern
- Use storage.deleteRun() instead of duplicating checkpoint+queue calls
- Remove unnecessary spread in toClientRun()
- Default preserveSteps to false (matches old cleanupSteps: true default)
- Update all tests, examples, and docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (15)
examples/fullstack-react-router/app/routes/_index/dashboard.tsx (1)

90-96: ⚠️ Potential issue | 🟡 Minor

Missing leased status in statusClasses.

The status checks on lines 221 and 232 now use 'leased' instead of 'running', but statusClasses still defines styling for 'running' without a corresponding entry for 'leased'. This causes runs with status 'leased' to fall back to the default gray styling on line 161 instead of the intended blue styling.

🎨 Proposed fix to add leased status styling
 const statusClasses: Record<string, string> = {
   pending: 'bg-yellow-100 text-yellow-800',
-  running: 'bg-blue-100 text-blue-800',
+  leased: 'bg-blue-100 text-blue-800',
   completed: 'bg-green-100 text-green-800',
   failed: 'bg-red-100 text-red-800',
   cancelled: 'bg-gray-100 text-gray-800',
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/fullstack-react-router/app/routes/_index/dashboard.tsx` around lines
90 - 96, statusClasses is missing an entry for the 'leased' run status, so UI
falls back to the default gray; add a 'leased' key to the statusClasses Record
(next to the existing 'running' entry) with the same styling values used for
'running' (e.g., blue classes) so items with status === 'leased' render the
intended blue styling; update the constant named statusClasses in dashboard.tsx
accordingly.
examples/spa-vite-react/src/components/dashboard.tsx (1)

214-225: ⚠️ Potential issue | 🟡 Minor

Complete the lease-status migration in the badge map.

These action guards now treat leased as the active state, but statusClasses in the same component still only styles running. As a result, leased runs will render with the fallback gray badge instead of the active-state styling. Please add a leased entry there, and remove running if it is no longer emitted.

Suggested follow-up
 const statusClasses: Record<string, string> = {
   pending: 'bg-yellow-100 text-yellow-800',
-  running: 'bg-blue-100 text-blue-800',
+  leased: 'bg-blue-100 text-blue-800',
   completed: 'bg-green-100 text-green-800',
   failed: 'bg-red-100 text-red-800',
   cancelled: 'bg-gray-100 text-gray-800',
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/spa-vite-react/src/components/dashboard.tsx` around lines 214 - 225,
Update the badge map `statusClasses` in the dashboard component to include a
`leased` key with the same active-state styling used for `pending`/active badges
and remove the `running` key if your backend no longer emits it; locate the
`statusClasses` object in the same file (e.g., near the badge rendering logic
and where `run.status` is checked) and add an entry for 'leased' mirroring the
active classes, then delete the 'running' entry so leased runs display the
correct active badge color.
packages/durably-react/src/hooks/use-auto-resume.ts (1)

7-8: ⚠️ Potential issue | 🟡 Minor

Update the public docs to say leased instead of running.

These comments still describe auto-resume as tracking pending/running runs, but the hook now resumes pending/leased runs. Keeping the JSDoc in sync matters here because this is exported API surface and shows up in editor hints/generated docs.

Also applies to: 29-30

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably-react/src/hooks/use-auto-resume.ts` around lines 7 - 8,
Update the JSDoc comments in use-auto-resume.ts to describe that the hook
auto-resumes "pending/leased" runs rather than "pending/running": locate the
JSDoc for the useAutoResume hook (and the related option description around the
top of the file and the other occurrence noted) and replace the word "running"
with "leased" so the exported API docs and editor hints reflect the current
behavior.
website/api/durably-react/spa.md (1)

113-129: ⚠️ Potential issue | 🟡 Minor

Disable the example button for pending runs too.

With the new lease model, isLeased no longer covers the queued/pending window. This example still allows duplicate triggers until the worker acquires the lease, even though isPending is already available in scope.

Suggested doc tweak
-      <button onClick={handleClick} disabled={isLeased}>
+      <button onClick={handleClick} disabled={isPending || isLeased}>
         Run
       </button>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/api/durably-react/spa.md` around lines 113 - 129, The button's
disabled prop should also account for pending runs so users can't trigger
duplicates before a lease is acquired; update the component that uses useJob
(references: useJob, isLeased, isPending, trigger, handleClick) to disable the
button when either isLeased or isPending is true (i.e., combine both flags in
the disabled condition).
examples/spa-react-router/app/routes/_index/dashboard.tsx (1)

89-95: ⚠️ Potential issue | 🟡 Minor

Add leased to statusClasses for consistent styling.

The statusClasses map includes running but not leased. Since the PR migrates to lease-based semantics and line 217 checks for run.status === 'leased', leased runs will fall back to the gray styling at line 158. Add the leased key for consistent blue styling:

🎨 Proposed fix
   const statusClasses: Record<string, string> = {
     pending: 'bg-yellow-100 text-yellow-800',
-    running: 'bg-blue-100 text-blue-800',
+    leased: 'bg-blue-100 text-blue-800',
     completed: 'bg-green-100 text-green-800',
     failed: 'bg-red-100 text-red-800',
     cancelled: 'bg-gray-100 text-gray-800',
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/spa-react-router/app/routes/_index/dashboard.tsx` around lines 89 -
95, The statusClasses map is missing a key for 'leased', causing run.status ===
'leased' to fall back to gray; update the statusClasses Record (variable name:
statusClasses) to include a 'leased' entry using the same classes as 'running'
(the blue styles) so leased runs render consistently with running runs
referenced by checks like run.status === 'leased'.
packages/durably/docs/llms.md (3)

32-38: ⚠️ Potential issue | 🟡 Minor

Use cleanupSteps here, not preserveSteps.

This example points readers at the wrong option name/behavior. The documented config for terminal-state step cleanup is cleanupSteps, so preserveSteps: true is misleading.

Suggested doc fix
-  preserveSteps: true, // Keep step output data on terminal state (default: true)
+  cleanupSteps: true, // Delete step output data on terminal state (default: true)
As per coding guidelines: "Create steps via `step.run()`, ensuring each step's success state and return value is persisted. Step output data is deleted when runs reach terminal state (controlled by `cleanupSteps` configuration)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/docs/llms.md` around lines 32 - 38, The example uses the
wrong config key: update the createDurably config shown (the block around
createDurably) to use cleanupSteps instead of preserveSteps and set its boolean
meaning correctly (cleanupSteps: false to preserve step outputs, true to delete
them at terminal state); ensure any explanatory text around that example
references cleanupSteps and matches the docs about step.run() persistence and
terminal-state cleanup.

528-541: ⚠️ Potential issue | 🟡 Minor

The Run type here still omits the lease fields.

Later in this doc, ClientRun is described as stripping leaseOwner and leaseExpiresAt, but the public Run definition here exposes neither. That makes the type section internally inconsistent.

Based on learnings: Update packages/durably/docs/llms.md to keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked to website/public/llms.txt for web access.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/docs/llms.md` around lines 528 - 541, The Run type in the
docs is missing the lease fields and is inconsistent with the later ClientRun
discussion; update the Run interface in packages/durably/docs/llms.md to include
the leaseOwner and leaseExpiresAt fields (with appropriate types, e.g., string |
null for leaseOwner and string | null for leaseExpiresAt or Date-compatible
strings) and ensure ClientRun is described as stripping those two fields;
reference the Run and ClientRun symbols so the doc matches the actual API
surface exported by the code.

234-244: ⚠️ Potential issue | 🟡 Minor

Add run:lease-renewed to the event examples.

The lease-based lifecycle here jumps from run:leased straight to terminal events, but the PR also adds a lease-renewal event. Please include it so this LLM doc reflects the full public event surface.

Based on learnings: Update packages/durably/docs/llms.md to keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked to website/public/llms.txt for web access.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/docs/llms.md` around lines 234 - 244, Add the missing
lease-renewal event example to the lifecycle event list: after the existing
durably.on('run:leased', ...) entry add a durably.on('run:lease-renewed', (e) =>
...) example that logs the runId and any lease metadata (mirroring the style of
run:leased), so the docs include the new public event run:lease-renewed and
remain in sync with the API.
website/api/create-durably.md (2)

207-246: ⚠️ Potential issue | 🟡 Minor

Run is missing the companion leaseOwner field.

This page documents leaseExpiresAt, but not the leaseOwner field introduced with leased runs. That leaves the getRun() / getRuns() shape incomplete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/api/create-durably.md` around lines 207 - 246, The Run interface/docs
currently document leaseExpiresAt but omit the companion leaseOwner field added
for leased runs; update the Run type declaration and the accompanying table to
include leaseOwner: string | null (ISO identifier or agent id) and mention it in
the description, and ensure any examples/consumers like getRun()/getRuns()
references reflect the new leaseOwner property so the shape is complete and
consistent with leased-run behavior.

50-149: ⚠️ Potential issue | 🟡 Minor

Document processOne() and processUntilIdle() on this page.

The method list still jumps from start()/stop() to the registration/query APIs, so the new manual processing APIs introduced in this PR are invisible to consumers.

Based on learnings: Update packages/durably/docs/llms.md to keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked to website/public/llms.txt for web access.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/api/create-durably.md` around lines 50 - 149, The API docs are
missing the new manual processing methods processOne() and processUntilIdle();
add short entries for both (signatures, brief description of behavior, return
types, and when to use) into the create-durably API page so they appear in the
method list alongside start()/stop()/register(), and also update the packaged
LLM docs file (llms.md that is bundled into the npm package and mirrored for the
website) to reflect the new APIs so consumers installing the package or visiting
the site see the changes; reference the exact symbols processOne and
processUntilIdle when adding the docs to ensure discoverability.
CLAUDE.md (1)

33-53: ⚠️ Potential issue | 🟡 Minor

Finish the lease-model update in this doc.

Line 33 was renamed, but the sections below still describe running, heartbeat_at, heartbeatInterval, and heartbeat recovery. That leaves CLAUDE.md internally inconsistent with the new lease-based runtime and can send future edits back toward the old model.

Based on learnings, "Use the doc-check skill (located at .claude/skills/doc-check/) as a documentation update checklist to verify API changes are properly documented."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 33 - 53, Update CLAUDE.md to finish the lease-model
migration: replace references to the old
"running"/"heartbeat_at"/"heartbeatInterval"/"heartbeat recovery" semantics with
the lease-based terminology (e.g., lease, lease_expiry, lease_interval,
lease_recovery) and describe how leases are acquired, renewed, and recovered
instead of heartbeats; remove or reword any assumptions about `running` state
transitions tied to heartbeat timestamps and update examples and defaults
(previously `heartbeatInterval`, `staleThreshold`) to the new lease defaults and
behavior; finally run the doc-check checklist in .claude/skills/doc-check/ to
verify all API names and behavior changes are documented consistently.
website/api/events.md (1)

287-299: ⚠️ Potential issue | 🟡 Minor

Include run:delete in the documented DurablyEvent union.

This page already documents run:delete above, but the union here still omits the corresponding event type. Readers copying this type block won't see the full public event surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/api/events.md` around lines 287 - 299, The DurablyEvent union is
missing the run:delete variant; update the union declaration to include the
RunDeleteEvent variant (i.e., add "| RunDeleteEvent") so the documented public
event surface matches the earlier run:delete documentation; ensure the
RunDeleteEvent type name exactly matches other event type names used in this
file (e.g., RunTriggerEvent, RunCompleteEvent) so tooling and consumers pick it
up.
packages/durably-react/src/client/use-job.ts (1)

176-208: ⚠️ Potential issue | 🟠 Major

Don't let followLatest override an explicit initialRunId.

autoResume already treats initialRunId as a pinned run, but this job-level SSE subscription still switches away as soon as any newer run:trigger or run:leased arrives for the same job. That breaks the reconnection scenario described by the option docs.

Suggested fix
   useEffect(() => {
-    if (!followLatest) return
+    if (!followLatest || initialRunId) return
 
     const params = new URLSearchParams({ jobName })
     const eventSource = new EventSource(`${api}/runs/subscribe?${params}`)
@@
     return () => {
       eventSource.close()
     }
-  }, [api, jobName, followLatest])
+  }, [api, jobName, followLatest, initialRunId])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably-react/src/client/use-job.ts` around lines 176 - 208, The SSE
handler in the useEffect that follows latest runs updates currentRunId on any
run:trigger/run:leased event and thus overrides an explicitly provided
initialRunId; modify the eventSource.onmessage logic in the useEffect (the block
that parses event.data and calls setCurrentRunId) to skip calling
setCurrentRunId when an explicit initialRunId is set/pinned (i.e., only update
from SSE if initialRunId is falsy or the component is not pinned to that run),
ensuring followLatest does not override the pinned initialRunId.
packages/durably-react/src/client/use-job-run.ts (1)

94-123: ⚠️ Potential issue | 🟠 Major

Scope callback transition tracking to the current runId.

prevStatusRef survives across runs, so switching from a completed/failed run to a new runId compares the new status against the previous run's terminal state. In that case onStart can be skipped for the new run.

Suggested fix
-  const prevStatusRef = useRef<RunStatus | null>(null)
+  const prevRef = useRef<{ runId: string | null; status: RunStatus | null }>({
+    runId: null,
+    status: null,
+  })
 
   useEffect(() => {
-    const prevStatus = prevStatusRef.current
-    prevStatusRef.current = effectiveStatus
+    const prevStatus =
+      prevRef.current.runId === runId ? prevRef.current.status : null
+    prevRef.current = { runId, status: effectiveStatus }
@@
   }, [
+    runId,
     effectiveStatus,
     isPending,
     isLeased,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably-react/src/client/use-job-run.ts` around lines 94 - 123,
prevStatusRef currently persists across different runId values causing
transitions to compare against the previous run's terminal state; modify the
useEffect (or add a small preceding effect) to detect runId changes and reset
prevStatusRef.current to null when runId has changed (e.g., track prevRunIdRef
and if prevRunIdRef.current !== runId then set prevStatusRef.current = null and
prevRunIdRef.current = runId) so that onStart/onComplete/onFail callbacks
(referenced in the same useEffect) are evaluated only against the current run's
status transitions using effectiveStatus.
packages/durably/src/storage.ts (1)

402-445: ⚠️ Potential issue | 🟠 Major

Deduplicate (jobName, idempotencyKey) inside one enqueueMany() call.

Line 405 only looks in durably_runs, so a later item in the same batch won't see an earlier staged insert. That means two identical keys in one batch are both treated as new, and Line 442's created_at === now sentinel can still batch-insert both. Track seen keys and newly created rows explicitly inside the transaction instead of inferring "newness" from timestamps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 402 - 445, The batch enqueue
logic currently only checks existing rows in durably_runs via trx and then
infers new rows by comparing created_at to now, which misses duplicate
(jobName,idempotencyKey) pairs within the same inputs array; update the loop
that processes inputs (the block creating runs, using variables inputs, runs,
ulid, now, and trx) to maintain an in-transaction Set or Map of seen keys (e.g.,
`${input.jobName}|${input.idempotencyKey ?? ''}`) and skip creating a new run
when that key has already been processed in this call; instead of relying on
created_at === now to detect new rows, explicitly mark newly-created run objects
with a flag (e.g., isNew: true) and then use runs.filter(r => r.isNew) when
building newRuns before calling
trx.insertInto('durably_runs').values(...).execute(), ensuring duplicates in the
same enqueueMany() are deduplicated and only one insert occurs.
♻️ Duplicate comments (5)
packages/durably/src/durably.ts (2)

466-494: ⚠️ Potential issue | 🟠 Major

Reschedule the local deadline from the renewed lease, not Date.now().

Lines 482-486 recompute the next deadline after the renewal roundtrip. That gives the local worker extra time past the persisted lease_expires_at, so another worker can reclaim the run while this instance still thinks it owns it. Return the renewed expiry from queue.renewLease() or, at minimum, derive it from the same now sent to storage.

⏱️ Minimal correction
-          const renewedLeaseExpiresAt = new Date(
-            Date.now() + state.leaseMs,
-          ).toISOString()
+          const renewedLeaseExpiresAt = new Date(
+            Date.parse(now) + state.leaseMs,
+          ).toISOString()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/durably.ts` around lines 466 - 494, The local lease
deadline is being rescheduled using Date.now() after a successful
queue.renewLease call, which can extend local ownership past the persisted
lease_expires_at; update the logic in the lease renewal handler (the setInterval
callback around leaseTimer that calls queue.renewLease for run.id and workerId)
to use the renewed expiry returned by queue.renewLease (or if renewLease cannot
return it, compute the expiry from the same now value passed into
queue.renewLease by using now + state.leaseMs) when calling
scheduleLeaseDeadline and when constructing renewedLeaseExpiresAt and the
run:lease-renewed event payload.

573-581: ⚠️ Potential issue | 🔴 Critical

Only purge checkpoints after the run definitely reached a terminal state.

Lines 579-580 still run after LeaseLostError and after completeRun() / failRun() return false, which wipes step data from runs another worker may immediately reclaim. Gate cleanup behind a terminal-state flag instead of the unconditional finally path. As per coding guidelines, "Step output data is deleted when runs reach terminal state (controlled by cleanupSteps configuration)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/durably.ts` around lines 573 - 581, The cleanup
currently deletes step data in the finally block unconditionally
(clearInterval(leaseTimer), clearTimeout(leaseDeadlineTimer), dispose(), then if
(!state.preserveSteps) await checkpoint.deleteSteps(run.id)), which can run
after LeaseLostError or when completeRun()/failRun() returned false and cause
other workers to lose step data; change this so that step deletion only happens
after you've confirmed the run reached a terminal state (e.g., set a
runIsTerminal flag when completeRun() or failRun() succeed) and also respect the
cleanupSteps/preserveSteps configuration, then move the await
checkpoint.deleteSteps(run.id) behind a conditional that checks runIsTerminal &&
!state.preserveSteps (or the cleanupSteps flag) so deletion never runs from the
unconditional finally path or after LeaseLostError.
packages/durably/src/storage.ts (3)

758-769: ⚠️ Potential issue | 🔴 Critical

Make cancellation an atomic state transition.

Line 758 rewrites whatever row matches id, so durably.cancel() can clobber a run that completed after its pre-read and still emit run:cancel. Restrict the write to non-terminal states and surface whether the update actually won.

🔒 Suggested shape for the fix
-      await db
+      const result = await db
         .updateTable('durably_runs')
         .set({
           status: 'cancelled',
           lease_owner: null,
           lease_expires_at: null,
           completed_at: now,
           updated_at: now,
         })
         .where('id', '=', runId)
-        .execute()
+        .where('status', 'in', ['pending', 'leased'])
+        .executeTakeFirst()
+
+      if (Number(result.numUpdatedRows) === 0) {
+        throw new Error(`Cannot cancel terminal run: ${runId}`)
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 758 - 769, The cancelRun update
must be made an atomic conditional write: in cancelRun, restrict the UPDATE on
the durably_runs row to only apply when status is non-terminal (e.g., NOT IN
['completed','cancelled','failed'] or equivalent terminal states used elsewhere)
by adding that condition to the .where chain, then inspect the execute() result
to determine whether any row was actually updated and surface that outcome
(change the signature to return a boolean or throw an error if the update did
not win) so callers (e.g., durably.cancel) can know whether cancellation
succeeded.

488-511: ⚠️ Potential issue | 🟠 Major

getRuns() still hardcodes SQLite SQL in the PostgreSQL path.

Lines 494-497 use json_extract(...), and Lines 508-510 use LIMIT -1 for offset-only pagination. Both are SQLite-specific, so getRuns({ labels }) and getRuns({ offset }) will break once backend === 'postgres'. Route both fragments through a backend-aware helper instead of emitting SQLite SQL unconditionally.

In PostgreSQL, is `json_extract(...)` available, and is `LIMIT -1` valid SQL for "no limit" when used with OFFSET?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 488 - 511, getRuns currently
emits SQLite-only SQL: the json_extract(...) expression and the use of limit(-1)
for offset-only pagination will break for backend === 'postgres'. Replace the
inline SQLite fragments by calling backend-aware helpers from getRuns (the block
that iterates over filter.labels and the block that applies filter.offset
without filter.limit): create a helper (e.g., buildJsonLabelExpr or
getLabelJsonExpr) that given a label key returns the correct SQL expression for
durably_runs.labels for the current backend (Postgres should use the JSON
operator/extraction appropriate for text equality), and create a helper (e.g.,
applyOffsetWithOptionalLimit) that emits a proper "no limit" behavior for
Postgres instead of LIMIT -1; then update the code in getRuns (the loop handling
labels and the offset/limit logic) to use those helpers rather than hardcoded
json_extract(...) or limit(-1).

671-755: ⚠️ Potential issue | 🔴 Critical

Reject stale owners in all lease-bound updates.

Lines 684-687, 726-729, and 750-753 only verify status and lease_owner. A suspended worker can wake up after lease_expires_at and still renew or finalize the run until another process rewrites the row. Add lease_expires_at IS NOT NULL plus lease_expires_at > now/completedAt to each UPDATE so stale owners lose at the storage boundary. As per coding guidelines, "Background tab interruptions are handled via heartbeat recovery."

🛡️ Minimal guard to add
       const result = await db
         .updateTable('durably_runs')
         .set({
           lease_expires_at: leaseExpiresAt,
           updated_at: now,
         })
         .where('id', '=', runId)
         .where('status', '=', 'leased')
         .where('lease_owner', '=', workerId)
+        .where('lease_expires_at', 'is not', null)
+        .where('lease_expires_at', '>', now)
         .executeTakeFirst()

Apply the same expiry check to completeRun() and failRun(), using completedAt as the comparison value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 671 - 755, The lease-bound
updates currently only check status and lease_owner allowing a stale worker (one
whose lease has already expired) to renew or finalize a run; update
renewLease(runId, workerId, now, leaseMs) to also require lease_expires_at IS
NOT NULL and lease_expires_at > now, and update completeRun(runId, workerId,
output, completedAt) and failRun(runId, workerId, error, completedAt) to also
require lease_expires_at IS NOT NULL and lease_expires_at > completedAt in their
WHERE clauses so that expired leases cannot be renewed or used to complete/fail
runs at the storage boundary.
🧹 Nitpick comments (2)
packages/durably/tests/shared/storage.shared.ts (1)

358-375: Assert the lease fields on claimed runs.

These tests cover the status rename, but a broken claim path that leaves leaseOwner / leaseExpiresAt unset would still pass. Since reclaim and stale-owner handling depend on those fields, it’s worth checking them here on both the returned run and the DB readback.

💡 Suggested assertion shape
         expect(claimed).not.toBeNull()
         expect(claimed!.id).toBe(created.id)
         expect(claimed!.status).toBe('leased')
+        expect(claimed!.leaseOwner).toBeTruthy()
+        expect(claimed!.leaseExpiresAt).toBeTruthy()
         expect(claimed!.startedAt).not.toBeNull()
         expect(claimed!.stepCount).toBe(0)

         // Verify run is now leased in DB
         const run = await durably.storage.getRun(created.id)
         expect(run!.status).toBe('leased')
+        expect(run!.leaseOwner).toBe(claimed!.leaseOwner)
+        expect(run!.leaseExpiresAt).toBe(claimed!.leaseExpiresAt)

Also applies to: 394-400, 434-440

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/shared/storage.shared.ts` around lines 358 - 375, Add
assertions that the claim path sets lease fields: after calling
durably.storage.claimNextPendingRun() assert that claimed.leaseOwner is a
non-empty string and claimed.leaseExpiresAt is not null (and optionally >
claimed.startedAt or > Date.now() for validity), and repeat the same checks on
the DB readback from durably.storage.getRun(created.id); update the test case
that calls claimNextPendingRun and getRun (references: claimNextPendingRun,
getRun, claimed, created) to include these leaseOwner and leaseExpiresAt
assertions so broken claim paths that omit lease metadata will fail the test.
packages/durably/tests/node/core-extensions.test.ts (1)

94-111: Assert the leased-event payload, not just the event name.

This still passes if run:leased is emitted without the new lease metadata. Since leaseOwner / leaseExpiresAt are part of the lease-based contract, this public-stream test should pin them down too.

💡 Suggested assertion shape
-      expect(events.some((e) => e.type === 'run:leased')).toBe(true)
+      const leasedEvent = events.find((e) => e.type === 'run:leased')
+      expect(leasedEvent).toBeDefined()
+      expect(leasedEvent).toMatchObject({
+        type: 'run:leased',
+        leaseOwner: expect.any(String),
+        leaseExpiresAt: expect.any(String),
+      })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/node/core-extensions.test.ts` around lines 94 - 111,
Update the test that subscribes to the run stream and not only check for the
'run:leased' event name but also assert the event's payload contains the lease
metadata: locate the subscription logic around durably.subscribe(run.id) /
events array and after finding the event with type === 'run:leased' assert that
its payload includes non-empty leaseOwner and a valid leaseExpiresAt (e.g.,
numeric timestamp or ISO string) properties (for example by using events.find(e
=> e.type === 'run:leased') and checking .payload.leaseOwner and
.payload.leaseExpiresAt are present and of the expected type).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/server-node/basic.ts`:
- Line 52: Update the example output label to match the lifecycle term used in
the filter: change the console.log that prints `Running: ${runs.filter((r) =>
r.status === 'leased').length}` so the visible label reads `Leased:` instead of
`Running:`; locate the console.log call that references runs.filter((r) =>
r.status === 'leased') and update the string literal accordingly.

In `@packages/durably-react/src/hooks/use-job-run.ts`:
- Around line 36-39: Update the JSDoc for the isLeased property in the useJobRun
hook to use lease terminology instead of "running": change the comment above
isLeased (in the useJobRun definition) to something like "Whether the run
currently holds a lease" or "Whether the run is currently leased" so public docs
and IDE hover text reflect the lease lifecycle; ensure you update only the
comment for the isLeased property (not the property name) and keep wording
concise and consistent with other lease-related docs.

In `@packages/durably-react/src/hooks/use-job.ts`:
- Around line 58-60: Update the exported JSDoc for the isLeased property so the
description reflects its new meaning (leased state) rather than "currently
running"; find the JSDoc comment attached to the exported isLeased property in
use-job (the exported type/interface that defines isLeased) and change the text
to something like "Whether the run is currently leased" or "Whether this run is
currently leased by a worker" so IntelliSense and generated docs match the new
API.

In `@packages/durably-react/src/shared/sse-event-subscriber.ts`:
- Around line 22-25: The SSE handler currently collapses "run:leased" to a bare
marker and omits lease renewals; update the switch in sse-event-subscriber so
that when data.type === 'run:leased' you forward the lease metadata (e.g.,
leaseOwner, leaseExpiresAt) to onEvent instead of emitting a bare marker, and
add a new case for 'run:lease-renewed' that also forwards the renewal payload to
onEvent; locate the logic around data.type, onEvent, and ensure both
'run:leased' and 'run:lease-renewed' pass the full event object (including
leaseOwner and leaseExpiresAt) through to the React subscription layer.

In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 40-52: The test captures initialLeaseExpiresAt from a pending run
(run.leaseExpiresAt is null) so the current comparison against
(initialLeaseExpiresAt ?? 0) is meaningless; change the test to capture the
pre-renewal lease timestamp after the run transitions to 'leased' (i.e., call
d.start(), await the first leased state via d.jobs.job.getRun(run.id) and assert
status === 'leased', then record initialLeaseExpiresAt from that leased midRun),
then wait and fetch a laterRun (e.g., another d.jobs.job.getRun(run.id)) and
assert that laterRun.leaseExpiresAt is greater than the recorded
initialLeaseExpiresAt; update references to run, initialLeaseExpiresAt,
midRun/getRun accordingly.

In `@website/api/create-durably.md`:
- Around line 36-44: Update the `dialect` table row in
website/api/create-durably.md to be backend-agnostic by changing its Description
from “Kysely SQLite dialect” to either “Kysely dialect” or enumerate supported
backends (e.g., “Kysely dialect — supported backends: SQLite, PostgreSQL,
libSQL, SQLocal”); locate the `dialect` entry in the options table and replace
the wording accordingly so it reflects the multi-backend support added in this
PR.

In `@website/api/durably-react/fullstack.md`:
- Around line 112-114: The dashboard snippet is inconsistent: it only shows the
Cancel button for run.status === 'leased' while useRunActions and the return
table indicate pending runs are cancellable; update the conditional rendering
around the Cancel button (where cancel is called and run.status is checked) to
allow both 'leased' and 'pending' (e.g., check run.status === 'leased' ||
run.status === 'pending') so the UI, the example in useRunActions, and the
return table behavior match.
- Around line 69-74: In CsvImporter replace the invalid placeholder rows: [...]
passed to trigger (from durably.importCsv.useJob()) with valid TSX/JS – e.g.
pass an actual empty array or a previously defined variable (like rows or
csvRows) instead of the literal ellipsis; update the onClick to call trigger({
rows: [] }) or trigger({ rows: rowsVariable }) so the trigger function receives
a proper array.

In `@website/api/durably-react/types.md`:
- Line 61: The doc comment for the HTTP projection is inconsistent: it lists
only leaseExpiresAt, idempotencyKey, concurrencyKey, and updatedAt as excluded
from the ClientRun type while the actual ClientRun shape also omits leaseOwner;
update the sentence in website/api/durably-react/types.md so the exclusion list
matches the ClientRun shape (include leaseOwner in the list or otherwise
describe “internal fields” to cover leaseOwner) and ensure the term ClientRun is
referenced consistently with the type definition.

---

Outside diff comments:
In `@CLAUDE.md`:
- Around line 33-53: Update CLAUDE.md to finish the lease-model migration:
replace references to the old
"running"/"heartbeat_at"/"heartbeatInterval"/"heartbeat recovery" semantics with
the lease-based terminology (e.g., lease, lease_expiry, lease_interval,
lease_recovery) and describe how leases are acquired, renewed, and recovered
instead of heartbeats; remove or reword any assumptions about `running` state
transitions tied to heartbeat timestamps and update examples and defaults
(previously `heartbeatInterval`, `staleThreshold`) to the new lease defaults and
behavior; finally run the doc-check checklist in .claude/skills/doc-check/ to
verify all API names and behavior changes are documented consistently.

In `@examples/fullstack-react-router/app/routes/_index/dashboard.tsx`:
- Around line 90-96: statusClasses is missing an entry for the 'leased' run
status, so UI falls back to the default gray; add a 'leased' key to the
statusClasses Record (next to the existing 'running' entry) with the same
styling values used for 'running' (e.g., blue classes) so items with status ===
'leased' render the intended blue styling; update the constant named
statusClasses in dashboard.tsx accordingly.

In `@examples/spa-react-router/app/routes/_index/dashboard.tsx`:
- Around line 89-95: The statusClasses map is missing a key for 'leased',
causing run.status === 'leased' to fall back to gray; update the statusClasses
Record (variable name: statusClasses) to include a 'leased' entry using the same
classes as 'running' (the blue styles) so leased runs render consistently with
running runs referenced by checks like run.status === 'leased'.

In `@examples/spa-vite-react/src/components/dashboard.tsx`:
- Around line 214-225: Update the badge map `statusClasses` in the dashboard
component to include a `leased` key with the same active-state styling used for
`pending`/active badges and remove the `running` key if your backend no longer
emits it; locate the `statusClasses` object in the same file (e.g., near the
badge rendering logic and where `run.status` is checked) and add an entry for
'leased' mirroring the active classes, then delete the 'running' entry so leased
runs display the correct active badge color.

In `@packages/durably-react/src/client/use-job-run.ts`:
- Around line 94-123: prevStatusRef currently persists across different runId
values causing transitions to compare against the previous run's terminal state;
modify the useEffect (or add a small preceding effect) to detect runId changes
and reset prevStatusRef.current to null when runId has changed (e.g., track
prevRunIdRef and if prevRunIdRef.current !== runId then set
prevStatusRef.current = null and prevRunIdRef.current = runId) so that
onStart/onComplete/onFail callbacks (referenced in the same useEffect) are
evaluated only against the current run's status transitions using
effectiveStatus.

In `@packages/durably-react/src/client/use-job.ts`:
- Around line 176-208: The SSE handler in the useEffect that follows latest runs
updates currentRunId on any run:trigger/run:leased event and thus overrides an
explicitly provided initialRunId; modify the eventSource.onmessage logic in the
useEffect (the block that parses event.data and calls setCurrentRunId) to skip
calling setCurrentRunId when an explicit initialRunId is set/pinned (i.e., only
update from SSE if initialRunId is falsy or the component is not pinned to that
run), ensuring followLatest does not override the pinned initialRunId.

In `@packages/durably-react/src/hooks/use-auto-resume.ts`:
- Around line 7-8: Update the JSDoc comments in use-auto-resume.ts to describe
that the hook auto-resumes "pending/leased" runs rather than "pending/running":
locate the JSDoc for the useAutoResume hook (and the related option description
around the top of the file and the other occurrence noted) and replace the word
"running" with "leased" so the exported API docs and editor hints reflect the
current behavior.

In `@packages/durably/docs/llms.md`:
- Around line 32-38: The example uses the wrong config key: update the
createDurably config shown (the block around createDurably) to use cleanupSteps
instead of preserveSteps and set its boolean meaning correctly (cleanupSteps:
false to preserve step outputs, true to delete them at terminal state); ensure
any explanatory text around that example references cleanupSteps and matches the
docs about step.run() persistence and terminal-state cleanup.
- Around line 528-541: The Run type in the docs is missing the lease fields and
is inconsistent with the later ClientRun discussion; update the Run interface in
packages/durably/docs/llms.md to include the leaseOwner and leaseExpiresAt
fields (with appropriate types, e.g., string | null for leaseOwner and string |
null for leaseExpiresAt or Date-compatible strings) and ensure ClientRun is
described as stripping those two fields; reference the Run and ClientRun symbols
so the doc matches the actual API surface exported by the code.
- Around line 234-244: Add the missing lease-renewal event example to the
lifecycle event list: after the existing durably.on('run:leased', ...) entry add
a durably.on('run:lease-renewed', (e) => ...) example that logs the runId and
any lease metadata (mirroring the style of run:leased), so the docs include the
new public event run:lease-renewed and remain in sync with the API.

In `@packages/durably/src/storage.ts`:
- Around line 402-445: The batch enqueue logic currently only checks existing
rows in durably_runs via trx and then infers new rows by comparing created_at to
now, which misses duplicate (jobName,idempotencyKey) pairs within the same
inputs array; update the loop that processes inputs (the block creating runs,
using variables inputs, runs, ulid, now, and trx) to maintain an in-transaction
Set or Map of seen keys (e.g., `${input.jobName}|${input.idempotencyKey ?? ''}`)
and skip creating a new run when that key has already been processed in this
call; instead of relying on created_at === now to detect new rows, explicitly
mark newly-created run objects with a flag (e.g., isNew: true) and then use
runs.filter(r => r.isNew) when building newRuns before calling
trx.insertInto('durably_runs').values(...).execute(), ensuring duplicates in the
same enqueueMany() are deduplicated and only one insert occurs.

In `@website/api/create-durably.md`:
- Around line 207-246: The Run interface/docs currently document leaseExpiresAt
but omit the companion leaseOwner field added for leased runs; update the Run
type declaration and the accompanying table to include leaseOwner: string | null
(ISO identifier or agent id) and mention it in the description, and ensure any
examples/consumers like getRun()/getRuns() references reflect the new leaseOwner
property so the shape is complete and consistent with leased-run behavior.
- Around line 50-149: The API docs are missing the new manual processing methods
processOne() and processUntilIdle(); add short entries for both (signatures,
brief description of behavior, return types, and when to use) into the
create-durably API page so they appear in the method list alongside
start()/stop()/register(), and also update the packaged LLM docs file (llms.md
that is bundled into the npm package and mirrored for the website) to reflect
the new APIs so consumers installing the package or visiting the site see the
changes; reference the exact symbols processOne and processUntilIdle when adding
the docs to ensure discoverability.

In `@website/api/durably-react/spa.md`:
- Around line 113-129: The button's disabled prop should also account for
pending runs so users can't trigger duplicates before a lease is acquired;
update the component that uses useJob (references: useJob, isLeased, isPending,
trigger, handleClick) to disable the button when either isLeased or isPending is
true (i.e., combine both flags in the disabled condition).

In `@website/api/events.md`:
- Around line 287-299: The DurablyEvent union is missing the run:delete variant;
update the union declaration to include the RunDeleteEvent variant (i.e., add "|
RunDeleteEvent") so the documented public event surface matches the earlier
run:delete documentation; ensure the RunDeleteEvent type name exactly matches
other event type names used in this file (e.g., RunTriggerEvent,
RunCompleteEvent) so tooling and consumers pick it up.

---

Duplicate comments:
In `@packages/durably/src/durably.ts`:
- Around line 466-494: The local lease deadline is being rescheduled using
Date.now() after a successful queue.renewLease call, which can extend local
ownership past the persisted lease_expires_at; update the logic in the lease
renewal handler (the setInterval callback around leaseTimer that calls
queue.renewLease for run.id and workerId) to use the renewed expiry returned by
queue.renewLease (or if renewLease cannot return it, compute the expiry from the
same now value passed into queue.renewLease by using now + state.leaseMs) when
calling scheduleLeaseDeadline and when constructing renewedLeaseExpiresAt and
the run:lease-renewed event payload.
- Around line 573-581: The cleanup currently deletes step data in the finally
block unconditionally (clearInterval(leaseTimer),
clearTimeout(leaseDeadlineTimer), dispose(), then if (!state.preserveSteps)
await checkpoint.deleteSteps(run.id)), which can run after LeaseLostError or
when completeRun()/failRun() returned false and cause other workers to lose step
data; change this so that step deletion only happens after you've confirmed the
run reached a terminal state (e.g., set a runIsTerminal flag when completeRun()
or failRun() succeed) and also respect the cleanupSteps/preserveSteps
configuration, then move the await checkpoint.deleteSteps(run.id) behind a
conditional that checks runIsTerminal && !state.preserveSteps (or the
cleanupSteps flag) so deletion never runs from the unconditional finally path or
after LeaseLostError.

In `@packages/durably/src/storage.ts`:
- Around line 758-769: The cancelRun update must be made an atomic conditional
write: in cancelRun, restrict the UPDATE on the durably_runs row to only apply
when status is non-terminal (e.g., NOT IN ['completed','cancelled','failed'] or
equivalent terminal states used elsewhere) by adding that condition to the
.where chain, then inspect the execute() result to determine whether any row was
actually updated and surface that outcome (change the signature to return a
boolean or throw an error if the update did not win) so callers (e.g.,
durably.cancel) can know whether cancellation succeeded.
- Around line 488-511: getRuns currently emits SQLite-only SQL: the
json_extract(...) expression and the use of limit(-1) for offset-only pagination
will break for backend === 'postgres'. Replace the inline SQLite fragments by
calling backend-aware helpers from getRuns (the block that iterates over
filter.labels and the block that applies filter.offset without filter.limit):
create a helper (e.g., buildJsonLabelExpr or getLabelJsonExpr) that given a
label key returns the correct SQL expression for durably_runs.labels for the
current backend (Postgres should use the JSON operator/extraction appropriate
for text equality), and create a helper (e.g., applyOffsetWithOptionalLimit)
that emits a proper "no limit" behavior for Postgres instead of LIMIT -1; then
update the code in getRuns (the loop handling labels and the offset/limit logic)
to use those helpers rather than hardcoded json_extract(...) or limit(-1).
- Around line 671-755: The lease-bound updates currently only check status and
lease_owner allowing a stale worker (one whose lease has already expired) to
renew or finalize a run; update renewLease(runId, workerId, now, leaseMs) to
also require lease_expires_at IS NOT NULL and lease_expires_at > now, and update
completeRun(runId, workerId, output, completedAt) and failRun(runId, workerId,
error, completedAt) to also require lease_expires_at IS NOT NULL and
lease_expires_at > completedAt in their WHERE clauses so that expired leases
cannot be renewed or used to complete/fail runs at the storage boundary.

---

Nitpick comments:
In `@packages/durably/tests/node/core-extensions.test.ts`:
- Around line 94-111: Update the test that subscribes to the run stream and not
only check for the 'run:leased' event name but also assert the event's payload
contains the lease metadata: locate the subscription logic around
durably.subscribe(run.id) / events array and after finding the event with type
=== 'run:leased' assert that its payload includes non-empty leaseOwner and a
valid leaseExpiresAt (e.g., numeric timestamp or ISO string) properties (for
example by using events.find(e => e.type === 'run:leased') and checking
.payload.leaseOwner and .payload.leaseExpiresAt are present and of the expected
type).

In `@packages/durably/tests/shared/storage.shared.ts`:
- Around line 358-375: Add assertions that the claim path sets lease fields:
after calling durably.storage.claimNextPendingRun() assert that
claimed.leaseOwner is a non-empty string and claimed.leaseExpiresAt is not null
(and optionally > claimed.startedAt or > Date.now() for validity), and repeat
the same checks on the DB readback from durably.storage.getRun(created.id);
update the test case that calls claimNextPendingRun and getRun (references:
claimNextPendingRun, getRun, claimed, created) to include these leaseOwner and
leaseExpiresAt assertions so broken claim paths that omit lease metadata will
fail the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab0d0885-a5fd-4556-8be9-6ae713590f82

📥 Commits

Reviewing files that changed from the base of the PR and between 01c7500 and d55a2ac.

📒 Files selected for processing (70)
  • CLAUDE.md
  • examples/fullstack-react-router/app/routes/_index/dashboard.tsx
  • examples/fullstack-react-router/app/routes/_index/data-sync-progress.tsx
  • examples/fullstack-react-router/app/routes/_index/image-processing-progress.tsx
  • examples/fullstack-react-router/app/routes/_index/run-progress.tsx
  • examples/server-node/basic.ts
  • examples/spa-react-router/app/routes/_index/dashboard.tsx
  • examples/spa-react-router/app/routes/_index/data-sync-progress.tsx
  • examples/spa-react-router/app/routes/_index/image-processing-progress.tsx
  • examples/spa-react-router/app/routes/_index/run-progress.tsx
  • examples/spa-vite-react/src/components/dashboard.tsx
  • examples/spa-vite-react/src/components/data-sync-progress.tsx
  • examples/spa-vite-react/src/components/image-processing-progress.tsx
  • examples/spa-vite-react/src/components/run-progress.tsx
  • packages/durably-react/README.md
  • packages/durably-react/docs/llms.md
  • packages/durably-react/src/client/create-durably.ts
  • packages/durably-react/src/client/create-job-hooks.ts
  • packages/durably-react/src/client/use-job-run.ts
  • packages/durably-react/src/client/use-job.ts
  • packages/durably-react/src/client/use-run-actions.ts
  • packages/durably-react/src/client/use-runs.ts
  • packages/durably-react/src/hooks/use-auto-resume.ts
  • packages/durably-react/src/hooks/use-job-run.ts
  • packages/durably-react/src/hooks/use-job-subscription.ts
  • packages/durably-react/src/hooks/use-job.ts
  • packages/durably-react/src/hooks/use-runs.ts
  • packages/durably-react/src/shared/durably-event-subscriber.ts
  • packages/durably-react/src/shared/event-subscriber.ts
  • packages/durably-react/src/shared/sse-event-subscriber.ts
  • packages/durably-react/src/shared/subscription-reducer.ts
  • packages/durably-react/src/shared/use-subscription.ts
  • packages/durably-react/src/types.ts
  • packages/durably-react/tests/browser/use-job-run.test.tsx
  • packages/durably-react/tests/browser/use-job.test.tsx
  • packages/durably-react/tests/client/use-job-run.test.tsx
  • packages/durably-react/tests/client/use-job.test.tsx
  • packages/durably-react/tests/client/use-runs.test.tsx
  • packages/durably-react/tests/types.test.ts
  • packages/durably/docs/llms.md
  • packages/durably/src/durably.ts
  • packages/durably/src/events.ts
  • packages/durably/src/index.ts
  • packages/durably/src/server.ts
  • packages/durably/src/storage.ts
  • packages/durably/tests/node/core-extensions.test.ts
  • packages/durably/tests/react/strict-mode.test.tsx
  • packages/durably/tests/shared/events.shared.ts
  • packages/durably/tests/shared/plugin.shared.ts
  • packages/durably/tests/shared/recovery.shared.ts
  • packages/durably/tests/shared/server.shared.ts
  • packages/durably/tests/shared/sse.shared.ts
  • packages/durably/tests/shared/step.shared.ts
  • packages/durably/tests/shared/storage.shared.ts
  • packages/durably/tests/shared/worker.shared.ts
  • website/api/create-durably.md
  • website/api/durably-react/fullstack.md
  • website/api/durably-react/index.md
  • website/api/durably-react/spa.md
  • website/api/durably-react/types.md
  • website/api/events.md
  • website/api/http-handler.md
  • website/api/index.md
  • website/guide/concepts.md
  • website/guide/deployment.md
  • website/guide/error-handling.md
  • website/guide/fullstack-mode.md
  • website/guide/server-mode.md
  • website/guide/spa-mode.md
  • website/public/llms.txt
✅ Files skipped from review due to trivial changes (1)
  • packages/durably-react/src/client/create-durably.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/durably/src/index.ts
  • packages/durably/src/events.ts
  • packages/durably/src/server.ts

Comment thread examples/server-node/basic.ts Outdated
Comment thread packages/durably-react/src/hooks/use-job-run.ts
Comment thread packages/durably-react/src/hooks/use-job.ts Outdated
Comment thread packages/durably-react/src/shared/sse-event-subscriber.ts
Comment thread packages/durably/tests/shared/recovery.shared.ts
Comment thread website/api/create-durably.md Outdated
Comment thread website/api/durably-react/fullstack.md
Comment thread website/api/durably-react/fullstack.md Outdated
Comment thread website/api/durably-react/types.md Outdated
Verifies the server returns 400 for the removed 'running' status value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Browser OPFS is single-tab exclusive, so multi-runtime lease reclaim
cannot be tested there. The scenario is covered by 3 Node-side tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update worker:error context, spa-mode tab suspension, and
error-handling cancellation descriptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The phase1 branch added Postgres-specific tests that need a running
PostgreSQL instance on port 55432.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test:node excludes *.postgres.test.ts (no Docker needed)
- test:node:postgres runs Postgres tests only
- test:node:all runs everything (used in CI with Postgres service)
- Document Postgres test setup in CLAUDE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/durably/src/storage.ts (1)

401-443: ⚠️ Potential issue | 🟠 Major

Deduplicate idempotency keys inside enqueueMany().

Because inserts are deferred until after the scan, the second { jobName, idempotencyKey } in the same batch never sees the first. That can either create duplicate runs or make the whole batch fail on the unique constraint. Track seen keys/new rows explicitly instead of inferring “new” from created_at === now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 401 - 443, In enqueueMany(),
deduplicate idempotency keys within the same batch instead of relying on
created_at === now: maintain an in-memory Set or Map keyed by jobName +
idempotencyKey as you iterate inputs (inside the loop that builds runs), check
input.idempotencyKey against that set before creating a new run or pushing an
existing run object, and if a duplicate in the same batch is found either reuse
the previously created run entry (from the Map) or skip adding the second one so
the subsequent batch insert into durably_runs via trx.insertInto('durably_runs')
won't violate the unique constraint; update references to runs, newRuns, and the
logic that filters by created_at/now to use this explicit tracking instead.
♻️ Duplicate comments (5)
packages/durably/src/storage.ts (3)

757-767: ⚠️ Potential issue | 🔴 Critical

Make cancelRun() an atomic state transition.

This unconditional update can still rewrite a run to cancelled after another worker has already completed or failed it between the caller’s read and this write. Gate the update to non-terminal states and check the affected-row count.

Suggested fix
-      await db
+      const result = await db
         .updateTable('durably_runs')
         .set({
           status: 'cancelled',
           lease_owner: null,
           lease_expires_at: null,
           completed_at: now,
           updated_at: now,
         })
         .where('id', '=', runId)
-        .execute()
+        .where('status', 'in', ['pending', 'leased'])
+        .executeTakeFirst()
+
+      if (Number(result.numUpdatedRows) === 0) {
+        throw new Error(`Cannot cancel terminal run: ${runId}`)
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 757 - 767, The cancelRun()
update is unconditional and can overwrite terminal states; make it an atomic
guarded transition by updating durably_runs only when status is NOT in terminal
states (e.g., not IN ('completed','failed','cancelled')) and matching the runId,
then inspect the affected-row count returned by
updateTable('durably_runs').execute() (or the DB client's result) to detect if
no row was updated and return/fail accordingly (e.g., return false or throw) so
callers know the cancel did not take effect; keep the same set fields (status,
lease_owner, lease_expires_at, completed_at, updated_at) but only apply when the
row was in a cancellable state.

486-495: ⚠️ Potential issue | 🟠 Major

PostgreSQL label filters still emit SQLite JSON syntax.

backend === 'postgres' is supported here, but getRuns({ labels }) still hardcodes json_extract(...). Label-filtered queries will fail on PostgreSQL until this branches to a Postgres expression such as ->>/#>> or a dialect helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 486 - 495, The label-filter
WHERE currently always uses SQLite's json_extract(...) (inside the getRuns label
loop where labels is validated) which breaks when backend === 'postgres'; update
the loop in storage.ts (the block referencing validateLabels, labels and
sql`json_extract(...)`) to branch on the SQL dialect and emit a Postgres JSON
extraction expression (e.g., labels ->> 'key' or #>> array_path) when backend
=== 'postgres' (or use your existing SQL dialect helper), ensure the JSON
path/quoting is built safely for each label key and keep the value as a bound
parameter so the query stays parameterized and secure.

675-685: ⚠️ Potential issue | 🔴 Critical

Reject renewals after the lease has already expired.

This update only checks status and lease_owner. A suspended worker can wake up after lease_expires_at, renew anyway, and keep executing instead of yielding to stale-owner recovery.

Suggested fix
       const result = await db
         .updateTable('durably_runs')
         .set({
           lease_expires_at: leaseExpiresAt,
           updated_at: now,
         })
         .where('id', '=', runId)
         .where('status', '=', 'leased')
         .where('lease_owner', '=', workerId)
+        .where('lease_expires_at', 'is not', null)
+        .where('lease_expires_at', '>', now)
         .executeTakeFirst()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/storage.ts` around lines 675 - 685, The renewal
currently updates durably_runs without verifying the existing lease hasn't
already expired; modify the update in the function that computes leaseExpiresAt
and calls db.updateTable('durably_runs') so it also checks the current
lease_expires_at is still in the future (e.g., add a where('lease_expires_at',
'>', now) or equivalent timestamp comparison) and handle possible NULLs if
leases can be absent, so the UPDATE only succeeds when the lease is not yet
expired and owned by workerId.
packages/durably/tests/shared/recovery.shared.ts (1)

40-52: ⚠️ Potential issue | 🟠 Major

This test still captures leaseExpiresAt before the run is leased.

trigger() returns a pending run, so run.leaseExpiresAt is null here. Comparing against initialLeaseExpiresAt ?? 0 makes the assertion pass trivially and doesn't prove renewal. Capture the baseline after the run first reaches leased.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/shared/recovery.shared.ts` around lines 40 - 52, The
test captures leaseExpiresAt from the pending run returned by
d.jobs.job.trigger({}) so the baseline is null; instead, poll or await until the
run reaches 'leased' using d.jobs.job.getRun(run.id), then capture
baselineLeaseExpiresAt = midRun.leaseExpiresAt, call d.start(), wait, fetch the
run again via d.jobs.job.getRun(run.id) and assert the new leaseExpiresAt is
greater than the baselineLeaseExpiresAt; reference the functions run.id,
d.jobs.job.getRun, d.jobs.job.trigger, d.start, and the leaseExpiresAt field
when making these changes.
packages/durably/src/durably.ts (1)

558-567: ⚠️ Potential issue | 🔴 Critical

Step deletion in finally may wipe data for reclaimed runs.

The finally block unconditionally deletes steps when !state.preserveSteps, but this executes even when:

  • LeaseLostError is caught (line 533-534 returns, but finally still runs)
  • completeRun() or failRun() return false (ownership was lost)

This can delete checkpoints from a run that another worker has already reclaimed.

🐛 Suggested fix: track terminal state before cleanup
+    let reachedTerminalState = false
     try {
       // ... existing code ...
       const completed = await queue.completeRun(...)
       if (completed) {
+        reachedTerminalState = true
         eventEmitter.emit({ type: 'run:complete', ... })
       }
     } catch (error) {
       if (error instanceof LeaseLostError || error instanceof CancelledError) {
         return
       }
       // ...
       const failed = await queue.failRun(...)
       if (failed) {
+        reachedTerminalState = true
         // ... emit fail event ...
       }
     } finally {
       clearInterval(leaseTimer)
       if (leaseDeadlineTimer) {
         clearTimeout(leaseDeadlineTimer)
       }
       dispose()
-      if (!state.preserveSteps) {
+      if (!state.preserveSteps && reachedTerminalState) {
         await checkpoint.deleteSteps(run.id)
       }
     }

As per coding guidelines: "Step output data is deleted when runs reach terminal state (controlled by cleanupSteps configuration)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/durably.ts` around lines 558 - 567, The finally block
currently deletes steps unconditionally when !state.preserveSteps, which can
wipe data if ownership was lost; change the flow to track terminal state and
ownership before cleanup: introduce local flags (e.g., terminalReached and
ownershipLost) around calls to completeRun() and failRun() and in LeaseLostError
catch set ownershipLost, set terminalReached only when completeRun()/failRun()
return true, then in the finally block only call checkpoint.deleteSteps(run.id)
when !state.preserveSteps && terminalReached && !ownershipLost; keep clearing
leaseTimer/leaseDeadlineTimer and dispose() unchanged.
🧹 Nitpick comments (1)
packages/durably/tests/shared/recovery.shared.ts (1)

64-109: The custom-renewal test doesn’t assert the interval yet.

timestamps.length > 0 passes even if you only observed the initial claim lease. Assert at least two distinct leaseExpiresAt values and verify that the later one moves after roughly the configured 200ms renewal window.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/shared/recovery.shared.ts` around lines 64 - 109, The
test currently only checks timestamps.length > 0 which can pass without any
lease renewal; update the "respects custom lease renewal interval" test to
assert that timestamps contains at least two distinct leaseExpiresAt values and
that a later timestamp is greater than an earlier one by roughly the configured
leaseRenewIntervalMs (200ms) with a small tolerance (e.g. >150ms). Locate the
test's timestamps array and the customDurably created with leaseRenewIntervalMs,
parse the recorded leaseExpiresAt strings into numbers/dates, assert
distinctness (length of unique values >= 2) and assert that (later - earlier) >=
~150ms to validate the renewal behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/durably/src/context.ts`:
- Around line 75-99: The code may execute and persist a step after the lease was
lost because it only checks run ownership once before running the step and then
relies on a local abort flag; to fix, re-fetch the run via
storage.queue.getRun(run.id) and verify currentRun.status === 'leased' &&
currentRun.leaseOwner === workerId immediately before calling createStep() or
advanceRunStepIndex(), aborting (call abortForLeaseLoss()/controller.abort() and
throwIfAborted()) if the check fails; do the same added ownership re-check after
resolving fn() and before persisting the checkpoint so
createStep/advanceRunStepIndex only run while the worker still owns the lease
(apply same change around the existing checks that use currentRun,
throwIfAborted, storage.checkpoint.getCompletedStep, createStep, and
advanceRunStepIndex).

In `@packages/durably/tests/browser/db-stress.test.ts`:
- Line 10: The variable runtimes is declared but not initialized, so if
beforeEach throws the teardown that calls runtimes.map(...) will fail and hide
the original error; initialize runtimes to an empty array at declaration (change
the declaration of runtimes in db-stress.test.ts to let runtimes:
Array<Durably<any, any>> = []) so beforeEach/teardown can safely run even when
setup fails; ensure any teardown logic (the function that iterates runtimes)
relies on the now-initialized runtimes variable.
- Around line 28-49: The test assumes a single-winner claim semantics that only
Postgres currently guarantees; update the spec to only run when the storage
backend supports atomic claimNext (e.g., check storage capability or backend
type) or mark the test pending for non-Postgres backends. Locate the test in
packages/durably/tests/browser/db-stress.test.ts and gate the loop/assertion
with a runtime/storage capability check (or use test.skip/pending) tied to the
storage.claimNext implementation (reference claimNext in
packages/durably/src/storage.ts) so the expectation
expect(winners).toHaveLength(1) only runs when the backend provides atomic
single-winner claims.

In `@packages/durably/tests/shared/plugin.shared.ts`:
- Around line 57-60: The test uses expect(events).toContain('run:leased') which
won’t fail if the same event is pushed multiple times; update the assertions
inside vi.waitFor to assert the exact occurrence count instead (e.g., assert
events.filter(e => e === 'run:leased').length === 1 or use toHaveLength(1)) so
duplicate lease notifications or double-registration of the listener are
detected; apply the same change for the other similar assertions (the
occurrences checking 'run:leased' around the other vi.waitFor block and any
related event checks) and keep using vi.waitFor and the same events array and
test flow.

In `@website/api/create-durably.md`:
- Line 220: Update the docs so the leaseExpiresAt field is documented as
nullable: change occurrences of "leaseExpiresAt: string" to "leaseExpiresAt:
string | null" (e.g., in the create-durably API examples and the accompanying
table) to match the Run type defined in packages/durably/src/storage.ts which
uses leaseExpiresAt: string | null; ensure both instances mentioned in the
comment are updated.

In `@website/api/index.md`:
- Around line 272-287: The Run type table is missing the leaseExpiresAt field;
update the API docs table under the Run type to add a row for `leaseExpiresAt`
with type `string | null` and description like "ISO timestamp when the current
lease on the run expires (null if not leased)"; ensure the note about HTTP
endpoints/ClientRun still mentions that `ClientRun` excludes `leaseExpiresAt`
and keep the reference to `toClientRun(run)` intact so readers see what is
projected away.

In `@website/public/llms.txt`:
- Around line 35-38: The inline comment for the preserveSteps option is
backwards—update the comment on the preserveSteps property to clearly state that
setting it to true keeps step output after a run reaches a terminal state (e.g.,
"set to true to keep step data after terminal state; default: false"). Also add
a short second sentence referencing how steps are created/persisted (e.g.,
"Create steps via step.run(); step success state and return value are persisted;
step output deletion is controlled by cleanupSteps configuration") so readers
understand step.run(), persisted state, and that cleanupSteps controls deletion.

---

Outside diff comments:
In `@packages/durably/src/storage.ts`:
- Around line 401-443: In enqueueMany(), deduplicate idempotency keys within the
same batch instead of relying on created_at === now: maintain an in-memory Set
or Map keyed by jobName + idempotencyKey as you iterate inputs (inside the loop
that builds runs), check input.idempotencyKey against that set before creating a
new run or pushing an existing run object, and if a duplicate in the same batch
is found either reuse the previously created run entry (from the Map) or skip
adding the second one so the subsequent batch insert into durably_runs via
trx.insertInto('durably_runs') won't violate the unique constraint; update
references to runs, newRuns, and the logic that filters by created_at/now to use
this explicit tracking instead.

---

Duplicate comments:
In `@packages/durably/src/durably.ts`:
- Around line 558-567: The finally block currently deletes steps unconditionally
when !state.preserveSteps, which can wipe data if ownership was lost; change the
flow to track terminal state and ownership before cleanup: introduce local flags
(e.g., terminalReached and ownershipLost) around calls to completeRun() and
failRun() and in LeaseLostError catch set ownershipLost, set terminalReached
only when completeRun()/failRun() return true, then in the finally block only
call checkpoint.deleteSteps(run.id) when !state.preserveSteps && terminalReached
&& !ownershipLost; keep clearing leaseTimer/leaseDeadlineTimer and dispose()
unchanged.

In `@packages/durably/src/storage.ts`:
- Around line 757-767: The cancelRun() update is unconditional and can overwrite
terminal states; make it an atomic guarded transition by updating durably_runs
only when status is NOT in terminal states (e.g., not IN
('completed','failed','cancelled')) and matching the runId, then inspect the
affected-row count returned by updateTable('durably_runs').execute() (or the DB
client's result) to detect if no row was updated and return/fail accordingly
(e.g., return false or throw) so callers know the cancel did not take effect;
keep the same set fields (status, lease_owner, lease_expires_at, completed_at,
updated_at) but only apply when the row was in a cancellable state.
- Around line 486-495: The label-filter WHERE currently always uses SQLite's
json_extract(...) (inside the getRuns label loop where labels is validated)
which breaks when backend === 'postgres'; update the loop in storage.ts (the
block referencing validateLabels, labels and sql`json_extract(...)`) to branch
on the SQL dialect and emit a Postgres JSON extraction expression (e.g., labels
->> 'key' or #>> array_path) when backend === 'postgres' (or use your existing
SQL dialect helper), ensure the JSON path/quoting is built safely for each label
key and keep the value as a bound parameter so the query stays parameterized and
secure.
- Around line 675-685: The renewal currently updates durably_runs without
verifying the existing lease hasn't already expired; modify the update in the
function that computes leaseExpiresAt and calls db.updateTable('durably_runs')
so it also checks the current lease_expires_at is still in the future (e.g., add
a where('lease_expires_at', '>', now) or equivalent timestamp comparison) and
handle possible NULLs if leases can be absent, so the UPDATE only succeeds when
the lease is not yet expired and owned by workerId.

In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 40-52: The test captures leaseExpiresAt from the pending run
returned by d.jobs.job.trigger({}) so the baseline is null; instead, poll or
await until the run reaches 'leased' using d.jobs.job.getRun(run.id), then
capture baselineLeaseExpiresAt = midRun.leaseExpiresAt, call d.start(), wait,
fetch the run again via d.jobs.job.getRun(run.id) and assert the new
leaseExpiresAt is greater than the baselineLeaseExpiresAt; reference the
functions run.id, d.jobs.job.getRun, d.jobs.job.trigger, d.start, and the
leaseExpiresAt field when making these changes.

---

Nitpick comments:
In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 64-109: The test currently only checks timestamps.length > 0 which
can pass without any lease renewal; update the "respects custom lease renewal
interval" test to assert that timestamps contains at least two distinct
leaseExpiresAt values and that a later timestamp is greater than an earlier one
by roughly the configured leaseRenewIntervalMs (200ms) with a small tolerance
(e.g. >150ms). Locate the test's timestamps array and the customDurably created
with leaseRenewIntervalMs, parse the recorded leaseExpiresAt strings into
numbers/dates, assert distinctness (length of unique values >= 2) and assert
that (later - earlier) >= ~150ms to validate the renewal behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 782b177a-ce2b-4ab0-8d7f-8e413b3e2dd1

📥 Commits

Reviewing files that changed from the base of the PR and between d55a2ac and 5988ca3.

📒 Files selected for processing (36)
  • .github/workflows/ci.yml
  • CLAUDE.md
  • examples/server-node/lib/durably.ts
  • examples/spa-react-router/app/lib/durably.ts
  • examples/spa-vite-react/src/lib/durably.ts
  • packages/durably-react/tests/browser/use-job-logs.test.tsx
  • packages/durably-react/tests/browser/use-job-run.test.tsx
  • packages/durably-react/tests/browser/use-job.test.tsx
  • packages/durably-react/tests/browser/use-runs.test.tsx
  • packages/durably-react/tests/helpers/create-test-durably.ts
  • packages/durably/docs/llms.md
  • packages/durably/src/context.ts
  • packages/durably/src/durably.ts
  • packages/durably/src/server.ts
  • packages/durably/src/storage.ts
  • packages/durably/tests/browser/db-stress.test.ts
  • packages/durably/tests/node/core-extensions.test.ts
  • packages/durably/tests/react/strict-mode.test.tsx
  • packages/durably/tests/shared/concurrency.shared.ts
  • packages/durably/tests/shared/durably.shared.ts
  • packages/durably/tests/shared/log.shared.ts
  • packages/durably/tests/shared/plugin.shared.ts
  • packages/durably/tests/shared/recovery.shared.ts
  • packages/durably/tests/shared/run-api.shared.ts
  • packages/durably/tests/shared/server.shared.ts
  • packages/durably/tests/shared/step.shared.ts
  • packages/durably/tests/shared/worker.shared.ts
  • website/api/create-durably.md
  • website/api/durably-react/spa.md
  • website/api/events.md
  • website/api/index.md
  • website/guide/concepts.md
  • website/guide/deployment.md
  • website/guide/error-handling.md
  • website/guide/spa-mode.md
  • website/public/llms.txt
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/durably/tests/node/core-extensions.test.ts
  • packages/durably/tests/shared/worker.shared.ts
  • packages/durably-react/tests/browser/use-job.test.tsx

Comment thread packages/durably/src/context.ts Outdated
Comment thread packages/durably/src/storage.ts Outdated
}

describe('sqlocal queue stress', () => {
let runtimes: Array<Durably<any, any>>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Initialize runtimes before teardown can run.

If beforeEach throws before assignment, Line 25 will fail on runtimes.map(...) and mask the real setup error. Initializing this to [] makes cleanup safe.

🛠️ Proposed fix
-  let runtimes: Array<Durably<any, any>>
+  let runtimes: Array<Durably<any, any>> = []

Also applies to: 24-25

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/tests/browser/db-stress.test.ts` at line 10, The variable
runtimes is declared but not initialized, so if beforeEach throws the teardown
that calls runtimes.map(...) will fail and hide the original error; initialize
runtimes to an empty array at declaration (change the declaration of runtimes in
db-stress.test.ts to let runtimes: Array<Durably<any, any>> = []) so
beforeEach/teardown can safely run even when setup fails; ensure any teardown
logic (the function that iterates runtimes) relies on the now-initialized
runtimes variable.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@coji have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 0 seconds before sending another message.

Comment thread packages/durably/tests/browser/db-stress.test.ts
Comment thread packages/durably/tests/shared/plugin.shared.ts
Comment thread website/api/create-durably.md Outdated
Comment thread website/api/index.md Outdated
Comment thread website/public/llms.txt Outdated
- Add leaseOwner field to Run type tables in API docs
- Fix ClientRun exclusion list to include leaseOwner
- Update JSDoc comments for lease terminology
- Fix lease renewal test to properly capture initial lease after claim
- Add OPFS Secure Context note in deployment RFC
- Fix example code in fullstack docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
website/api/durably-react/fullstack.md (1)

463-463: ⚠️ Potential issue | 🟡 Minor

Update description to match code examples.

The description states "Cancel a leased job" but the code examples at lines 112-114 and 434 both demonstrate cancelling runs with either pending or leased status. Update the description to reflect both statuses.

📝 Suggested fix
-| `cancel`    | `(runId: string) => Promise<void>`              | Cancel a leased job                         |
+| `cancel`    | `(runId: string) => Promise<void>`              | Cancel a pending or leased job              |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/api/durably-react/fullstack.md` at line 463, Update the table
description for the `cancel` method signature `(runId: string) => Promise<void>`
so it matches the examples: change the text from "Cancel a leased job" to
something like "Cancel a pending or leased run" (or "Cancel a run in pending or
leased status") to reflect that examples at lines showing canceling runs handle
both `pending` and `leased` statuses; update the single-cell table entry where
`cancel` appears to use this new wording.
packages/durably/src/context.ts (1)

119-136: ⚠️ Potential issue | 🟠 Major

Post-step lease ownership re-check is still missing before persisting checkpoint.

After fn() resolves at line 121, the code checks the local abort signal at line 122 but does not re-fetch the run from the database to verify status === 'leased' && leaseOwner === workerId before calling createStep() and advanceRunStepIndex(). If the lease expired during step execution and another worker reclaimed the run, this worker could still persist a stale checkpoint.

The local abort signal relies on the leaseDeadlineTimer in durably.ts firing, but there's a race window where the timer hasn't fired yet while the DB lease has already expired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/durably/src/context.ts` around lines 119 - 136, After
fn(controller.signal) resolves and throwIfAborted() is called, re-fetch the run
from storage to ensure the lease still belongs to this worker before persisting
the checkpoint; specifically, call storage.getRun(run.id) (or equivalent) and
verify fetchedRun.status === 'leased' && fetchedRun.leaseOwner === workerId, and
only then call storage.createStep(...) and storage.advanceRunStepIndex(...); if
the check fails, abort without writing the step or advancing the index (cleanly
return or throw) to avoid persisting a stale checkpoint.
docs/rfcs/runtime-rearchitecture/core-runtime.md (1)

206-209: ⚠️ Potential issue | 🟡 Minor

Clarify whether Phase 1 changes the default step cleanup behavior.

This section states "preserve step history" as the default, which appears to conflict with the existing cleanupSteps contract. Based on learnings, step output data is deleted when runs reach terminal state (controlled by cleanupSteps configuration).

If Phase 1 intentionally changes this default to preserve step history, please state that explicitly and note the behavioral change for adapter implementors. If Phase 1 does NOT change the cleanupSteps contract, update the text to clarify that this is describing a recommended pattern rather than a new default behavior.

Based on learnings, step output data is deleted when runs reach terminal state, controlled by cleanupSteps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` around lines 206 - 209,
Clarify whether Phase 1 changes the default step cleanup behavior: update the
prose around "preserve step history" to explicitly state if Phase 1 changes the
existing cleanupSteps contract or not; if Phase 1 preserves history by default,
say "Phase 1 changes the default: runs will preserve step history and will not
delete step output on terminal state" and call out adapter implementors to
update their implementations accordingly (reference cleanupSteps and
StepRecord); otherwise, mark the text as a recommended pattern (not a default)
and add a short note that existing behavior—where step output data is deleted at
terminal state controlled by cleanupSteps—remains unchanged.
docs/rfcs/runtime-rearchitecture/ja/core-runtime.md (1)

206-209: ⚠️ Potential issue | 🟡 Minor

Phase 1 でステップクリーンアップのデフォルト動作が変更されるか明示してください。

このセクションでは「ステップ履歴を保持します」がデフォルトであると述べていますが、既存の cleanupSteps 契約と矛盾する可能性があります。学習によれば、ステップ出力データは run が終了状態に達したときに削除されます(cleanupSteps 設定で制御)。

Phase 1 で意図的にこのデフォルトを変更する場合は、明示的に述べ、adapter 実装者向けに動作変更を記載してください。Phase 1 で cleanupSteps 契約を変更しない場合は、これが新しいデフォルト動作ではなく推奨パターンを説明していることを明確にしてください。

Based on learnings, step output data is deleted when runs reach terminal state, controlled by cleanupSteps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md` around lines 206 - 209,
明確化が必要です: ドキュメント内の「ステップ履歴を保持します」という文言が既存の cleanupSteps 契約と矛盾しているため、Phase 1
で実際にデフォルト動作を変更するのかを明示してください。もし Phase 1 でデフォルトを変更するなら「Phase 1 で cleanupSteps
のデフォルト挙動を変更し、ステップ出力を保持する」と明記し、adapter 実装者向けに cleanupSteps
契約の新仕様と移行手順を追記してください。変更しないなら、この箇所を「推奨パターン(デフォルトではない)」と明確に書き換え、既存の cleanupSteps
設定(run が終端状態のときにステップ出力を削除する挙動)を参照して矛盾を解消してください。
🧹 Nitpick comments (1)
docs/rfcs/runtime-rearchitecture/core-runtime.md (1)

497-497: Minor: Consider simpler phrasing.

The phrase "in support of" could be shortened to "supporting" for conciseness, though the current phrasing is acceptable.

✍️ Optional simplification
-3. Claim and reclaim become adapter-level semantics in support of runtime execution.
+3. Claim and reclaim become adapter-level semantics supporting runtime execution.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` at line 497, Replace the
phrase "in support of" with the shorter "supporting" in the sentence "Claim and
reclaim become adapter-level semantics in support of runtime execution." (edit
the sentence in core-runtime.md so it reads "Claim and reclaim become
adapter-level semantics supporting runtime execution.")
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Around line 206-209: Clarify whether Phase 1 changes the default step cleanup
behavior: update the prose around "preserve step history" to explicitly state if
Phase 1 changes the existing cleanupSteps contract or not; if Phase 1 preserves
history by default, say "Phase 1 changes the default: runs will preserve step
history and will not delete step output on terminal state" and call out adapter
implementors to update their implementations accordingly (reference cleanupSteps
and StepRecord); otherwise, mark the text as a recommended pattern (not a
default) and add a short note that existing behavior—where step output data is
deleted at terminal state controlled by cleanupSteps—remains unchanged.

In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md`:
- Around line 206-209: 明確化が必要です: ドキュメント内の「ステップ履歴を保持します」という文言が既存の cleanupSteps
契約と矛盾しているため、Phase 1 で実際にデフォルト動作を変更するのかを明示してください。もし Phase 1 でデフォルトを変更するなら「Phase 1
で cleanupSteps のデフォルト挙動を変更し、ステップ出力を保持する」と明記し、adapter 実装者向けに cleanupSteps
契約の新仕様と移行手順を追記してください。変更しないなら、この箇所を「推奨パターン(デフォルトではない)」と明確に書き換え、既存の cleanupSteps
設定(run が終端状態のときにステップ出力を削除する挙動)を参照して矛盾を解消してください。

In `@packages/durably/src/context.ts`:
- Around line 119-136: After fn(controller.signal) resolves and throwIfAborted()
is called, re-fetch the run from storage to ensure the lease still belongs to
this worker before persisting the checkpoint; specifically, call
storage.getRun(run.id) (or equivalent) and verify fetchedRun.status === 'leased'
&& fetchedRun.leaseOwner === workerId, and only then call
storage.createStep(...) and storage.advanceRunStepIndex(...); if the check
fails, abort without writing the step or advancing the index (cleanly return or
throw) to avoid persisting a stale checkpoint.

In `@website/api/durably-react/fullstack.md`:
- Line 463: Update the table description for the `cancel` method signature
`(runId: string) => Promise<void>` so it matches the examples: change the text
from "Cancel a leased job" to something like "Cancel a pending or leased run"
(or "Cancel a run in pending or leased status") to reflect that examples at
lines showing canceling runs handle both `pending` and `leased` statuses; update
the single-cell table entry where `cancel` appears to use this new wording.

---

Nitpick comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Line 497: Replace the phrase "in support of" with the shorter "supporting" in
the sentence "Claim and reclaim become adapter-level semantics in support of
runtime execution." (edit the sentence in core-runtime.md so it reads "Claim and
reclaim become adapter-level semantics supporting runtime execution.")

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54d18a25-24a6-411b-92f3-e79b204259a7

📥 Commits

Reviewing files that changed from the base of the PR and between 5988ca3 and 5aad4a5.

📒 Files selected for processing (33)
  • .github/workflows/ci.yml
  • CLAUDE.md
  • docs/rfcs/runtime-rearchitecture/README.md
  • docs/rfcs/runtime-rearchitecture/core-runtime.md
  • docs/rfcs/runtime-rearchitecture/database-claim-patterns.md
  • docs/rfcs/runtime-rearchitecture/deployment-models.md
  • docs/rfcs/runtime-rearchitecture/ja/README.md
  • docs/rfcs/runtime-rearchitecture/ja/core-runtime.md
  • docs/rfcs/runtime-rearchitecture/ja/database-claim-patterns-ja.md
  • docs/rfcs/runtime-rearchitecture/phase1-exploration-notes.md
  • examples/server-node/basic.ts
  • package.json
  • packages/durably-react/src/hooks/use-job-run.ts
  • packages/durably-react/src/hooks/use-job.ts
  • packages/durably/package.json
  • packages/durably/src/context.ts
  • packages/durably/src/durably.ts
  • packages/durably/src/index.ts
  • packages/durably/src/job.ts
  • packages/durably/src/migrations.ts
  • packages/durably/src/storage.ts
  • packages/durably/tests/browser/db-stress.test.ts
  • packages/durably/tests/node/phase1-exploration.test.ts
  • packages/durably/tests/shared/db-concurrency.shared.ts
  • packages/durably/tests/shared/db-semantics.shared.ts
  • packages/durably/tests/shared/db-stress.shared.ts
  • packages/durably/tests/shared/durably.shared.ts
  • packages/durably/tests/shared/recovery.shared.ts
  • packages/durably/tests/shared/storage.shared.ts
  • website/api/create-durably.md
  • website/api/durably-react/fullstack.md
  • website/api/durably-react/types.md
  • website/api/index.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/durably/tests/shared/db-stress.shared.ts
  • packages/durably/tests/browser/db-stress.test.ts
  • packages/durably/tests/node/phase1-exploration.test.ts
  • packages/durably/tests/shared/db-concurrency.shared.ts
  • docs/rfcs/runtime-rearchitecture/ja/README.md

OPFS requires exclusive access, so only one tab can run Durably at a
time. Use Web Locks API to detect when another tab holds the lock and
show a friendly message instead of crashing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update RFC checkpoint retention section to reflect current defaults
- Initialize runtimes array to prevent teardown crash on setup failure
- Fix preserveSteps comment in llms.md (was misleading about default)
- Add busy_timeout pragma to local SQLite test helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
createStep() and advanceRunStepIndex() now verify lease ownership
(status='leased' AND lease_owner=workerId) before writing, closing the
race window where a stale worker could persist steps after its lease
expired and was reclaimed by another worker. completeRun/failRun already
had these guards; this extends the same protection to step-level writes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
coji and others added 2 commits March 8, 2026 14:55
Update the runtime rearchitecture RFC (EN/JA) with design decisions
from Phase 1 implementation:

- Add leaseGeneration (fencing token) to RunRecord and Store interface
- Replace workerId-based guards with leaseGeneration across all
  lease-holder writes (renewLease, completeRun, failRun, persistStep)
- Introduce persistStep as atomic replacement for createStep +
  advanceRunStepIndex (eliminates TOCTOU window)
- Add completed step uniqueness via partial unique index
- Document side effect boundaries and step idempotency requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace scattered workerId-based WHERE guards with a monotonically
increasing leaseGeneration counter (fencing token) for all lease-holder
writes. This eliminates workerId reuse collisions and provides a
single, unified guard mechanism.

Key changes:
- Add lease_generation column to durably_runs (migration v4)
- Replace createStep + advanceRunStepIndex with atomic persistStep
  using INSERT...SELECT guarded by lease_generation
- Add partial unique index on completed steps for replay determinism
- Update renewLease, completeRun, failRun to use leaseGeneration
- Update all tests to use the new API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lease_generation field and cancelled step status to the database
schema description. Note partial unique index on completed steps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract 'lease-lost' magic string to LEASE_LOST constant in context.ts
- Unify completeRun/failRun copy-paste into shared terminateRun helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cancel() works on both pending and leased runs, not just leased.
Updated in fullstack.md, http-handler.md, and error-handling.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@coji coji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: Duplicate comments (4)

1. fullstack.md — "Cancel a leased job" → Fixed

Updated to "Cancel a pending or leased run" in fullstack.md, http-handler.md, and error-handling.md (4756f82).

2. context.ts — "Post-step lease ownership re-check missing" → Already resolved

This was flagged against the old createStep + advanceRunStepIndex code. The current implementation uses persistStep which does an atomic INSERT...SELECT WHERE lease_generation = ? — the DB-level guard ensures no stale checkpoint can be written without an extra round-trip. No code change needed.

3 & 4. core-runtime.md / ja/core-runtime.md — "preserve step history" default → Already correct

Both EN (line 251) and JA (line 234) already state: "deletes step output data by default (preserveSteps: false)" with preserveSteps: true as an opt-in. The StepRecord interface at lines 206-209 that triggered this comment is just a type definition, not a default-behavior statement. No change needed.

…rite failure

- Add leaseGeneration parameter to updateProgress() to prevent stale
  workers from polluting another worker's progress data
- Emit worker:error events when completeRun/failRun returns false
  (lease lost), improving debuggability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coji coji merged commit b6a897a into main Mar 8, 2026
4 checks passed
@coji coji deleted the phase1-runtime-exploration branch March 8, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant