Phase 1: Lease-based runtime model#101
Conversation
Replace the MATERIALIZED CTE approach with a 3-step claim pattern: SELECT FOR UPDATE SKIP LOCKED → pg_advisory_xact_lock → UPDATE. This fixes both the CTE snapshot isolation bug (double-claiming the same row) and the concurrency-key race (two workers claiming different rows with the same key). Enable the previously skipped direct concurrent claim test for PostgreSQL. Update RFC docs accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When advisory lock re-verification detects a concurrency-key conflict, exclude that key and loop to find the next eligible candidate within the same transaction. This prevents premature idle when unrelated pending work is still claimable. Also tighten RFC wording that overstated what was proven. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite all 7 Japanese RFC documents from scratch based on the English originals. Key improvements: consistent です/ます style, natural Japanese phrasing instead of literal translations, unified technical terminology, relative link paths, and Phase 1 advisory lock findings reflected in database-claim-patterns-ja.md. Also include minor English doc updates for database-runtime-fit.md and phase1-exploration-notes.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughImplements a lease-based runtime and storage rearchitecture: replaces heartbeat/running semantics with lease/leased, introduces a unified public Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Runtime as Durably Runtime
participant Store as Unified Store
participant Worker as Worker (processOne)
participant Checkpoint as Checkpoint/Steps
Client->>Runtime: trigger(input)
Runtime->>Store: enqueue(input)
loop Worker processing
Worker->>Store: claimNext(workerId, now, leaseMs)
Note over Store: Acquire lease (leaseGeneration, leaseOwner, leaseExpiresAt)
Store-->>Worker: Run (status: leased)
Worker->>Checkpoint: getSteps(runId)
Checkpoint-->>Worker: Steps[]
Worker->>Worker: execute job (StepContext with leaseGeneration)
Worker->>Store: persistStep(runId, leaseGeneration, input)
Worker->>Store: completeRun(runId, leaseGeneration, output, completedAt)
Worker->>Store: renewLease(...) or releaseExpiredLeases()
end
Client->>Runtime: getRun(runId)
Runtime->>Store: getRun(runId)
Store-->>Runtime: Run (status/lease fields)
sequenceDiagram
participant R1 as Runtime A
participant R2 as Runtime B
participant Store as Unified Store
R1->>Store: claimNext(A, now, leaseMs)
R2->>Store: claimNext(B, now, leaseMs)
Note over Store: Concurrency-key advisory lock / SKIP LOCKED ensures single winner
Store-->>R1: Run (leased by A)
Store-->>R2: null
R1->>Store: renewLease(runId, leaseGeneration, now, leaseMs)
Note over Store: Lease extended
Note over Store: Lease expires
R2->>Store: claimNext(B, now2, leaseMs)
Store-->>R2: Run (reclaimed by B)
R1->>Store: completeRun(runId, oldLeaseGeneration, ...)
Store-->>R1: false (stale owner)
R2->>Store: completeRun(runId, newLeaseGeneration, ...)
Store-->>R2: true (completed)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (14)
packages/durably/src/errors.ts (1)
16-20: ExposerunIdas data onLeaseLostError.Since this is now a public error type, callers should not have to parse the message text to learn which run lost its lease.
♻️ Proposed fix
export class LeaseLostError extends Error { + readonly runId: string + constructor(runId: string) { super(`Lease ownership was lost: ${runId}`) this.name = 'LeaseLostError' + this.runId = runId } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/errors.ts` around lines 16 - 20, The LeaseLostError currently only encodes the runId in the message; update the LeaseLostError class to store the runId as a public property so callers can read it without parsing the message: add a public readonly runId: string field on the LeaseLostError class and assign it in the constructor (while keeping the existing message and name behavior), so callers can access error.runId directly from instances created by new LeaseLostError(runId).docs/rfcs/runtime-rearchitecture/core-runtime.md (1)
500-501: Minor: Consider simplifying phrasing."in support of runtime execution" could be shortened to "for runtime execution" or "supporting runtime execution".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` around lines 500 - 501, The phrase "in support of runtime execution" in the sentence about "Claim and reclaim become adapter-level semantics" is wordy—replace it with a simpler alternative such as "for runtime execution" or "supporting runtime execution" to improve clarity; update the line mentioning "Claim and reclaim" (and adjacent line referencing `processOne()`) to use the chosen shorter phrasing consistently.packages/durably/src/worker.ts (1)
70-87: Consider returning an already-resolved Promise when no work is in-flight.When
stop()is called with no in-flight work, it implicitly returnsundefinedrather than a resolved Promise. While this works due to async semantics, explicitly returningPromise.resolve()would be clearer for callers expecting a Promise.♻️ Optional: Explicit Promise return
async stop(): Promise<void> { if (!running) { return } running = false if (pollingTimeout) { clearTimeout(pollingTimeout) pollingTimeout = null } if (inFlight) { return new Promise<void>((resolve) => { stopResolver = resolve }) } + + return Promise.resolve() },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/worker.ts` around lines 70 - 87, The stop() async method can return undefined when there is no in-flight work; change it to explicitly return a resolved Promise for clarity: after setting running = false and clearing pollingTimeout, if inFlight is false return Promise.resolve(); keep the existing branch that sets stopResolver and returns a new Promise when inFlight is true. Update the stop() implementation to reference the same symbols (running, pollingTimeout, inFlight, stopResolver) and return Promise.resolve() in the no-inFlight case.packages/durably/tests/node/db-stress.libsql.test.ts (1)
7-13: Add cleanup to remove the temporary database file.Unlike the
local-sqlitetest which has acleanupfunction to remove the temp file, this libSQL test leaves the file behind. This can accumulate test artifacts in the temp directory.♻️ Proposed fix to add cleanup
+import { rm } from 'node:fs/promises' import { randomUUID } from 'node:crypto' import { tmpdir } from 'node:os' import { join } from 'node:path' import { createNodeDialectForFile } from '../helpers/node-dialect' import { createDbStressTests } from '../shared/db-stress.shared' createDbStressTests('libsql', () => { const filePath = join(tmpdir(), `durably-libsql-shared-${randomUUID()}.db`) return { createDialect: () => createNodeDialectForFile(filePath), + cleanup: async () => { + await rm(filePath, { force: true }) + }, } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/node/db-stress.libsql.test.ts` around lines 7 - 13, The libsql stress test creates a temp DB file but doesn't remove it; update the object returned from createDbStressTests('libsql', ...) to include a cleanup function that deletes the temp filePath (the one built with tmpdir() and randomUUID()) after tests complete. Implement cleanup that checks for the file's existence and removes it (e.g., fs.unlinkSync or fs.rmSync with force) and export it alongside createDialect (refer to createDialect and createNodeDialectForFile) so the test harness can call it.packages/durably/tests/browser/db-stress.test.ts (1)
1-41: Consider reusing the shared stress test harness.This browser test duplicates much of the logic from
db-stress.shared.ts. The main difference is the browser dialect. Consider adaptingcreateDbStressTeststo accept a browser dialect factory, which would reduce duplication and ensure tests stay in sync.That said, if browser-specific quirks require a separate implementation, this standalone approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/browser/db-stress.test.ts` around lines 1 - 41, The test duplicates shared stress-test logic; update the shared harness to accept a browser dialect factory: modify createDbStressTests to take a parameter like browserDialectFactory (or a generic dialectFactory) and use createBrowserDialectForName when invoking it from the browser test, then replace the duplicated setup in db-stress.test.ts with a call to createDbStressTests(dialectFactory) so the browser-specific dialect is injected while preserving any browser-specific tweaks; reference createDbStressTests, createBrowserDialectForName, and the runtimes/migrate/deleteFrom setup to locate and consolidate the logic.packages/durably/tests/shared/db-stress.shared.ts (1)
20-31: Consider migrating all runtimes or documenting single-migration assumption.Only
runtimes[0].migrate()is called, assuming all runtimes share the same underlying database. This works for file-backed SQLite and shared PostgreSQL schemas, but it's worth noting that this assumption is implicit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/db-stress.shared.ts` around lines 20 - 31, Test setup calls runtimes[0].migrate() implicitly assuming a shared DB; either migrate every runtime or document the single-migration assumption. Fix by replacing the single call to runtimes[0].migrate() in the beforeEach (where runtimes is created via createDurably) with a loop that awaits migrate() on each runtime, or add an explicit comment in the beforeEach noting that only runtimes[0] is migrated because all runtimes share the same underlying database (file-backed SQLite or shared PostgreSQL schema).packages/durably/src/events.ts (1)
34-45: Clarify the deprecation comment.The JSDoc states
RunStartEventis a "Deprecated compatibility alias for run:leased," butRunStartEventusestype: 'run:start', not'run:leased'. These are semantically distinct event types. IfRunStartEventis being deprecated in favor ofRunLeasedEvent, consider rewording to clarify the migration path (e.g., "Deprecated: prefer RunLeasedEvent for new code").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/events.ts` around lines 34 - 45, The JSDoc on RunStartEvent is misleading: update the comment for the RunStartEvent interface to clearly mark it as deprecated and point to the replacement RunLeasedEvent (or whatever the new type is) and clarify the migration path; reference the RunStartEvent interface and the replacement RunLeasedEvent/type name so readers know to prefer RunLeasedEvent for new code and that RunStartEvent (with type: 'run:start') remains for compatibility only.packages/durably/tests/shared/db-concurrency.shared.ts (1)
202-267: Potential resource leak:runtimesarray not destroyed in this test.This test creates separate
runtimeAandruntimeBinstances and destroys them at line 266, but the sharedruntimesarray (created inbeforeEach) is still active. TheafterEachhook will destroyruntimes, but since this test also callsruntimeA.migrate()and clears tables viaruntimeA.db, the fourruntimesinstances remain idle but connected throughout test execution.Consider either:
- Reusing
runtimes[0]andruntimes[1]with job registration instead of creating new instances, or- Destroying the shared
runtimesearly in this test if they're not needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/db-concurrency.shared.ts` around lines 202 - 267, The test creates its own runtimeA/runtimeB but leaves the shared runtimes array from beforeEach alive, risking leaked DB connections; either reuse the existing runtimes by registering concurrencyJob on runtimes[0] and runtimes[1] (instead of createDurably) or explicitly destroy the shared runtimes early in this test (call await runtimes.forEach(async r => r.db.destroy()/r.destroy()) or similar) before creating runtimeA/runtimeB so only the intended instances remain active; update the test to use the chosen approach and ensure cleanup (runtimeA/runtimeB and any shared runtimes) so no idle connections persist.packages/durably/tests/browser/singleton-warning.test.ts (1)
8-13: Consider usingPromise.allSettledfor more robust cleanup.If one runtime's
stop()ordestroy()throws, the remaining runtimes won't be cleaned up. UsingPromise.allSettledwould ensure all cleanup attempts complete regardless of individual failures.♻️ Optional: More robust cleanup
afterEach(async () => { - await Promise.all(runtimes.map((runtime) => runtime.stop())) - await Promise.all(runtimes.map((runtime) => runtime.db.destroy())) + await Promise.allSettled(runtimes.map((runtime) => runtime.stop())) + await Promise.allSettled(runtimes.map((runtime) => runtime.db.destroy())) runtimes.length = 0 vi.restoreAllMocks() })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/browser/singleton-warning.test.ts` around lines 8 - 13, The afterEach cleanup currently uses Promise.all for runtimes.map(runtime => runtime.stop()) and runtimes.map(runtime => runtime.db.destroy()), which aborts remaining cleanup if one promise rejects; change both to Promise.allSettled so every runtime's stop() and db.destroy() is attempted, then inspect the settled results and log or rethrow any failures as appropriate (use the runtimes variable and the afterEach hook, and keep vi.restoreAllMocks() as-is) — this ensures afterEach always tries to clean up all runtimes even if some stop()/db.destroy() calls fail.packages/durably/src/index.ts (1)
61-71: Be careful turning Phase 1 store shapes into root-level API.The RFCs in this PR position
processOne()as the portability contract and the queue/checkpoint split as adapter internals. Re-exportingQueueStore,CheckpointStore, andStoragefrom the root barrel makes those exploratory shapes part of the supported semver surface for all consumers. If third-party adapter authoring is not intentionally first-class yet, prefer an explicitexperimentalnamespace or a deeper import path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/index.ts` around lines 61 - 71, This re-exports QueueStore, CheckpointStore, and Storage at the package root, which promotes exploratory adapter shapes to the public semver surface; instead remove these three types from the root export list and surface them under an explicit experimental/deep import instead (e.g., export them from an `experimental` namespace or from a deeper module path), leaving only the intended portability contract (e.g., processOne and public runtime types) at the top level; update any internal references to QueueStore/CheckpointStore/Storage to import from the new experimental/deep path so third-party consumers aren’t accidentally locked into these adapter shapes.packages/durably/tests/node/phase1-exploration.test.ts (1)
7-11: DrainruntimesinafterEach().With two tests in this file, Lines 9-10 stop the first runtime again during the second teardown because the array is never cleared. That makes cleanup order-dependent for no benefit.
♻️ Suggested cleanup
afterEach(async () => { - await Promise.all(runtimes.map((runtime) => runtime.stop())) + await Promise.all(runtimes.splice(0).map((runtime) => runtime.stop())) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/node/phase1-exploration.test.ts` around lines 7 - 11, The afterEach teardown currently stops all runtimes in the shared runtimes array but never clears it, causing duplicate stop attempts across tests; update the afterEach handler (the runtimes array and the async () => runtimes.map(runtime => runtime.stop()) block) to both await stopping all runtimes and then clear the runtimes array (e.g., set runtimes.length = 0 or splice(0)) so each test’s cleanup only affects that test’s entries.packages/durably/src/context.ts (1)
189-192: Progress update is fire-and-forget but should handle errors.The progress persistence call
storage.checkpoint.updateProgress(run.id, progressData)is not awaited (intentional fire-and-forget), but any rejection will become an unhandled promise rejection.Consider adding a
.catch()to silently swallow or log errors:🔧 Proposed fix to handle potential rejection
progress(current: number, total?: number, message?: string): void { const progressData = { current, total, message } // Fire and forget - don't await - storage.checkpoint.updateProgress(run.id, progressData) + storage.checkpoint.updateProgress(run.id, progressData).catch(() => { + // Silently ignore progress update failures - they're non-critical + }) // Emit progress event🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/context.ts` around lines 189 - 192, The progress method currently calls storage.checkpoint.updateProgress(run.id, progressData) without awaiting, which can produce unhandled promise rejections; update the progress method to keep the fire-and-forget behavior but append a .catch(...) to the returned promise (or otherwise handle rejection) so errors are either silently swallowed or logged via the existing logger; modify the progress method around storage.checkpoint.updateProgress to include this rejection handling while preserving run.id and progressData usage.packages/durably/tests/node/stale-owner.e2e.test.ts (1)
29-38: Timing configuration may cause flakiness in CI environments.The 25ms
leaseMscombined with 40ms sleeps (lines 76, 123, 179, 243) provides only ~15ms margin for lease expiration detection. Under CI load or slow I/O, this margin may be insufficient.Consider either:
- Increasing
leaseMsto 50-100ms with proportionally larger sleep buffers- Using a polling assertion (e.g.,
vi.waitFor) to confirm lease expiration rather than a fixed delay🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/node/stale-owner.e2e.test.ts` around lines 29 - 38, The test uses very short leases in createDurably (runtimeA and runtimeB with leaseMs: 25) which makes CI flaky; update the lease configuration to a more robust value (e.g., set leaseMs to 100 and keep leaseRenewIntervalMs proportional) and either increase the fixed sleep times accordingly or replace fixed delays with a polling assertion (e.g., vi.waitFor) to wait for the lease to expire/owner change; locate the createDurably calls and the corresponding test waits/sleeps and apply the increased leaseMs or switch the sleeps to polling checks that verify lease expiration/owner change.packages/durably/tests/node/process-until-idle.serverless.test.ts (1)
72-82: Test validates quick return but doesn't register any jobs.The test calls
runtime.migrate()without registering any jobs, then callsprocessUntilIdle(). This works because there's simply no work in the queue, but consider adding a comment clarifying this is intentional (testing empty queue behavior with no registered jobs).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/node/process-until-idle.serverless.test.ts` around lines 72 - 82, The test 'returns quickly when there is no claimable work' calls runtime.migrate() and then runtime.processUntilIdle() without registering any jobs, which is intentional but not documented; add an inline comment inside this test (near the runtime.migrate() or before the processUntilIdle() call) stating that the absence of registered jobs is deliberate and the test is verifying empty-queue behavior for processUntilIdle(), referencing the test name and the runtime.processUntilIdle() invocation so future readers understand this is not a missing setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/rfcs/runtime-rearchitecture/deployment-models.md`:
- Around line 296-314: Update the Browser-Local Mode guidance to explicitly
state that OPFS-backed SQLite requires a Secure Context (HTTPS or localhost); in
the section mentioning "browser-local database runtime such as SQLocal /
OPFS-backed SQLite" add a short note that OPFS is only available in secure
contexts and that attempts to initialize OPFS on plain HTTP will fail, and
advise developers to use HTTPS or localhost or fall back to other storage
options when presenting how to create and share a single Durably instance per
tab.
In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md`:
- Around line 206-209: The paragraph ambiguously implies a new default of
retaining step history; explicitly state whether Phase 1 changes the existing
cleanup behavior or not: if Phase 1 does NOT change the contract, update the
text to say "This does not change the existing cleanupSteps contract; adapters
must continue deleting step output data at terminal state per cleanupSteps," and
if Phase 1 DOES change it, state "Phase 1 changes the default: step history is
retained by default and cleanupSteps semantics are updated to require explicit
cleanup policies," then add a short note for adapter implementors to follow the
revised contract; reference the cleanupSteps contract and "Phase 1" so readers
know which behavior applies.
In `@docs/rfcs/runtime-rearchitecture/README.md`:
- Around line 67-68: The README contains filesystem-absolute Markdown links like
[core-runtime.md](/Users/coji/progs/oss/durably/docs/rfcs/runtime-rearchitecture/core-runtime.md);
replace those with repo-relative links (e.g., [core-runtime.md](core-runtime.md)
or the appropriate relative path under docs/rfcs/runtime-rearchitecture/) so
links work in other checkouts and on GitHub, and apply the same change to the
other absolute links in this file (the other occurrences in the same README).
In `@packages/durably/src/durably.ts`:
- Around line 466-505: renewLease currently returns a boolean and the local
timer computes renewedLeaseExpiresAt from Date.now(), which can drift past the
persisted DB expiry; change queue.renewLease to return the persisted lease
expiry (e.g., an ISO timestamp or millis) and, in the leaseTimer callback (where
renewLease is called), use that returned expiry value to call
scheduleLeaseDeadline(...) and to populate leaseExpiresAt in the
'run:lease-renewed' event (reference: leaseTimer, queue.renewLease,
scheduleLeaseDeadline, renewedLeaseExpiresAt, state.leaseMs, run.id); also
update any types/signatures and error handling to accept the new return shape.
- Around line 507-591: The finally block always deletes persisted step outputs
via checkpoint.deleteSteps(run.id) when state.preserveSteps is false, which can
wipe checkpoints on non-terminal exits (e.g., LeaseLostError or when
queue.completeRun/queue.failRun returned false). Change the cleanup so deletion
only occurs when the run actually reached a terminal state and cleanupSteps is
enabled: use the boolean results already returned by queue.completeRun
(completed) and queue.failRun (failed) or track a terminal flag set when either
completes successfully, and only call checkpoint.deleteSteps(run.id) if
!state.preserveSteps && cleanupStepsEnabled && terminalReached; keep clearing
timers and calling dispose() unchanged. Ensure you reference LeaseLostError,
completeRun, failRun, checkpoint.deleteSteps, state.preserveSteps and run.id
when making the change.
In `@packages/durably/src/migrations.ts`:
- Around line 131-139: The migration currently converts legacy rows by running
db.updateTable('durably_runs').set({status: 'leased', lease_owner:
'migration:v2', lease_expires_at: null}).where('status', '=', 'running') which
incorrectly creates non-expiring synthetic leases and strands in-flight work;
before applying this update, add a check for any rows with status 'running'
(query durably_runs WHERE status = 'running') and if any exist either abort the
migration by throwing an error or map them to a safe deterministic state (e.g.,
a terminal or requeueable status) instead of setting lease_expires_at to
null—ensure the change is made in the same migration function in migrations.ts
and remove the set of lease_expires_at = null when choosing to abort or remap.
In `@packages/durably/src/storage.ts`:
- Around line 675-693: The renewLease implementation in renewLease currently
only checks status and lease_owner allowing stale processes to renew after
lease_expires_at; update renewLease (in packages/durably/src/storage.ts) to also
require the current lease not be expired by adding a condition that
lease_expires_at > now (compare the DB column lease_expires_at against the
incoming now string) in the query before performing the update so renewals are
rejected when the lease has already expired.
- Around line 762-774: The cancelRun update is not conditional and can overwrite
a run that completed between the prior read and this write; change cancelRun to
perform an atomic conditional update by adding a WHERE clause that only updates
when the run is not already in a terminal state (e.g., WHERE status NOT IN
('completed','failed','cancelled')) and (optionally) when the lease_owner or
expected status matches the prior read; have the method check the number of
affected rows (or return a boolean) so callers (e.g., durably.cancel) can detect
whether the transition succeeded and avoid clobbering terminal data.
- Around line 341-344: The getRuns label-filtering logic in
createKyselyQueueStore currently hardcodes json_extract(...) (SQLite) which
breaks when backend === 'postgres'; update createKyselyQueueStore (and the
getRuns implementation) to use a dialect-aware JSON extractor: either branch on
backend === 'postgres' to emit "column ->> 'key'" for Postgres and
json_extract(column, '$.key') for others, or add a helper function (e.g.,
jsonExtract(column, key) used by getRuns) that returns the correct SQL snippet
based on DatabaseBackend; ensure the DatabaseBackend parameter passed into
createKyselyQueueStore is used for this decision so label filters work on
PostgreSQL.
In `@packages/durably/tests/helpers/local-sqlite-dialect.ts`:
- Around line 11-12: Add a busy timeout to the local test SQLite connection so
concurrent writers don't get immediate SQLITE_BUSY; after creating the Database
(the sqlite variable) and setting WAL via sqlite.pragma('journal_mode = WAL'),
call the SQLite busy timeout PRAGMA (e.g., set sqlite.pragma('busy_timeout =
5000') or similar timeout in milliseconds) to allow retries under contention.
In `@packages/durably/tests/node/db-semantics.postgres.test.ts`:
- Line 4: The postgres test inconsistently invokes the factory instead of
passing it like the others; change the call to createDbSemanticsTests so it
passes the factory function createPostgresSchemaResource (remove the
parentheses) to match how createNodeDialect, createLocalSqliteDialect, and
createBrowserDialect are passed and ensure the harness receives the factory
function rather than a pre-invoked resource.
---
Nitpick comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Around line 500-501: The phrase "in support of runtime execution" in the
sentence about "Claim and reclaim become adapter-level semantics" is
wordy—replace it with a simpler alternative such as "for runtime execution" or
"supporting runtime execution" to improve clarity; update the line mentioning
"Claim and reclaim" (and adjacent line referencing `processOne()`) to use the
chosen shorter phrasing consistently.
In `@packages/durably/src/context.ts`:
- Around line 189-192: The progress method currently calls
storage.checkpoint.updateProgress(run.id, progressData) without awaiting, which
can produce unhandled promise rejections; update the progress method to keep the
fire-and-forget behavior but append a .catch(...) to the returned promise (or
otherwise handle rejection) so errors are either silently swallowed or logged
via the existing logger; modify the progress method around
storage.checkpoint.updateProgress to include this rejection handling while
preserving run.id and progressData usage.
In `@packages/durably/src/errors.ts`:
- Around line 16-20: The LeaseLostError currently only encodes the runId in the
message; update the LeaseLostError class to store the runId as a public property
so callers can read it without parsing the message: add a public readonly runId:
string field on the LeaseLostError class and assign it in the constructor (while
keeping the existing message and name behavior), so callers can access
error.runId directly from instances created by new LeaseLostError(runId).
In `@packages/durably/src/events.ts`:
- Around line 34-45: The JSDoc on RunStartEvent is misleading: update the
comment for the RunStartEvent interface to clearly mark it as deprecated and
point to the replacement RunLeasedEvent (or whatever the new type is) and
clarify the migration path; reference the RunStartEvent interface and the
replacement RunLeasedEvent/type name so readers know to prefer RunLeasedEvent
for new code and that RunStartEvent (with type: 'run:start') remains for
compatibility only.
In `@packages/durably/src/index.ts`:
- Around line 61-71: This re-exports QueueStore, CheckpointStore, and Storage at
the package root, which promotes exploratory adapter shapes to the public semver
surface; instead remove these three types from the root export list and surface
them under an explicit experimental/deep import instead (e.g., export them from
an `experimental` namespace or from a deeper module path), leaving only the
intended portability contract (e.g., processOne and public runtime types) at the
top level; update any internal references to QueueStore/CheckpointStore/Storage
to import from the new experimental/deep path so third-party consumers aren’t
accidentally locked into these adapter shapes.
In `@packages/durably/src/worker.ts`:
- Around line 70-87: The stop() async method can return undefined when there is
no in-flight work; change it to explicitly return a resolved Promise for
clarity: after setting running = false and clearing pollingTimeout, if inFlight
is false return Promise.resolve(); keep the existing branch that sets
stopResolver and returns a new Promise when inFlight is true. Update the stop()
implementation to reference the same symbols (running, pollingTimeout, inFlight,
stopResolver) and return Promise.resolve() in the no-inFlight case.
In `@packages/durably/tests/browser/db-stress.test.ts`:
- Around line 1-41: The test duplicates shared stress-test logic; update the
shared harness to accept a browser dialect factory: modify createDbStressTests
to take a parameter like browserDialectFactory (or a generic dialectFactory) and
use createBrowserDialectForName when invoking it from the browser test, then
replace the duplicated setup in db-stress.test.ts with a call to
createDbStressTests(dialectFactory) so the browser-specific dialect is injected
while preserving any browser-specific tweaks; reference createDbStressTests,
createBrowserDialectForName, and the runtimes/migrate/deleteFrom setup to locate
and consolidate the logic.
In `@packages/durably/tests/browser/singleton-warning.test.ts`:
- Around line 8-13: The afterEach cleanup currently uses Promise.all for
runtimes.map(runtime => runtime.stop()) and runtimes.map(runtime =>
runtime.db.destroy()), which aborts remaining cleanup if one promise rejects;
change both to Promise.allSettled so every runtime's stop() and db.destroy() is
attempted, then inspect the settled results and log or rethrow any failures as
appropriate (use the runtimes variable and the afterEach hook, and keep
vi.restoreAllMocks() as-is) — this ensures afterEach always tries to clean up
all runtimes even if some stop()/db.destroy() calls fail.
In `@packages/durably/tests/node/db-stress.libsql.test.ts`:
- Around line 7-13: The libsql stress test creates a temp DB file but doesn't
remove it; update the object returned from createDbStressTests('libsql', ...) to
include a cleanup function that deletes the temp filePath (the one built with
tmpdir() and randomUUID()) after tests complete. Implement cleanup that checks
for the file's existence and removes it (e.g., fs.unlinkSync or fs.rmSync with
force) and export it alongside createDialect (refer to createDialect and
createNodeDialectForFile) so the test harness can call it.
In `@packages/durably/tests/node/phase1-exploration.test.ts`:
- Around line 7-11: The afterEach teardown currently stops all runtimes in the
shared runtimes array but never clears it, causing duplicate stop attempts
across tests; update the afterEach handler (the runtimes array and the async ()
=> runtimes.map(runtime => runtime.stop()) block) to both await stopping all
runtimes and then clear the runtimes array (e.g., set runtimes.length = 0 or
splice(0)) so each test’s cleanup only affects that test’s entries.
In `@packages/durably/tests/node/process-until-idle.serverless.test.ts`:
- Around line 72-82: The test 'returns quickly when there is no claimable work'
calls runtime.migrate() and then runtime.processUntilIdle() without registering
any jobs, which is intentional but not documented; add an inline comment inside
this test (near the runtime.migrate() or before the processUntilIdle() call)
stating that the absence of registered jobs is deliberate and the test is
verifying empty-queue behavior for processUntilIdle(), referencing the test name
and the runtime.processUntilIdle() invocation so future readers understand this
is not a missing setup.
In `@packages/durably/tests/node/stale-owner.e2e.test.ts`:
- Around line 29-38: The test uses very short leases in createDurably (runtimeA
and runtimeB with leaseMs: 25) which makes CI flaky; update the lease
configuration to a more robust value (e.g., set leaseMs to 100 and keep
leaseRenewIntervalMs proportional) and either increase the fixed sleep times
accordingly or replace fixed delays with a polling assertion (e.g., vi.waitFor)
to wait for the lease to expire/owner change; locate the createDurably calls and
the corresponding test waits/sleeps and apply the increased leaseMs or switch
the sleeps to polling checks that verify lease expiration/owner change.
In `@packages/durably/tests/shared/db-concurrency.shared.ts`:
- Around line 202-267: The test creates its own runtimeA/runtimeB but leaves the
shared runtimes array from beforeEach alive, risking leaked DB connections;
either reuse the existing runtimes by registering concurrencyJob on runtimes[0]
and runtimes[1] (instead of createDurably) or explicitly destroy the shared
runtimes early in this test (call await runtimes.forEach(async r =>
r.db.destroy()/r.destroy()) or similar) before creating runtimeA/runtimeB so
only the intended instances remain active; update the test to use the chosen
approach and ensure cleanup (runtimeA/runtimeB and any shared runtimes) so no
idle connections persist.
In `@packages/durably/tests/shared/db-stress.shared.ts`:
- Around line 20-31: Test setup calls runtimes[0].migrate() implicitly assuming
a shared DB; either migrate every runtime or document the single-migration
assumption. Fix by replacing the single call to runtimes[0].migrate() in the
beforeEach (where runtimes is created via createDurably) with a loop that awaits
migrate() on each runtime, or add an explicit comment in the beforeEach noting
that only runtimes[0] is migrated because all runtimes share the same underlying
database (file-backed SQLite or shared PostgreSQL schema).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b2191ec1-5e61-4adf-811f-13d4bc6cde72
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (53)
docker-compose.postgres.ymldocs/rfcs/runtime-rearchitecture/README.mddocs/rfcs/runtime-rearchitecture/core-runtime.mddocs/rfcs/runtime-rearchitecture/database-claim-patterns.mddocs/rfcs/runtime-rearchitecture/database-runtime-fit.mddocs/rfcs/runtime-rearchitecture/deployment-database-matrix.mddocs/rfcs/runtime-rearchitecture/deployment-models.mddocs/rfcs/runtime-rearchitecture/ja/README.mddocs/rfcs/runtime-rearchitecture/ja/ambient-agent-concepts-ja.mddocs/rfcs/runtime-rearchitecture/ja/core-runtime.mddocs/rfcs/runtime-rearchitecture/ja/database-adapter-sketches-ja.mddocs/rfcs/runtime-rearchitecture/ja/database-claim-patterns-ja.mddocs/rfcs/runtime-rearchitecture/ja/database-runtime-fit-ja.mddocs/rfcs/runtime-rearchitecture/ja/deployment-models-ja.mddocs/rfcs/runtime-rearchitecture/phase1-exploration-notes.mdpackage.jsonpackages/durably/package.jsonpackages/durably/src/context.tspackages/durably/src/durably.tspackages/durably/src/errors.tspackages/durably/src/events.tspackages/durably/src/index.tspackages/durably/src/job.tspackages/durably/src/migrations.tspackages/durably/src/plugins/log-persistence.tspackages/durably/src/schema.tspackages/durably/src/server.tspackages/durably/src/storage.tspackages/durably/src/worker.tspackages/durably/tests/browser/db-semantics.test.tspackages/durably/tests/browser/db-stress.test.tspackages/durably/tests/browser/singleton-warning.test.tspackages/durably/tests/helpers/browser-dialect.tspackages/durably/tests/helpers/local-sqlite-dialect.tspackages/durably/tests/helpers/node-dialect.tspackages/durably/tests/helpers/postgres-dialect.tspackages/durably/tests/node/db-concurrency.libsql.test.tspackages/durably/tests/node/db-concurrency.local-sqlite.test.tspackages/durably/tests/node/db-concurrency.postgres.test.tspackages/durably/tests/node/db-semantics.local-sqlite.test.tspackages/durably/tests/node/db-semantics.postgres.test.tspackages/durably/tests/node/db-semantics.test.tspackages/durably/tests/node/db-stress.libsql.test.tspackages/durably/tests/node/db-stress.local-sqlite.test.tspackages/durably/tests/node/db-stress.postgres.test.tspackages/durably/tests/node/migration-clean-break.test.tspackages/durably/tests/node/phase1-exploration.test.tspackages/durably/tests/node/process-until-idle.serverless.test.tspackages/durably/tests/node/stale-owner.e2e.test.tspackages/durably/tests/shared/db-concurrency.shared.tspackages/durably/tests/shared/db-semantics.shared.tspackages/durably/tests/shared/db-stress.shared.tspackages/durably/tests/shared/migrate.shared.ts
Update existing tests, React package, examples, and docs to use 'leased' status consistently as defined in the runtime rearchitecture RFC. Remove the ClientRunStatus mapping layer that converted leased back to running at the API boundary — the clean-break approach exposes leased directly to all consumers. Key changes: - Legacy tests: running → leased, heartbeatAt → leaseExpiresAt, run:start → run:leased, cleanupSteps default change - React hooks: RunStatus uses leased, isRunning → isLeased, run:start events → run:leased, MappedTypedRun removed - Server: SSE emits run:leased, filter no longer maps running - Examples and docs updated accordingly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean break from legacy runtime: remove RunStartEvent type, heartbeatAt field, and all remaining run:start / running references across core, tests, React package, examples, and docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace run:start → run:leased, running → leased, heartbeatInterval → leaseInterval, heartbeatAt → leaseExpiresAt, isRunning → isLeased, RunStartEvent → RunLeasedEvent across all website and package docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove old option names: pollingInterval, heartbeatInterval, staleThreshold, cleanupSteps (use pollingIntervalMs, leaseRenewIntervalMs, leaseMs, preserveSteps) - Remove redundant `store` property from Durably interface (use `storage.queue` / `storage.checkpoint` instead) - Remove stale `running` from VALID_STATUSES, restore satisfies guard - Use existing getErrorMessage() helper instead of inline pattern - Use storage.deleteRun() instead of duplicating checkpoint+queue calls - Remove unnecessary spread in toClientRun() - Default preserveSteps to false (matches old cleanupSteps: true default) - Update all tests, examples, and docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (15)
examples/fullstack-react-router/app/routes/_index/dashboard.tsx (1)
90-96:⚠️ Potential issue | 🟡 MinorMissing
leasedstatus instatusClasses.The status checks on lines 221 and 232 now use
'leased'instead of'running', butstatusClassesstill defines styling for'running'without a corresponding entry for'leased'. This causes runs with status'leased'to fall back to the default gray styling on line 161 instead of the intended blue styling.🎨 Proposed fix to add leased status styling
const statusClasses: Record<string, string> = { pending: 'bg-yellow-100 text-yellow-800', - running: 'bg-blue-100 text-blue-800', + leased: 'bg-blue-100 text-blue-800', completed: 'bg-green-100 text-green-800', failed: 'bg-red-100 text-red-800', cancelled: 'bg-gray-100 text-gray-800', }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/fullstack-react-router/app/routes/_index/dashboard.tsx` around lines 90 - 96, statusClasses is missing an entry for the 'leased' run status, so UI falls back to the default gray; add a 'leased' key to the statusClasses Record (next to the existing 'running' entry) with the same styling values used for 'running' (e.g., blue classes) so items with status === 'leased' render the intended blue styling; update the constant named statusClasses in dashboard.tsx accordingly.examples/spa-vite-react/src/components/dashboard.tsx (1)
214-225:⚠️ Potential issue | 🟡 MinorComplete the lease-status migration in the badge map.
These action guards now treat
leasedas the active state, butstatusClassesin the same component still only stylesrunning. As a result, leased runs will render with the fallback gray badge instead of the active-state styling. Please add aleasedentry there, and removerunningif it is no longer emitted.Suggested follow-up
const statusClasses: Record<string, string> = { pending: 'bg-yellow-100 text-yellow-800', - running: 'bg-blue-100 text-blue-800', + leased: 'bg-blue-100 text-blue-800', completed: 'bg-green-100 text-green-800', failed: 'bg-red-100 text-red-800', cancelled: 'bg-gray-100 text-gray-800', }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/spa-vite-react/src/components/dashboard.tsx` around lines 214 - 225, Update the badge map `statusClasses` in the dashboard component to include a `leased` key with the same active-state styling used for `pending`/active badges and remove the `running` key if your backend no longer emits it; locate the `statusClasses` object in the same file (e.g., near the badge rendering logic and where `run.status` is checked) and add an entry for 'leased' mirroring the active classes, then delete the 'running' entry so leased runs display the correct active badge color.packages/durably-react/src/hooks/use-auto-resume.ts (1)
7-8:⚠️ Potential issue | 🟡 MinorUpdate the public docs to say
leasedinstead ofrunning.These comments still describe auto-resume as tracking
pending/runningruns, but the hook now resumespending/leasedruns. Keeping the JSDoc in sync matters here because this is exported API surface and shows up in editor hints/generated docs.Also applies to: 29-30
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably-react/src/hooks/use-auto-resume.ts` around lines 7 - 8, Update the JSDoc comments in use-auto-resume.ts to describe that the hook auto-resumes "pending/leased" runs rather than "pending/running": locate the JSDoc for the useAutoResume hook (and the related option description around the top of the file and the other occurrence noted) and replace the word "running" with "leased" so the exported API docs and editor hints reflect the current behavior.website/api/durably-react/spa.md (1)
113-129:⚠️ Potential issue | 🟡 MinorDisable the example button for pending runs too.
With the new lease model,
isLeasedno longer covers the queued/pending window. This example still allows duplicate triggers until the worker acquires the lease, even thoughisPendingis already available in scope.Suggested doc tweak
- <button onClick={handleClick} disabled={isLeased}> + <button onClick={handleClick} disabled={isPending || isLeased}> Run </button>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/api/durably-react/spa.md` around lines 113 - 129, The button's disabled prop should also account for pending runs so users can't trigger duplicates before a lease is acquired; update the component that uses useJob (references: useJob, isLeased, isPending, trigger, handleClick) to disable the button when either isLeased or isPending is true (i.e., combine both flags in the disabled condition).examples/spa-react-router/app/routes/_index/dashboard.tsx (1)
89-95:⚠️ Potential issue | 🟡 MinorAdd
leasedtostatusClassesfor consistent styling.The
statusClassesmap includesrunningbut notleased. Since the PR migrates to lease-based semantics and line 217 checks forrun.status === 'leased', leased runs will fall back to the gray styling at line 158. Add theleasedkey for consistent blue styling:🎨 Proposed fix
const statusClasses: Record<string, string> = { pending: 'bg-yellow-100 text-yellow-800', - running: 'bg-blue-100 text-blue-800', + leased: 'bg-blue-100 text-blue-800', completed: 'bg-green-100 text-green-800', failed: 'bg-red-100 text-red-800', cancelled: 'bg-gray-100 text-gray-800', }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/spa-react-router/app/routes/_index/dashboard.tsx` around lines 89 - 95, The statusClasses map is missing a key for 'leased', causing run.status === 'leased' to fall back to gray; update the statusClasses Record (variable name: statusClasses) to include a 'leased' entry using the same classes as 'running' (the blue styles) so leased runs render consistently with running runs referenced by checks like run.status === 'leased'.packages/durably/docs/llms.md (3)
32-38:⚠️ Potential issue | 🟡 MinorUse
cleanupStepshere, notpreserveSteps.This example points readers at the wrong option name/behavior. The documented config for terminal-state step cleanup is
cleanupSteps, sopreserveSteps: trueis misleading.As per coding guidelines: "Create steps via `step.run()`, ensuring each step's success state and return value is persisted. Step output data is deleted when runs reach terminal state (controlled by `cleanupSteps` configuration)."Suggested doc fix
- preserveSteps: true, // Keep step output data on terminal state (default: true) + cleanupSteps: true, // Delete step output data on terminal state (default: true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/docs/llms.md` around lines 32 - 38, The example uses the wrong config key: update the createDurably config shown (the block around createDurably) to use cleanupSteps instead of preserveSteps and set its boolean meaning correctly (cleanupSteps: false to preserve step outputs, true to delete them at terminal state); ensure any explanatory text around that example references cleanupSteps and matches the docs about step.run() persistence and terminal-state cleanup.
528-541:⚠️ Potential issue | 🟡 MinorThe
Runtype here still omits the lease fields.Later in this doc,
ClientRunis described as strippingleaseOwnerandleaseExpiresAt, but the publicRundefinition here exposes neither. That makes the type section internally inconsistent.Based on learnings: Update
packages/durably/docs/llms.mdto keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked towebsite/public/llms.txtfor web access.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/docs/llms.md` around lines 528 - 541, The Run type in the docs is missing the lease fields and is inconsistent with the later ClientRun discussion; update the Run interface in packages/durably/docs/llms.md to include the leaseOwner and leaseExpiresAt fields (with appropriate types, e.g., string | null for leaseOwner and string | null for leaseExpiresAt or Date-compatible strings) and ensure ClientRun is described as stripping those two fields; reference the Run and ClientRun symbols so the doc matches the actual API surface exported by the code.
234-244:⚠️ Potential issue | 🟡 MinorAdd
run:lease-renewedto the event examples.The lease-based lifecycle here jumps from
run:leasedstraight to terminal events, but the PR also adds a lease-renewal event. Please include it so this LLM doc reflects the full public event surface.Based on learnings: Update
packages/durably/docs/llms.mdto keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked towebsite/public/llms.txtfor web access.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/docs/llms.md` around lines 234 - 244, Add the missing lease-renewal event example to the lifecycle event list: after the existing durably.on('run:leased', ...) entry add a durably.on('run:lease-renewed', (e) => ...) example that logs the runId and any lease metadata (mirroring the style of run:leased), so the docs include the new public event run:lease-renewed and remain in sync with the API.website/api/create-durably.md (2)
207-246:⚠️ Potential issue | 🟡 Minor
Runis missing the companionleaseOwnerfield.This page documents
leaseExpiresAt, but not theleaseOwnerfield introduced with leased runs. That leaves thegetRun()/getRuns()shape incomplete.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/api/create-durably.md` around lines 207 - 246, The Run interface/docs currently document leaseExpiresAt but omit the companion leaseOwner field added for leased runs; update the Run type declaration and the accompanying table to include leaseOwner: string | null (ISO identifier or agent id) and mention it in the description, and ensure any examples/consumers like getRun()/getRuns() references reflect the new leaseOwner property so the shape is complete and consistent with leased-run behavior.
50-149:⚠️ Potential issue | 🟡 MinorDocument
processOne()andprocessUntilIdle()on this page.The method list still jumps from
start()/stop()to the registration/query APIs, so the new manual processing APIs introduced in this PR are invisible to consumers.Based on learnings: Update
packages/durably/docs/llms.mdto keep it in sync whenever API changes are made. This file is bundled in the npm package and symlinked towebsite/public/llms.txtfor web access.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/api/create-durably.md` around lines 50 - 149, The API docs are missing the new manual processing methods processOne() and processUntilIdle(); add short entries for both (signatures, brief description of behavior, return types, and when to use) into the create-durably API page so they appear in the method list alongside start()/stop()/register(), and also update the packaged LLM docs file (llms.md that is bundled into the npm package and mirrored for the website) to reflect the new APIs so consumers installing the package or visiting the site see the changes; reference the exact symbols processOne and processUntilIdle when adding the docs to ensure discoverability.CLAUDE.md (1)
33-53:⚠️ Potential issue | 🟡 MinorFinish the lease-model update in this doc.
Line 33 was renamed, but the sections below still describe
running,heartbeat_at,heartbeatInterval, and heartbeat recovery. That leavesCLAUDE.mdinternally inconsistent with the new lease-based runtime and can send future edits back toward the old model.Based on learnings, "Use the doc-check skill (located at
.claude/skills/doc-check/) as a documentation update checklist to verify API changes are properly documented."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 33 - 53, Update CLAUDE.md to finish the lease-model migration: replace references to the old "running"/"heartbeat_at"/"heartbeatInterval"/"heartbeat recovery" semantics with the lease-based terminology (e.g., lease, lease_expiry, lease_interval, lease_recovery) and describe how leases are acquired, renewed, and recovered instead of heartbeats; remove or reword any assumptions about `running` state transitions tied to heartbeat timestamps and update examples and defaults (previously `heartbeatInterval`, `staleThreshold`) to the new lease defaults and behavior; finally run the doc-check checklist in .claude/skills/doc-check/ to verify all API names and behavior changes are documented consistently.website/api/events.md (1)
287-299:⚠️ Potential issue | 🟡 MinorInclude
run:deletein the documentedDurablyEventunion.This page already documents
run:deleteabove, but the union here still omits the corresponding event type. Readers copying this type block won't see the full public event surface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/api/events.md` around lines 287 - 299, The DurablyEvent union is missing the run:delete variant; update the union declaration to include the RunDeleteEvent variant (i.e., add "| RunDeleteEvent") so the documented public event surface matches the earlier run:delete documentation; ensure the RunDeleteEvent type name exactly matches other event type names used in this file (e.g., RunTriggerEvent, RunCompleteEvent) so tooling and consumers pick it up.packages/durably-react/src/client/use-job.ts (1)
176-208:⚠️ Potential issue | 🟠 MajorDon't let
followLatestoverride an explicitinitialRunId.
autoResumealready treatsinitialRunIdas a pinned run, but this job-level SSE subscription still switches away as soon as any newerrun:triggerorrun:leasedarrives for the same job. That breaks the reconnection scenario described by the option docs.Suggested fix
useEffect(() => { - if (!followLatest) return + if (!followLatest || initialRunId) return const params = new URLSearchParams({ jobName }) const eventSource = new EventSource(`${api}/runs/subscribe?${params}`) @@ return () => { eventSource.close() } - }, [api, jobName, followLatest]) + }, [api, jobName, followLatest, initialRunId])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably-react/src/client/use-job.ts` around lines 176 - 208, The SSE handler in the useEffect that follows latest runs updates currentRunId on any run:trigger/run:leased event and thus overrides an explicitly provided initialRunId; modify the eventSource.onmessage logic in the useEffect (the block that parses event.data and calls setCurrentRunId) to skip calling setCurrentRunId when an explicit initialRunId is set/pinned (i.e., only update from SSE if initialRunId is falsy or the component is not pinned to that run), ensuring followLatest does not override the pinned initialRunId.packages/durably-react/src/client/use-job-run.ts (1)
94-123:⚠️ Potential issue | 🟠 MajorScope callback transition tracking to the current
runId.
prevStatusRefsurvives across runs, so switching from a completed/failed run to a newrunIdcompares the new status against the previous run's terminal state. In that caseonStartcan be skipped for the new run.Suggested fix
- const prevStatusRef = useRef<RunStatus | null>(null) + const prevRef = useRef<{ runId: string | null; status: RunStatus | null }>({ + runId: null, + status: null, + }) useEffect(() => { - const prevStatus = prevStatusRef.current - prevStatusRef.current = effectiveStatus + const prevStatus = + prevRef.current.runId === runId ? prevRef.current.status : null + prevRef.current = { runId, status: effectiveStatus } @@ }, [ + runId, effectiveStatus, isPending, isLeased,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably-react/src/client/use-job-run.ts` around lines 94 - 123, prevStatusRef currently persists across different runId values causing transitions to compare against the previous run's terminal state; modify the useEffect (or add a small preceding effect) to detect runId changes and reset prevStatusRef.current to null when runId has changed (e.g., track prevRunIdRef and if prevRunIdRef.current !== runId then set prevStatusRef.current = null and prevRunIdRef.current = runId) so that onStart/onComplete/onFail callbacks (referenced in the same useEffect) are evaluated only against the current run's status transitions using effectiveStatus.packages/durably/src/storage.ts (1)
402-445:⚠️ Potential issue | 🟠 MajorDeduplicate
(jobName, idempotencyKey)inside oneenqueueMany()call.Line 405 only looks in
durably_runs, so a later item in the same batch won't see an earlier staged insert. That means two identical keys in one batch are both treated as new, and Line 442'screated_at === nowsentinel can still batch-insert both. Track seen keys and newly created rows explicitly inside the transaction instead of inferring "newness" from timestamps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 402 - 445, The batch enqueue logic currently only checks existing rows in durably_runs via trx and then infers new rows by comparing created_at to now, which misses duplicate (jobName,idempotencyKey) pairs within the same inputs array; update the loop that processes inputs (the block creating runs, using variables inputs, runs, ulid, now, and trx) to maintain an in-transaction Set or Map of seen keys (e.g., `${input.jobName}|${input.idempotencyKey ?? ''}`) and skip creating a new run when that key has already been processed in this call; instead of relying on created_at === now to detect new rows, explicitly mark newly-created run objects with a flag (e.g., isNew: true) and then use runs.filter(r => r.isNew) when building newRuns before calling trx.insertInto('durably_runs').values(...).execute(), ensuring duplicates in the same enqueueMany() are deduplicated and only one insert occurs.
♻️ Duplicate comments (5)
packages/durably/src/durably.ts (2)
466-494:⚠️ Potential issue | 🟠 MajorReschedule the local deadline from the renewed lease, not
Date.now().Lines 482-486 recompute the next deadline after the renewal roundtrip. That gives the local worker extra time past the persisted
lease_expires_at, so another worker can reclaim the run while this instance still thinks it owns it. Return the renewed expiry fromqueue.renewLease()or, at minimum, derive it from the samenowsent to storage.⏱️ Minimal correction
- const renewedLeaseExpiresAt = new Date( - Date.now() + state.leaseMs, - ).toISOString() + const renewedLeaseExpiresAt = new Date( + Date.parse(now) + state.leaseMs, + ).toISOString()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/durably.ts` around lines 466 - 494, The local lease deadline is being rescheduled using Date.now() after a successful queue.renewLease call, which can extend local ownership past the persisted lease_expires_at; update the logic in the lease renewal handler (the setInterval callback around leaseTimer that calls queue.renewLease for run.id and workerId) to use the renewed expiry returned by queue.renewLease (or if renewLease cannot return it, compute the expiry from the same now value passed into queue.renewLease by using now + state.leaseMs) when calling scheduleLeaseDeadline and when constructing renewedLeaseExpiresAt and the run:lease-renewed event payload.
573-581:⚠️ Potential issue | 🔴 CriticalOnly purge checkpoints after the run definitely reached a terminal state.
Lines 579-580 still run after
LeaseLostErrorand aftercompleteRun()/failRun()returnfalse, which wipes step data from runs another worker may immediately reclaim. Gate cleanup behind a terminal-state flag instead of the unconditionalfinallypath. As per coding guidelines, "Step output data is deleted when runs reach terminal state (controlled bycleanupStepsconfiguration)."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/durably.ts` around lines 573 - 581, The cleanup currently deletes step data in the finally block unconditionally (clearInterval(leaseTimer), clearTimeout(leaseDeadlineTimer), dispose(), then if (!state.preserveSteps) await checkpoint.deleteSteps(run.id)), which can run after LeaseLostError or when completeRun()/failRun() returned false and cause other workers to lose step data; change this so that step deletion only happens after you've confirmed the run reached a terminal state (e.g., set a runIsTerminal flag when completeRun() or failRun() succeed) and also respect the cleanupSteps/preserveSteps configuration, then move the await checkpoint.deleteSteps(run.id) behind a conditional that checks runIsTerminal && !state.preserveSteps (or the cleanupSteps flag) so deletion never runs from the unconditional finally path or after LeaseLostError.packages/durably/src/storage.ts (3)
758-769:⚠️ Potential issue | 🔴 CriticalMake cancellation an atomic state transition.
Line 758 rewrites whatever row matches
id, sodurably.cancel()can clobber a run that completed after its pre-read and still emitrun:cancel. Restrict the write to non-terminal states and surface whether the update actually won.🔒 Suggested shape for the fix
- await db + const result = await db .updateTable('durably_runs') .set({ status: 'cancelled', lease_owner: null, lease_expires_at: null, completed_at: now, updated_at: now, }) .where('id', '=', runId) - .execute() + .where('status', 'in', ['pending', 'leased']) + .executeTakeFirst() + + if (Number(result.numUpdatedRows) === 0) { + throw new Error(`Cannot cancel terminal run: ${runId}`) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 758 - 769, The cancelRun update must be made an atomic conditional write: in cancelRun, restrict the UPDATE on the durably_runs row to only apply when status is non-terminal (e.g., NOT IN ['completed','cancelled','failed'] or equivalent terminal states used elsewhere) by adding that condition to the .where chain, then inspect the execute() result to determine whether any row was actually updated and surface that outcome (change the signature to return a boolean or throw an error if the update did not win) so callers (e.g., durably.cancel) can know whether cancellation succeeded.
488-511:⚠️ Potential issue | 🟠 Major
getRuns()still hardcodes SQLite SQL in the PostgreSQL path.Lines 494-497 use
json_extract(...), and Lines 508-510 useLIMIT -1for offset-only pagination. Both are SQLite-specific, sogetRuns({ labels })andgetRuns({ offset })will break oncebackend === 'postgres'. Route both fragments through a backend-aware helper instead of emitting SQLite SQL unconditionally.In PostgreSQL, is `json_extract(...)` available, and is `LIMIT -1` valid SQL for "no limit" when used with OFFSET?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 488 - 511, getRuns currently emits SQLite-only SQL: the json_extract(...) expression and the use of limit(-1) for offset-only pagination will break for backend === 'postgres'. Replace the inline SQLite fragments by calling backend-aware helpers from getRuns (the block that iterates over filter.labels and the block that applies filter.offset without filter.limit): create a helper (e.g., buildJsonLabelExpr or getLabelJsonExpr) that given a label key returns the correct SQL expression for durably_runs.labels for the current backend (Postgres should use the JSON operator/extraction appropriate for text equality), and create a helper (e.g., applyOffsetWithOptionalLimit) that emits a proper "no limit" behavior for Postgres instead of LIMIT -1; then update the code in getRuns (the loop handling labels and the offset/limit logic) to use those helpers rather than hardcoded json_extract(...) or limit(-1).
671-755:⚠️ Potential issue | 🔴 CriticalReject stale owners in all lease-bound updates.
Lines 684-687, 726-729, and 750-753 only verify
statusandlease_owner. A suspended worker can wake up afterlease_expires_atand still renew or finalize the run until another process rewrites the row. Addlease_expires_at IS NOT NULLpluslease_expires_at > now/completedAtto eachUPDATEso stale owners lose at the storage boundary. As per coding guidelines, "Background tab interruptions are handled via heartbeat recovery."🛡️ Minimal guard to add
const result = await db .updateTable('durably_runs') .set({ lease_expires_at: leaseExpiresAt, updated_at: now, }) .where('id', '=', runId) .where('status', '=', 'leased') .where('lease_owner', '=', workerId) + .where('lease_expires_at', 'is not', null) + .where('lease_expires_at', '>', now) .executeTakeFirst()Apply the same expiry check to
completeRun()andfailRun(), usingcompletedAtas the comparison value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 671 - 755, The lease-bound updates currently only check status and lease_owner allowing a stale worker (one whose lease has already expired) to renew or finalize a run; update renewLease(runId, workerId, now, leaseMs) to also require lease_expires_at IS NOT NULL and lease_expires_at > now, and update completeRun(runId, workerId, output, completedAt) and failRun(runId, workerId, error, completedAt) to also require lease_expires_at IS NOT NULL and lease_expires_at > completedAt in their WHERE clauses so that expired leases cannot be renewed or used to complete/fail runs at the storage boundary.
🧹 Nitpick comments (2)
packages/durably/tests/shared/storage.shared.ts (1)
358-375: Assert the lease fields on claimed runs.These tests cover the status rename, but a broken claim path that leaves
leaseOwner/leaseExpiresAtunset would still pass. Since reclaim and stale-owner handling depend on those fields, it’s worth checking them here on both the returned run and the DB readback.💡 Suggested assertion shape
expect(claimed).not.toBeNull() expect(claimed!.id).toBe(created.id) expect(claimed!.status).toBe('leased') + expect(claimed!.leaseOwner).toBeTruthy() + expect(claimed!.leaseExpiresAt).toBeTruthy() expect(claimed!.startedAt).not.toBeNull() expect(claimed!.stepCount).toBe(0) // Verify run is now leased in DB const run = await durably.storage.getRun(created.id) expect(run!.status).toBe('leased') + expect(run!.leaseOwner).toBe(claimed!.leaseOwner) + expect(run!.leaseExpiresAt).toBe(claimed!.leaseExpiresAt)Also applies to: 394-400, 434-440
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/storage.shared.ts` around lines 358 - 375, Add assertions that the claim path sets lease fields: after calling durably.storage.claimNextPendingRun() assert that claimed.leaseOwner is a non-empty string and claimed.leaseExpiresAt is not null (and optionally > claimed.startedAt or > Date.now() for validity), and repeat the same checks on the DB readback from durably.storage.getRun(created.id); update the test case that calls claimNextPendingRun and getRun (references: claimNextPendingRun, getRun, claimed, created) to include these leaseOwner and leaseExpiresAt assertions so broken claim paths that omit lease metadata will fail the test.packages/durably/tests/node/core-extensions.test.ts (1)
94-111: Assert the leased-event payload, not just the event name.This still passes if
run:leasedis emitted without the new lease metadata. SinceleaseOwner/leaseExpiresAtare part of the lease-based contract, this public-stream test should pin them down too.💡 Suggested assertion shape
- expect(events.some((e) => e.type === 'run:leased')).toBe(true) + const leasedEvent = events.find((e) => e.type === 'run:leased') + expect(leasedEvent).toBeDefined() + expect(leasedEvent).toMatchObject({ + type: 'run:leased', + leaseOwner: expect.any(String), + leaseExpiresAt: expect.any(String), + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/node/core-extensions.test.ts` around lines 94 - 111, Update the test that subscribes to the run stream and not only check for the 'run:leased' event name but also assert the event's payload contains the lease metadata: locate the subscription logic around durably.subscribe(run.id) / events array and after finding the event with type === 'run:leased' assert that its payload includes non-empty leaseOwner and a valid leaseExpiresAt (e.g., numeric timestamp or ISO string) properties (for example by using events.find(e => e.type === 'run:leased') and checking .payload.leaseOwner and .payload.leaseExpiresAt are present and of the expected type).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/server-node/basic.ts`:
- Line 52: Update the example output label to match the lifecycle term used in
the filter: change the console.log that prints `Running: ${runs.filter((r) =>
r.status === 'leased').length}` so the visible label reads `Leased:` instead of
`Running:`; locate the console.log call that references runs.filter((r) =>
r.status === 'leased') and update the string literal accordingly.
In `@packages/durably-react/src/hooks/use-job-run.ts`:
- Around line 36-39: Update the JSDoc for the isLeased property in the useJobRun
hook to use lease terminology instead of "running": change the comment above
isLeased (in the useJobRun definition) to something like "Whether the run
currently holds a lease" or "Whether the run is currently leased" so public docs
and IDE hover text reflect the lease lifecycle; ensure you update only the
comment for the isLeased property (not the property name) and keep wording
concise and consistent with other lease-related docs.
In `@packages/durably-react/src/hooks/use-job.ts`:
- Around line 58-60: Update the exported JSDoc for the isLeased property so the
description reflects its new meaning (leased state) rather than "currently
running"; find the JSDoc comment attached to the exported isLeased property in
use-job (the exported type/interface that defines isLeased) and change the text
to something like "Whether the run is currently leased" or "Whether this run is
currently leased by a worker" so IntelliSense and generated docs match the new
API.
In `@packages/durably-react/src/shared/sse-event-subscriber.ts`:
- Around line 22-25: The SSE handler currently collapses "run:leased" to a bare
marker and omits lease renewals; update the switch in sse-event-subscriber so
that when data.type === 'run:leased' you forward the lease metadata (e.g.,
leaseOwner, leaseExpiresAt) to onEvent instead of emitting a bare marker, and
add a new case for 'run:lease-renewed' that also forwards the renewal payload to
onEvent; locate the logic around data.type, onEvent, and ensure both
'run:leased' and 'run:lease-renewed' pass the full event object (including
leaseOwner and leaseExpiresAt) through to the React subscription layer.
In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 40-52: The test captures initialLeaseExpiresAt from a pending run
(run.leaseExpiresAt is null) so the current comparison against
(initialLeaseExpiresAt ?? 0) is meaningless; change the test to capture the
pre-renewal lease timestamp after the run transitions to 'leased' (i.e., call
d.start(), await the first leased state via d.jobs.job.getRun(run.id) and assert
status === 'leased', then record initialLeaseExpiresAt from that leased midRun),
then wait and fetch a laterRun (e.g., another d.jobs.job.getRun(run.id)) and
assert that laterRun.leaseExpiresAt is greater than the recorded
initialLeaseExpiresAt; update references to run, initialLeaseExpiresAt,
midRun/getRun accordingly.
In `@website/api/create-durably.md`:
- Around line 36-44: Update the `dialect` table row in
website/api/create-durably.md to be backend-agnostic by changing its Description
from “Kysely SQLite dialect” to either “Kysely dialect” or enumerate supported
backends (e.g., “Kysely dialect — supported backends: SQLite, PostgreSQL,
libSQL, SQLocal”); locate the `dialect` entry in the options table and replace
the wording accordingly so it reflects the multi-backend support added in this
PR.
In `@website/api/durably-react/fullstack.md`:
- Around line 112-114: The dashboard snippet is inconsistent: it only shows the
Cancel button for run.status === 'leased' while useRunActions and the return
table indicate pending runs are cancellable; update the conditional rendering
around the Cancel button (where cancel is called and run.status is checked) to
allow both 'leased' and 'pending' (e.g., check run.status === 'leased' ||
run.status === 'pending') so the UI, the example in useRunActions, and the
return table behavior match.
- Around line 69-74: In CsvImporter replace the invalid placeholder rows: [...]
passed to trigger (from durably.importCsv.useJob()) with valid TSX/JS – e.g.
pass an actual empty array or a previously defined variable (like rows or
csvRows) instead of the literal ellipsis; update the onClick to call trigger({
rows: [] }) or trigger({ rows: rowsVariable }) so the trigger function receives
a proper array.
In `@website/api/durably-react/types.md`:
- Line 61: The doc comment for the HTTP projection is inconsistent: it lists
only leaseExpiresAt, idempotencyKey, concurrencyKey, and updatedAt as excluded
from the ClientRun type while the actual ClientRun shape also omits leaseOwner;
update the sentence in website/api/durably-react/types.md so the exclusion list
matches the ClientRun shape (include leaseOwner in the list or otherwise
describe “internal fields” to cover leaseOwner) and ensure the term ClientRun is
referenced consistently with the type definition.
---
Outside diff comments:
In `@CLAUDE.md`:
- Around line 33-53: Update CLAUDE.md to finish the lease-model migration:
replace references to the old
"running"/"heartbeat_at"/"heartbeatInterval"/"heartbeat recovery" semantics with
the lease-based terminology (e.g., lease, lease_expiry, lease_interval,
lease_recovery) and describe how leases are acquired, renewed, and recovered
instead of heartbeats; remove or reword any assumptions about `running` state
transitions tied to heartbeat timestamps and update examples and defaults
(previously `heartbeatInterval`, `staleThreshold`) to the new lease defaults and
behavior; finally run the doc-check checklist in .claude/skills/doc-check/ to
verify all API names and behavior changes are documented consistently.
In `@examples/fullstack-react-router/app/routes/_index/dashboard.tsx`:
- Around line 90-96: statusClasses is missing an entry for the 'leased' run
status, so UI falls back to the default gray; add a 'leased' key to the
statusClasses Record (next to the existing 'running' entry) with the same
styling values used for 'running' (e.g., blue classes) so items with status ===
'leased' render the intended blue styling; update the constant named
statusClasses in dashboard.tsx accordingly.
In `@examples/spa-react-router/app/routes/_index/dashboard.tsx`:
- Around line 89-95: The statusClasses map is missing a key for 'leased',
causing run.status === 'leased' to fall back to gray; update the statusClasses
Record (variable name: statusClasses) to include a 'leased' entry using the same
classes as 'running' (the blue styles) so leased runs render consistently with
running runs referenced by checks like run.status === 'leased'.
In `@examples/spa-vite-react/src/components/dashboard.tsx`:
- Around line 214-225: Update the badge map `statusClasses` in the dashboard
component to include a `leased` key with the same active-state styling used for
`pending`/active badges and remove the `running` key if your backend no longer
emits it; locate the `statusClasses` object in the same file (e.g., near the
badge rendering logic and where `run.status` is checked) and add an entry for
'leased' mirroring the active classes, then delete the 'running' entry so leased
runs display the correct active badge color.
In `@packages/durably-react/src/client/use-job-run.ts`:
- Around line 94-123: prevStatusRef currently persists across different runId
values causing transitions to compare against the previous run's terminal state;
modify the useEffect (or add a small preceding effect) to detect runId changes
and reset prevStatusRef.current to null when runId has changed (e.g., track
prevRunIdRef and if prevRunIdRef.current !== runId then set
prevStatusRef.current = null and prevRunIdRef.current = runId) so that
onStart/onComplete/onFail callbacks (referenced in the same useEffect) are
evaluated only against the current run's status transitions using
effectiveStatus.
In `@packages/durably-react/src/client/use-job.ts`:
- Around line 176-208: The SSE handler in the useEffect that follows latest runs
updates currentRunId on any run:trigger/run:leased event and thus overrides an
explicitly provided initialRunId; modify the eventSource.onmessage logic in the
useEffect (the block that parses event.data and calls setCurrentRunId) to skip
calling setCurrentRunId when an explicit initialRunId is set/pinned (i.e., only
update from SSE if initialRunId is falsy or the component is not pinned to that
run), ensuring followLatest does not override the pinned initialRunId.
In `@packages/durably-react/src/hooks/use-auto-resume.ts`:
- Around line 7-8: Update the JSDoc comments in use-auto-resume.ts to describe
that the hook auto-resumes "pending/leased" runs rather than "pending/running":
locate the JSDoc for the useAutoResume hook (and the related option description
around the top of the file and the other occurrence noted) and replace the word
"running" with "leased" so the exported API docs and editor hints reflect the
current behavior.
In `@packages/durably/docs/llms.md`:
- Around line 32-38: The example uses the wrong config key: update the
createDurably config shown (the block around createDurably) to use cleanupSteps
instead of preserveSteps and set its boolean meaning correctly (cleanupSteps:
false to preserve step outputs, true to delete them at terminal state); ensure
any explanatory text around that example references cleanupSteps and matches the
docs about step.run() persistence and terminal-state cleanup.
- Around line 528-541: The Run type in the docs is missing the lease fields and
is inconsistent with the later ClientRun discussion; update the Run interface in
packages/durably/docs/llms.md to include the leaseOwner and leaseExpiresAt
fields (with appropriate types, e.g., string | null for leaseOwner and string |
null for leaseExpiresAt or Date-compatible strings) and ensure ClientRun is
described as stripping those two fields; reference the Run and ClientRun symbols
so the doc matches the actual API surface exported by the code.
- Around line 234-244: Add the missing lease-renewal event example to the
lifecycle event list: after the existing durably.on('run:leased', ...) entry add
a durably.on('run:lease-renewed', (e) => ...) example that logs the runId and
any lease metadata (mirroring the style of run:leased), so the docs include the
new public event run:lease-renewed and remain in sync with the API.
In `@packages/durably/src/storage.ts`:
- Around line 402-445: The batch enqueue logic currently only checks existing
rows in durably_runs via trx and then infers new rows by comparing created_at to
now, which misses duplicate (jobName,idempotencyKey) pairs within the same
inputs array; update the loop that processes inputs (the block creating runs,
using variables inputs, runs, ulid, now, and trx) to maintain an in-transaction
Set or Map of seen keys (e.g., `${input.jobName}|${input.idempotencyKey ?? ''}`)
and skip creating a new run when that key has already been processed in this
call; instead of relying on created_at === now to detect new rows, explicitly
mark newly-created run objects with a flag (e.g., isNew: true) and then use
runs.filter(r => r.isNew) when building newRuns before calling
trx.insertInto('durably_runs').values(...).execute(), ensuring duplicates in the
same enqueueMany() are deduplicated and only one insert occurs.
In `@website/api/create-durably.md`:
- Around line 207-246: The Run interface/docs currently document leaseExpiresAt
but omit the companion leaseOwner field added for leased runs; update the Run
type declaration and the accompanying table to include leaseOwner: string | null
(ISO identifier or agent id) and mention it in the description, and ensure any
examples/consumers like getRun()/getRuns() references reflect the new leaseOwner
property so the shape is complete and consistent with leased-run behavior.
- Around line 50-149: The API docs are missing the new manual processing methods
processOne() and processUntilIdle(); add short entries for both (signatures,
brief description of behavior, return types, and when to use) into the
create-durably API page so they appear in the method list alongside
start()/stop()/register(), and also update the packaged LLM docs file (llms.md
that is bundled into the npm package and mirrored for the website) to reflect
the new APIs so consumers installing the package or visiting the site see the
changes; reference the exact symbols processOne and processUntilIdle when adding
the docs to ensure discoverability.
In `@website/api/durably-react/spa.md`:
- Around line 113-129: The button's disabled prop should also account for
pending runs so users can't trigger duplicates before a lease is acquired;
update the component that uses useJob (references: useJob, isLeased, isPending,
trigger, handleClick) to disable the button when either isLeased or isPending is
true (i.e., combine both flags in the disabled condition).
In `@website/api/events.md`:
- Around line 287-299: The DurablyEvent union is missing the run:delete variant;
update the union declaration to include the RunDeleteEvent variant (i.e., add "|
RunDeleteEvent") so the documented public event surface matches the earlier
run:delete documentation; ensure the RunDeleteEvent type name exactly matches
other event type names used in this file (e.g., RunTriggerEvent,
RunCompleteEvent) so tooling and consumers pick it up.
---
Duplicate comments:
In `@packages/durably/src/durably.ts`:
- Around line 466-494: The local lease deadline is being rescheduled using
Date.now() after a successful queue.renewLease call, which can extend local
ownership past the persisted lease_expires_at; update the logic in the lease
renewal handler (the setInterval callback around leaseTimer that calls
queue.renewLease for run.id and workerId) to use the renewed expiry returned by
queue.renewLease (or if renewLease cannot return it, compute the expiry from the
same now value passed into queue.renewLease by using now + state.leaseMs) when
calling scheduleLeaseDeadline and when constructing renewedLeaseExpiresAt and
the run:lease-renewed event payload.
- Around line 573-581: The cleanup currently deletes step data in the finally
block unconditionally (clearInterval(leaseTimer),
clearTimeout(leaseDeadlineTimer), dispose(), then if (!state.preserveSteps)
await checkpoint.deleteSteps(run.id)), which can run after LeaseLostError or
when completeRun()/failRun() returned false and cause other workers to lose step
data; change this so that step deletion only happens after you've confirmed the
run reached a terminal state (e.g., set a runIsTerminal flag when completeRun()
or failRun() succeed) and also respect the cleanupSteps/preserveSteps
configuration, then move the await checkpoint.deleteSteps(run.id) behind a
conditional that checks runIsTerminal && !state.preserveSteps (or the
cleanupSteps flag) so deletion never runs from the unconditional finally path or
after LeaseLostError.
In `@packages/durably/src/storage.ts`:
- Around line 758-769: The cancelRun update must be made an atomic conditional
write: in cancelRun, restrict the UPDATE on the durably_runs row to only apply
when status is non-terminal (e.g., NOT IN ['completed','cancelled','failed'] or
equivalent terminal states used elsewhere) by adding that condition to the
.where chain, then inspect the execute() result to determine whether any row was
actually updated and surface that outcome (change the signature to return a
boolean or throw an error if the update did not win) so callers (e.g.,
durably.cancel) can know whether cancellation succeeded.
- Around line 488-511: getRuns currently emits SQLite-only SQL: the
json_extract(...) expression and the use of limit(-1) for offset-only pagination
will break for backend === 'postgres'. Replace the inline SQLite fragments by
calling backend-aware helpers from getRuns (the block that iterates over
filter.labels and the block that applies filter.offset without filter.limit):
create a helper (e.g., buildJsonLabelExpr or getLabelJsonExpr) that given a
label key returns the correct SQL expression for durably_runs.labels for the
current backend (Postgres should use the JSON operator/extraction appropriate
for text equality), and create a helper (e.g., applyOffsetWithOptionalLimit)
that emits a proper "no limit" behavior for Postgres instead of LIMIT -1; then
update the code in getRuns (the loop handling labels and the offset/limit logic)
to use those helpers rather than hardcoded json_extract(...) or limit(-1).
- Around line 671-755: The lease-bound updates currently only check status and
lease_owner allowing a stale worker (one whose lease has already expired) to
renew or finalize a run; update renewLease(runId, workerId, now, leaseMs) to
also require lease_expires_at IS NOT NULL and lease_expires_at > now, and update
completeRun(runId, workerId, output, completedAt) and failRun(runId, workerId,
error, completedAt) to also require lease_expires_at IS NOT NULL and
lease_expires_at > completedAt in their WHERE clauses so that expired leases
cannot be renewed or used to complete/fail runs at the storage boundary.
---
Nitpick comments:
In `@packages/durably/tests/node/core-extensions.test.ts`:
- Around line 94-111: Update the test that subscribes to the run stream and not
only check for the 'run:leased' event name but also assert the event's payload
contains the lease metadata: locate the subscription logic around
durably.subscribe(run.id) / events array and after finding the event with type
=== 'run:leased' assert that its payload includes non-empty leaseOwner and a
valid leaseExpiresAt (e.g., numeric timestamp or ISO string) properties (for
example by using events.find(e => e.type === 'run:leased') and checking
.payload.leaseOwner and .payload.leaseExpiresAt are present and of the expected
type).
In `@packages/durably/tests/shared/storage.shared.ts`:
- Around line 358-375: Add assertions that the claim path sets lease fields:
after calling durably.storage.claimNextPendingRun() assert that
claimed.leaseOwner is a non-empty string and claimed.leaseExpiresAt is not null
(and optionally > claimed.startedAt or > Date.now() for validity), and repeat
the same checks on the DB readback from durably.storage.getRun(created.id);
update the test case that calls claimNextPendingRun and getRun (references:
claimNextPendingRun, getRun, claimed, created) to include these leaseOwner and
leaseExpiresAt assertions so broken claim paths that omit lease metadata will
fail the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ab0d0885-a5fd-4556-8be9-6ae713590f82
📒 Files selected for processing (70)
CLAUDE.mdexamples/fullstack-react-router/app/routes/_index/dashboard.tsxexamples/fullstack-react-router/app/routes/_index/data-sync-progress.tsxexamples/fullstack-react-router/app/routes/_index/image-processing-progress.tsxexamples/fullstack-react-router/app/routes/_index/run-progress.tsxexamples/server-node/basic.tsexamples/spa-react-router/app/routes/_index/dashboard.tsxexamples/spa-react-router/app/routes/_index/data-sync-progress.tsxexamples/spa-react-router/app/routes/_index/image-processing-progress.tsxexamples/spa-react-router/app/routes/_index/run-progress.tsxexamples/spa-vite-react/src/components/dashboard.tsxexamples/spa-vite-react/src/components/data-sync-progress.tsxexamples/spa-vite-react/src/components/image-processing-progress.tsxexamples/spa-vite-react/src/components/run-progress.tsxpackages/durably-react/README.mdpackages/durably-react/docs/llms.mdpackages/durably-react/src/client/create-durably.tspackages/durably-react/src/client/create-job-hooks.tspackages/durably-react/src/client/use-job-run.tspackages/durably-react/src/client/use-job.tspackages/durably-react/src/client/use-run-actions.tspackages/durably-react/src/client/use-runs.tspackages/durably-react/src/hooks/use-auto-resume.tspackages/durably-react/src/hooks/use-job-run.tspackages/durably-react/src/hooks/use-job-subscription.tspackages/durably-react/src/hooks/use-job.tspackages/durably-react/src/hooks/use-runs.tspackages/durably-react/src/shared/durably-event-subscriber.tspackages/durably-react/src/shared/event-subscriber.tspackages/durably-react/src/shared/sse-event-subscriber.tspackages/durably-react/src/shared/subscription-reducer.tspackages/durably-react/src/shared/use-subscription.tspackages/durably-react/src/types.tspackages/durably-react/tests/browser/use-job-run.test.tsxpackages/durably-react/tests/browser/use-job.test.tsxpackages/durably-react/tests/client/use-job-run.test.tsxpackages/durably-react/tests/client/use-job.test.tsxpackages/durably-react/tests/client/use-runs.test.tsxpackages/durably-react/tests/types.test.tspackages/durably/docs/llms.mdpackages/durably/src/durably.tspackages/durably/src/events.tspackages/durably/src/index.tspackages/durably/src/server.tspackages/durably/src/storage.tspackages/durably/tests/node/core-extensions.test.tspackages/durably/tests/react/strict-mode.test.tsxpackages/durably/tests/shared/events.shared.tspackages/durably/tests/shared/plugin.shared.tspackages/durably/tests/shared/recovery.shared.tspackages/durably/tests/shared/server.shared.tspackages/durably/tests/shared/sse.shared.tspackages/durably/tests/shared/step.shared.tspackages/durably/tests/shared/storage.shared.tspackages/durably/tests/shared/worker.shared.tswebsite/api/create-durably.mdwebsite/api/durably-react/fullstack.mdwebsite/api/durably-react/index.mdwebsite/api/durably-react/spa.mdwebsite/api/durably-react/types.mdwebsite/api/events.mdwebsite/api/http-handler.mdwebsite/api/index.mdwebsite/guide/concepts.mdwebsite/guide/deployment.mdwebsite/guide/error-handling.mdwebsite/guide/fullstack-mode.mdwebsite/guide/server-mode.mdwebsite/guide/spa-mode.mdwebsite/public/llms.txt
✅ Files skipped from review due to trivial changes (1)
- packages/durably-react/src/client/create-durably.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/durably/src/index.ts
- packages/durably/src/events.ts
- packages/durably/src/server.ts
Verifies the server returns 400 for the removed 'running' status value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6f159ff to
a977bd9
Compare
Browser OPFS is single-tab exclusive, so multi-runtime lease reclaim cannot be tested there. The scenario is covered by 3 Node-side tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update worker:error context, spa-mode tab suspension, and error-handling cancellation descriptions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The phase1 branch added Postgres-specific tests that need a running PostgreSQL instance on port 55432. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test:node excludes *.postgres.test.ts (no Docker needed) - test:node:postgres runs Postgres tests only - test:node:all runs everything (used in CI with Postgres service) - Document Postgres test setup in CLAUDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/durably/src/storage.ts (1)
401-443:⚠️ Potential issue | 🟠 MajorDeduplicate idempotency keys inside
enqueueMany().Because inserts are deferred until after the scan, the second
{ jobName, idempotencyKey }in the same batch never sees the first. That can either create duplicate runs or make the whole batch fail on the unique constraint. Track seen keys/new rows explicitly instead of inferring “new” fromcreated_at === now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 401 - 443, In enqueueMany(), deduplicate idempotency keys within the same batch instead of relying on created_at === now: maintain an in-memory Set or Map keyed by jobName + idempotencyKey as you iterate inputs (inside the loop that builds runs), check input.idempotencyKey against that set before creating a new run or pushing an existing run object, and if a duplicate in the same batch is found either reuse the previously created run entry (from the Map) or skip adding the second one so the subsequent batch insert into durably_runs via trx.insertInto('durably_runs') won't violate the unique constraint; update references to runs, newRuns, and the logic that filters by created_at/now to use this explicit tracking instead.
♻️ Duplicate comments (5)
packages/durably/src/storage.ts (3)
757-767:⚠️ Potential issue | 🔴 CriticalMake
cancelRun()an atomic state transition.This unconditional update can still rewrite a run to
cancelledafter another worker has already completed or failed it between the caller’s read and this write. Gate the update to non-terminal states and check the affected-row count.Suggested fix
- await db + const result = await db .updateTable('durably_runs') .set({ status: 'cancelled', lease_owner: null, lease_expires_at: null, completed_at: now, updated_at: now, }) .where('id', '=', runId) - .execute() + .where('status', 'in', ['pending', 'leased']) + .executeTakeFirst() + + if (Number(result.numUpdatedRows) === 0) { + throw new Error(`Cannot cancel terminal run: ${runId}`) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 757 - 767, The cancelRun() update is unconditional and can overwrite terminal states; make it an atomic guarded transition by updating durably_runs only when status is NOT in terminal states (e.g., not IN ('completed','failed','cancelled')) and matching the runId, then inspect the affected-row count returned by updateTable('durably_runs').execute() (or the DB client's result) to detect if no row was updated and return/fail accordingly (e.g., return false or throw) so callers know the cancel did not take effect; keep the same set fields (status, lease_owner, lease_expires_at, completed_at, updated_at) but only apply when the row was in a cancellable state.
486-495:⚠️ Potential issue | 🟠 MajorPostgreSQL label filters still emit SQLite JSON syntax.
backend === 'postgres'is supported here, butgetRuns({ labels })still hardcodesjson_extract(...). Label-filtered queries will fail on PostgreSQL until this branches to a Postgres expression such as->>/#>>or a dialect helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 486 - 495, The label-filter WHERE currently always uses SQLite's json_extract(...) (inside the getRuns label loop where labels is validated) which breaks when backend === 'postgres'; update the loop in storage.ts (the block referencing validateLabels, labels and sql`json_extract(...)`) to branch on the SQL dialect and emit a Postgres JSON extraction expression (e.g., labels ->> 'key' or #>> array_path) when backend === 'postgres' (or use your existing SQL dialect helper), ensure the JSON path/quoting is built safely for each label key and keep the value as a bound parameter so the query stays parameterized and secure.
675-685:⚠️ Potential issue | 🔴 CriticalReject renewals after the lease has already expired.
This update only checks
statusandlease_owner. A suspended worker can wake up afterlease_expires_at, renew anyway, and keep executing instead of yielding to stale-owner recovery.Suggested fix
const result = await db .updateTable('durably_runs') .set({ lease_expires_at: leaseExpiresAt, updated_at: now, }) .where('id', '=', runId) .where('status', '=', 'leased') .where('lease_owner', '=', workerId) + .where('lease_expires_at', 'is not', null) + .where('lease_expires_at', '>', now) .executeTakeFirst()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/storage.ts` around lines 675 - 685, The renewal currently updates durably_runs without verifying the existing lease hasn't already expired; modify the update in the function that computes leaseExpiresAt and calls db.updateTable('durably_runs') so it also checks the current lease_expires_at is still in the future (e.g., add a where('lease_expires_at', '>', now) or equivalent timestamp comparison) and handle possible NULLs if leases can be absent, so the UPDATE only succeeds when the lease is not yet expired and owned by workerId.packages/durably/tests/shared/recovery.shared.ts (1)
40-52:⚠️ Potential issue | 🟠 MajorThis test still captures
leaseExpiresAtbefore the run is leased.
trigger()returns apendingrun, sorun.leaseExpiresAtisnullhere. Comparing againstinitialLeaseExpiresAt ?? 0makes the assertion pass trivially and doesn't prove renewal. Capture the baseline after the run first reachesleased.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/recovery.shared.ts` around lines 40 - 52, The test captures leaseExpiresAt from the pending run returned by d.jobs.job.trigger({}) so the baseline is null; instead, poll or await until the run reaches 'leased' using d.jobs.job.getRun(run.id), then capture baselineLeaseExpiresAt = midRun.leaseExpiresAt, call d.start(), wait, fetch the run again via d.jobs.job.getRun(run.id) and assert the new leaseExpiresAt is greater than the baselineLeaseExpiresAt; reference the functions run.id, d.jobs.job.getRun, d.jobs.job.trigger, d.start, and the leaseExpiresAt field when making these changes.packages/durably/src/durably.ts (1)
558-567:⚠️ Potential issue | 🔴 CriticalStep deletion in
finallymay wipe data for reclaimed runs.The
finallyblock unconditionally deletes steps when!state.preserveSteps, but this executes even when:
LeaseLostErroris caught (line 533-534 returns, butfinallystill runs)completeRun()orfailRun()returnfalse(ownership was lost)This can delete checkpoints from a run that another worker has already reclaimed.
🐛 Suggested fix: track terminal state before cleanup
+ let reachedTerminalState = false try { // ... existing code ... const completed = await queue.completeRun(...) if (completed) { + reachedTerminalState = true eventEmitter.emit({ type: 'run:complete', ... }) } } catch (error) { if (error instanceof LeaseLostError || error instanceof CancelledError) { return } // ... const failed = await queue.failRun(...) if (failed) { + reachedTerminalState = true // ... emit fail event ... } } finally { clearInterval(leaseTimer) if (leaseDeadlineTimer) { clearTimeout(leaseDeadlineTimer) } dispose() - if (!state.preserveSteps) { + if (!state.preserveSteps && reachedTerminalState) { await checkpoint.deleteSteps(run.id) } }As per coding guidelines: "Step output data is deleted when runs reach terminal state (controlled by
cleanupStepsconfiguration)."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/durably.ts` around lines 558 - 567, The finally block currently deletes steps unconditionally when !state.preserveSteps, which can wipe data if ownership was lost; change the flow to track terminal state and ownership before cleanup: introduce local flags (e.g., terminalReached and ownershipLost) around calls to completeRun() and failRun() and in LeaseLostError catch set ownershipLost, set terminalReached only when completeRun()/failRun() return true, then in the finally block only call checkpoint.deleteSteps(run.id) when !state.preserveSteps && terminalReached && !ownershipLost; keep clearing leaseTimer/leaseDeadlineTimer and dispose() unchanged.
🧹 Nitpick comments (1)
packages/durably/tests/shared/recovery.shared.ts (1)
64-109: The custom-renewal test doesn’t assert the interval yet.
timestamps.length > 0passes even if you only observed the initial claim lease. Assert at least two distinctleaseExpiresAtvalues and verify that the later one moves after roughly the configured 200ms renewal window.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/recovery.shared.ts` around lines 64 - 109, The test currently only checks timestamps.length > 0 which can pass without any lease renewal; update the "respects custom lease renewal interval" test to assert that timestamps contains at least two distinct leaseExpiresAt values and that a later timestamp is greater than an earlier one by roughly the configured leaseRenewIntervalMs (200ms) with a small tolerance (e.g. >150ms). Locate the test's timestamps array and the customDurably created with leaseRenewIntervalMs, parse the recorded leaseExpiresAt strings into numbers/dates, assert distinctness (length of unique values >= 2) and assert that (later - earlier) >= ~150ms to validate the renewal behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/durably/src/context.ts`:
- Around line 75-99: The code may execute and persist a step after the lease was
lost because it only checks run ownership once before running the step and then
relies on a local abort flag; to fix, re-fetch the run via
storage.queue.getRun(run.id) and verify currentRun.status === 'leased' &&
currentRun.leaseOwner === workerId immediately before calling createStep() or
advanceRunStepIndex(), aborting (call abortForLeaseLoss()/controller.abort() and
throwIfAborted()) if the check fails; do the same added ownership re-check after
resolving fn() and before persisting the checkpoint so
createStep/advanceRunStepIndex only run while the worker still owns the lease
(apply same change around the existing checks that use currentRun,
throwIfAborted, storage.checkpoint.getCompletedStep, createStep, and
advanceRunStepIndex).
In `@packages/durably/tests/browser/db-stress.test.ts`:
- Line 10: The variable runtimes is declared but not initialized, so if
beforeEach throws the teardown that calls runtimes.map(...) will fail and hide
the original error; initialize runtimes to an empty array at declaration (change
the declaration of runtimes in db-stress.test.ts to let runtimes:
Array<Durably<any, any>> = []) so beforeEach/teardown can safely run even when
setup fails; ensure any teardown logic (the function that iterates runtimes)
relies on the now-initialized runtimes variable.
- Around line 28-49: The test assumes a single-winner claim semantics that only
Postgres currently guarantees; update the spec to only run when the storage
backend supports atomic claimNext (e.g., check storage capability or backend
type) or mark the test pending for non-Postgres backends. Locate the test in
packages/durably/tests/browser/db-stress.test.ts and gate the loop/assertion
with a runtime/storage capability check (or use test.skip/pending) tied to the
storage.claimNext implementation (reference claimNext in
packages/durably/src/storage.ts) so the expectation
expect(winners).toHaveLength(1) only runs when the backend provides atomic
single-winner claims.
In `@packages/durably/tests/shared/plugin.shared.ts`:
- Around line 57-60: The test uses expect(events).toContain('run:leased') which
won’t fail if the same event is pushed multiple times; update the assertions
inside vi.waitFor to assert the exact occurrence count instead (e.g., assert
events.filter(e => e === 'run:leased').length === 1 or use toHaveLength(1)) so
duplicate lease notifications or double-registration of the listener are
detected; apply the same change for the other similar assertions (the
occurrences checking 'run:leased' around the other vi.waitFor block and any
related event checks) and keep using vi.waitFor and the same events array and
test flow.
In `@website/api/create-durably.md`:
- Line 220: Update the docs so the leaseExpiresAt field is documented as
nullable: change occurrences of "leaseExpiresAt: string" to "leaseExpiresAt:
string | null" (e.g., in the create-durably API examples and the accompanying
table) to match the Run type defined in packages/durably/src/storage.ts which
uses leaseExpiresAt: string | null; ensure both instances mentioned in the
comment are updated.
In `@website/api/index.md`:
- Around line 272-287: The Run type table is missing the leaseExpiresAt field;
update the API docs table under the Run type to add a row for `leaseExpiresAt`
with type `string | null` and description like "ISO timestamp when the current
lease on the run expires (null if not leased)"; ensure the note about HTTP
endpoints/ClientRun still mentions that `ClientRun` excludes `leaseExpiresAt`
and keep the reference to `toClientRun(run)` intact so readers see what is
projected away.
In `@website/public/llms.txt`:
- Around line 35-38: The inline comment for the preserveSteps option is
backwards—update the comment on the preserveSteps property to clearly state that
setting it to true keeps step output after a run reaches a terminal state (e.g.,
"set to true to keep step data after terminal state; default: false"). Also add
a short second sentence referencing how steps are created/persisted (e.g.,
"Create steps via step.run(); step success state and return value are persisted;
step output deletion is controlled by cleanupSteps configuration") so readers
understand step.run(), persisted state, and that cleanupSteps controls deletion.
---
Outside diff comments:
In `@packages/durably/src/storage.ts`:
- Around line 401-443: In enqueueMany(), deduplicate idempotency keys within the
same batch instead of relying on created_at === now: maintain an in-memory Set
or Map keyed by jobName + idempotencyKey as you iterate inputs (inside the loop
that builds runs), check input.idempotencyKey against that set before creating a
new run or pushing an existing run object, and if a duplicate in the same batch
is found either reuse the previously created run entry (from the Map) or skip
adding the second one so the subsequent batch insert into durably_runs via
trx.insertInto('durably_runs') won't violate the unique constraint; update
references to runs, newRuns, and the logic that filters by created_at/now to use
this explicit tracking instead.
---
Duplicate comments:
In `@packages/durably/src/durably.ts`:
- Around line 558-567: The finally block currently deletes steps unconditionally
when !state.preserveSteps, which can wipe data if ownership was lost; change the
flow to track terminal state and ownership before cleanup: introduce local flags
(e.g., terminalReached and ownershipLost) around calls to completeRun() and
failRun() and in LeaseLostError catch set ownershipLost, set terminalReached
only when completeRun()/failRun() return true, then in the finally block only
call checkpoint.deleteSteps(run.id) when !state.preserveSteps && terminalReached
&& !ownershipLost; keep clearing leaseTimer/leaseDeadlineTimer and dispose()
unchanged.
In `@packages/durably/src/storage.ts`:
- Around line 757-767: The cancelRun() update is unconditional and can overwrite
terminal states; make it an atomic guarded transition by updating durably_runs
only when status is NOT in terminal states (e.g., not IN
('completed','failed','cancelled')) and matching the runId, then inspect the
affected-row count returned by updateTable('durably_runs').execute() (or the DB
client's result) to detect if no row was updated and return/fail accordingly
(e.g., return false or throw) so callers know the cancel did not take effect;
keep the same set fields (status, lease_owner, lease_expires_at, completed_at,
updated_at) but only apply when the row was in a cancellable state.
- Around line 486-495: The label-filter WHERE currently always uses SQLite's
json_extract(...) (inside the getRuns label loop where labels is validated)
which breaks when backend === 'postgres'; update the loop in storage.ts (the
block referencing validateLabels, labels and sql`json_extract(...)`) to branch
on the SQL dialect and emit a Postgres JSON extraction expression (e.g., labels
->> 'key' or #>> array_path) when backend === 'postgres' (or use your existing
SQL dialect helper), ensure the JSON path/quoting is built safely for each label
key and keep the value as a bound parameter so the query stays parameterized and
secure.
- Around line 675-685: The renewal currently updates durably_runs without
verifying the existing lease hasn't already expired; modify the update in the
function that computes leaseExpiresAt and calls db.updateTable('durably_runs')
so it also checks the current lease_expires_at is still in the future (e.g., add
a where('lease_expires_at', '>', now) or equivalent timestamp comparison) and
handle possible NULLs if leases can be absent, so the UPDATE only succeeds when
the lease is not yet expired and owned by workerId.
In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 40-52: The test captures leaseExpiresAt from the pending run
returned by d.jobs.job.trigger({}) so the baseline is null; instead, poll or
await until the run reaches 'leased' using d.jobs.job.getRun(run.id), then
capture baselineLeaseExpiresAt = midRun.leaseExpiresAt, call d.start(), wait,
fetch the run again via d.jobs.job.getRun(run.id) and assert the new
leaseExpiresAt is greater than the baselineLeaseExpiresAt; reference the
functions run.id, d.jobs.job.getRun, d.jobs.job.trigger, d.start, and the
leaseExpiresAt field when making these changes.
---
Nitpick comments:
In `@packages/durably/tests/shared/recovery.shared.ts`:
- Around line 64-109: The test currently only checks timestamps.length > 0 which
can pass without any lease renewal; update the "respects custom lease renewal
interval" test to assert that timestamps contains at least two distinct
leaseExpiresAt values and that a later timestamp is greater than an earlier one
by roughly the configured leaseRenewIntervalMs (200ms) with a small tolerance
(e.g. >150ms). Locate the test's timestamps array and the customDurably created
with leaseRenewIntervalMs, parse the recorded leaseExpiresAt strings into
numbers/dates, assert distinctness (length of unique values >= 2) and assert
that (later - earlier) >= ~150ms to validate the renewal behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 782b177a-ce2b-4ab0-8d7f-8e413b3e2dd1
📒 Files selected for processing (36)
.github/workflows/ci.ymlCLAUDE.mdexamples/server-node/lib/durably.tsexamples/spa-react-router/app/lib/durably.tsexamples/spa-vite-react/src/lib/durably.tspackages/durably-react/tests/browser/use-job-logs.test.tsxpackages/durably-react/tests/browser/use-job-run.test.tsxpackages/durably-react/tests/browser/use-job.test.tsxpackages/durably-react/tests/browser/use-runs.test.tsxpackages/durably-react/tests/helpers/create-test-durably.tspackages/durably/docs/llms.mdpackages/durably/src/context.tspackages/durably/src/durably.tspackages/durably/src/server.tspackages/durably/src/storage.tspackages/durably/tests/browser/db-stress.test.tspackages/durably/tests/node/core-extensions.test.tspackages/durably/tests/react/strict-mode.test.tsxpackages/durably/tests/shared/concurrency.shared.tspackages/durably/tests/shared/durably.shared.tspackages/durably/tests/shared/log.shared.tspackages/durably/tests/shared/plugin.shared.tspackages/durably/tests/shared/recovery.shared.tspackages/durably/tests/shared/run-api.shared.tspackages/durably/tests/shared/server.shared.tspackages/durably/tests/shared/step.shared.tspackages/durably/tests/shared/worker.shared.tswebsite/api/create-durably.mdwebsite/api/durably-react/spa.mdwebsite/api/events.mdwebsite/api/index.mdwebsite/guide/concepts.mdwebsite/guide/deployment.mdwebsite/guide/error-handling.mdwebsite/guide/spa-mode.mdwebsite/public/llms.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/durably/tests/node/core-extensions.test.ts
- packages/durably/tests/shared/worker.shared.ts
- packages/durably-react/tests/browser/use-job.test.tsx
| } | ||
|
|
||
| describe('sqlocal queue stress', () => { | ||
| let runtimes: Array<Durably<any, any>> |
There was a problem hiding this comment.
Initialize runtimes before teardown can run.
If beforeEach throws before assignment, Line 25 will fail on runtimes.map(...) and mask the real setup error. Initializing this to [] makes cleanup safe.
🛠️ Proposed fix
- let runtimes: Array<Durably<any, any>>
+ let runtimes: Array<Durably<any, any>> = []Also applies to: 24-25
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/durably/tests/browser/db-stress.test.ts` at line 10, The variable
runtimes is declared but not initialized, so if beforeEach throws the teardown
that calls runtimes.map(...) will fail and hide the original error; initialize
runtimes to an empty array at declaration (change the declaration of runtimes in
db-stress.test.ts to let runtimes: Array<Durably<any, any>> = []) so
beforeEach/teardown can safely run even when setup fails; ensure any teardown
logic (the function that iterates runtimes) relies on the now-initialized
runtimes variable.
There was a problem hiding this comment.
Rate Limit Exceeded
@coji have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 0 seconds before sending another message.
- Add leaseOwner field to Run type tables in API docs - Fix ClientRun exclusion list to include leaseOwner - Update JSDoc comments for lease terminology - Fix lease renewal test to properly capture initial lease after claim - Add OPFS Secure Context note in deployment RFC - Fix example code in fullstack docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (4)
website/api/durably-react/fullstack.md (1)
463-463:⚠️ Potential issue | 🟡 MinorUpdate description to match code examples.
The description states "Cancel a leased job" but the code examples at lines 112-114 and 434 both demonstrate cancelling runs with either
pendingorleasedstatus. Update the description to reflect both statuses.📝 Suggested fix
-| `cancel` | `(runId: string) => Promise<void>` | Cancel a leased job | +| `cancel` | `(runId: string) => Promise<void>` | Cancel a pending or leased job |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/api/durably-react/fullstack.md` at line 463, Update the table description for the `cancel` method signature `(runId: string) => Promise<void>` so it matches the examples: change the text from "Cancel a leased job" to something like "Cancel a pending or leased run" (or "Cancel a run in pending or leased status") to reflect that examples at lines showing canceling runs handle both `pending` and `leased` statuses; update the single-cell table entry where `cancel` appears to use this new wording.packages/durably/src/context.ts (1)
119-136:⚠️ Potential issue | 🟠 MajorPost-step lease ownership re-check is still missing before persisting checkpoint.
After
fn()resolves at line 121, the code checks the local abort signal at line 122 but does not re-fetch the run from the database to verifystatus === 'leased' && leaseOwner === workerIdbefore callingcreateStep()andadvanceRunStepIndex(). If the lease expired during step execution and another worker reclaimed the run, this worker could still persist a stale checkpoint.The local abort signal relies on the
leaseDeadlineTimerindurably.tsfiring, but there's a race window where the timer hasn't fired yet while the DB lease has already expired.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/src/context.ts` around lines 119 - 136, After fn(controller.signal) resolves and throwIfAborted() is called, re-fetch the run from storage to ensure the lease still belongs to this worker before persisting the checkpoint; specifically, call storage.getRun(run.id) (or equivalent) and verify fetchedRun.status === 'leased' && fetchedRun.leaseOwner === workerId, and only then call storage.createStep(...) and storage.advanceRunStepIndex(...); if the check fails, abort without writing the step or advancing the index (cleanly return or throw) to avoid persisting a stale checkpoint.docs/rfcs/runtime-rearchitecture/core-runtime.md (1)
206-209:⚠️ Potential issue | 🟡 MinorClarify whether Phase 1 changes the default step cleanup behavior.
This section states "preserve step history" as the default, which appears to conflict with the existing
cleanupStepscontract. Based on learnings, step output data is deleted when runs reach terminal state (controlled bycleanupStepsconfiguration).If Phase 1 intentionally changes this default to preserve step history, please state that explicitly and note the behavioral change for adapter implementors. If Phase 1 does NOT change the
cleanupStepscontract, update the text to clarify that this is describing a recommended pattern rather than a new default behavior.Based on learnings, step output data is deleted when runs reach terminal state, controlled by
cleanupSteps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` around lines 206 - 209, Clarify whether Phase 1 changes the default step cleanup behavior: update the prose around "preserve step history" to explicitly state if Phase 1 changes the existing cleanupSteps contract or not; if Phase 1 preserves history by default, say "Phase 1 changes the default: runs will preserve step history and will not delete step output on terminal state" and call out adapter implementors to update their implementations accordingly (reference cleanupSteps and StepRecord); otherwise, mark the text as a recommended pattern (not a default) and add a short note that existing behavior—where step output data is deleted at terminal state controlled by cleanupSteps—remains unchanged.docs/rfcs/runtime-rearchitecture/ja/core-runtime.md (1)
206-209:⚠️ Potential issue | 🟡 MinorPhase 1 でステップクリーンアップのデフォルト動作が変更されるか明示してください。
このセクションでは「ステップ履歴を保持します」がデフォルトであると述べていますが、既存の
cleanupSteps契約と矛盾する可能性があります。学習によれば、ステップ出力データは run が終了状態に達したときに削除されます(cleanupSteps設定で制御)。Phase 1 で意図的にこのデフォルトを変更する場合は、明示的に述べ、adapter 実装者向けに動作変更を記載してください。Phase 1 で
cleanupSteps契約を変更しない場合は、これが新しいデフォルト動作ではなく推奨パターンを説明していることを明確にしてください。Based on learnings, step output data is deleted when runs reach terminal state, controlled by
cleanupSteps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md` around lines 206 - 209, 明確化が必要です: ドキュメント内の「ステップ履歴を保持します」という文言が既存の cleanupSteps 契約と矛盾しているため、Phase 1 で実際にデフォルト動作を変更するのかを明示してください。もし Phase 1 でデフォルトを変更するなら「Phase 1 で cleanupSteps のデフォルト挙動を変更し、ステップ出力を保持する」と明記し、adapter 実装者向けに cleanupSteps 契約の新仕様と移行手順を追記してください。変更しないなら、この箇所を「推奨パターン(デフォルトではない)」と明確に書き換え、既存の cleanupSteps 設定(run が終端状態のときにステップ出力を削除する挙動)を参照して矛盾を解消してください。
🧹 Nitpick comments (1)
docs/rfcs/runtime-rearchitecture/core-runtime.md (1)
497-497: Minor: Consider simpler phrasing.The phrase "in support of" could be shortened to "supporting" for conciseness, though the current phrasing is acceptable.
✍️ Optional simplification
-3. Claim and reclaim become adapter-level semantics in support of runtime execution. +3. Claim and reclaim become adapter-level semantics supporting runtime execution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/rfcs/runtime-rearchitecture/core-runtime.md` at line 497, Replace the phrase "in support of" with the shorter "supporting" in the sentence "Claim and reclaim become adapter-level semantics in support of runtime execution." (edit the sentence in core-runtime.md so it reads "Claim and reclaim become adapter-level semantics supporting runtime execution.")
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Around line 206-209: Clarify whether Phase 1 changes the default step cleanup
behavior: update the prose around "preserve step history" to explicitly state if
Phase 1 changes the existing cleanupSteps contract or not; if Phase 1 preserves
history by default, say "Phase 1 changes the default: runs will preserve step
history and will not delete step output on terminal state" and call out adapter
implementors to update their implementations accordingly (reference cleanupSteps
and StepRecord); otherwise, mark the text as a recommended pattern (not a
default) and add a short note that existing behavior—where step output data is
deleted at terminal state controlled by cleanupSteps—remains unchanged.
In `@docs/rfcs/runtime-rearchitecture/ja/core-runtime.md`:
- Around line 206-209: 明確化が必要です: ドキュメント内の「ステップ履歴を保持します」という文言が既存の cleanupSteps
契約と矛盾しているため、Phase 1 で実際にデフォルト動作を変更するのかを明示してください。もし Phase 1 でデフォルトを変更するなら「Phase 1
で cleanupSteps のデフォルト挙動を変更し、ステップ出力を保持する」と明記し、adapter 実装者向けに cleanupSteps
契約の新仕様と移行手順を追記してください。変更しないなら、この箇所を「推奨パターン(デフォルトではない)」と明確に書き換え、既存の cleanupSteps
設定(run が終端状態のときにステップ出力を削除する挙動)を参照して矛盾を解消してください。
In `@packages/durably/src/context.ts`:
- Around line 119-136: After fn(controller.signal) resolves and throwIfAborted()
is called, re-fetch the run from storage to ensure the lease still belongs to
this worker before persisting the checkpoint; specifically, call
storage.getRun(run.id) (or equivalent) and verify fetchedRun.status === 'leased'
&& fetchedRun.leaseOwner === workerId, and only then call
storage.createStep(...) and storage.advanceRunStepIndex(...); if the check
fails, abort without writing the step or advancing the index (cleanly return or
throw) to avoid persisting a stale checkpoint.
In `@website/api/durably-react/fullstack.md`:
- Line 463: Update the table description for the `cancel` method signature
`(runId: string) => Promise<void>` so it matches the examples: change the text
from "Cancel a leased job" to something like "Cancel a pending or leased run"
(or "Cancel a run in pending or leased status") to reflect that examples at
lines showing canceling runs handle both `pending` and `leased` statuses; update
the single-cell table entry where `cancel` appears to use this new wording.
---
Nitpick comments:
In `@docs/rfcs/runtime-rearchitecture/core-runtime.md`:
- Line 497: Replace the phrase "in support of" with the shorter "supporting" in
the sentence "Claim and reclaim become adapter-level semantics in support of
runtime execution." (edit the sentence in core-runtime.md so it reads "Claim and
reclaim become adapter-level semantics supporting runtime execution.")
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54d18a25-24a6-411b-92f3-e79b204259a7
📒 Files selected for processing (33)
.github/workflows/ci.ymlCLAUDE.mddocs/rfcs/runtime-rearchitecture/README.mddocs/rfcs/runtime-rearchitecture/core-runtime.mddocs/rfcs/runtime-rearchitecture/database-claim-patterns.mddocs/rfcs/runtime-rearchitecture/deployment-models.mddocs/rfcs/runtime-rearchitecture/ja/README.mddocs/rfcs/runtime-rearchitecture/ja/core-runtime.mddocs/rfcs/runtime-rearchitecture/ja/database-claim-patterns-ja.mddocs/rfcs/runtime-rearchitecture/phase1-exploration-notes.mdexamples/server-node/basic.tspackage.jsonpackages/durably-react/src/hooks/use-job-run.tspackages/durably-react/src/hooks/use-job.tspackages/durably/package.jsonpackages/durably/src/context.tspackages/durably/src/durably.tspackages/durably/src/index.tspackages/durably/src/job.tspackages/durably/src/migrations.tspackages/durably/src/storage.tspackages/durably/tests/browser/db-stress.test.tspackages/durably/tests/node/phase1-exploration.test.tspackages/durably/tests/shared/db-concurrency.shared.tspackages/durably/tests/shared/db-semantics.shared.tspackages/durably/tests/shared/db-stress.shared.tspackages/durably/tests/shared/durably.shared.tspackages/durably/tests/shared/recovery.shared.tspackages/durably/tests/shared/storage.shared.tswebsite/api/create-durably.mdwebsite/api/durably-react/fullstack.mdwebsite/api/durably-react/types.mdwebsite/api/index.md
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/durably/tests/shared/db-stress.shared.ts
- packages/durably/tests/browser/db-stress.test.ts
- packages/durably/tests/node/phase1-exploration.test.ts
- packages/durably/tests/shared/db-concurrency.shared.ts
- docs/rfcs/runtime-rearchitecture/ja/README.md
OPFS requires exclusive access, so only one tab can run Durably at a time. Use Web Locks API to detect when another tab holds the lock and show a friendly message instead of crashing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update RFC checkpoint retention section to reflect current defaults - Initialize runtimes array to prevent teardown crash on setup failure - Fix preserveSteps comment in llms.md (was misleading about default) - Add busy_timeout pragma to local SQLite test helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
createStep() and advanceRunStepIndex() now verify lease ownership (status='leased' AND lease_owner=workerId) before writing, closing the race window where a stale worker could persist steps after its lease expired and was reclaimed by another worker. completeRun/failRun already had these guards; this extends the same protection to step-level writes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update the runtime rearchitecture RFC (EN/JA) with design decisions from Phase 1 implementation: - Add leaseGeneration (fencing token) to RunRecord and Store interface - Replace workerId-based guards with leaseGeneration across all lease-holder writes (renewLease, completeRun, failRun, persistStep) - Introduce persistStep as atomic replacement for createStep + advanceRunStepIndex (eliminates TOCTOU window) - Add completed step uniqueness via partial unique index - Document side effect boundaries and step idempotency requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace scattered workerId-based WHERE guards with a monotonically increasing leaseGeneration counter (fencing token) for all lease-holder writes. This eliminates workerId reuse collisions and provides a single, unified guard mechanism. Key changes: - Add lease_generation column to durably_runs (migration v4) - Replace createStep + advanceRunStepIndex with atomic persistStep using INSERT...SELECT guarded by lease_generation - Add partial unique index on completed steps for replay determinism - Update renewLease, completeRun, failRun to use leaseGeneration - Update all tests to use the new API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lease_generation field and cancelled step status to the database schema description. Note partial unique index on completed steps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract 'lease-lost' magic string to LEASE_LOST constant in context.ts - Unify completeRun/failRun copy-paste into shared terminateRun helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cancel() works on both pending and leased runs, not just leased. Updated in fullstack.md, http-handler.md, and error-handling.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
coji
left a comment
There was a problem hiding this comment.
Re: Duplicate comments (4)
1. fullstack.md — "Cancel a leased job" → Fixed
Updated to "Cancel a pending or leased run" in fullstack.md, http-handler.md, and error-handling.md (4756f82).
2. context.ts — "Post-step lease ownership re-check missing" → Already resolved
This was flagged against the old createStep + advanceRunStepIndex code. The current implementation uses persistStep which does an atomic INSERT...SELECT WHERE lease_generation = ? — the DB-level guard ensures no stale checkpoint can be written without an extra round-trip. No code change needed.
3 & 4. core-runtime.md / ja/core-runtime.md — "preserve step history" default → Already correct
Both EN (line 251) and JA (line 234) already state: "deletes step output data by default (preserveSteps: false)" with preserveSteps: true as an opt-in. The StepRecord interface at lines 206-209 that triggered this comment is just a type definition, not a default-behavior statement. No change needed.
…rite failure - Add leaseGeneration parameter to updateProgress() to prevent stale workers from polluting another worker's progress data - Emit worker:error events when completeRun/failRun returns false (lease lost), improving debuggability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
leasedstatus,lease_owner,lease_expires_at) withprocessOne()/processUntilIdle()APIsFOR UPDATE SKIP LOCKED+pg_advisory_xact_lockfor atomicity and concurrency-key serializationQueueStore/CheckpointStorewith per-backend adapter tests (PostgreSQL, SQLite, libSQL, SQLocal)Key design decisions
FOR UPDATE SKIP LOCKEDalone can't prevent two workers from claiming different rows with the same key. Advisory locks serialize per key; retry loop ensures unrelated work isn't suppressed.Test plan
pnpm validatepassesskipDirectConcurrentClaimRace)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Architecture & API Updates
React Updates
Tests & Tooling
Chores