Skip to content

[did2 #11] [ndi-cloud-node] Dataset writeLock: exclusive write lease + 423 rejection #784

@stevevanhooser

Description

@stevevanhooser

Part of the NDI-matlab did2 migration, but the PR lives in waltham-data-science/ndi-cloud-node, not NDI-matlab.

Scope

Add a general-purpose per-dataset exclusive write lock to the cloud. The did2 migration (issue 10) is the first consumer, but the same mechanism is intended to be reused for any future maintenance operation that needs to quiesce writers on a dataset (schema repair, bulk re-ingest, dedup passes, backup/restore, etc.).

Why server-side, not client-side: the lock must be enforced by the entity that accepts writes, otherwise other clients (older versions, scripts using the REST API directly, a crashed migrator) can still write and corrupt mid-flight state. A client-side check also has an unavoidable TOCTOU race. Server-side enforcement also gives us an audit trail (who held the lock, when, why), which is the HIPAA-friendlier posture.

Schema changes on Dataset

Add a writeLock subdocument:

writeLock: {
  state: 'idle' | 'held',
  heldBy: string,        // user/client identifier
  heldUntil: Date,       // server-side expiry
  acquiredAt: Date,
  reason: string         // free-form, e.g. 'did2-migration', 'schema-repair'
}
  • TTL ~30 minutes.
  • Refreshable by the holder; extends heldUntil.
  • Auto-expires: once heldUntil is in the past, the lock is considered released. The next write attempt observes this and clears the field (lazy expiry; no background sweeper required for v1).

New endpoints

  • POST /v1/datasets/:id/write-lock — acquire. Body: { reason: string, ttlSeconds?: number }. Returns 409 if already held by someone else (response body includes heldBy, heldUntil, reason so the caller can surface a useful message). 200 if acquired (or refreshed by the existing holder, idempotent).
  • DELETE /v1/datasets/:id/write-lock — release. Only the holder may release. 403 otherwise.
  • PATCH /v1/datasets/:id/write-lock — refresh (extends heldUntil). Only the holder. 403 otherwise. 410 if the lock has expired since the holder last touched it.
  • GET /v1/datasets/:id/write-lock — inspect (useful for diagnostics and for clients deciding whether to wait or fail fast).

Write-path enforcement

In every Document write path — single create, bulk create, update, delete — check the parent dataset's writeLock. If state == 'held', heldUntil is in the future, and the request isn't from the holder, reject with 423 Locked. The 423 response body should include heldBy, heldUntil, and reason so callers (including the NDI-matlab UI) can show something like "dataset is locked for: did2-migration, expires 14:32 UTC".

  • Reads are not blocked — maintenance windows must not break reader clients.
  • Holder can still write (so the migrator can push converted docs back).
  • Expired lock auto-releases on next write attempt (lazy expiry).

Tests

  • Acquire / release / refresh / expire paths.
  • Acquire is idempotent for the current holder; 409 for anyone else.
  • Only the holder can release (403 otherwise) and refresh (403 otherwise).
  • Write-rejection with 423 during the held window for non-holders.
  • Holder can still write while the lock is held.
  • Expired lock auto-releases on the next write attempt and the write proceeds.
  • Reads succeed while the lock is held.
  • 423 response body includes heldBy, heldUntil, reason.

Out of scope

  • Cloud-side did2 conversion — by design, conversion stays on the client (see the design discussion that led to this issue list).
  • Multi-granularity locks (per-document, per-collection). Single dataset-wide exclusive lock only.
  • Named/multiple concurrent locks on the same dataset. One lock per dataset.
  • Background sweepers / cron-style expiry. Lazy expiry is sufficient for v1.

Where to open the PR

waltham-data-science/ndi-cloud-node. Target branch: claude/ndi-matlab-did2-migration-Dp03V (already exists), or a fresh Vnext if cut.

Dependencies

None. Can land any time before issue 10's PR opens against the cloud.

Note on naming

Earlier drafts of this issue called the field migrationState and the endpoints /migration-lease. Renamed to writeLock / /write-lock because the mechanism is a general-purpose exclusive write lock — did2 migration is just the first reason value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions