Part of the NDI-matlab did2 migration, but the PR lives in waltham-data-science/ndi-cloud-node, not NDI-matlab.
Scope
Add a general-purpose per-dataset exclusive write lock to the cloud. The did2 migration (issue 10) is the first consumer, but the same mechanism is intended to be reused for any future maintenance operation that needs to quiesce writers on a dataset (schema repair, bulk re-ingest, dedup passes, backup/restore, etc.).
Why server-side, not client-side: the lock must be enforced by the entity that accepts writes, otherwise other clients (older versions, scripts using the REST API directly, a crashed migrator) can still write and corrupt mid-flight state. A client-side check also has an unavoidable TOCTOU race. Server-side enforcement also gives us an audit trail (who held the lock, when, why), which is the HIPAA-friendlier posture.
Schema changes on Dataset
Add a writeLock subdocument:
writeLock: {
state: 'idle' | 'held',
heldBy: string, // user/client identifier
heldUntil: Date, // server-side expiry
acquiredAt: Date,
reason: string // free-form, e.g. 'did2-migration', 'schema-repair'
}
- TTL ~30 minutes.
- Refreshable by the holder; extends
heldUntil.
- Auto-expires: once
heldUntil is in the past, the lock is considered released. The next write attempt observes this and clears the field (lazy expiry; no background sweeper required for v1).
New endpoints
POST /v1/datasets/:id/write-lock — acquire. Body: { reason: string, ttlSeconds?: number }. Returns 409 if already held by someone else (response body includes heldBy, heldUntil, reason so the caller can surface a useful message). 200 if acquired (or refreshed by the existing holder, idempotent).
DELETE /v1/datasets/:id/write-lock — release. Only the holder may release. 403 otherwise.
PATCH /v1/datasets/:id/write-lock — refresh (extends heldUntil). Only the holder. 403 otherwise. 410 if the lock has expired since the holder last touched it.
GET /v1/datasets/:id/write-lock — inspect (useful for diagnostics and for clients deciding whether to wait or fail fast).
Write-path enforcement
In every Document write path — single create, bulk create, update, delete — check the parent dataset's writeLock. If state == 'held', heldUntil is in the future, and the request isn't from the holder, reject with 423 Locked. The 423 response body should include heldBy, heldUntil, and reason so callers (including the NDI-matlab UI) can show something like "dataset is locked for: did2-migration, expires 14:32 UTC".
- Reads are not blocked — maintenance windows must not break reader clients.
- Holder can still write (so the migrator can push converted docs back).
- Expired lock auto-releases on next write attempt (lazy expiry).
Tests
- Acquire / release / refresh / expire paths.
- Acquire is idempotent for the current holder; 409 for anyone else.
- Only the holder can release (403 otherwise) and refresh (403 otherwise).
- Write-rejection with 423 during the held window for non-holders.
- Holder can still write while the lock is held.
- Expired lock auto-releases on the next write attempt and the write proceeds.
- Reads succeed while the lock is held.
- 423 response body includes
heldBy, heldUntil, reason.
Out of scope
- Cloud-side did2 conversion — by design, conversion stays on the client (see the design discussion that led to this issue list).
- Multi-granularity locks (per-document, per-collection). Single dataset-wide exclusive lock only.
- Named/multiple concurrent locks on the same dataset. One lock per dataset.
- Background sweepers / cron-style expiry. Lazy expiry is sufficient for v1.
Where to open the PR
waltham-data-science/ndi-cloud-node. Target branch: claude/ndi-matlab-did2-migration-Dp03V (already exists), or a fresh Vnext if cut.
Dependencies
None. Can land any time before issue 10's PR opens against the cloud.
Note on naming
Earlier drafts of this issue called the field migrationState and the endpoints /migration-lease. Renamed to writeLock / /write-lock because the mechanism is a general-purpose exclusive write lock — did2 migration is just the first reason value.
Part of the NDI-matlab did2 migration, but the PR lives in
waltham-data-science/ndi-cloud-node, not NDI-matlab.Scope
Add a general-purpose per-dataset exclusive write lock to the cloud. The did2 migration (issue 10) is the first consumer, but the same mechanism is intended to be reused for any future maintenance operation that needs to quiesce writers on a dataset (schema repair, bulk re-ingest, dedup passes, backup/restore, etc.).
Why server-side, not client-side: the lock must be enforced by the entity that accepts writes, otherwise other clients (older versions, scripts using the REST API directly, a crashed migrator) can still write and corrupt mid-flight state. A client-side check also has an unavoidable TOCTOU race. Server-side enforcement also gives us an audit trail (who held the lock, when, why), which is the HIPAA-friendlier posture.
Schema changes on
DatasetAdd a
writeLocksubdocument:heldUntil.heldUntilis in the past, the lock is considered released. The next write attempt observes this and clears the field (lazy expiry; no background sweeper required for v1).New endpoints
POST /v1/datasets/:id/write-lock— acquire. Body:{ reason: string, ttlSeconds?: number }. Returns 409 if already held by someone else (response body includesheldBy,heldUntil,reasonso the caller can surface a useful message). 200 if acquired (or refreshed by the existing holder, idempotent).DELETE /v1/datasets/:id/write-lock— release. Only the holder may release. 403 otherwise.PATCH /v1/datasets/:id/write-lock— refresh (extendsheldUntil). Only the holder. 403 otherwise. 410 if the lock has expired since the holder last touched it.GET /v1/datasets/:id/write-lock— inspect (useful for diagnostics and for clients deciding whether to wait or fail fast).Write-path enforcement
In every Document write path — single create, bulk create, update, delete — check the parent dataset's
writeLock. Ifstate == 'held',heldUntilis in the future, and the request isn't from the holder, reject with 423 Locked. The 423 response body should includeheldBy,heldUntil, andreasonso callers (including the NDI-matlab UI) can show something like "dataset is locked for: did2-migration, expires 14:32 UTC".Tests
heldBy,heldUntil,reason.Out of scope
Where to open the PR
waltham-data-science/ndi-cloud-node. Target branch:claude/ndi-matlab-did2-migration-Dp03V(already exists), or a freshVnextif cut.Dependencies
None. Can land any time before issue 10's PR opens against the cloud.
Note on naming
Earlier drafts of this issue called the field
migrationStateand the endpoints/migration-lease. Renamed towriteLock//write-lockbecause the mechanism is a general-purpose exclusive write lock — did2 migration is just the firstreasonvalue.